Inequalities on partial correlations in Gaussian graphical models containing star shapes

ABSTRACT This short paper proves inequalities that restrict the magnitudes of the partial correlations in star-shaped structures in Gaussian graphical models. These inequalities have to be satisfied by distributions that are used for generating simulated data to test structure-learning algorithms, but methods that have been used to create such distributions do not always ensure that they are. The inequalities are also noteworthy because stars are common and meaningful in real-world networks.


Introduction and definitions
Networks that model real-world phenomena often have few edges but several hubs, which are vertices that are connected to many others. Hubs are often important. For example, when Gaussian graphical models (GGMs) are used to model gene regulation networks, as for example in Friedman et al. (2000), Roverato (2006, 2009), and Edwards et al. (2010), hubs are likely to correspond to genes that code for transcription factors that regulate other genes. If a GGM is used to model currency values, as in Carvalho et al. (2007), then a hub might correspond to a country that has large-scale trade with many others.
In a GGM the strength of the direct association between two vertices is measured by the magnitude of their partial correlation. The partial correlation is the correlation between the two vertices given the values of all the other vertices, and if there is no edge between the vertices then the partial correlation is zero (Lauritzen, 1996, section 5.1). This paper shows that the magnitudes of the partial correlations are always small, in a certain sense, in the case of a star, which is a structure that consists of a hub and a set of vertices that have edges to the hub but not to each other. (We use the term "star" because the results do not apply to a hub where two or more of the vertices that have edges to the hub also have edges to each other.) Several definitions and relations will be needed. Let G = (V, E) be an undirected graph, ..., n}, and X ∼ N n (μ, ), and suppose that G is the graph for a graphical model that includes the distribution of X; this means that if (i, j) / ∈ E then X i ⊥ ⊥ X j | X V \{i, j} (X i is conditionally independent of X j given X V \{i, j} ). For the multivariate Gaussian distribution, X i ⊥ ⊥ X j | X V \{i, j} ⇔ i j = 0, where = −1 is the precision matrix. Let M be the standardized precision matrix, which means that M = D −1/2 D −1/2 , where d i j = 0 for i = j and d ii = ω ii . It follows that m i j = ω i j / √ ω ii ω j j and −1 ≤ m i j ≤ 1. The partial correlation between X i and X j is then p i j = −m i j , for i = j.

Sylvester's criterion
Sylvester's criterion states that a matrix is positive-definite if and only if the determinants of all its square upper-left submatrices are positive-these determinants are called the leading principal minors of the matrix. The origins of this result are obscure but some light is shed on them by Smith (2008). A proof is given in Gilbert (1991).
For GGMs it is common to assume that is positive-definite, which is equivalent to the support of X being R n . This assumption is made in the propositions below. It is also equivalent to or M being positive-definite.
If is not assumed to be positive-definite, it must at least be positive-semidefinite. It might be conjectured that Sylvester's criterion could be adapted to this case, to state that a matrix is positive-semidefinite if and only if all its leading principal minors are non-negative. But this is not true, and a counterexample is given by Swamy (1973).

The inequalities
This section presents four inequalities. The first two are the main results and the other two are corollaries. Proposition 1 is about graphs that consist entirely of a single star-structure.
, n}}, so that G is a star-shaped graph centered at vertex 1. Then a necessary and sufficient condition for M to be positive-definite is that n i=2 p 2 1i < 1. Proof. One of the square upper-left submatrices of M is the whole matrix M itself.
The determinant of M is 1 − n i=2 m 2 1i , and this being positive is equivalent to n i=2 m 2 1i < 1. If this inequality holds then the other determinants in Sylvester's criterion are also positive. So this inequality on its own is a necessary and sufficient condition for M being positivedefinite, and it is obviously equivalent to n i=2 p 2 1i < 1.
Proposition 2 is about graphs that contain star-shaped subgraphs. It states that if G contains a star as an induced subgraph (Lauritzen, 1996, section 2.1.1), then a similar inequality to Proposition 1 holds in that subgraph as a necessary condition.
Proposition 3 is a corollary of Proposition 2 in which the inequality may be easier to interpret.
Proposition 3. If the graph is as in Proposition 2, then the mean magnitude of the partial correlations p i j 1 , ..., p i j s must be less than 1/ √ s.
Proof. Suppose that q 1 , ..., q s ≥ 0 and s a=1 q 2 a = 1. The method of Lagrange multipliers can be used to show that the maximum value of ( s a=1 q a )/s is 1/ √ s. Proposition 2 implies that |p i j 1 |, ..., |p i j s | satisfy the same conditions as q 1 , ..., q s , except that the equals sign is replaced by a less-than sign. It follows that ( s a=1 |p i j a |)/s < 1/ √ s.
For example, in a star with s edges, at least one of these edges must have the magnitude of the corresponding partial correlation being less than √ 1/s. In any graph that contains a V-shape (three vertices with two edges), which means any graph that does not consist entirely of disjoint cliques, there must be at least one partial correlation on an edge that has magnitude less than √ 1/2 ≈ 0.707. Two classes of graphs that contain many stars are forests and trees. Forests can be defined as graphs that have no cycles, and trees are connected forests. These are very restricted classes of graphs, but forest and tree graphical models have several advantages and have been widely studied and used (Willsky et al., 2002;Meilȃ and Jaakkola, 2006;Eaton and Murphy, 2007;Edwards et al., 2010;Anandkumar et al., 2012). If G is a forest or tree, then Proposition 2 holds with i as any of the vertices. In Proposition 4 this fact is used to make an inequality on the partial correlations throughout the graph.
Proof. Apply Proposition 2 to all the vertices in V \ L in turn, and sum all these n − |L| inequalities. For each (i, j) ∈ E, if i / ∈ L and j / ∈ L then p 2 i j will appear twice on the left-hand side, if i ∈ L or j ∈ L but not both then it will appear once, and if both i ∈ L and j ∈ L then it will not appear at all.
It might be conjectured that if the inequality in Proposition 2 is satisfied for all stars that appear as induced subgraphs of G, then M is positive-definite. But this does not hold. For shapes other than stars, there do not seem to be any inequalities that are as notable as Propositions 1 and 2. It is possible to write down the inequalities that result from Sylvester's criterion, but it is generally not easy to rearrange them into a meaningful form.
The conditional independence relations shown by graphs also imply conditions on the marginal correlations (which are usually just called correlations). These can be found by inverting M and then standardizing to find the correlation matrix C. For example, if the graph is as in Proposition 1 then c jk = c 1 j c 1k for all j, k ∈ {2, ..., n}. This is a special case of the fact that the correlation between two vertices in a tree is the product of the correlations along the edges that connect them (Pearl, 1988, section 8.3.4;Tan et al., 2010).

Relevance to experiments on structure-learning algorithms
Proposition 2 arises when doing a certain type of experiment on algorithms for learning the structure of GGMs from data. In these experiments, simulated data is generated from a multivariate Gaussian distribution that corresponds to a known graph, then the structure-learning algorithm is used on the data, and finally the output of the algorithm is compared with the original graph. The first step in making the simulated data is to create a covariance matrix that corresponds to the original graph, and naturally this has to fulfil the inequality in Proposition 2.
Numerous publications describe experiments of this type, for example Friedman et al. (2007), Moghaddam et al. (2009), Albieri (2010, Wang and Li (2012), Wang (2012), and Green and Thomas (2013). Most of these do not mention the issue of ensuring that is positive-definite, suggesting that it was not a problem. One experiment that does mention the issue appears in Meinshausen and Bühlmann (2006). They used large graphs whose vertices have maximum degree 4, and chose all the partial correlations to be 0.245. They state without proof that absolute values less than 0.25 guarantee that is positive-definite-this condition is stronger than Proposition 2, which implies only that the mean absolute value has to be less than 0.5.
One detailed procedure for creating is described for the first example in section 4.1 of Guo et al. (2011). This procedure presumably gave positive-definite matrices when it was used for this example, with n = 100 and small numbers of extra edges, but it does not always do so. To show how this procedure sometimes fails, it is convenient to start by considering specific values from the uniform distributions that are used, though obviously the probability that these exact values would be drawn is zero. Suppose that n ≥ 4, s i − s i−1 = 0.9 for i = 2, ..., n, the extra edges include {1, 3} and {1, 4} but no other edges between any of the first five vertices (the first four if n = 4), and the corresponding four new elements of are all 0.95. The exact value of can now be calculated by using equation 3.2 in Barrett (1979) to calculate the tridiagonal and then adding the new elements. Let For all n ≥ 6 the upper-left 5 × 5 submatrix of is the same, and its determinant is negative, which means that is not positive-definite; the cases n = 4 and n = 5 can be checked separately. This argument still holds if all the instances of 0.9 and 0.95 are replaced by slightly different values, because if the determinant of the upper-left 5 × 5 submatrix is written as a function of s 1 , ..., s n , ω 13 , and ω 14 , then this function is continuous. It follows that the procedure fails with positive probability for all n ≥ 4, assuming that it is possible for the new edges between the first five vertices to be as in this counterexample.

Discussion
Proposition 1 is a necessary and sufficient condition for the covariance matrix to be positivedefinite, but it only applies to graphs that consist of a single star-structure. Proposition 2 applies much more widely, to graphs that contain star-structures, but it is only necessary, not sufficient. Nevertheless, Proposition 2 is useful and important in practice. When creating a covariance matrix it is natural to want to choose specific values for the partial correlations, and Proposition 2 places strong restrictions on what these can be.
Proposition 2 has several other interesting consequences or interpretations. If X 1 has sufficiently strong direct associations (partial correlations) with X 2 and X 3 then there must also be a direct association between X 2 and X 3 . On the other hand, if X 2 and X 3 are both almost deterministic functions of X 1 (and not of each other), then the marginal correlations c 12 and c 13 will be close to 1 or −1, but at least one of the partial correlations p 12 and p 13 must have magnitude less than 1/ √ 2. Obviously both of these consequences generalize to larger n. The proofs of Propositions 1 and 2 are straightforward applications of Sylvester's criterion. This criterion is well known in some fields but does not seem to have been previously used or even mentioned in connection with partial correlation matrices for GGMs.
Since Proposition 2 is not a sufficient condition, the question arises of how to create a possible covariance matrix for an arbitrary given graph. There are several methods that are guaranteed to work, though these do not easily allow specific values to be chosen for the partial correlations or elements of the covariance matrix. One method is described in the appendix of Roverato (2002). This uses the Cholesky decomposition = T , where is an uppertriangular matrix. The diagonal elements of and the elements that correspond to edges in the graph can be chosen freely, and the other elements have to be calculated according to Roverato's equation (10). For decomposable graphs, the calculations for this second set of elements can be avoided-if the vertices are ordered according to a perfect vertex elimination scheme (Lauritzen, 1996, section 2.1.3), then these elements are all zero.
An alternative method to create a covariance matrix for any graph is as follows. Start with any n × n symmetric matrix in which the diagonal elements are positive and the elements corresponding to absent edges are zero, find its eigenvalues, and if any of these are negative then let −λ be the lowest one and add (λ + )I n to the matrix, for some > 0. The resulting matrix's eigenvalues are all positive, which means that it is positive-definite, and it still has the symmetry and the zeroes in the same places.
Propositions 1-4 also apply to directed acyclic graphical models (also known as Bayesian networks), because stars in undirected graphical models are equivalent to stars in directed acyclic graphical models, if the edges are all directed from the hub to the other vertices. The edges are oriented like this if the hub corresponds to a gene that codes for a transcription factor, for example.
Inequalities that are essentially the same as Propositions 1-4 also apply to covariance graphical models (Wermuth and Cox, 2001), in which an edge that is absent from the graph means that the two variables are marginally independent (rather than conditionally independent as in GGMs) and corresponds to zeroes in the covariance and correlation matrices (rather than the precision and partial correlation matrices). The four propositions hold for these models if M is just replaced by the correlation matrix and p i j is replaced by the correlation between X i and X j . Covariance graphical models are an active topic of research (Chaudhuri et al., 2008;Drton and Richardson, 2008;El Karoui, 2008;Bien and Tibshirani, 2011;Wang, 2014;Wang, 2015) and can be used to analyze gene expression data, protein networks, and financial data.