Inference in Graphical Gaussian Models with Edge and Vertex Symmetries with the gRc package for R

In this paper we present the R package gRc for statistical inference in graphical Gaussian models in which symmetry restrictions have been imposed on the concentration or partial correlation matrix. The models are represented by coloured graphs where parameters associated with edges or vertices of same colour are restricted to being identical. We describe algorithms for maximum likelihood estimation and discuss model selection issues. The paper illustrates the practical use of the gRc package.


Introduction
This paper describes an R package, (R Development Core Team 2007), for statistical inference in a class of graphical Gaussian models with edge and vertex symmetries as introduced by Højsgaard and Lauritzen (2007), see also Højsgaard and Lauritzen (2005).The models generalise graphical Gaussian models (hereafter abbreviated GGMs) (Whittaker 1990;Lauritzen 1996), also known as covariance selection models (Dempster 1972).
There are two types of models available in gRc.In one type, denoted RCON models, selected elements of the concentration matrix (the inverse covariance matrix) are restricted to being identical.These models are all linear in the inverse covariance matrix and are therefore instances of models discussed by Anderson (1970).In the other class, denoted RCOR models, it is the partial correlations rather than the concentrations which are restricted to being equal.
We use RCOX models as a generic term for both types.The gRc package is part of the gR initiative (Lauritzen 2002) aiming to make graphical models available in R.

Graph colouring
Consider an undirected graph G = (V, E).Colouring the vertices of G with R ≤ |V | different colours induces a partitioning of V into disjoint sets V 1 , . . ., V R called vertex colour classes where all vertices in V r have the same colour.Here |V | denotes the number of elements in V .A similar colouring of the edges E with S ≤ |E| different colours yields a partitioning of E into disjoint sets E 1 , . . ., E S called edge colour classes where all edges in E s have the same colour.We say that V = {V 1 , . . ., V R } is a vertex colouring and E = {E 1 , . . ., E S } is an edge colouring.
A colour class with only one element is said to be atomic.A colour class which is not atomic is composite.A set a ⊂ V is called neutral if its induced subgraph has only atomic colour classes.
When drawing vertices/edges we make the convention that black and white are used for atomic colour classes.Thus two edges displayed in black will be in different (atomic) colour classes.

Graphical Gaussian models
Graphical Gaussian models are concerned with the distribution of a multivariate random vector Y = (Y α ) α∈V following a N d (µ, Σ) distribution where d = |V |.For simplicity we assume throughout that µ = 0.In the following we use Greek letters to refer to single variables and Latin letters to refer to sets of variables.We let K = Σ −1 denote the inverse covariance, also known as the concentration with elements (k αβ ) α,β∈V .The partial correlation between Y α and Y β given all other variables is then ρ αβ|V \{α,β} = −k αβ / k αα k ββ . (1) Thus k αβ = 0 if and only if Y α and Y β are conditionally independent given all other variables.
A graphical Gaussian model (hereafter abbreviated GGM) is represented by an undirected graph G = (V, E) where V is a set of vertices representing the variables and E is a set of undirected edges.The graph represents the model with K being a positive definite matrix having k αβ = 0 whenever there is no edge between α and β in G.

RCON models -Restricted CONcentration models
An RCON model with vertex colour classes V and edge colour classes E is obtained by restricting the elements of K = Σ −1 further as follows: 1) All partial variances (i.e.all diagonal elements of K) corresponding to vertices in the same vertex colour class must be identical.2) All off-diagonal entries of K corresponding to edges in the same edge colour class must be identical.Thus, the diagonal of K can be specified by an R dimensional vector η while the off-diagonal elements are given by an S dimensional vector δ so we can write K = K(η, δ). Figure 1 (a) thereby represents the concentration matrix .

RCOR models -Restricted partial CORrelation models
An RCOR model with vertex classes V and edge classes E is obtained by restricting the elements of K = Σ −1 as follows: 1) All partial variances corresponding to vertices in the same vertex colour class must be identical.2) All partial correlations corresponding to edges in the same edge colour class must be identical.
As an RCOR model, Figure 1 (b) represents a concentration matrix K written as where .
Hence from (1), A contains the inverse partial standard errors on the diagonal while C contains minus the partial correlations on the off-diagonal.The vertex colour classes of an RCOR model is then restricting elements of A whereas the edge colour classes are restricting elements of C.

Working data set: Mathematics marks
The gRc package will be illustrated on the basis of the following data set (taken from Mardia, Kent, and Bibby (1979), see also Edwards (2000)).Data contains the examination marks for 88 students in 5 different mathematics subjects: Mechanics (me), Vectors (ve), Algebra (al), Analysis (an) and Statistics (st).Data is contained the data set math.A stepwise backward model selection yields the "butterfly" model shown in Figure 2, (a), see also Whittaker (1990), p. 4.

Specifying the butterfly model -a GGM
Initially we specify the butterfly model for the mathmark data as a GGM which, by definition, is also an RCON and RCOR model.The engine for specifying and fitting the models is the rcox function which takes a type argument specifying the model type.The default model type is type='rcon'.
In the following we shall show different ways of specifying models.For a GGM, the edge and vertex colour classes can be specified indirectly by a generating class, e.g. the cliques of the independence graph.For example, the butterfly model m0 can be specified as:

Mixed representations of RCON / RCOR models
In connection with model specification it is convenient to be able to work with a mixed representation of a model as a triple (C, V, E) where C is the generating class for a GGM.The convention in connection with such a triple specification is as follows: 1) C specifies vertices and edges in the model.These are a priori unrestricted.2) E also specifies edges.Some of these may already have been specified in C but in that case restrictions in E will be imposed.
3) V also specifies vertices.Some of these may already have been specified in C but in that case restrictions in V will be imposed.

Adding symmetry restrictions to the butterfly model
To illustrate RCON models which are not standard GGMs we impose the following restrictions on m0 to obtain m1 (which is illustrated in Figure 2, (b)): 1. Vertices me and st are in the same vertex colour class and so are ve and an.Note that here we have specified some of the edges through a generating class for a graphical model, i.e. ~al:an:st.

Edges
Setting type='rcor' in rcox will similarly fit the corresponding RCOR model:

Retrieving vertex and edge colour classes
The colour classes can be retrieved using the getecc and getvcc functions, e.g.The tests in the table are Wald tests for the corresponding parameters being zero.These tests only make sense for edge colour classes, but the standard errors for vertex colour classes are still informative for how precisely the parameters are estimated.
An alternative summary type is "KC" which when applied to the RCON model above gives summary(m1, type = "KC") RCON model: logL= -1279.710dimension= 7 method= scoring time= 0.03 vcc: ~al, ~me + st, ~ve + an ecc: ~al:an, ~an:st, ~me:ve + me:al, ~ve:al + al:st me ve al an st me 0.07661336 -0.002957588 -0.002957588 0.000000000 0.000000000 ve 0.38519248 0.100220207 -0.004738956 0.000000000 0.000000000 al 0.23030890 0.282101238 0.167618661 -0.008025724 -0.004738956 an 0.00000000 0.000000000 0.477756429 0.100220207 -0.001763193 st 0.00000000 0.000000000 0.369024986 0.229636063 0.076613361 The fitted concentrations for edge colour classes appear above the diagonal (some of these are restriced to being identical under the model).The diagonal contains the fitted concentrations for for vertex colour classes, i.e. the partial variances (some of these are restriced to being identical under the model).Below the diagonal are the corresponding partial correlations.
Below the diagonal are the corresponding partial correlations (of which some are restricted to being identical under the model).
Other types of summaries are "K" and "ACA".
Standard methods like coef (for obtaining the parameter estimates) and vcov (for obtaining the asymptotic variance for the estimators) are available.

Maximum likelihood estimation
This section describes estimation in RCON and RCOR models.See Højsgaard and Lauritzen (2007) for discussion of problems concerning existence and uniqueness of estimators and convergence properties of the algorithms.

Likelihood analysis of RCON models
We consider a sample y 1 , . . ., y n of n observations of Y ∼ N d (0, Σ) where d = |V | and Σ = K −1 and let W denote the matrix of sums of squares and products The log-likelihood function based on the sample is where in this case f = n is the degrees of freedom in the Wishart distribution of W . Taking into account a possible unknown mean µ and calculating W based on residuals would yield First we consider an RCON model (V, E).For each vertex colour class u ∈ V let T u be the d × d diagonal matrix with entries T u αα = 1 if α ∈ u and 0 otherwise.For each edge colour class u ∈ E let T u be the d × d symmetrical matrix with entries T u αβ = 1 if {α, β} ∈ u and 0 otherwise.For convenience we shall identify a vertex α with a set {α, α} such that vertex classes and edge classes can be treated simultaneously in the following.Hence we can refer to a generator u for an RCON model (V, E) without specifying whether u is a vertex colour class or an edge colour class.Consequently, we can rewrite (η, δ) as θ which is an R + S dimensional vector.
The concentration matrix K = K(θ) can be written  1978).
Taking first and second derivatives of the logarithm of the normalising constant using that (3)

Algorithms for estimation in RCON models
This section describes algorithms for estimation in RCON models.Let Σ = K−1 denote the current estimate of Σ at any time during the iteration.

Scoring algorithm
The likelihood equations for RCON models can be solved using Fisher scoring if good starting values can be found and if the Fisher information matrix is moderate in size.The algorithm, however, is not globally convergent in general.
It is convenient to parametrise the model with λ u = log η u .With this parametrisation, differentiation of (2) yields the score function Differentiating further and changing sign gives the Fisher information matrix for u, v ∈ E. (5) The Fisher scoring step becomes which we found to sometimes be unstable in practice for RCON models.
Jensen, Johansen, and Lauritzen (1991) describe an estimation algorithm for linear exponential families, (see also Lauritzen (1996), p. 269), which applies Newton iteration to the reciprocal of the f th root of the likelihood function.This algorithm becomes This algorithm is globally convergent in the one-parameter case (Jensen et al. 1991).In the multi-parameter case the global convergence properties are unknown but empirical evidence suggests that it is quite stable and may be globally convergent.
We further define the discrepancy ∆(λ, δ) = 2S(λ, δ)/f where S is the score vector.Expressed in terms of ∆, ( 7) becomes Using the default method='scoring' in gRc for RCON models invokes (7) which can be seen as a stabilised version of Fisher scoring (6).

Iterative partial maximisation
Jensen et al. (1991) show that ( 7) is globally convergent in an exponential family if applied to one parameter at the time while keeping all other parameters fixed at their current values.
This iterative partial maximisation scheme works as follows for RCON models.Repeatedly loop through the elements of u ∈ V ∪E until convergence doing the following: The discrepancy is ∆ u = tr(T u Σ) − tr(T u W )/f = 2S u /f and (8) for a single parameter becomes in this case The substitution ( 9) is repeated until convergence for the set u before moving on to the next set in V ∪ E. Thus the algorithm consists of two nested loops: 1) An outer loop running over the elements u ∈ V ∪ E and 2) an inner loop maximising L with respect to θ u while keeping all other parameters fixed.
Contrary to scoring, iterative partial maximisation does not directly produce the asymptotic variance-covariance matrix of the parameter estimates.However, after convergence the inverse Fisher information may be calculated.
To ensure global convergence, the substitution in (9) should only be effectuated if the new matrix K is positive definite.Otherwise the update should be made follows: Find the largest value of λ (where λ < 1) for which an update θ n u +λQ u would yield a positive semidefinite value of K, i.e.where the update would bring the parameter to the boundary of the parameter space.
The update should then be θ n+1 u ← θ n u + λQ u /2.Thereby the update moves half the distance towards the boundary of the parameter space.This refinement has not been implemented and seems unnecessary.(We have not shown that the Newton steps are guaranteed to keep K positive definite but empirical evidence suggests that it is so.) For some colour classes, the partial maximisation of the likelihood can be made explicitly using a single step of the iterative proportional scaling (IPS) algorithm for graphical Gaussian models, see e.g.Lauritzen (1996), p. 134.Thus for such colour classes the iterative scheme in (9) needs not to be applied.
Consider a neutral set {α, β}.The parameters k αα , k ββ and k αβ are not restricted other than through K being positive definite.In this case these parameters can be updated with a single IPS step, i.e. without using the Newton method.Let a = {α, β}, let b denote the complement to a, let K aa be the 2 × 2 submatrix of K comprising k αα , k ββ and k αβ , and let K ab and K bb be defined similarly.The likelihood equations are that ( which are solved by setting This IPS step maximises the likelihood over the particular section of the parameter space given by k αα , k ββ and k αβ and thus no iteration is needed.This IPS step can be applied for any neutral set a ⊂ V .A special case is for a single parameter k αα (i.e. for an atomic vertex colour class).
An IPS step can also be used for estimating a single parameter k αβ where α = β, i.e. for an atomic edge colour class: Following the notation above, let B = K ab (K bb ) −1 K ba .The likelihood equations state that {(K aa − B) −1 } αβ = W αβ /f where {A} αβ is the off-diagonal of a symmetric 2 × 2 matrix A with entries indexed by α and β.This yields the following 2nd degree equation in k αβ : By inspection of the equation it can be seen that only one of the solutions leads to a positive definite K, and the solution is The IPM algorithm is illustrated in an example below.In this connection it is convenient to work with a mixed representation of an RCON model which is a 3-tuple, (V, E, N ) where N is a set of neutral sets.
Example 1 The graph in Figure 4 has vertex and edge colour classes set.Algorithmically, we add {4, 5} to N , remove (4:5) from E, and remove [4] and [5] from V.This yields the mixed representation One full cycle of the outer loop of the IPM algorithm goes as follows: 1.
[1] is an atomic vertex colour class so k 11 can be updated with a single IPS step (10) on a 1 × 1 matrix.
Hence k 22 = k 33 and k 12 = k 23 must be updated separately with the Newton sequence (9).So this step is computationally demanding.

3.
(3:4) is a atomic edge colour class so k 34 can be updated with a single IPS step (12) on the off-diagonal of a 2 × 2 matrix.
4. {4, 5} is a neutral set so all three parameters k 44 , k 55 and k 45 can be updated in a single IPS step (10) on a 2 × 2 matrix.
To avoid complex book keeping we have not exploited that {4, 5} is a neutral set and can be fitted as such in the current version of gRc.Instead 4. above is replaced with 4a.Update in turn k 44 , k 55 and k 45 with a single IPS step (10).
The method='ipm' for RCON models is used for the scheme where IPS is applied whenever possible (i.e. for atomic colour classes) and where ( 9) is applied for all composite colour classes.

Computational savings
The following considerations lead to additional substantial computational savings: 1.The matrices T u are symmetrical matrices consisting of zeros and ones.It is not necessary (and becomes prohibitive in terms of space requirements for high dimensional models) to actually create these matrices.When calculating traces like e.g.tr(T u ΣT u Σ) and related quantities it is sufficient to identify relevant entries of Σ etc. to be added up.
2. After updating entries of K, it is not necessary to find Σ = K −1 .The relevant part of Σ aa is (K aa − K ab (K bb ) −1 K ba ) −1 .Note here 1) that K ab (K bb ) −1 K ba is fixed throughout the whole Newton sequence and 2) that the dimension of Σ aa is often much smaller than the dimension of Σ.

Starting values
A starting value for K for both the scoring and iterative partial maximisation methods is found on the basis of the method of score matching (Hyvärinen 2005).The score matching estimate Ǩ is obtained by solving a linear system of R + S equations with R + S unknowns where R + S is the number of parameters in K. Hence Ǩ can be calculated very easily.While the score matching estimate is asymptotically unbiased, is not guaranteed to be positive definite.Moreover, we have not shown that diagonal elements of Ǩ are indeed positive.
If Ǩ is positive definite then Ǩ is taken to be the initial value of K.If Ǩ is not positive definite we make it so as follows: If the diagonal elements of Ǩ are not all positive we apply score matching to a model with the same vertex colour classes as the model under consideration but with no edge colour classes.This yields an estimate K, which is also the MLE under this hypothesis and is formed by taking simple averages over corresponding diagonal elements of W . Hence K has positive diagonal elements.We then replace the diagonal elements of Ǩ by the corresponding elements of K.
If the modified Ǩ is still not positive definite, let diag( Ǩ) be the diagonal matrix with diagonal entries being the diagonals of Ǩ. Starting from α = 0.95 and working downwards in steps of 0.05 we search the largest 0 < α < 1 such that ) is positive definite.To obtain numerical stability we then set α ← 0.95α and calculate K α again and take this as the initial value of K.
Setting method='matching' means that we first find an initial estimate of K as described above and then perform one iteration of the the scoring algorithm ( 7).This yields a fast estimate of K which is efficient to the first order.

Comparison of the estimation methods
The scoring method is in general somewhat faster than iterative partial maximisation, but iterative partial maximisation will tend to be more economical in terms of space requirements.

Likelihood analysis of RCOR models
For an RCOR model (V, E) we write K(η, δ) = A(η)C(δ)A(η).Then A is diagonal and consists of the inverse partial standard deviations while C has ones on the diagonal and will contain minus the partial correlations on the off diagonals.The log likelihood is

Algorithms for estimation in RCOR models
This section describes two iterative algorithms for estimation in RCOR models.

Scoring algorithm
As for RCON models, Fishers method of scoring can be applied for solving the likelihood equations.It is convenient to parametrise the model with λ u = log η u in which case the score becomes Differentiating further and changing sign yields the observed information matrix Taking expectations gives the Fisher information matrix, Using method='scoring' for RCOR models invokes the iteration ( 7) with score and information given by ( 14) and ( 16).For RCOR models we have found that the iteration ( 7) can lead to a decrease of the log likelihood.When this occurs, the step size [I(λ, δ) + S(λ, δ)S(λ, δ) /f ] −1 S(λ, δ) is repeatedly halved until the log likelihood has increased.

Iterative partial maximisation
Contrary to RCON models, the restrictions on the concentration matrix are in general not linear in η and δ for RCOR models.However, for known η, the restrictions are linear in δ and for known δ, the restrictions are quadratic in η.This suggests to estimate the parameters by alternating between η and δ as follows: 1. Suppose that C is known, i.e. that δ is known.Then we maximize log L over η.Maximising log L over a given η u keeping the other ηs fixed yields a 2nd order equation which has a unique positive root.Note that η u depends on the remaining ηs.Therefore, we must iterate to solve for η.For the specific form of these equations we refer to Højsgaard and Lauritzen (2007).
2. Suppose that A is known, i.e. that η is known and let Q = AW A. Then tr(ACAW ) = tr(CQ) and log L can be maximized over δ by maximising This maximisation can be made by applying the IPM algorithm for RCON models to the off-diagonal elements of C only, letting Q play the role as W in the likelihood equations for RCON models.That is, the diagonal elements of C remain constantly equal to one.
Atomic edge colour classes are updated with an IPS step and composite edge colour classes are updated with the modified Newton algorithm.
The method='ipm' in gRc for RCOR models is used for the scheme where IPS is applied whenever possible (i.e. for neutral sets and for atomic edge colour classes) and where ( 9) is applied for all composite colour classes.

Adding and dropping edge colour classes
Adding an edge colour class corresponds to adding a set of edges to the graph and forcing these to be in the same colour class.Dropping an edge colour class corresponds to deleting a set of edges from the graph.(Corresponding operations on vertex colour classes make no sense.)For example: update(m1, addecc = ~me:an + ve:st) update(m1, dropecc = ~me:ve + me:al)

Methods for comparison of colour classes
This section describes methods for investigating model reductions.That is 1) investigate if the parameters for two colour classes u and v in a model are significantly different (so that u and v can be joined) and 2) investigate if the parameter for an edge colour class u in a model is significantly different from zero (so that u can be dropped).We also described methods investigating model expansions.That is 1) investigate if a composite colour class can be split into atomic colour classes and 2) investigate if an atomic edge colour class can be added to the model.
The output of these methods is a list with two components: 1) a data frame with the results of the tests and 2) a list of the colour classes.

Model comparison
Model reductions Consider a model M 0 and two colour classes u and v (of the same type) in M 0 .The feasibility of joining u and v can be judged by the likelihood ratio statistic.This requires fitting a new model M 1 ⊂ M 0 in which u and v are joined.Then M 1 can be tested against M 0 with a deviance (likelihood ratio) test statistic To avoid fitting the model M 1 , one can instead use the Wald statistic Var( θu − θv ) .

Graphical Gaussian Models with Edge and Vertex Symmetries
The asymptotic variance of the difference is obtained from the inverse Fisher information matrix.Both statistics have under the hypothesis approximately a χ 2 d distribution where d = df 1 − df 0 .The same alternatives are available for testing if the parameter θ u for an edge colour class is zero.This generalises immediately to cases where several colour classes are compared simultaneously.
The comparisons can also be made using AIC (Akaike 1974) and BIC (Schwarz 1978).With reference to the setup above, M 1 is preferred to M 0 (according to BIC) if otherwise M 0 is preferred.If using AIC, log n is replaced by 2 above.Because the Wald and the deviance statistics are asymptotically equivalent we can calculate an approximation to AIC/BIC by replacing D with W .
Functions for making model reductions accept a stat keyword.Default is stat='wald' because it is the fastest to calculate.
Model expansions Consider a model M 0 and a composite colour class u in M 0 .The feasibility of splitting u can be judged by the likelihood ratio statistic.This requires fitting a new model M 1 ⊃ M 0 in which u is split.The deviance becomes D = −2 log(L 0 /L 1 ).When models are expanded, the Wald statistic is not available as the larger model has not been fitted to data.Apart from that everything else is as above.
Consequently, functions for model expansion are all based on the deviance statistic and do therefore not accept a stat keyword.

Model reductions
Pairwise comparisons of edge/vertex colour classes can be made using the comparecc function.
In comparecc all colour classes specified in cc1 are compared with all those given in cc2 (duplicate entries are not compared).If cc2=NULL (the default) then all colour classes specified in cc1 are compared with all colour classes in the model except those specified in cc1.If cc1=NULL (the default) and cc2=NULL then all pairwise comparisons are made.Hence there is no evidence that adding addtional (atomic) edge colour classes to the model would significantly enhance the fit of the model.

Stepwise procedures
This section contains a description of functions for stepwise model selection.The output of these methods is either a new model object or NULL if no change was made.These functions are based on repeated applications of the functions described in Section 6, and therefore their arguments are similar.Default is that the selection criterion is AIC.

The need for selection strategies
The number of different models which can be formed by colouring edges/vertices in a given graph is enormous.To illustrate the complexity, consider graphs with three vertices (for which there are 8 different graphs).A tedious enumeration shows that there are in total (over all 8 graphs) 15 possible edge colour classes.There are 5 possible vertex colour classes which gives 5 × 15 = 75 different models.Therefore, good model selection strategies become important.This section discusses methods which would be part of model selection strategies; however much additional work is required in this area.

Nested versus non-nested models
It should be noted that addition of an edge to a coloured graph can have different meanings: The model M 0 in Figure 5 is given by [1][2][3](1:2, 2:3).Consider addition of the edge 2:3 in M 0 .If a new colour class is formed as in M 1 then M 1 will contain M 0 and these two models can be tested e.g. with a deviance test.An alternative is to add the edge to the already existing colour class as in M 2 .In that case, M 0 and M 2 are not nested.Likewise dropping e.g.1:2 from M 0 does not lead to a model reduction.A model comparison can however be made in terms of AIC or BIC.
The methods described in the following all act within nested models and hence AIC and BIC as well as significance testing can be used as selection criteria.

Aspects of stepwise procedures
Stepwise joining of the two most homogeneous colour classes In RCOX models, model reductions can be achieved by joining colour classes.Suppose there are p colour classes of a given type, e.g.edge colour classes.The number of ways these can be combined for joining is enormous.Therefore we consider only to join pairs of colour classes and there are p(p − 1)/2 such pairs to consider.We say we join the two most homogeneous colour classes.
The stepjoin1 function facilitates doing this in a stepwise fashion.
Stepwise dropping of the least significant edge colour class Model reductions can also be achieved by dropping edge colour classes, which is the counterpart to dropping insignificant edges in GGMs.The stepdrop1 function facilitates doing this.In this case, the function returns NULL because it is not feasible to drop any of the edge colour classes.

Stepwise model expansions
Stepwise split of composite colour classes The split operation for colour classes can be applied in a stepwise fashion.The model in Figure 7, left is given by: In this case, the function returns NULL because it is not feasible to add any edge colour classes.

Discussion and perspectives
We have described an R package gRc for statistical inference in RCON and RCOR models.These models have been described in some detail, including a description of various algorithms for maximum likelihood estimation.For further details on the models and their properties we refer to Højsgaard and Lauritzen (2007).
The facilities of this package cover model editing functions, functions for comparing colour classes and stepwise model selection functions.These facilities are described.We have also presented some examples of how to use the package.
Improvements of gRc can be made in several directions of which we outline some here: i) The current implementation of gRc is made entirely in R. The computing time could be reduced by implementing larger parts of the algorithms, in particular the iterative partial maximisation, in a compiled language.Further, it would be desirable to develop faster faster estimation algorithms such that gRc can be applied to problems of higher dimension.
ii) Additional model selection criteria and strategies in RCON and RCOR models must be investigated further and implemented.

Figure 1 :
Figure 1: Coloured graphs.(a): The edges 1:2 and 1:3 are in the same (light blue) edge colour class as also indicated by the "+"-sign.Likewise, 2:4 and 3:4 are in the same (green) edge colour class, also indicated by "++".The vertices 1 and 4 are in the red vertex colour class (also indicated by "*") while vertices 2 and 3 are in the blue vertex colour class (indicated by "**").(b): Illustration of atomic colour classes.The vertices 2 and 3 are drawn in black and are atomic, so 2 and 3 are in different vertex colour classes.Likewise for edges 2:4 and 3:4.

Figure 2 :
Figure 2: (a) The graphical Gaussian "butterfly" model selected for the mathmarks data.All vertices and edges are neutral, i.e. the parameters are unrestricted.In the following this model is called m0.(b) A representation of an RCON / RCOR model with restrictions on both vertices and edges.This model is denoted m1 in the following.

FigureFigure 3 :
Figure 3: The display of the model m1 produced by the plot() function.

Figure 4 :
Figure 4: Representation of the RCON model with the restrictions that k 22 = k 33 and k 12 = k 23 .
Joining colour classes leads to a model reduction.The join1 function is essentially a wrapper for comparecc.Based on the Wald statistic (the default) the pairwise comparisons of the colour classes of a specific type are made.The set of colour classes under consideration can be restricted using the scope argument (default is that all colour classes are considered) e.g.join1(m1, scope = list(~an:st, ~me:ve + me:al, ~ve:al + al:st), type = "ecc")

Figure 5 :
Figure5: Illustration of addition of the edge 2:3 to a coloured graph.If the edge is added to M 0 by forming a new edge colour class as in M 1 , then M 0 and M 1 are nested and can be tested.Alternatively, the edge can added to an existing edge colour class as in M 2 .The models M 0 and M 2 are then not nested.

Figure 6 :
Figure 6: The model obtained after 1) first successively joining vertex colour classes and 2) then successively joining edge colour classes.

Figure 7 :
Figure 7: Left: Starting model.Middle: Model after splitting vertex colour classes.Right: Model after also splitting edge colour classes me:ve and me:al are in the same edge colour class and so are ve:al and al:st.
The reason for displaying the colour classes below the body of the table is that a colour class can consist of many edges/vertices thereby making the table very large.
models are thus linear exponential families where (−t 1 /2, ..., −t S+T /2) are canonical sufficient statistics and ψ(θ) = − f 2 log det(K) isthe logarithm of the normalising constant.The maximum likelihood estimate is unique and is obtained by equating the canonical sufficient statistics to their expectation (Barndorff-Nielsen Dropping edge colour class It is possible to test if the parameter for an edge colour class is zero.(Such a test makes no sense for vertex colour classes).The set of edge colour classes under consideration can be restricted using the scope argument (default is that all edge colour classes are considered) e.g.Splitting colour classes Splitting a (composite) colour class leads to a model expansion.To investigate if edge colour class in m1 can be split do: Thus there is no evidence that splitting either of the composite edge colour classes would significantly enhance the fit of the model.Note that splitting a composite colour class into atomic colour classes can lead to fairly large increase in the model complexity, i.e. in the number of parameters in the model.Adding edge colour classIn the same spirit one can make a test for addition of (atomic) edge colour classes to the model: