Bergm: Bayesian Exponential Random Graphs in R

In this paper we describe the main featuress of the Bergm package for the open-source R software which provides a comprehensive framework for Bayesian analysis for exponential random graph models: tools for parameter estimation, model selection and goodness-of-fit diagnostics. We illustrate the capabilities of this package describing the algorithms through a tutorial analysis of two well-known network datasets.


Introduction
The Bergm package for R (R Development Core Team, 2011) implements Bayesian analysis for Exponential Random Graph Models (ERGMs) (Wasserman and Pattison (1996); Robins et al. (2007)) using the methods described by Caimo and Friel (2011) and Caimo and Friel (2013).The package provides a comprehensive framework for Bayesian inference and model selection using Markov chain Monte Carlo (MCMC) algorithms.It can also supply graphical Bayesian goodness-of-fit procedures that address the issue of model adequacy.Although computationally intensive, the package is simple to use and represents an attractive way of analyzing network data as it offers the advantange of a complete probabilistic treatment of uncertainty.Bergm is based on the ergm package (Hunter et al., 2008b) which is part of the statnet suite of packages (Handcock et al., 2007) and therefore it makes use of the same model set-up and network simulation algorithms.The Bergm package has been continually improved in terms of speed performance over the last two years and one of the purposes of this paper is to highlight these improvements.We feel that this package now offers the end-user a feasible option for carrying out Bayesian inference for exponential random graphs.
Two well-known network datasets will be used throughout this tutorial for illustrative purposes: the first is the Kapferer Tailor Shop dataset (Kapferer, 1972) whose directed edges represent work interactions in a tailor shop in Zambia (then Northern Rhodesia) and nodal attributes refer to the job status.The second network is Zachary's karate club (Zachary, 1977) which represents the undirected social network graph of friendships between 34 members of a karate club at a US university in the 1970s.Figure 1 displays the graphs of these two networks, The exact R code used to produce both of these plots is given in Appendix A.
In this paper we describe how to install and load Bergm (Section 2) providing a brief summary of what Bayesian ERGMs are (Section 3).Sections 4, 5, and 6 overview the algorithms and the functions used to produce posterior estimates for the parameters, Bayesian goodness-of-fit procedures and model selection respectively.Two examples are developed in Section 4.3 and Section 6.1.This paper does not provide an exhaustive description of all the functionality and options available, and more information about the commands and methods mentioned are available through the R help system within the package.

Getting Bergm
The Bergm package can be obtained and loaded in R using the following commands: R> install.packages("Bergm")R> library("Bergm") Since Bergm depends on ergm (Hunter et al., 2008b) (which in turn depends on network (Butts, 2008)), coda (Plummer et al., 2006), and mvtnorm (Genz et al., 2012).Installing the package will automatically load all the dependencies.All of these packages are available on the Comprehensive R Archive Network (CRAN) at http://CRAN.R-project.org.
The results presented in this paper have been obtained using R version 2.15.1 on a Mac using Bergm version 2.5; ergm version 3.0-3; network version 1.7-1; coda version 0.15-2; and mvtnorm version 0.9-9993.

Bayesian exponential random graphs
The Bayesian approach to statistical problems is probabilistic.Inference is based on the posterior distribution which is the conditional probability of the unknown quantities given the observed ones.The posterior distribution extracts the information in the data and provide a complete summary of the uncertainty about the unknowns.
In the ERGM context (see Wasserman and Pattison (1996) and Robins et al. (2007)), the purpose of Bayesian inference is to learn about the posterior distribution of the model parameters θ of an observed graph y on n nodes: where s(y) is a known vector of sufficient network statistics (Figure 2) (Morris et al., 2008), p(θ) is a prior distribution placed on θ, z(θ) is the likelihood normalizing constant, and p(y) is the model evidence.Equation 1 provides a probabilistic statement about how likely parameter values are after observing the data y.The likelihood p(y|θ) is translated into a proper probability distribution that can be summarised by computing expected values, standard deviations, quantiles, etc.
Unfortunately the posterior distribution (Equation 1) is doubly-intractable as both z(θ) and p(y) cannot be evaluated analytically (Koskinen, 2004).This makes the use of standard MCMC procedures infeasible.
In order to carry out Bayesian inference for ERGMs, the Bergm package makes use of a combination of Bayesian algorithms and MCMC techniques.The exchange algorithm circumvents the problem of computing the normalizing constants of the ERGM likelihoods, while the use of multiple chains interacting with each others (population MCMC approach) by means of adaptive direction sampling is able to speed up the computations and improve chain mixing quite significantly.

Bayesian parameter estimation
In order to estimate Equation 1, the Bergm package uses the exchange algorithm described in Section 4.1 of Caimo and Friel (2011) to sample from the following distribution: where p(y |θ ) is the likelihood on which the simulated data y are defined, (θ |θ) is any arbitrary proposal distribution for the augmented variable θ .As we will see in the next section, this proposal distribution is set to be a normal centered at θ.At each MCMC iteration, the exchange algorithm consists of a Gibbs update of θ followed by a Gibbs update of y , which is drawn from the p(•|θ ) via an MCMC algorithm (Hunter et al., 2008b).Then a deterministic exchange or swap from the current state θ to the proposed new parameter θ .This deterministic proposal is accepted with probability: min 1, q θ (y )p(θ ) (θ|θ )q θ (y) where q θ and q θ indicates the unnormalised likelihoods with parameter θ and θ , respectively.Notice that all the normalising constants cancel above and below in the fraction above, in this way avoiding the need to calculate the intractable normalising constant.
The exchange algorithm is implemented by the bergm function in the following way: where s(y) is the observed vector of network statistics and s(y ) is the simulated vector of network statistics.

Block-update sampler
Step 1 of the algorithm consists in generating θ from some proposal distribution within each iteration.Bergm uses a block-update sampler with normal proposal to simultaneously update of the parameter values in the MCMC chain: Typically, tuning the parameter Σ of the proposal distribution from which θ is drawn represents the crucial part of the algorithm since a poor tuning of the proposal parameter Σ can slow down the chain's mixing rate and therefore the algorithm can take a very long time to converge to the stationary posterior density.

Parallel adaptive direction sampler
In order to improve mixing a parallel adaptive direction sampler (ADS) (Gilks et al., 1994;Roberts and Gilks, 1994) is considered: at the i-th iteration of the algorithm we have a collection of H different chains interacting with one another.By construction, the state space consists of {θ 1 , . . ., θ H } with target distribution p(θ A parallel ADS move consists of generating a new value θ h from the difference of two parameters θ h 1 and θ h 2 (randomly selected from other chains) multiplied by a scalar term γ which is called parallel ADS move factor plus a random term called parallel ADS move parameter (Figure 3) which is equivalent to the block-update sampler defined in (Equation 2).The algorithm can be summarised as follows:

Kapferer tailor shop network
Consider the Kapferer Tailor Shop network and a 3-dimensional model including the following network statistics: edges (edges), mutual edges (mutual) and cyclic triples (ctriple) involving nodes with the same job status, where the job status is represented by a categorical nodal attribute variable consisting of 8 levels: i<j<k y ij y jk y ki where i, j, k have the same job status The format of the model specification is the same of an ergm formula: R> formula <-y ~edges + mutual + ctriple("job") Then we can use the bergm function to sample from the posterior distribution using the MCMC algorithm described above.In this example we use the parallel ADS procedure described in Section 4.2.By default, the number of chains in the population is set as twice the number of dimensions of the model.It is possible to choose a different number of chains by using the argument nchains.In order to perform the block-site update described in Section 4.1 it is necessary to set nchains = 1.For each chain, we can then set the number of burn-in iterations (burn.in)and the number of iterations after the burn-in (main.iters).The number of iterations used to simulate a network y at each iteration is defined by the argument aux.iters.
R> post.est<-bergm(formula, The population MCMC with parallel ADS move is the default procedure of the bergm function.The total number of iterations, e.g., the size of the posterior sample, is nchains × main.iters.The proposal covariance structure Σ is defined by the argument sigma.epsilonwhich is set to be a diagonal matrix with every diagonal entry equal to a small number.In many cases, good mixing of the chain is ensured by a sensible tuning of the parallel ADS move factor gamma and therefore the argument sigma.epsiloncan be generally left at its default setting.As said above, parallel ADS is adopted as the default procedure but it is automatically disabled in the case of uni-dimensional models where the block-update sampler is used and the argument gamma is used to tune the variance of the normal proposal distribution . After The output above shows the results of the MCMC estimation: posterior means and standard deviations, and acceptance rates for every chain in the population and for the overall chain.Notice that the posterior summaries of each chain are consistent with each other.Figure 4 displays the MCMC diagnostic plots produced by the bergm.outputfunction.In this example, we observe a low density effect expressed by the negative value of the posterior mean of the edge effect parameter (θ 1 ) combined with the positive mutuality and transitivity within people having the same job status expressed by the mutual edge parameter (θ 2 ) and cyclic triple parameter (θ 3 ) respectively.The overall acceptance rate is around 24% and the autocorrelation is negligible after lag 50.

Bayesian goodness-of-fit diagnostics
An important contribution of this article is to propose a Bayesian procedure to establish whether the estimated parameter posterior of the model achieves a good fit to the key topological features of the observed network.
The bgof function provides a useful tool for assessing Bayesian goodness-of-fit so as to examine the fit of the data to the posterior model obtained by the bergm function.The observed network data y are compared with a set of networks y 1 , y 2 , . . ., y S simulated from S independent realisations θ 1 , θ 2 , . . ., θ S of the posterior density estimate.This comparison is made in terms of high-level characteristics g(•) such as higher degree distributions, etc. (see Hunter et al. (2008a)).The algorithm can be summarised as follows: The set of statistics used for the comparison of directed networks includes the in-degree distribution, the out-degree distribution, the minimum geodesic distance distribution and the edgewise shared partner distribution.The arguments n.ideg, n.odeg, n.dist, and n.esp indicates the number of boxplots to plot for each distribution respectively.
In Figure 5 we see, based on the various goodness of fit statistics, that the networks simulated from the posterior distribution are in reasonable agreement with the observed network.We can therefore conclude that the data are a reasonable fit to the model, despite its simplicity.

Bayesian model selection
An important problem in statistical analysis is the choice of an optimal model from a set of a priori competing models.In the ERGM context, this task translates into the choice of which subset of network statistics should be included into the model.(Green, 1995) was designed to explore this type of posterior distribution across the joint model and parameter space.It is therefore a type of MCMC algorithm that allows one to jointly explore the uncertain between and within models.
This approach is very appealing since it relies exclusively on probabilistic considerations but is very challenging from a computational viewpoint.As stated above, the intractability of the likelihood normalising constant z(θ) in Equation 1renders standard RJMCMC techniques infeasible.However, the exchange algorithm used for parameter estimation can be easily generalised so as to include model indicators.
The auto-RJ exchange algorithm described in Caimo and Friel (2013)  The bergmS function implements the auto-RJ exchange algorithm which consists of two parts: an offline step and an online step.In the online step, samples from the posterior p(θ|y, m) are gathered from each competing model using the bergm function and then approximated by normal distributions N ( μm , Σm ) determined by the first and second moments from a sample from the model.
The second step (online run) of the algorithm consists of a Gibbs update of m followed by a Gibbs update of θ which is generated via the independence sampler w(θ |m ) ∼ N ( μm , Σm ).This is followed by a Gibbs update of y which is generated from p(•|θ , m ).
Then a deterministic exchange move from a current state (θ, m) to the proposed new state (θ , m ) is accepted with probability: q θ,m (y) q θ ,m (y ) where q θ,m and q θ ,m indicates the unnormalised likelihoods under model m with parameter θ and under model m with parameter θ respectively.
The structure of the bergmS function can be described in the following way: between model proposals to be uniform.The reader is referred to Caimo and Friel (2013) for more details on this algorithm.

Karate club network
In this example, we consider the Karate club network and we propose three competing models to fit the data using a set of new specification statistics introduced by Snijders The bergmS command is then used to carry out the algorithm.To do this we have to specify several arguments for both the offline and online step.
The offline run consists of running the bergm function for each of the models proposed.
Therefore we set some arguments main.iters,burn.ins,gammas which are vectors containing values for bergm arguments: main.iters,burn.in,gamma for each competing model.
The argument iters refers to the number of iterations used for the online run.The number of MCMC steps used for network simulation is specified as usual by the argument aux.iters and this will be used in both the offline and the online step.The command below should take around 10 minutes depending on the CPU speed of the computer.

Discussion
The software package Bergm aims to help researchers and practitioners in two ways.
Firstly, it is currently the only package for R that provides a simple and complete range of tools for conducting Bayesian analysis for exponential random graph models.Secondly, Bergm makes available a platform that can be easily customised, extended, and adapted to address different requirements.
The software package is under continual development and it is far from finished.The main limitation of the software is its computational cost which makes it unsuitable for managing network graphs larger than hundreds of nodes, however it is perfectly suited for networks involving up to a hundred nodes.An important improvement in terms of computational time and efficiency will be done by turning some of the R functions into C functions integrated with the ergm package.We expect that will yield further reductions in computional run time.
Future versions of the Bergm package will address several issues including Bayesian analysis of Curved Exponential Random Graph Models (Hunter and Handcock, 2006) and exponential random graph models with missing data (Koskinen et al., 2010).
A. R code for loading and plotting the network data Here we give the code to load the datasets presented in this paper and to plot the network graphs displayed in Figure 1.The main functions to load and plot network data are network and plot.networkrespectively.They are both included in the network package which is one of the depencencies of Bergm and it is automatically loaded by typing the library("Bergm") command (see Section 2).We include the set.seedstatement in order to produce exactly the same graphs of Figure 1.Both the datasets used in this paper can be downloaded from the author's online network repository at .
The Kapferer Taylor Shop network and nodal attribute data can be loaded into R by typing: R> y <-read.table("kapferer_y.dat")R> y <-network(as.matrix(y),matrix.type="adjacency",directed=TRUE)R> x <-read.table("kapferer_x.dat")R> y %v% 'job' <-x The last command is used to attache the nodal covariate "job" represented by the object x to the network object y.
To plot the network graph as in Figure 1 (top) we used the following code: R> CC <-colors() [c(24,135,53,142,258,28,551,119) The legend function creates a legend showing the colors associated to the levels of the "job" nodal attribute.For more information about this function type ?legend.
Similarly to above, the following code was used to load the Zachary Karate Club network and to plot it as in Figure 1 (bottom):

Figure 2 :
Figure 2: Some of the most used configurations for directed (a) and undirected (b) graphs.

Figure 3 :
Figure 3: The parallel ADS move of the current state (darker blue dot) consists of generating a new parameter value along the direction made by the difference of two randomly sampled parameter states (light blue dots) belonging to different chains plus a random term.

Figure 4 :
Figure 4: MCMC diagnostics for the overall chain.
Let m indicate a particular model from a set of competing models with corresponding parameters θ.Following the Bayesian paradigm, interest focuses on exploring the posterior distribution, p(m, θ|y) ∝ p(y|m, θ) p(θ|m) p(m) where p(θ|m) and p(m) are prior distributions within model m, and on model m, respectively.The reversible jump Markov chain Monte Carlo (RJMCMC) algorithm represents a trans-dimensional RJMCMC extension of the exchange algorithm involving an independence sampler based on a distribution fitting a parametric density approximation to the within-model posterior.This approach overcomes the issue of the likelihood intractability sampling from: p(θ , θ, m , m, y |y) ∝ p(y|θ, m)p(θ|m)p(m)w(θ |m )h(m |m)p(y |θ , m ) (3) where m and m are two competing models, p(y|θ, m) and p(y |θ , m ) are the two likelihoods for the data y under model m and the simulated data y under model m respectively, p(θ|m) and p(m) are the priors for the parameter and the respective model m, w(•) is a within-model proposal (independence sampler) which fit a parametric density approximation to the model posteriors and h(•) is a between-model proposal.Notice that the marginal distribution for θ and m in Equation 3) is the target distribution of interest p(θ, m|y).
Σm ) 3. simulate y from p(•|θ , m ) 4. update (θ, m) → (θ , m ) with the log of the probability: min 0, θ t [s m (y ) − s m (y)] + θ t [s m (y ) − s m (y)] + log p(θ ) p(θ) w(θ| μm , Σm ) w(θ | μm , Σm ) end for where s m (y) and s m (y) are the observed vectors of network statistics under model m and m respectively, and s m (y ) and s m (y ) are the simulated vector of network statistics under model m and m respectively.Here, we have assumed the model priors and the et al. (2006): geometrically weighted edgewise shared partners (gwesp) and geometrically weighted degrees (gwdegree):gwesp e φv n−2 k=1 1 − 1 − e −φv k EP k (y) gwdegree e φu n−1 k=1 1 − 1 − e −φu k D k (y)where the scale parameters φ v = 0.2 and φ u = 0.8.D k (y) is the number of pairs that have exactly k common neighbours and EP k (y) is the number of connected pairs with exactly k common neighbours.The specification of these models requires the creation of a list of formulas:R> formulae <-c(y ~edges + gwesp(0
(Plummer et al., 2006)ion, post.est is an object of the class bergm and contains a list of attributes among which are the real and CPU time (in seconds) taken by the It is possible to visualise the results of the MCMC estimation by using the bergm.outputfunctionwhich is based on the coda package(Plummer et al., 2006)which is an R package for MCMC output analysis and diagnostics.
R> bergm.output(post.est,lag.max=50)MCMC results for Model: y ~edges + mutual + ctriple("job") . sample θ i from the estimate of p(θ|y) i ) end forFor example, the code below is used to compare the Kapferer Tailor Shop network with a series of networks simulated from S = 100 random realisations (sample.size) of the estimated posterior distribution post.estusing50,000 iterations (aux.iters) for the network simulation step.The bgof function may take a few seconds to run and, at the end of the execution, it will automatically plot the results as shown in Figure5.