R2MLwiN : A Package to Run MLwiN from within R

R2MLwiN is a new package designed to run the multilevel modeling software program MLwiN from within the R environment. It allows for a large range of models to be speciﬁed which take account of a multilevel structure, including continuous, binary, proportion, count, ordinal and nominal responses for data structures which are nested, cross-classiﬁed and/or exhibit multiple membership. Estimation is available via iterative generalized least squares (IGLS), which yields maximum likelihood estimates, and also via Markov chain Monte Carlo (MCMC) estimation for Bayesian inference. As well as employing MLwiN ’s own MCMC engine, users can request that MLwiN write BUGS model, data and initial values statements for use with WinBUGS or OpenBUGS (which R2MLwiN automatically calls via rbugs ), employing IGLS starting values from MLwiN . Users can also take advantage of MLwiN ’s graphical user interface: for example to specify models and inspect plots via its interactive equations and graphics windows. R2MLwiN is supported by a large number of examples, reproducing all the analyses conducted in MLwiN ’s IGLS and MCMC manuals.


Introduction
In research fields as diverse as education, economics, medicine, psychology, and biology, it is commonplace to encounter data which are clustered: for example, exam results from many students across a smaller number of schools in a cross-sectional study, or clinical measurements taken repeatedly from the same individuals in a longitudinal study. Multilevel modeling is an efficient way to model such data; it accounts for the lack of independence in clustered data, and adjusts the standard errors accordingly, whilst also opening avenues of enquiry to which more traditional multiple regression techniques are ill-suited, such as investigating group effects (e.g., Goldstein 2010;Pinheiro and Bates 2000;Raudenbush and Bryk 2002). Multilevel models (also known as mixed models, random effects models, hierarchical models, etc.) achieve this by treating the units at each level (in the above examples: students and schools, measurement occasion and individuals, respectively), as a random sample from a larger population with an assumed distribution, partitioning the residual variance between levels.

MLwiN software
The statistical software package MLwiN (Rasbash et al. 2009) is designed to allow researchers to specify, estimate and interpret multilevel models. It was first released, as a Windowsbased program, in 1998, and is still actively-developed by the Centre for Multilevel Modelling (now at the University of Bristol). With an estimated 18,000 users worldwide, the current version (v2.36, at time of writing) was released in April 2016, and native versions of the MLwiN engine for Mac OS X and Linux have been recently made available. MLwiN allows for a variety of response types to be modeled, including continuous, binary, proportion, count, ordinal, nominal, and multivariate combinations (i.e., simultaneous equations); estimation is available via iterative generalized least squares (IGLS, Goldstein 1986), which yields maximum likelihood estimates, and also via Markov chain Monte Carlo (MCMC) estimation for Bayesian inference (Browne 2012). It supports a range of data structures, including nested, crossclassified (i.e., crossed random effects) and/or multiple membership structures (Browne et al. 2001). Other features include the fitting of complex level 1 variance (heteroskedastic) models, multilevel factor analysis (MCMC only; with multiple -correlated or uncorrelated -factors at each level), adjustments for measurement errors in predictors, spatial conditional auto regressive (CAR) models, autoregressive residual structures at level 1, and a selection of MCMC algorithms to increase efficiency.
Although MLwiN can be run via a macro language and the command line, this can prove unwieldy, reflecting its origins building on the earlier statistical package NANOStat (Healy 1989). Indeed, it is likely that many users operate MLwiN via its graphical user interface (GUI); this has a number of innovative features, such as an interactive equations window, in which users can point-and-click on elements of the fully-formatted mathematical formulae representing their model to change its specification, and interactive graphical displays, allowing users to ascertain the identity of plotted points by simply clicking on them.
Funding from the UK's Economic and Social Research Council (ESRC) has allowed MLwiN to be offered free to UK-based academics, and it is otherwise available for purchase, and as an unrestricted 30-day trial version. The Centre for Multilevel Modelling's website (http:// www.bristol.ac.uk/cmm/) provides details on how to obtain the software, as well as manuals and other resources offering guidance to MLwiN, and indeed to multilevel modeling in general.

R
R (R Core Team 2016) is a freely-available, open source "language and environment for statistical computing and graphics" (https://www.R-project.org/). It is designed to be highly extensible, as evidenced by the large number of user-authored add-on packages available on the Comprehensive R Archive Network (CRAN; https://CRAN.R-project.org/). In con-trast to MLwiN, R is chiefly designed to be operated via a command line interface (CLI), although alternative user interfaces are available. Considerable support and advice is available to R users via https://www.R-project.org/, together with a vibrant online community of other forums and resources, as well as a range of books and manuals.

Fitting a 2-level continuous response model via IGLS
For our first example, we will fit a 2-level continuous response model using IGLS, employing an educational dataset available as the sample dataset tutorial in both MLwiN and R2MLwiN. This is a subset of data from a much larger dataset of examination results from six inner-London Education Authorities (school boards). The original analysis (Goldstein et al. 1993) sought to establish whether some secondary schools had better student exam performance at 16 than others, after taking account of variations in the characteristics of students when they started secondary school; i.e., the analysis investigated the extent to which schools 'added value' (with regard to exam performance), and then examined what factors might be associated with any such differences.

R> data("tutorial")
The variables we will analyze in the following examples are summarized in Table 1 (although you can view descriptions of all the variables in the original dataset by typing ?tutorial).
We will start with an example of the simplest multilevel model one can fit: a 2-level variance components model with a continuous response. Such models partition the variance in the response variable between the levels specified in the model. We will choose normexam as our response variable for student i in school j; see Equation 1.
Here, u j corresponds to the school-level random effect whilst e ij is the student-level residual error; u j and e ij are assumed to be independent of one another and normally-distributed with zero means and constant variances σ 2 u and σ 2 e . The model thus partitions the variance in normexam between that attributable to schools (σ 2 u ), corresponding to departures of the school means from the overall mean (β 0 ), and that attributable to students within schools (σ 2 e ), corresponding to departures of students' normexam scores from the mean of the school they attend.

Variable
Description school Numeric school identifier. student Numeric student identifier. normexam Students' exam score at age 16, normalized to have approximately a standard normal distribution. standlrt Student's score at age 11 on the London reading test (LRT), standardized using Z-scores. Here we have created an object, VarCompModel, to which we have assigned a call (with appropriate arguments) to R2MLwiN's runMLwiN function. In this example the Formula and data arguments are the only ones explicitly declared. In the case of the Formula, normexam has been specified as the response variable, since it is to the left of the tilde (~), and only an intercept is included in the fixed part of the model. Note that the intercept is not included by default (this is keeping with the manner in which models are specified in MLwiN), and so is explicitly added by including 1 to the right of the tilde. The random part of the model is specified in sets of parentheses arranged in descending order with respect to their hierarchy. So here we have specified that the coefficient of the intercept is allowed to randomly-vary at level 2 (1 | school) and level 1 (1 | student). Note that the variable containing the level 1 ID needs to be explicitly specified unless the model is a discrete response model (in which case it should not be specified). See Table 2 (plus later examples) for details of how to specify the Formula argument when modeling other distributions, and also Section 10 for an example using categorical explanatory variables.
Other than nominating the data.frame containing the variables to be modeled in the data argument, we accept the defaults for runMLwiN's other arguments (see ?runMLwiN). These include the distribution to be modeled (defaulting to D = "Normal"; see Table 2 for other distributions, many of which are explored in later examples) and also an argument pertaining to estimation options; this defaults to IGLS estimation, which is denoted by estoptions = list(EstM = 0). The argument estoptions is a list with a large number of possible terms -some are covered in later examples, but a more comprehensive account is provided in the relevant R2MLwiN help files. We also accept the default workdir = tempdir(), which indicates the output files are to be saved in the temporary directory.
Once this command has been entered, functions within the R2MLwiN package will then take this input and create an MLwiN macro file; MLwiN is then called and executes the macro script, and the output is returned to R for post-processing, such as producing the following output table (which can be reproduced via e.g., summary(VarCompModel)): -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-     Table 2: A summary of options for the Formula argument in R2MLwiN. They assume an intercept is added (which needs to be explicitly specified, here via the addition of 1). <link> denotes the link function, <y1>, <y2>, etc. represent response variables, <denom> denotes the denominator (optional; if not specified, a constant of ones is used as denominator), <offs> the offset (optional), <L2>, <L1>, etc. the variables containing the level 2 and level 1 identifying codes, and <ref_cat> represents the reference category of a categorical response variable (optional: if unspecified the lowest level of the factor is used as the reference category). Explanatory variables are specified as e.g., <x1> + <x2>. For "Ordered Multinomial", "Multivariate Normal" and "Mixed" responses, [<common>] indicates a common coefficient (i.e., the same for each category) is to be fitted; here <common> takes the form of a numeric identifier indicating the responses for which a common coefficient is to be added (e.g., [1:5] to fit a common coefficient for categories 1 to 5 of a 6-point ordered variable, [1] to fit a common coefficient for the response variable specified first in the Formula object for a "Mixed" response model, etc.). Otherwise a separate coefficient (i.e., one for each category) is added. <Normal_resp> indicates the position (as an integer) the Normal response(s) appears in the formula (2 in the examples in the table). For "Mixed" response models, the Formula arguments need to be grouped in the order the distributions are listed in D. Note that, when using IGLS, quasi-likelihood methods are used for discrete response models (and not, for example, adaptive quadrature). ‡ denotes IGLS only.
Beneath that, there is a reiteration of the model formula/levels we specified earlier, followed by the model estimates. First we see the coefficient estimates for the fixed part of the model, which in this case consists only of the intercept, indicating that the overall mean of normexam is −0.013 (which is close to zero since it has been standardized), with a standard error of 0.054, together with the corresponding z score, its p value, and confidence intervals (although these latter statistics are of little substantive interest in this particular instance).
Next we have the estimates (with standard errors) for the random part of the model, indicating that the school means are distributed around the overall mean with an estimated variance of 0.169 (σ 2 u ), with the (within-school) student scores distributed around their school mean with an estimated variance of 0.848 (σ 2 e ).

Conducting a likelihood ratio test
If we wish to investigate whether there are significant school differences in normexam (i.e., whether it is necessary to take account of clustering at the school-level or not), we can conduct a likelihood ratio test: subtracting the deviance statistic from the current 2-level model from the corresponding value from the simpler single-level model (Equation 2) and comparing this against a χ 2 distribution with the appropriate degrees of freedom (in this case 1, since the addition of only one parameter distinguishes the two models).

Calculating the variance partition coefficient (VPC)
If we wanted to find out how much of the variance in normexam is due to differences between schools, and how much is due to differences between students within schools, then we could calculate the variance partition coefficient using the relevant slots for this S4 class object to pull out the estimates we need for the calculation; in this instance we use the slot "RP", which stands for "random part" (typing slotNames(VarCompModel) will list all available slots): [1] 0.1659064 i.e., approximately 17% of the total variance in normexam is attributable to differences between schools. (Note that calculating the VPC is less straightforward for random coefficients and discrete response models; see, e.g., Goldstein et al. 2002.)

Storing residuals
By default, R2MLwiN does not store residuals (at any level), but it is straightforward to request it to do so by simply toggling the value of resi.store, in the list of estoptions, as follows (see Section 8, as well as ?runMLwiN, for more advanced options regarding saving residuals): R> VarCompResid <-runMLwiN(Formula = F1, data = tutorial, + estoptions = list(resi.store = TRUE)) We can then, for instance, plot the residuals using R2MLwiN's caterpillar function, a wrapper for the plot function with the addition of error bars, yielding the plot displayed in Figure 1; see the R2MLwiN demo MCMCGuide04 for an alternative choice of caterpillar plot:  model; see Figure 2. Pressing the 'Resume macro' button, within MLwiN, results in the model iterating to convergence, at which point the estimates turn green. Pressing the button again returns a dialogue box inviting the user to either close the application, or continue working in the MLwiN GUI; in either case estimates from the model as fitted up to that point are returned to R once the MLwiN application is closed (closing the application via another method may result in model estimates not being returned to R, and an error being displayed as a result). Working with MLwiN's GUI allows us to employ some of its useful features, such as the interactive equations and graph windows, as well as providing a useful aid to trouble-shooting in the case of model misspecification (for further guidance see Rasbash et al. 2012;Browne 2012).
Including show.file = TRUE within our list of estoptions would display the MLwiN macro file produced by R2MLwiN (clean.files = FALSE may also need to be added to the list of estoptions, depending on the R interface you are using); an example taken from our variance components model, above, can be found in the supplementary materials; for further information about MLwiN commands see Rasbash et al. (2003). This macro file is also saved in the working directory, as specified in the runMLwiN argument workdir, together with a range of other outputs, such as a data file in Stata format (i.e., *.dta; these can be imported by both MLwiN and R); these are deleted from the working directory once MLwiN closes, so running in debugmode provides an opportunity to access them.

Fitting a 2-level continuous response model via MCMC
We fitted our variance components model, above, using the default estimation method of IGLS, but we can fit the same model using MCMC methods by simply toggling the value of EstM (away from the default) as demonstrated below. MLwiN's MCMC engine uses Gibbs sampling for steps in the algorithm where the conditional posterior distribution has a standard form (as is the case for all parameters in the current model), and, if not, uses single site random walk Metropolis sampling with a normal proposal distribution and adaptive procedure (Browne and Draper 2000).
R has, of course, a variety of other packages which fit models using MCMC methods (e.g., Park et al. 2016), for example the MCMCglmm package written by Jarrod Hadfield (Hadfield 2010) covers many of the same models as MLwiN and also has an impressively efficient MCMC implementation, particularly for normal response models. However, for non-normal models it does require that they contain over-dispersion terms for each observation, in part so that it can efficiently estimate the model. This is less of an issue for count data, but for binary responses can be problematic as it is not clear in such models that over-dispersion at the observation level makes sense (Skrondal and Rabe-Hesketh 2007). Gelman and Hill (2007) cover many aspects of multilevel modelling within R, including, from an MCMC perspective, how to write an MCMC algorithm directly in the R language, and to call WinBUGS from R via R2WinBUGS (Sturtz et al. 2005), which requires the user to write their own model code to send to WinBUGS, whilst R2MLwiN generates these files, using (by default) IGLS starting values, on behalf of the user; see Section 12.
Below that we have the deviance information criterion (DIC) model-fit statistic (Spiegelhalter et al. 2002), and the quantities from which it is derived: •D (Dbar): Average deviance across all (post-burnin) iterations.
• DIC: Computed as D(θ) + 2pD: i.e., it is a measure of goodness of model fit which adjusts for model complexity, thus allowing models to be compared: models with smaller DIC values are preferred to those with large values, with differences of 5 or more considered substantial (e.g., Lunn et al. 2012).
Finally, following the model formula, we have the model estimates, which are largely similar to those from IGLS, and include 95% credible intervals (i.e., the 2.5th and 97.5th quantiles of each chain) for both the fixed and random part of the model. Note that, for the fixed part of the model, users can toggle between displaying the z score and its associated (two-tailed) p value (i.e., assuming normality), and a one-sided Bayesian p value (see Section 6 for an example).
The accompanying statistics include the ESS, or effective sample size (Kass et al. 1998), i.e., the number of iterations necessary to obtain these estimates if the sample were independent and identically-distributed. As such, a low ESS (compared to the number of iterations the chain has actually run for) would indicate a strongly positively autocorrelated chain, suggesting it may need to be run longer and/or sampled in a different manner, or other alternative MCMC methodologies could be employed to aid efficiency (see Section 11).
As well as the ESS, other MCMC diagnostic aids available in the R2MLwiN package include trajectory plots, as well as a larger suite of diagnostics available via the sixway function. We will demonstrate the latter when plotting the chain for the variance partition coefficient (VPC) in the following section; with regard to the former, however, entering yields the plots presented in Figure 3, namely the trajectories for the deviance, the intercept in the fixed part of the model (FP_Intercept), and the variances at level 2 (RP2_var_Intercept), and level 1 (RP1_var_Intercept) associated with the random part of the model.

Calculating the variance partition coefficient (VPC)
For our IGLS model fitted earlier, we calculated the VPC based on a single set of point estimates; in the case of MCMC, however, we have chains for σ 2 u and σ 2 e , and so can derive a chain for the VPC, as follows: RP2_var_Intercept"]) R> sixway(VPC_MCMC, name = "VPC") As you can see (Figure 4), using R2MLwiN's sixway function we obtain a range of useful statistics we can use to describe this function, including mean, mode, and interval estimates (e.g., the 95% Bayesian credible interval runs from 0.123 to 0.235), as well as a range of other charts and diagnostics: in addition to the trajectory plot (top-left), we also have a kernel density plot of the posterior distribution (top-right), the Raftery-Lewis diagnostic (Raftery and Lewis 1992) -indicating how long the chain should run to estimate the 0.025 and 0.975 quantiles (respectively) to a given accuracy (see defaults for the raftery.diag function in the coda package), and also the Brooks-Draper diagnostic -indicating the number of iterations the chain needs to run to estimate the mean to two significant places; Chapter 3 of Browne (2012) describes these, and the other diagnostics returned, in more detail.
The chains returned by R2MLwiN, as mcmc objects, are also, of course, available for analysis by a range of other packages and functions in R, including the many useful functions available in the coda package (Plummer et al. 2006).

Adding a predictor to the fixed part of a model
If we wish to see if a predictor, e.g., students' exam performance at age 11, appreciably improves the fit of the model, we can add it to the fixed part, as in Equation 3.
In the resulting output (truncated below) we see its coefficient estimate is positive and much greater than its standard error, indicating that it is a highly significant predictor, as reflected in its 95% credible interval, which excludes zero: We also see a large reduction in the DIC in this model): [1] 1640.952 7. Priors, starting values, and random seeds Browne (2012) describes in detail the priors, starting values, and random seeds used by default by MLwiN's MCMC engine, and how to modify them to suit users' own needs, so we will not fully reiterate those details here, other than providing the following few examples by means of illustration of how to do so via R2MLwiN. For fixed parameters MLwiN employs, by default, an improper uniform prior p(β) ∝ 1, but conjugate informative priors can readily be specified as a normal distribution with a mean and standard deviation as determined by the user. Let us assume we have gleaned, from prior work, some knowledge leading us to propose a prior for standlrt with a mean of 1 and standard deviation of 0.01; we can pass this information onto priorParam, listed within mcmcMeth, itself listed within estoptions, by specifying (via fixe) that we wish to construct a proper normal prior for the term in the fixed part of the model called standlrt, with c(mean, sd). See Figure 5 for a plot produced using the lattice package (Sarkar 2008), automatically loaded with R2MLwiN, indicating a strong prior which disagrees with the data.
By default, MLwiN's MCMC engine uses IGLS estimates as starting values for each parameter; this is one of the reasons its MCMC engine converges comparatively quickly which is an advantage over for example WinBUGS. However, if we wished to change the starting values, we can easily do so via the estoptions argument: specifically the startval list. Here we have specified starting values of -2 and 5 for the coefficient estimates of the Intercept and standlrt, respectively, in the fixed part of the model, and starting values of 2 and 4 for σ 2 u and σ 2 e , respectively, in the random part of the model. For illustrative purposes, we also dispense with the burnin, allowing us to subsequently view the first 500 iterations, as these will be most influenced by the starting values: R> OurStartingValues <-list(FP.b = c(-2, 5), RP.b = c(2, 4)) R> startvalMCMC <-runMLwiN(F3, data = tutorial, + estoptions = list(EstM = 1, startval = OurStartingValues, + mcmcMeth = list(burnin = 0, iterations = 500))) R> trajectories(startvalMCMC["chains"], Range = c(1, 500)) Here we can see, in Figure 6, the chains quickly moving from the starting values we have given them to values closer to those we would have received from an initial IGLS fit.
Finally, it is also a simple matter to change the random number seed from the default of 1 (e.g., estoptions = list(EstM = 1, mcmcMeth = list(seed = 2)).

Adding a random slope/coefficient
In the modelling so far we have assumed that only the intercept term in our models varies across clusters. We have already included standlrt in the fixed part of the model to allow for a fixed slope effect; if we also include it to the left of | school in the random part of the model, we will additionally allow the coefficient of our predictor to randomly vary across level 2 units, thus specifying Equation 4. Here we see that including such a random slope increases the effective number of parameters (pD), but the reduction in DIC indicates that, despite this increase in complexity, it is a better model: R> ComparingModels <-rbind(standlrtMCMC["BDIC"], standlrtRS_MCMC["BDIC"]) R> rownames(ComparingModels) <-c("Random intercept", "Random slope") R> ComparingModels We included resi.store.levs = 2 in our list of estoptions, above, to illustrate R2MLwiN's predLines function; this draws predicted lines against an explanatory variable for each group at a higher level, calculating the median and quantiles in doing so (note that predLines uses a lot of contiguous memory, so we recommend using the 64-bit version of R to mitigate against any problems this may cause). For example, the script below produces a plot of all 65 school lines, plus their intervals (see Figure 7). As well as specifying the model object, we also provide the name of the predictor (xname), the level (lev), and the probabilities (probs) used to calculate the lower and upper quantiles from which the error bars are plotted: R> predLines(standlrtRS_MCMC, xname = "standlrt", lev = 2, + probs = c(0.025, 0.975), legend = FALSE) By not declaring otherwise, we have accepted the default of selected = NULL and have thus produced quite a busy plot, although a general fanning-out pattern is apparent, reflecting the positive intercept/slope covariance at the school level (cov_Intercept_standlrt, in the model output above). To better distinguish individual schools, however, we can instead choose to select just a few (requesting a legend whilst doing so; see Figure 8

Modeling complex level 1 variance
Another important aspect of multilevel models is their ability to model complex level 1 variance -i.e., residual heteroscedasticity. Whilst it is not currently possible to fit complex level 1 variance models in R packages such as lme4 (Bates et al. 2015), it is possible to do so via MLwiN (using both IGLS and MCMC, although here we focus on the latter). For example, taking the model we have just fitted above, we can allow for the possibility that variability in normexam within-schools (i.e., around their fitted regression lines) may also depend on standlrt (see Browne et al. 2002, for an algorithm to fit this extension).
Finally, the following sample of three trajectory plots, plotting only the last 500 iterations (see Figure 10), illustrate that MLwiN has recognized that a different sampling method -Metropolis Hastings (MH), with its characteristic block-like appearance -is necessary to update the level 1 variance parameters, with Gibbs sampling updating the other parameters.

Fitting a 2-level binary response model via MCMC
We will now move on from continuous response models to consider some of the other response distribution types supported by R2MLwiN. As Table 2 illustrates, many of these, including the "Binomial" distribution we explore in the next example, can be fitted using IGLS or MCMC estimation; here, though, we will stay with MCMC in order to illustrate alternative MCMC methodology implemented in MLwiN, and also interoperability with BUGS, in the following sections.
Our example dataset is a sub-sample of the 1989 Bangladesh Fertility Survey (Huq and Cleland 1990), provided as bang1 in both MLwiN and R2MLwiN (see Table 3 for a list of the variables to be modeled).
Below we construct a logit model to investigate whether the use of contraceptives at the time

Variable Description woman
Identifying code for each woman. district Identifying code for each district. use Contraceptive use status at time of survey; levels are Not_using and Using. lc Number of living children at time of survey; levels correspond to none (None), one (One_child), two (Two_children), or three or more children (Three_plus). age Age of woman at time of survey (in years), centered on the sample mean of 30 years. urban Type of region of residence; levels are Rural and Urban. of the survey is predicted by the age of the respondent, their number of children living at the time of the survey (lc), and the type of region in which they reside (urban); furthermore, we will allow the coefficient of the latter predictor to randomly-vary at the district-level to explore whether variability between Urban areas within districts differs from that between Rural areas within districts, producing the random coefficient model in Equation 5.
logit(π ij ) = β 0 + β 1 age ij + β 2 lcOne_child ij + β 3 lcTwo_children ij + β 4 lcThree_plus ij (5) We can fit this, via R2MLwiN, as follows: R> data("bang1") R> F6 = logit(use)~1 + age + lc + urban + (1 + urban | district) R> binomialMCMC <-runMLwiN(Formula = F6, D = "Binomial", data = bang1, + estoptions = list(EstM = 1)) R> trajectories(binomialMCMC["chains"][,"FP_Intercept", drop = FALSE]) As discussed in Table 2, the terms in the parentheses immediately following logit are of the form response variable, denominator; specifying the demoninator is actually optional, and if not specified will take the form of a constant of ones, which is appropriate here. Since MLwiN requires binary response variables to be of the form 0/1, R2MLwiN transforms it into that format having detected it is indeed binary, and thus we are modeling a Bernoulli distribution. lc is a factor variable and so dummy variables will be added as predictors with None (see min(levels(bang1$lc))) as the reference category (the relevel function, part of the stats package, can be used to assign an alternative reference category, by re-ordering the levels of a factor prior to an runMLwiN call).
We have requested, on the last line, a plot of the chain trajectory for the intercept in the fixed part of the model. As the resulting plot ( Figure 11) indicates, mixing here looks relatively poor, and so it may be worthwhile examining some of the alternative MCMC methods implemented in MLwiN; we review these, and apply one to the current example, in the next section.

Alternative MCMC methods implemented in MLwiN
MLwiN's MCMC engine features a number of methods which can improve the efficiency of MCMC estimation (see Browne 2012, chapters 21-25). Table 4 provides a summary of available methods, one of which -orthogonal parameterization -we explore in the example below, before moving on to discuss further options via MLwiN's interoperability with BUGS.

An example using orthogonal parameterization to improve mixing
Given that we encountered poor mixing in our last model fit, above, we will examine whether orthogonal parameterization may help; this replaces a set of fixed effect predictors with an alternative group of predictors that span the same parameter space, but are orthogonal, and is particularly useful for instances such as this, where we are modeling a non-normal distribution (since fixed effects in normal models are generally updated in a block).
Here, then, we redefine estoptions, toggling orth to 1 in the list of mcmcOptions, and run the model again:

Method
As currently implemented in ML-wiN As specified in mcmcOptions Structured MCMC (e.g., Sargent et al. 2009) 2-level nested normal response models, with no complex level 1 variation, only smcm = 1 Structured MVN framework (e.g., Browne et al. 2009a) 2-level variance components models only smvn = 1 Othogonal fixed effect vectors (e.g., Browne et al. 2009b) All models orth = 1 Parameter expansion (e.g., Liu and Wu 1999) All models paex = c(<L>, 1) Hierarchical centring (e.g., Gelfand et al. 1995) All models hcen = <L>  As such, we see that the mixing is considerably better (Figure 12), with the ESS considerablyimproved for the fixed effects (see comparison table in the following section). The model indicates that the probability of contraceptive use decreases with age, but increases with the number of living children, and is greater in Urban areas; furthermore, the variance in contra-ceptive use differs between Rural areas (estimated as 0.417), and Urban areas (estimated as 0.417 − 2 × 0.419 + 0.7 = 0.28).

Using R2MLwiN to write BUGS code
As well as using its own MCMC estimation engine, MLwiN can write code to fit models in WinBUGS (Lunn et al. 2000) and OpenBUGS (Lunn et al. 2009), with the inclusion of its own IGLS estimates as starting values. With the aid of the rbugs package (Yan and Prates 2013), the user can employ a single runMLwiN function call to obtain starting values from an IGLS run in MLwiN, automatically generate the necessary BUGS model code, initial values, data files, and script, and then fit the model in BUGS.
As demonstrated in the script below, WinBUGS and OpenBUGS are called via the runMLwiN argument BUGO; this is a vector where n.chains corresponds to the number of chains, debug determines whether BUGS stays open following completion of the model run, seed sets the random number seed, bugs specifies the path, and OpenBUGS can toggle between FALSE for WinBUGS, and TRUE for OpenBUGS. Here we fit the same model as in the example above (the burnin, iterations and thinning we specify are the defaults, we simply make them explicit here for the purposes of this example); by specifying show.file = TRUE in the list of estoptions, the BUGS model is returned (see supplementary materials): R> WinBUGS <-"C:/WinBUGS14/WinBUGS14.exe" R> BUGSmodel = runMLwiN(Formula = F6, D = "Binomial", + data = bang1, estoptions = list(EstM = 1, + mcmcMeth = list(burnin = 500, iterations = 5000, thinning = 1)), + BUGO = c(n.chains = 1, debug = FALSE, seed = 1, + bugs = WinBUGS, OpenBugs = FALSE)) R> summary(BUGSmodel) On finishing the model run, the model estimates are returned (including the quantiles, not displayed in the truncated output above), but this time as a summary of an mcmc.list (see coda (Plummer et al. 2006)).
Running this on our machine took 140 seconds in WinBUGS, which uses (by default) a multivariate Metropolis update of the fixed effects (Gamerman 1997), whilst it took 16 seconds to run the model for the same length of chain in MLwiN (both via R2MLwiN), which uses a single site Metropolis sampler. As the table below indicates, however, the effective sample sizes (ESSs) from the run in BUGS are considerably larger than those from the run (without orthogonal parameterization) in MLwiN (although there is considerable improvement when we opt for orthogonal parameteristion in MLwiN, at least for the fixed effects; this took 15 seconds to run on our machine).

Modeling a cross-classified data structure
So far the multilevel models we have examined have been strictly hierarchical, but researchers may well encounter data structures which are cross-classified (i.e., crossed random effects), exhibit multiple membership, or indeed a combination of these; R2MLwiN allows users to fit such models via MCMC estimation.
In the following example we use an educational dataset from Fife in Scotland (available as the sample dataset xc in MLwiN and R2MLwiN); see Table 5 for a description of the variables we will model. We wish to model students' examination attainment, whilst controlling for clustering due to their current (secondary) school (sid), but also controlling for clustering due to their (earlier) primary school too (pid); since not all students who attended a given primary school all subsequently attended the same secondary school, the former are not nested within the latter, and so we have a cross-classified structure; see Equation 6, which uses the classification notation as per Browne et al. (2001).

Modeling a multiple membership data structure
Multiple membership models arise when we wish to control for clustering at a given 'level' or classification, but have lower-level units which belong to more than one group at that higher level. For example, if we wished to model employees' earnings over the past financial year, we might want to control for non-independence due to the companies which employed them during this time. If the employees were employed by more than one company, however (i.e., they were a 'member' of more than one group at this higher level), it would be judicious to control for the effects of all of them.
Below we fit such a model, using the wage1 sample dataset (available in both MLwiN and R2MLwiN); see Table 6 for a description of the variables we will model, and Equation 7 for the model itself.
You can see, below, that the argument mm, specifying the structure of the multiple membership model, has been included in the list of estoptions (NB since mm is non-NULL, xc defaults to TRUE). The argument mm can be a list of variable names, a list of vectors, or a matrix (e.g., see ?df2matrix). Here we employ variable names, which need to be assigned as lists to mmvar, specifying the classification units, and weights, specifying the weights. In the

Variable Description id
Identifying code for each office worker. company Identifying code for first company worked for over the last 12 months. company2 If worked for > 1 company over the last 12 months, identifying code for second company. company3 If worked for > 2 company over the last 12 months, identifying code for third company. company4 If worked for > 3 company over the last 12 months, identifying code for fourth company. logearn Workers' (natural) log-transformed earnings over the last financial year. numjobs The number of companies worked for over the last 12 months. weight1 Proportion of time worked for employer listed in company. weight2 Proportion of time worked for employer listed in company2. weight3 Proportion of time worked for employer listed in company3. weight4 Proportion of time worked for employer listed in company4. age_40 Age of workers, centered on 40 years. variables comprising the mmvar list, the same company, e.g., ACME Ltd, has the same value, e.g., 8, throughout. The weights specify the employee-level weighting given to each company they worked for; here we have chosen the proportion of time an employee has spent with a particular company. Each element of the list corresponds to a level (classification) of the model, in descending order, so the final NA indicates that there is not a multiple membership classification at level 1.
R> data("wage1") R> F8 = logearn~1 + age_40 + numjobs + (1 | company) + (1 | id) R> OurMultiMemb <-list(list( + mmvar = list("company", "company2", "company3","company4"), + weights = list("weight1", "weight2", "weight3", "weight4")), NA) R> ( The model indicates (as calculated at the bottom) that company accounted for 17.6% of the variance in (log-transformed) earnings, having controlled for age and number of jobs. Looking at the fixed effects, earnings on average increase with age, but those employees working for more companies over the past 12 months earn on average less; it would be interesting to investigate whether these effects persist once other variables available in the dataset are taken into account, such as sex of the employee, and whether they work part-time or not, but we will leave this model as it is, and move on to our next example.

Multivariate response models
Here we illustrate fitting a multivariate response model. In this example dataset (see Browne 2012), taken from the larger junior school project (JSP) dataset (Mortimore et al. 1988), we have two responses (see Table 7): english -an English test score marked out of 100, and behaviour -a binary behaviour score taken in the same year. We're interested in the correlation between English test performance and behaviour, and the effect of predictorssuch as sex, and an earlier test score (ravens) -on both of them, and also the effect of an earlier English fluency indicator (fluent) on the english test (i.e., not behaviour, since preliminary investigation revealed no correlation between the two); see Equation 8.
Note that the MCMC engine in MLwiN can fit mixed response multivariate models, as well as normal response models, but only if they consist of a mixture of normal responses and binary responses (with a probit, rather than logit, link), as in this case. In practice latent normal variables are constructed for each binary response with the value of the binary response governing the sign of the latent variable as illustrated below: R> data("jspmix1") R> F9 = c(english, probit(behaviour))~1 + sex + ravens + fluent[1] + + (1 | school) + (1[1] | id) R> (MixedRespMCMC <-runMLwiN(Formula = F9, D = c("Mixed", "Normal", + "Binomial"), data = jspmix1, estoptions = list(EstM = 1, + mcmcMeth = list(fixM = 1, residM = 1, Lev1VarM = 2)))) Here then we have two terms to the left of the tilde (~) in the model formula, corresponding to the two responses (their order matters, insofar as we use it to distinguish them when referring to the response variables in other parts of our model specification  and then our binary variable behaviour: this is modeled via a probit link and, since we have not specified otherwise, a constant of ones as the denominator. The default is for predictors added to the right of the tilde to have separate coefficients (i.e., one for each category) unless their name is suffixed by square brackets. In the case of the latter, a common coefficient is added to the model for the response(s) indicated in the numeric identifier within the square brackets. In this example, then, the model formula specifies separate coefficients for sex and ravens, and a common coefficient for fluent to be added to the response we earlier listed first (english). Finally, separate coefficients for the intercept are allowed to vary at level 2 (school) and also at level 1 (id), but the latter is only applicable in the case of the normal response variable (english; hence 1[1]).
Next, after requesting MCMC estimation, and Gibbs sampling (denoted by 1) as the method used to update both the fixed (fixM) and the random effects (residM), and univariate MH (denoted by 2; 3 corresponds to multivariate MH -see ?write.MCMC for further details) to update the level 1 variance (Lev1VarM), we specify our runMLwiN function call. The arguments here will be familiar to you now, although we encounter a slightly different format for the value we assign to the distribution: if our multivariate response model consisted only of normal responses, we would simply specify "Multivariate Normal"; since we have mixed response types, however, we must identify the distribution of each (in an order corresponding to their position in the preceding model formula), hence the character string: D = c("Mixed", "Normal", "Binomial").    740 -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- The model indicates that the Year 1 ravens test score has a positive effect on both of the Year 3 response variables we have fitted, whilst the Year 1 measure of fluency in English has a positive effect on the Year 3 English test score. Girls are found to both have a higher behaviour rating, and a better Year 3 English test score, than boys. There is a positive correlation between the two response variables at the pupil-level, but the correlation is negligible at the school-level: Note, as reflected in the output returned, that the DIC is currently not available for mixed response models fitted in MLwiN.

Running simulations/fitting multiple models
Whilst MLwiN can be run via a macro language and the command line, the R environment is particularly well-suited to requesting the estimation of a series of models, and the efficient post-processing of their results.
For example, simulation studies can be a useful tool for researchers wishing to investigate hypotheses such as how sensitive, or biased, their model estimates are to the different estimation methods which could be used to derive them. We will illustrate this with an example based on Browne andDraper (2000, 2006), and also Chapter 8 of Browne (2012), comparing Bayesian and likelihood-based methods of estimating multilevel models with very few level 2 units. Here we simulate a series of datasets from 'true' parameter values, and then fit each dataset to a model via the estimation methods of interest, before investigating how sensitive the model estimates are with regard to the methods used to derive them (see demo(MCMCGuide08) for an example of conducting such a study using parallel processing).

Other models and features
In addition to the models covered in the examples above, MLwiN, via R2MLwiN, has a number of other useful features; examples of many of these can be found in R2MLwiN's demos (see Section 1.3), e.g., (with relevant demo in parentheses): adjusting for measurement errors in predictors (MCMCGuide14), fitting multiple membership multiple classification (MMMC) models (MCMCGuide16), fitting spatial data models (MCMCGuide17), creating imputations for multilevel datasets with missing values (MCMCGuide18), modeling autoregressive residual structures at level 1 (MCMCGuide19), and multilevel factor analysis (MCMCGuide20).

Conclusions
This paper has introduced R2MLwiN, a new package which calls the multilevel modeling software MLwiN from within the R environment. As such, it offers a route other than the GUI, or direct authoring of macros, to run MLwiN, enabling users to benefit from working within the R environment (e.g., to parsimoniously fit multiple models, and compile and analyze the results returned using the many statistical and graphical functions available in R). In addition, R2MLwiN offers functionality not commonly-supported by existing R packages, such as modeling complex level 1 variance, multiple membership models, multivariate response models, writing (and running) BUGS models with automatically-generated model code and IGLS starting values, and MLwiN's interactive equation and graphics windows available via its GUI, which can be opened from R (if running via Windows or in Wine (Wine Development Team 2015) on other platforms). As a means to call MLwiN externally R2MLwiN joins the recently-developed programme Stat-JR ) (which interoperates with a large range of statistical software programmes) and the Stata package (StataCorp. 2015) runmlwin . This work therefore has the potential to be of benefit to a range of research communities, offering an accessible means of addressing a common research problem: namely modeling datasets with multilevel structures.