runjags : An R Package Providing Interface Utilities, Model Templates, Parallel Computing Methods and Additional Distributions for MCMC Models in JAGS

The runjags package provides a set of interface functions to facilitate running Markov chain Monte Carlo models in JAGS from within R . Automated calculation of appropriate convergence and sample length diagnostics, user-friendly access to commonly used graphical outputs and summary statistics, and parallelized methods of running JAGS are provided. Template model speciĄcations can be generated using a standard lme4 -style formula interface to assist users less familiar with the BUGS syntax. Automated simulation study functions are implemented to facilitate model performance assessment, as well as drop-k type cross-validation studies, using high performance computing clusters such as those provided by parallel . A module extension for JAGS is also included within runjags , providing the Pareto family of distributions and a series of minimally-informative priors including the DuMouchel and half-Cauchy priors. This vignette is taken from the publication for the runjags package (Denwood 2016). It outlines the primary functions of this package, and gives an illustration of a simulation study to assess the performance of equivalent model speciĄcations.


Introduction
Over the last two decades, the increased availability of computing power has led to a substantial increase in the availability and use of Markov chain Monte Carlo (MCMC) methods for Bayesian estimation (Gilks, Richardson, and Spiegelhalter 1998).However, such methods have potential drawbacks if used inappropriately, including difficulties in identifying convergence (Toft, Innocent, Gettinby, and Reid 2007;Brooks and Roberts 1998) and the potential for auto-correlation to decrease the effective sample size of the numerical integration process (Kass, Carlin, Gelman, and Neal 1998).Although writing MCMC sampling algorithms such as the Metropolis-Hastings algorithm (Hastings 1970) is relatively straightforward, many users employ software such as the Bayesian analysis Using Gibbs Sampling (BUGS) software variants WinBUGS and OpenBUGS (Lunn, Thomas, Best, and Spiegelhalter 2000).Just Another Gibbs Sampler (JAGS; Plummer 2003) is a cross-platform alternative with a direct interface to R using rjags (Plummer 2016), which can be easily extended with user-speciĄed modules supporting additional distributions and random number generators (Wabersich and Vandekerckhove 2014).Each of these uses the BUGS syntax to allow the user to deĄne arbitrary models more easily, which is attractive and attainable for researchers who are more familiar with traditional modeling techniques.However, many of these less experienced users may not be aware of the potential issues with MCMC analysis, hence the prominent warning that ŞMCMC sampling can be dangerousŤ in the WinBUGS user manual (Lunn et al. 2000).Some of this potential risk for inexperienced users can be reduced using a wrapper for the model-Ątting software that analyzes the model output for common problems, such as failure to converge, parameter auto-correlation and effective sample size, which may otherwise be overlooked by the end user.
Bayesian statistical methods, such as those used by BUGS and JAGS, also require prior belief to be incorporated into the model.There are a number of different recommendations for an appropriate choice of prior distribution under various different circumstances, for example the half-Cauchy distribution has been recommended as a reasonable choice for standard deviation parameters within hierarchical models (Gelman 2006;Polson andScott 2011), andDuMouchel (1994) gives an argument for the use of π(τ ) = s 0 (s 0 +τ ) 2 as a prior for a variance parameter τ in meta-analysis models.However, these are not available as built-in distributions in BUGS or JAGS.
This paper describes the runjags package (?) for R (R Core Team 2015) which can be used to automate MCMC Ątting and summarizing procedures for JAGS models and is available from the Comprehensive R Archive Network (CRAN) at https://CRAN.R-project.org/package=runjags.The functions are designed to be user-friendly (particularly for those less experienced with MCMC analysis), and provide a number of features to make the recommended convergence and sample size checks more obvious to the end user.The runjags package also provides additional distributions to extend the core functionality of JAGS, including the half-Cauchy and DuMouchel distributions, as well as functions implementing different types of simulation studies to assess the performance of JAGS models.Section 3 gives a worked example of usage to assess the sensitivity of an over-dispersed count observation model to various minimally-informative prior distributions.Some prior familiarity with the BUGS programming language and the underlying MCMC algorithms is assumed.All code shown below is also included in an R Ąle in the supplementary material.

Preparation
The core functionality of the runjags package allows a model speciĄed by the user to be run in JAGS, using the run.jagsfunction.The help Ąle for this function gives an overview of the core functionality of the runjags package and provides links to other relevant functions.All functions require installation of JAGS, which is an open source software package available from https://mcmc-jags.sourceforge.io/.
Before running a model for the Ąrst time, it is advisable to check the installation of JAGS and set any desired global settings such as installation locations and warning message preferences using the runjags.optionsfunction.For example, the following will Ąrst test the JAGS installation, and then set function feedback from runjags and simulation updates from JAGS to be suppressed for future model runs in this R session:

R> testjags()
You are using R version 3.3.0(2016-05-03) on a unix machine, with the X11 GUI JAGS version 4.2.0 found successfully using the command '/usr/local/bin/jags' The rjags package is installed

R> runjags.options(silent.runjags = TRUE, silent.jags = TRUE)
The help Ąle for the runjags.optionsfunction gives a list of other possible global options, and instructions on how to set these in the R proĄle Ąle for permanent use.

Basic usage
The run.jags function requires a valid model deĄnition to the model argument and a character string of monitored variables to the monitor argument before a model can be run.The model can be speciĄed in an external text Ąle, or as a character string within R. The former is likely to be preferable for more complex model formulations, but the latter eliminates the need for multiple text Ąles.Data will be necessary for most models, and it is highly recommended to provide over-dispersed starting values for multiple chains; the default settings give a warning if no initial values are provided.
There are a number of ways to provide data and initial values, depending on the preferences of the user.It is possible for the text Ąle containing the model to also contain data and initial value ŞblocksŤ, in which case these will be automatically imported with the model by run.jags and the number of chains is inferred from the number of initial value lists found.This is also compatible with standard WinBUGS or OpenBUGS text Ąles, although the addition of curly brackets is necessary to demarcate the data and initial value blocks in the same way as for the model block.It is also necessary to convert any BUGS arrays from rowmajor order to column-major order, which is done automatically if the variables are speciĄed inside a list (as is the case for BUGS, but not for R).To over-ride this setting within a speciĄc data or initial value block, the user can include #BUGSdata# to ensure all arrays are converted from row-to column-major order, #Rdata# to ensure none of the arrays are converted, and #modeldata# to pass the data block directly to JAGS for data transformation (see Section 7.0.4 of the JAGS user manual).
As a basic example, we can use the Salmonella example from Chapter 6.5.2 of the BUGS book (http://www.mrc-bsu.cam.ac.uk/software/bugs/the-bugs-project-the-bugs-book/ bugs-book-examples/the-bugs-book-examples-chapter-6-6-5-2/, with thanks to Lunn, Jackson, Best, Thomas, and Spiegelhalter 2012, for permission to reproduce their model).Simulation-speciĄc options can be provided to the run.jagsfunction, which may include the required burn-in period, sampling length and thinning interval.A basic model run with a Ąxed burn-in period (default 4,000 iterations after 1,000 adaptive iterations) and sampling period (default 10,000 iterations) can be obtained as follows: R> filestring <-" + The BUGS Book example Chapter 6.5.(.Data = c(15,21,29,16,18,21,16,26,33,+ 27,41,60,33,38,41,20,27,42), .Dim = c(6, 3)), + x = c (0,10,33,100,333,1000)) + } + + Inits { + list(alpha = 0, beta = 0, gamma = 0) + } + " R> results <-run.jags(filestring, monitor = c("alpha", "beta", "gamma")) Warning message: Convergence cannot be assessed with only 1 chain A single chain was used for this model because only one set of initial values was found in the example Ąle, resulting in the warning message regarding convergence assessment.The results of the simulation can be examined using the default print method as follows: The results show similar inference to that provided by Lunn et al. (2012), although with additional information regarding the effective sample size (SSeff), auto-correlation at a lag of 10 (AC.10), and the potential scale reduction factor (psrf) of the Gelman-Rubin statistic (Gelman and Rubin 1992) for models with multiple chains (the latter is sometimes referred to as ŚRhatŠ).In this case, an insufficient number of samples has been taken for this highly auto-correlated model (although it is important to note that the auto-correlation is markedly reduced if the ŚglmŠ module is loaded in JAGS).Displaying the effective sample size with the summary information will alert the user to the fact that additional steps should be taken before sensible inference can be made.

JAGS
The data can also be speciĄed to run.jags using the data argument, in which case it should take the format of a named list, data frame, character string as produced by dump.format, or a function (with no arguments) returning one of these.Similarly, the initial values can be speciĄed using the inits argument as a list with length equal to the number of chains, with each element specifying a named list, data frame or character string for the initial values for that chain.The initial values may also be speciĄed as a function taking either no arguments (as for the data argument) or one argument (specifying the chain number), in which case an additional n.chains argument will be required by run.jags to determine the number of chains required.

Alternative usage
To facilitate a more streamlined function call within R, an alternative method of specifying data and initial values is provided.The model formulation may contain special inline comments including: #data#, which indicates that the comma separated variable names to the right of the statement are to be included in the simulation as data, and #inits#, which indicates variables for which initial values are to be provided.Any variables speciĄed by #data# and #inits# will be automatically retrieved from a named list, data frame or environment passed to the data and inits argument (or function returning one of these), or from the global environment.Any variable names speciĄed in this way may also match a function returning an appropriate vector, and in the case of initial values, this function may accept a single argument indicating the chain for which the initial values are to be used.Note that any variables speciĄed by #data# or #inits# will be ignored if a character string is provided to the data or inits arguments, which may be useful for temporarily over-riding the values speciĄed in the model Ąle.See the dump.formatfunction for a way to generate these.In addition to #data# and #inits#, a number of optional inline comments are supported as follows: • #monitors# Ű a comma-separated list of monitored variables to use, which may include the special variables ŞDICŤ (Spiegelhalter, Best, Carlin, and van der Linde 2002) and ŞPEDŤ (Plummer 2008), which can be used to assess model Ąt; • #modules# Ű a comma-separated list of any JAGS extension modules required, optionally also specifying the status (e.g., #modules# glm on, dic on); • #factories# Ű a comma-separated list of any JAGS factories and types required, optionally also specifying the status (e.g., #factories# mix::TemperedMix sampler on); • #response# Ű a single variable name specifying the response variable; • #residual# Ű a single variable name specifying a variable that represents the residuals; • #fitted# Ű a single variable name specifying a variable that represents the Ątted value.
Each of these options can also be supplied directly to the relevant function call in R.An example of running a model using this style of model speciĄcation is as follows: Simulate the data: The following code speciĄes functions that return initial values (including RNG seeds) for each chain.The use of switch within these functions allows different initial values to be chosen for chains one and two, ensuring that initial values are over-dispersed.

Extending models
The autorun.jags function can be used in the same way as run.jags,but the burn-in period and sample length are calculated automatically rather than being directly controlled by the user.The autorun.jags function will continually extend a simulation until convergence Ű as assessed by the Gelman-Rubin statistic (Gelman and Rubin 1992) Ű has been achieved for all monitored variables, and will then extend the simulation further to compensate for any observed auto-correlation.The automated assessment of convergence should be veriĄed graphically before making inference from models Ąt to real data, but a fully automated analysis is useful for simulated data and for reinforcing the importance of convergence assessment for novice users.The following code will run the same model as above, extending the model as necessary up to a maximum total elapsed time of one hour: Alternatively, an existing model may be extended by the user in order to increase the sample size of the MCMC chains using either the extend.jagsor autoextend.jagsfunction.For these functions, the arguments add.monitor, drop.monitorand drop.chainare provided in order to change the monitored variables and number of chains being run.The combine argument controls whether the old MCMC chains should be discarded, or combined with the new chains.For example, the following code will manually extend the existing simulation by 5000 iterations, and then extend the simulation again with automatic control of convergence and sample size diagnostics:

R> results <-extend.jags(results, sample = 5000) R> results <-autoextend.jags(results)
In the second function call, the automated diagnostics run by autoextend.jagsdetermine that the simulation has converged and already has an adequate sample size, so no additional samples are taken.For more details on these functions including detailed descriptions of the other arguments and additional examples, consult the help pages for run.jags and autorun.jags.Once a valid ŚrunjagsŠ class object has been obtained, the full representation of the model, data and current status of the random number generators can be saved to a Ąle using the write.jagsfilefunction.This allows a model to be run from the last sampled values using the run.jagsfunction at a later time point, and it may also be instructive to use this function to examine the format of a syntactically-valid and complete model Ąle that can be read directly using the run.jagsfunction.It is also possible to specify a value of 0 for the sample argument in the original run.jagsfunction call, and then subsequently use write.jagsfile to produce a model Ąle with the initial values speciĄed.

Visualization methods
The output of these functions is an object of class ŚrunjagsŠ.This class is associated with a number of S3 methods, as well as utility functions for combining multiple ŚrunjagsŠ objects (combine.jags),and for conversion to and from objects produced by the rjags package (as.runjags and as.jags).Many of these allow a vars argument giving a subset of monitored nodes (using partial matching), as well as a mutate argument.This should specify a function (or a list with Ąrst element a function and remaining elements arguments to this function), and can be used to add new variables to the posterior chains that are derived from the directly monitored variables in JAGS.This allows the variables to be summarized or extracted as part of the MCMC objects as if they had been calculated in JAGS, but without the computational or storage overheads associated with calculating them directly in JAGS.
One possible application for this is for pair-wise comparisons of different levels within Ąxed effects using the supplied contrasts.mcmcfunction.
The print method displays relevant overview information, including summary statistics for monitored variables calculated and stored by the run.jagsfunction.The summary method returns a summary table for the monitored variables, which is taken from the stored values created by run.jags if available; otherwise it will be recalculated during the function call.Alternatively, summary statistics can be recalculated and stored in the ŚrunjagsŠ object using the add.summary function.There are a series of options available to these summary functions, including vars and mutate as outlined above, confidence which speciĄes a numeric vector of conĄdence intervals to calculate, and custom which allows one or more statistics calculated by a user-supplied function to be appended to the summary statistics.Note that summary options may also be passed to run.jags in order to control the summary statistics calculated and appended to the ŚrunjagsŠ object.
The plot method produces a series of relevant plots for the selected variables, including trace plots, empirical cumulative distribution function plots, histograms, and a cross-correlation plot, with additional options allowing auto-correlation plots and density plots if desired.
Further plot parameters can be speciĄed using the col and separate.chainsarguments, as well as a named list for each plot type which will be passed to the underlying lattice functions (?).The primary intention with these plots is to provide rapid access to commonly used convergence diagnostics, and plot methods associated with ŚmcmcŠ or Śmcmc.listŠobjects may be more Ćexible and intuitive for producing more speciĄc graphical output from converged MCMC chains.The coda package (Plummer, Best, Cowles, and Vines 2006) provides such plotting methods, as well as many of the underlying functions that calculate the summaries given by runjags.A typical examination of a simulation output (the default print method, and a plot output for variable names partially matching the letter ŞcŤ) could be obtained as follows: R> results    R> plot(results, vars = "c", layout = c(3, 3))

JAGS
The standard plot method presents the commonly required information in an easily readable format (including model Ąt statistics where available), but the same information can be returned in the form of a numeric matrix using the summary method.To extract additional information from the ŚrunjagsŠ object not covered by these summary statistics, see the extract method.

GLMM templates
There are many available frameworks for Ątting standard generalized linear mixed models (GLMMs) in R, but new users to MCMC may Ąnd that running relatively simple models in JAGS and comparing the results to those obtained through other software packages allows them to better understand the Ćexibility and syntax of BUGS models.To this end, the runjags package provides a template.jagsfunction which generates model speciĄcation Ąles based on a formula syntax similar to that employed by the well-known lme4 package (Bates, Maechler, Bolker, and Walker 2014; ?).After generating the template model, the user is encouraged to examine the model Ąle and make whatever changes are necessary before running the model using run.jags.For example, a basic generalized linear model (from the help Ąle for glm) can be compared to the output of JAGS as follows: R> counts <-c (18,17,15,20,10,20,25,13,12 Your model template was created at "JAGSmodel.txt" -it is highly advisable to examine the model syntax to be sure it is as intended You can then run the model using run.jags("JAGSmodel.txt")

R> jags.D93 <-run.jags("JAGSmodel.txt")
The results of these comparisons are not displayed here, but show how the same inference is presented slightly differently in a Bayesian framework.The template.jagsfunction supports Gaussian, (zero-inĆated) binomial, (zero-inĆated) Poisson and (zero-inĆated) negative binomial distributions, as well as linear and Ąxed effects, 2-way interactions and random intercept terms speciĄed using the same syntax as used in lme4.Additional distributions and link functions can be introduced by manually editing the template model Ąle.All necessary data, initial values, monitored variables and modules are saved to the model Ąle using the previously described comment syntax, and the template function also saves information about the response variable, Ątted estimates and residuals to the model Ąle, allowing residuals and fitted methods to be used with the objects returned by run.jags.

JAGS module
In addition to the R code used to facilitate running JAGS models and summarizing results, the runjags package also provides a modular extension to the JAGS language, providing additional distributions.The module can be loaded using the following command:

R> load.runjagsmodule() module runjags loaded
This makes the module available to any JAGS model, including those run using the rjags package.The available distributions extend the Pareto Type I distribution provided within JAGS to Pareto Types II, III and IV, as well as providing the generalized Pareto distribution, the Lomax distribution (a special case of the Pareto Type II distribution with µ = 0), and two distributions advocated for use as Şminimally-informativeŤ priors for variance parameters: the DuMouchel distribution (DuMouchel 1994), and the half-Cauchy distribution (Gelman 2006).The usage, probability density function (PDF) and lower bound for the support of each of the distributions provided by the module are shown in Table 1, and an example of how to use the distributions in this module is given in Section 3.
One limitation of the module provided within runjags is that it is only made available for the ŚrjagsŠ and ŚrjparallelŠ methods when loaded within R.However, a standalone JAGS module containing the same functions for use with any JAGS installation (independently of R) is available from http://runjags.sourceforge.net/.This module is named ŚparetopriorŠ to avoid naming conĆicts with the internal runjags module, and should install on a variety of platforms using the standard Ś./configureŠ, ŚmakeŠ, Śmake installŠ convention.Binary installers are also provided for some platforms.

Method options
There are a number of different methods for calling JAGS from within R using runjags, which can be controlled using the method argument or by changing the global option using the runjags.optionsfunction.The main difference between these is that some allow multiple chains to be run in parallel using separate JAGS models, with automatic pseudo-random number generation handled by runjags where necessary.The "interruptible", "rjags", "parallel" or "bgparallel" methods are recommended for most situations, but all possible methods and their advantages and disadvantages are summarized in Table 2.Note that a preexisting cluster created using the parallel package can be used by specifying a cl argument, and a maximum number of parallel simulations for these methods can optionally be speciĄed using a n.sims argument to the main function call (the default will use a separate simulation per chain, but it is possible to specify fewer simulations than chains).The model Ąt statistics are not available with parallel methods because multiple chains within the same model are required for calculation of DIC and PED, but these can be obtained using the extract method which will extend the simulation using a single simulation.The adaptation phase is always explicitly controlled to allow MCMC simulations with the same pseudo-random number seed to be reproducible regardless of the method used to call JAGS.
The two background methods do not return a completed simulation, but instead create a folder in the working environment where the simulation results will be written once the JAGS process has completed.For example, the following code will allow a JAGS simulation to be run in the background using two processors in parallel, and saving the results in a folder called ŚmysimulationŠ in the current working directory: R> info <-run.jags(model,n.chains = 2, method = "bgparallel", + keep.jags.files= "mysimulation", n.sims = 2) Name Usage in JAGS Density Lower Pareto I 1 dpar1(alpha, sigma) α > 0, σ > 0 Table 1: Distributions provided by the JAGS module included with the runjags package.
The name, JAGS code with parameterization, PDF and lower bound of the distributions are shown.All distributions have an upper bound of ∞ unless otherwise stated.
1 This is equivalent to the dpar(alpha, c) distribution and provided for naming consistency.
2 This is referred to as the Ş2nd kind ParetoŤ distribution by Van Hauwermeiren and Vose (2009); an alternative form for the PDF of this distribution is given by: α σ α (x+σ) α+1 . 3The Generalized Pareto distribution also has an upper bound of x ≤ µ − σ ξ for ξ < 0.
Starting the simulations in the background...

The JAGS processes are now running in the background
This returns the control of the terminal to the user, who can then carry on working in R Method name Description Method options "interruptible" 1 JAGS called using a shell, with output monitored and displayed within R.
"background" 1 * JAGS called as a background process, with the R prompt returned to the user.
Ű "simple" 1 JAGS called directly using a shell.Ű "parallel" 3 Multiple JAGS instances called using separate shells to allow chain parallelization.
n.sims: the number of parallel simulations.
"bgparallel" 3 * Multiple JAGS instances called using separate background processes to allow chain parallelization.
n.sims: the number of parallel simulations.
cl: a pre-created cluster to be used, and n.sims: the number of parallel simulations.
"snow" 1 Multiple JAGS instances called using separate shells set up using a parallel cluster.
cl and n.sims: as above, and remote.jags: the JAGS path on the cluster nodes.
Table 2: Methods provided by the runjags package to run simulations in JAGS.Availability of JAGS modules is as follows: 1 Installed in JAGS.
2 Loadable in the R session.
4 Loadable in R code run remotely on the cluster nodes (except DIC).* These methods are not compatible with autorun.jags and autoextend.jags.
while waiting for the simulation to complete.The default behavior on completion of the simulations is to alert the user by emitting a beep from the speakers, but conĄguration using runjags.optionsallows a shell script Ąle to be executed instead.The info variable in this code contains the name and directory of the simulation, which is given to the user if the object is printed.The results can be retrieved using either the folder name or the variable returned by the function that started the simulation:

R> background.results <-results.jags("mysimulation")
If the simulation has not yet completed, the results.jagsfunction will display the JAGS output so that the user can gauge how much longer the simulation will take.Further options for the results.jagsfunction include recover.chainswhich allows the results of successful simulations to be read even if other parallel simulations did not produce output, and read.monitorwhich allows only a chosen subset of the monitored variables to be read from the MCMC output.For all methods except "rjags" and "rjparallel", any calls to run.jagswhere the keep.jags.filesargument is speciĄed will result in a folder being created in the working directory that can be reimported using results.jags.Any failed simulations created are also kept using the same mechanism, and a message is displayed detailing how the user can attempt to recover these simulations.These failed simulation folders are automatically cleaned up when the R session is terminated.The failed.jagsfunction returns any output captured from JAGS in such cases, and is helpful to debug model code.

Simulation studies
One of the principle motivations behind the development of the runjags package is to automate the analysis of simulated data sets for the purposes of model validation.A common motivation for this type of analysis is a drop-k validation study, also known as a leave-one-out crossvalidation where k = 1.This procedure re-Ąts the same model to a single data set multiple times, with one or more of the observed data points removed from each re-Ąt of the model.This can either be a randomly selected group of a Ąxed number ŞkŤ of data points, or each individual data point in turn.The goal is to evaluate the ability of the model to predict each observation from the explanatory variables, so that any unusual observations can be identiĄed.While it is possible to repeatedly use the autorun.jagsfunction to analyze multiple data sets, the higher level run.jags.studyand drop.kfunctions are provided to automate much of this process.Large simulation studies are likely to be computationally intensive, but are ideal candidates for parallelization.For this reason, parallel computation is built directly into these functions using the parallel package.This can be used to parallelize the simulation locally, or to run the simulation on any cluster set up using the snow package (Tierney, Rossini, Li, and Sevcikova 2013).This allows for the maximization of the available computing power without requiring the end user to write any additional code, and includes an initial check to ensure that the model compiles and runs locally before beginning the parallelized study.
A drop-k study is implemented in runjags using the drop.kfunction as follows.The ŚrunjagsŠ class object on which the drop-k analysis will be performed must Ąrst be obtained using the run.jagsfunction.Here, we will use the simple linear regression model obtained in Section 2.3, with the result of run.jagscontained in the variable results.The drop.k function takes arguments drop (indicating the data variables to remove between simulations), and k (indicating the number of data points to drop for each simulation).In this case, a drop-1 study is run with the number of simulations equal to the number of data points.All individual simulations are run using the underlying autorun.jagsfunction; additional arguments for autorun.jagscan be passed through drop.kas required.The initial values for each simulation are taken from the parent simulation, including the observed values of the removed data points to ensure that the model will compile.The drop-1 study is run and the results displayed using the following syntax (limited to the Ąrst Ąve data-points for brevity): Values obtained from a drop-k study with a total of 5 simulations: The results show the 95% conĄdence interval (CI) for each data point obtained from the corresponding simulation where this data point was removed, which in this case indicates that the Ąrst Ąve data-points were predicted reasonably well.For drop-k cross-validation with ŞkŤ greater than 1, the indicated number of data points will be randomly removed from each simulation and the average values for the corresponding summary statistics from each data point will be shown.In this case, the argument simulations must also be provided.Additional arguments to autorun.jags can also be provided to the drop.kfunction.For example, the following syntax will run 100 simulations with a random selection of 2 of the 5 Ąrst Ąve data-points removed from each: R> assessment <-drop.k(results,drop = "Y[1:5]", k = 2, simulations = 100, + method = "simple", psrf.target= 1.1)R> assessment Average values obtained from a drop-k study with a total of 100 simulations: In the latter case, inference was made on each data point in several different data sets, so the results present the mean values of each summary statistic obtained from the multiple simulations.
The drop.k function is a wrapper for the run.jags.studyfunction, which can be used to perform various different types of simulation studies.This function takes the following arguments: the number of data sets to analyze, the model to use, a function to produce data that will be provided to each simulation, and a named list of ŞtargetŤ variables with true values representing parameters to be monitored and used to summarize the output of the simulation.Inline #monitor# statements can be used as with run.jags, and any target variables are also automatically monitored.Any variables speciĄed using the inline #data# statement will be retrieved from the working environment as usual and will be common to all simulations Ű data which is intended to change between simulations must therefore be provided using the datafunction argument instead.Initial variables can be speciĄed using #inits# in the model Ąle, but it is also necessary to pass a character string of all variable names required to the export.clusterargument to ensure these variables are visible on the cluster nodes.It may be preferable to specify initial values as a function, to which the data will be made available by run.jags at run time (this may be required in cases where the choice of appropriate initial values depends on the values in the data).An illustration of the run.jags.studyfunction is provided in Section 3.

Illustration of usage with a simulation study
Here we will consider a worked example of a simulation study analysis using runjags, in order to assess the performance of two equivalent model formulations with two different Şminimally-informativeŤ priors.The application is an over-dispersed count model, the use of which is widespread in many biological Ąelds (Bolker, Brooks, Clark, Geange, Poulsen, Stevens, and White 2009), including parasitology (Wilson, Grenfell, and Shaw 1996;Wilson and Grenfell 1997;Shaw, Grenfell, and Dobson 1998), where Bayesian methods of analysis have been shown to provide more robust inference than traditional methods (Denwood, Stear, Matthews, Reid, Toft, and Innocent 2008;Denwood, Reid, Love, Nielsen, Matthews, McKendrick, and Innocent 2010).

Model formulation and assessment
The gamma distribution is parameterized in JAGS and BUGS by the shape (α) and rate (β) parameters, with the expectation given by α β and variance given by α β 2 .This distribution can be used to describe underlying variability in a Poisson observation, representing an unknown amount of over-dispersion between observations.In this situation the extra-Poisson coefficient of variation cv is a useful measure of the variability of the underlying gamma distribution, and is a simple function of the shape parameter: cv = 1 α A candidate JAGS Model A (using inline data and monitor statements to be detected by runjags) is as follows: This model allows each observed Count to follow a Poisson distribution with lambda drawn from a gamma distribution with shape parameter to be estimated, and rate parameter calculated from the shape parameter and the mean of the distribution, which is also to be estimated.The prior distribution used for the mean and shape parameters is the DuMouchel prior distribution as shown in Table 1 Ű this distribution is provided by the runjags extension module which can be loaded using the #modules# tag.Here we use the same minimallyinformative prior distribution for both shape and mean parameters.The #data# statement is used to include N as data that does not change between simulations.The Count variable is also observed, but will vary between simulations so it is not retrieved from R memory using #data#.An alternative formulation of this same model could be provided using a negative binomial distribution rather than a gamma mixture of Poisson distributions, as represented in Model B: In this model, the same priors are placed on the parameters shape and mean, but the negative binomial distribution is parameterized by a probability p in place of the parameter mean.However, the gamma-Poisson and negative binomial distributions are equivalent (see Appendix A), and these models share the same prior distributions for the two parameters of interest.The two might therefore be expected to give equivalent inference.
The posterior coverage and auto-correlation of these models can be assessed using simulation studies, with data generated from a distribution with a mean of 2, cv of 1.1, and sample size of 20.These values are chosen to exaggerate any model performance issues by providing a comparatively small data set with a large number of zero observations, and are similar to those typically found in veterinary parasitological data sets (Denwood 2010).The two parameters of interest are the mean parameter which is directly monitored in the model, and the cv parameter which is a function of the monitored shape parameter.Rather than calculate the cv parameter in JAGS, this can be calculated more efficiently in R using a mutate function: The model performance assessment can be automated using run.jags.studyby creating a function to return a pre-generated simulated data set for each simulation:

)) + }) R> datafunction <-function(i) return(list(Count = alldata[[i]]))
In this case we specify the initial values as a function, illustrating the potential to make use of the stochastically-generated data while creating the initial values within the function: .RNG.name <-c("base::Super-Duper", "base::Wichmann-Hill")[chain] + return(list(shape = shape, mean = mean, .RNG.seed = .RNG.seed, + .RNG.name = .RNG.name)) + } Finally, a parallel cluster with 10 nodes is set up on the local machine, before the two simulation studies are run on this cluster using the same data.The run.jags.studyfunction will check each of the models locally using a single randomly chosen data set to ensure that the model is valid before it is passed to the cluster: R> library("parallel") R> cl <-makeCluster( 10

Sensitivity to prior distributions
The ability to incorporate prior information is an advantage of Bayesian methods, but there is often a variety of potential distributions that could be equally justiĄable in a given situation.The choice between these possibilities is known to affect the shape of the posterior in some situations (Lele and Dennis 2009), particularly when the information in the data is relatively sparse.In particular, there are various different minimally-informative priors advocated for use with variance parameters in hierarchical models, including the Gamma(0.001,0.001) distribution which is characterized by a mean of one and a very large variance.The sensitivity of a model to the choice of priors between this gamma prior and the DuMouchel prior can be evaluated using the run.jags.studyfunction, with a total of four candidate sets of priors (using each combination of DuMouchel and gamma distributions for the mean and shape parameters).These were applied to the same 1,000 simulated data sets using Model B and very similar R code to that given above.The results of these four simulation studies are shown in Table 3.There are small but noticeable differences between the inference made for both parameters using these prior distributions.The bias and auto-correlation are both approximately doubled for the mean parameter between DuMouchel and gamma priors, and more substantial changes in bias and auto-correlation are seen between priors for the cv parameter.In addition, the 95% conĄdence intervals for the cv parameter have less than 90% coverage when using the gamma prior, despite a slightly larger average range of these conĄdence intervals relative to the DuMouchel prior.

Discussion
The results presented here demonstrate the utility of simulation studies facilitated by the runjags package to evaluate the relative performance of alternative model formulations and the effect of prior distribution choices.In this case, the DuMouchel prior out-performed the more standard gamma prior, and it also possesses properties that are theoretically desirable for a minimally-informative distribution, such as invariance to inverse transformation, inĄnite variance and a mode of zero.DuMouchel (1994) proposed this prior for use with variance parameters in hierarchical models, but it has also been used in situations outside the metaanalysis application for which it was originally devised (see for example Phillips, Tam, Conti, Rodrigues, Brown, Iturriza-Gomara, Gray, and Lopman 2010;Conti, Presanis, van Veen, Xiridou, Donoghoe, Rinder Stengaard, and De Angelis 2011;Yin, Conti, Desai, Stafford, Slater, Gill, and Simms 2013).Christiansen and Morris (1997) also used the same distribution as a prior for a hierarchical regression model, and Daniels (1999) uses a uniform shrinkage prior which is equivalent to the DuMouchel distribution.Although this connection is not stated directly by DuMouchel (1994), the distribution is equivalent to a Lomax distribution with τ = x, s 0 = σ and α = 1, and therefore to a Pareto type II distribution with τ = x, s 0 = σ, α = 1 and µ = 0 (Table 1).The choice of σ dictates the median Ű a value of 1 is advocated since this also ensures invariance to the inverse transformation of τ , so this prior is equivalent in terms of variance and precision.The half-Cauchy distribution has a similar form to the DuMouchel distribution, and has also been suggested for use as a prior for variance parameters (Gelman 2006;Polson and Scott 2011).
Although it is also possible to extend other variants of BUGS, JAGS is fully open source and written in C++, making extension modules such as the one provided by runjags much easier to implement.A very useful tutorial on writing and installing a standalone JAGS module is provided by Wabersich and Vandekerckhove (2014), but it is arguably more straightforward to implement a shared JAGS library inside an R package.The conĄgure script provided inside the runjags package sets up the necessary environmental variables for compilation on any platform, and can be used as a template for creating additional extension modules within R packages.

Summary
There are several advantages to using MCMC, but also some potential disadvantages associated with failure to identify poor convergence and high Monte Carlo error.The runjags package attempts to partially safeguard against some of these difficulties by calculating and automatically reporting convergence and sample length diagnostics every time a JAGS model is run, and provides a more user-friendly way to access commonly used visual convergence diagnostics and summary statistics.Implementations of common GLMMs are provided using a standard formula-style interface, in order to encourage new users to explore the potential of MCMC inference without having to generate the full code for the model themselves.A further application of the runjags package is in implementing simulation studies so that model formulations and prior speciĄcations can be validated using techniques such as drop-k crossvalidation studies.Given that the inference made using JAGS and BUGS can be sensitive to subtly different model speciĄcations and prior distributions, a user-friendly mechanism to perform these types of analyses is potentially very useful.

A. Formulation of the negative binomial as a gamma-Poisson
The compound probability mass function of a Poisson distribution (with mean λ) integrated over a gamma distribution (with shape and scale parameters α and β respectively) is given in Equation 1.
Substituting α = r and β = 1−p p into Equation 1gives Equation 2, which can be re-written and simpliĄed to Equation 4. Equation 7 is the probability mass function of the negative binomial distribution deĄning the number of successes x before r failures with success probability p, which is therefore exactly equivalent to a gamma-Poisson compound distribution with mean α β = pr 1−p and shape α = r.

Figure 1 :
Figure 1: A series of plots displayed by the plot method for the ŚrunjagsŠ class, showing only parameters partially matched using the letter ŞcŤ with plots shown in a 3 × 3 layout.Multiple chains are shown using different colors as indicated by the key plot.

Table 3 :
) Average values for the inference on the mean parameter (true value 2) and cv parameter (true value 1.1) obtained from a negative binomial MCMC model formulation using DuMouchel and gamma priors for the mean and shape parameters.