PrevMap : An R Package for Prevalence Mapping

In this paper we introduce a new R package, PrevMap , for the analysis of spatially referenced prevalence data, including both classical maximum likelihood and Bayesian approaches to parameter estimation and plug-in or Bayesian prediction. More speciﬁcally, the new package implements ﬁtting of geostatistical models for binomial data, based on two distinct approaches. The ﬁrst approach uses a generalized linear mixed model with logistic link function, binomial error distribution and a Gaussian spatial process as a stochastic component in the linear predictor. A simpler, but approximate, alternative approach consists of ﬁtting a linear Gaussian model to empirical-logit-transformed data. The package also includes implementations of convolution-based low-rank approximations to the Gaussian spatial process to enable computationally eﬃcient analysis of large spatial datasets. We illustrate the use of the package through the analysis of Loa loa prevalence data from Cameroon and Nigeria. We illustrate the use of the low rank approximation using a simulated geostatistical dataset.


Introduction
This article introduces PrevMap, an R package for classical and Bayesian inference on spatially referenced prevalence data. The package implements fitting and spatial prediction for the standard geostatistical model used in the context of prevalence mapping (Diggle, Tawn, and Moyeed 1998). This model falls within the generalized linear mixed model framework with binomial error distribution, logistic-link function and a latent Gaussian spatial process in the linear predictor. For classical analysis, we estimate parameters by Monte Carlo maximum likelihood (MCML), which uses importance sampling techniques so as to approximate the high-dimensional intractable integral that defines the likelihood function; see for example Christensen (2004). Plug-in spatial prediction is then carried out by fixing the model pa- rameters at the corresponding MCML estimates. In order to account for uncertainty in the model parameter estimates, we also consider a Bayesian approach in which plug-in predictive distributions at different values of the model parameters are weighted according to their posterior probabilities. A simpler, approximate procedure consists of fitting a geostatistical linear Gaussian model to empirical-logit-transformed prevalences. Table 1 summarizes the common functionalities required for prevalence mapping that are available in PrevMap and the existing packages geoR (Diggle and Ribeiro 2007;Ribeiro and Diggle 2001), geoRglm (Christensen and Ribeiro 2002), geostatsp (Brown 2015), geoBayes and spBayes (Finley, Banerjee, and Carlin 2007;Finley, Banerjee, and Gelfand 2015). Overall, PrevMap provides the most extensive functionality. Specifically, PrevMap provides the following features: implementation of a convolution-based low-rank approximation that can be used to reduce the computational burden when analyzing large spatial datasets; accurate numerical computation of MCML standard errors for both regression and covariance parameters estimates; inclusion of both individual-level and location-level explanatory variables with random effects defined at location-level when repeated observations are made at the same location; more flexible prior specifications for the covariance parameters; implementation of an efficient Hamiltonian Markov chain Monte Carlo algorithm for Bayesian parameter estimation.
The paper is structured as follows. In Section 2, we briefly introduce the geostatistical binomial logistic (henceforth BL) model, describe methods for classical and Bayesian inference, and outline approximate procedures based on the empirical logit transformation and low-rank approximations. Section 3 describes geostatistical analyses of Loa loa prevalence data using either the approximate linear model for the empirical logit transformation of prevalence or the exact BL model using Monte Carlo methods, both for classical and Bayesian analysis. In Section 4, we illustrate the use of the low-rank approximation by fitting a BL model to a simulated geostatistical dataset. Section 5 is a concluding discussion on planned extensions to the package.

Methodological framework
The ingredients of a geostatistical BL model are: random variables Y i of positive counts, binomial denominators m i , explanatory variables d i ∈ R p and associated sampling locations x i : i = 1, . . . , n in a given region of interest A ⊆ R 2 . Conditionally on a zero-mean Gaussian process S(x) and mutually independent zero-mean Gaussian variables Z i , Y i follows a binomial distribution with mean E[Y i |S(x i ), Z i ] = m i p i such that where we set d i = d(x i ) to emphasize the spatial context. In (1), we write τ 2 for the variance of Z i and model S(x) as a stationary isotropic Gaussian process with variance σ 2 and Matérn (1986) correlation function given by where φ > 0 is a scale parameter, K κ (·) is the modified Bessel function of the second kind of order κ > 0 and u is the distance between two sampling locations. The shape parameter κ determines the smoothness of S(x), in the sense that S(x) is κ − 1 times mean-square differentiable, with κ denoting the smallest integer greater than or equal to κ.
In most of the functions available in PrevMap, the Matérn shape parameter κ is treated as fixed. One reason for this is that, as shown by Zhang (2004), not all of the three parameters σ 2 , φ and κ can be consistently estimated under in-fill asymptotics, and in practice this translates to κ often being poorly identified. Additionally, the parameter κ is rarely of direct scientific interest. We therefore recommend either fixing κ at a plausible value, or considering a discrete set of values e.g., {1/2, 3/2, 5/2} corresponding to different levels of smoothness, and profiling on κ.

Monte Carlo maximum likelihood
The likelihood function for the parameters β and θ = (σ 2 , φ, τ 2 ) is obtained by integrating out the random effects in T i as defined by Equation 1. Let D denote the n by p matrix of explanatory variables and y = (y 1 , . . . , y n ) the vector of binomial observations. The marginal distribution of T is multivariate Gaussian with mean vector Dβ and covariance matrix Σ(θ) with diagonal elements σ 2 + τ 2 and off-diagonal elements σ 2 ρ(u ij ), where u ij is the distance between locations a product of independent binomial probability functions. The likelihood function for β and θ follows as where N (·; µ, Σ) is the density function of a multivariate Gaussian distribution with mean vector µ and covariance matrix Σ.
The MCML method (Geyer and Thompson 1992;Geyer 1994Geyer , 1996Geyer , 1999 uses conditional simulation from the distribution of T given Y = y to approximate the high-dimensional integral in Equation 3. Specifically, the likelihood function can be rewritten as where f (y, t) = N (t; Dβ 0 , Σ(θ 0 ))f (y|t) is the joint distribution of Y and T for pre-defined, fixed values of β 0 and θ 0 . We use a Markov Chain Monte Carlo (MCMC) algorithm to obtain m samples t (i) from the conditional distribution of T given Y = y under β 0 and θ 0 and approximate Equation 4 with Note that L m (β, θ) is a consistent estimator of L(β, θ), whether or not the samples t (i) are correlated. The optimal choices for β 0 and θ 0 are the maximum likelihood estimates of β and θ, for which max β,θ L m (β, θ) → 1 as m → ∞. Since our choices for β 0 and θ 0 will necessarily differ from the actual maximum likelihood estimates, the distance of L m (β m ,θ m ) from 1, whereβ m andθ m are the MCML estimates, provides a measure of the quality of the Monte Carlo approximation. In practice, we embed the maximization of L m (β, θ) within the following iterative procedure. Letβ m andθ m denote the values that maximize L m (β, θ) using an initial guess at suitable values β 0 and θ 0 , repeat the maximization with β 0 =β m and θ 0 =θ m and continue until convergence.
For maximization of the approximation to the log-likelihood l m (β, θ) = log L m (β, θ) in Pre-vMap, the user can choose between a BFGS algorithm or unconstrained optimization with PORT routines. Let ψ = log θ; analytical expressions for the first and second derivatives of l m (β, ψ) with respect to β and ψ are internally passed to the optimization functions maxBFGS of the maxLik package (Henningsen and Toomet 2011) in the former case and to the nlminb function in the latter. This can be very useful in order better to locate the global maximum on a relatively flat likelihood surface, as it is often the case for the ψ parameter. The MCML standard errors are then estimated by taking the square-roots of the diagonal elements of the inverse of the negative Hessian of l m (β m ,ψ m ). The inherent accuracy of this approximation for the standard errors is context-specific in addition to being affected by the Monte Carlo error. As a partial check, the resulting standard errors for β are typically larger than those estimated using an ordinary logistic regression.
In the examples of Section 3.3 and Section 4, the number of simulated samples is sufficiently large to make the Monte Carlo error negligible.
In the PrevMap package, conditional simulation of T given y with fixed parameters β and θ is implemented by the function Laplace.sampling. This function uses a Langevin-Hastings algorithm to update the random variableT =Σ 1/2 (T −t), wheret andΣ are, respectively, the mode and the inverse of the negative Hessian of the density of the conditional distribution. The objective of this linear transformation is to break the dependence between the different components of T so as to allow for faster convergence of the MCMC algorithm. However, when using the function binomial.logistic.MCML for parameter estimation, conditional simulation is carried out internally; see Section 3.3.1.

Bayesian inference
In the Bayesian framework, a joint prior distribution for β and θ is combined with the likelihood function through Bayes' theorem so as to obtain the corresponding posterior distribution. We assume that the prior distributions for θ and β are of the form where g(·) can be any distribution for θ, and ξ and Ω are the mean vector and a p by p covariance matrix for the Gaussian prior of β. The posterior distribution for β, θ and T is given by π(β, θ, t|y) ∝ g(θ)N (β; ξ, σ 2 Ω)N (t; Dβ, Σ(θ))f (y|t).
The function binomial.logistic.Bayes can be used to obtain samples from the above posterior distribution. This uses an MCMC algorithm, where θ, β and T are updated in turn using the following procedure.
2. Following the procedure proposed by Christensen, Roberts, and Sköld (2006), use the following re-parametrization for the covariance parameters and update each of them in turn using a random-walk Metropolis Hastings (RWMH). In each of the three RWMH forθ 1 ,θ 2 andθ 3 , the standard deviation, h say, of the Gaussian proposal at i-th iteration is given by where c 1 > 0 and c 2 ∈ (0, 1] are pre-defined constants, α i is the acceptance probability at the i-th iteration and 0.45 is the optimal acceptance probability for a univariate Gaussian distribution.
3. Update β using a Gibbs step. The required conditional distribution of β given θ and T is Gaussian, independent of y and with meanξ and covariance matrix σ 2Ω given bỹ where σ 2 R(θ) = Σ(θ).
4. Update the distribution of T given β, θ and y using a Hamiltonian Monte Carlo algorithm (Neal 2011). Specifically, let H(t, u) be the Hamiltonian function where u ∈ R n is the vector of the momentum variables and f (t|y, β, θ) is the conditional density of T given β, θ and y. The partial derivatives of H(u, t) determine how u and t change over time v according to the Hamiltonian equations In order to implement the Hamiltonian dynamic, the above differential equations are discretized using the leapfrog method (Neal 2011, pages 121-122) and approximate solutions are then found.
Two auxiliary functions, control.prior and control.mcmc.Bayes, define prior distributions and tuning parameters for the above MCMC scheme.

Empirical logit transformation
An alternative approach to exact fitting methods is to use a trans-Gaussian approximation of the model in Equation 1. This consists of fitting a linear model to the empirical logit transformation of the data,Ỹ i = log The method then assumes thatỸ i |S( having the same properties as previously defined. Guidance on when this model can be used safely is given by Stanton and Diggle (2013).
In the PrevMap package the empirical logit transformation is implemented both for classical and Bayesian inference in the functions linear.model.MLE and linear.model.Bayes.

Low-rank approximation
The Gaussian process S(x) in Equation 1 can be represented as a convolution of Gaussian noise (Higdon 1998(Higdon , 2002 , where B is Brownian motion, · is the Euclidean distance and K(·) is the Matérn kernel given by the following expression Let (x 1 , . . . ,x r ) be a grid of spatial knots. By discretization of Equation 9, and for r sufficiently large, we obtain a low-rank approximation where the U i are independent zero-mean Gaussian variables with variance σ 2 . This approximation is particularly beneficial for relatively large values of the scale parameter φ, when a small number of spatial knots is required to give a good approximation over the study-region. Note also that the number of spatial knots r is independent of the sample size n, making this approach computationally attractive when n is large.
Since the resulting approximation in Equation 11 is no longer a stationary process, we adjust the value of σ 2 by multiplying it by the following quantity The adjusted value of σ 2 is then a closer approximation to the actual variance of the Gaussian process S(x).
Different implementations of this method are possible, depending on whether we use an exact fitting method or an empirical logit approximation. In the PrevMap package, low-rank approximations can be used in each of the fitting functions listed in Table 2; we give an example in Section 4.
Implementations of the low-rank approximation for the BL and linear model are as follows.
• BL model. In this implementation the nugget effect is not included, hence τ 2 = 0. For both the classical and Bayesian analysis, conditional simulation from the distribution of the random effect U given the data y (and the model parameters in the Bayesian case) is used, hence avoiding matrix inversion.
• Linear model. The low-rank approximation is here used for the empirical logit transformation of prevalence. In this case τ 2 > 0, since the nugget effect is now a proxy for binomial sampling variation. Inversion of the covariance matrix and computation of the determinant are simplified as follows. Let K(θ) denote the n by r kernel matrix. The covariance matrix now assumes the form where I n is the n by n identity matrix. The Woodbury identity for matrix inversion gives where ν 2 = τ 2 /σ 2 . Inversion of Σ(θ) now requires inversion of an r by r matrix. Computation of the determinant, denoted by | · |, can also be simplified by using Sylvester's determinant theorem. This gives which again reduces the dimensionality of the required matrix operations from n by n to r by r.

Spatial prediction
We now consider the prediction of T * = (T (x n+1 ), . . . , T (x n+q )) at q additional locations not included in the data. This requires all relevant explanatory variables to be available at the prediction locations. We do not include the mutually independent random variables Z i in Equation 1 as part of our target for prediction, hence T ( Conditionally on T = (T 1 , . . . , T n ), β, θ and y, the target for prediction T * follows a multivariate Gaussian distribution with mean and covariance matrix where C is the cross-covariance matrix between T and T * , V is the covariance matrix of T * and D * is a q by p matrix of explanatory variables at the prediction locations. Let T *

(j)
denote the j-th simulated sampled from the posterior distribution of T * for j = 1, . . . , m. If the sample mean is to be used as a point predictor of T , the package uses the following result to reduce the associated Monte Carlo error, Prediction of the functional W (T * ) = (W (T (x n+1 )), . . . , W (T (x n+q ))) follows immediately by computing W (j) = W (T * (j) ) for j = 1, . . . , m. The PrevMap package provides automatic computation of the following functionals.
variances. In this case, the Monte Carlo error in the computation of the posterior mean is reduced by noticing that Another summary of the posterior distribution that is often relevant, particularly in problems of hotspot detection, is the exceedance probability P (T (x n+i ) > l | y) for a given threshold l and i = 1, . . . , q. We estimate this as where I(a > l) is 1 if a > l and 0 otherwise, and T (j) (x n+i ) is the i-th element of T * (j) . The spatial.pred.binomial.MCML and spatial.pred.binomial.Bayes functions can be used for classical and Bayesian spatial prediction, respectively. As we later illustrate, one of the available options is also the computation of either joint or marginal predictions. For example, joint predictions are needed when the target for prediction is an average over a sub-region. Spatial prediction for the empirical logit transformation using classical and Bayesian approaches is implemented in the spatial.pred.linear.MLE and spatial.pred.linear.Bayes functions, respectively. Low-rank approximations for each of the above functions are also available; see Section 3.3.

Example: Loa loa prevalence mapping
The data for this example relate to a study of the prevalence of Loa loa (eyeworm) in a series of surveys undertaken in 197 villages in Cameroon and southern Nigeria; see  for more details. Figure 2(a) shows the locations of the sampled villages.

Exploratory analysis
Exploratory analysis is useful under both classical and Bayesian inferential frameworks, to identify a provisional model for the data. under the classical framework, initial values for the model parameters are also needed for numerical optimization of the likelihood. Initial values for the regression coefficients can be easily obtained from an ordinary logistic regression fit. Choosing initial values for the covariance parameters is less straightforward. The shape parameter κ of the Matérn function is typically chosen from a discrete set of candidate values, which can be compared by evaluating a profile likelihood for κ based on the empirical logit transformation of the observed prevalence, as in the following example.
R> library("PrevMap") R> data("loaloa") R> loaloa$logit <-log((loaloa$NO_INF + 0.5)/ + (loaloa$NO_EXAM -loaloa$NO_INF + 0.5)) R> profile.kappa <-shape.matern(formula = logit~1, + coords =~LONGITUDE + LATITUDE, + data = loaloa, set.kappa = seq(0.2, 1.5, length = 15), + start.par = c(0.2, 0.05), coverage = 0.95) R> c(profile.kappa$lower, profile.kappa$upper) [1] 0.2140705 1.1044392 R> profile.kappa$kappa.hat  The shape.matern function evaluates the profile likelihood for κ and obtains a corresponding confidence interval with coverage specified by the argument coverage. The set of values that are used for evaluation of the profile likelihood is specified through the set.kappa argument. Computation of the confidence interval uses the interpolated profile log-likelihood as shown in Figure 1: the red line corresponds to an interpolating spline and the likelihood threshold, denoted by the horizontal dashed line, is obtained using the asymptotic distribution of a chisquared with one degree of freedom. Since the maximum likelihood estimate is very close to 1/2, we then fix the shape parameter κ at this value for the subsequent analysis.
The package geoR provides several functions that are useful for an initial exploratory analysis of geostatistical data. For example, using the function variofit, a least-squares estimation of the empirical variogram can be used in order to choose initial values for the covariance parameters of the Gaussian spatial process.

Linear model
In this section we show how to fit a linear model with Matérn correlation function to the empirical logit transformation of the Loa loa data using the maximum likelihood method. The linear.model.MLE function has its counterpart in the likfit function in geoR but, unlike likfit, uses analytic expressions for the gradient function and Hessian matrix, and delivers an estimated covariance matrix of the maximum likelihood estimates accordingly. As shown in the next section, the binomial.logistic.MCML function uses the same approach in fitting a BL model. Legend: sigma^2 = variance of the Gaussian process phi = scale of the spatial correlation tau^2 = variance of the nugget effect The first argument of linear.model.MLE specifies the covariates used in the regression as a formula object; in this case formula = logit~1 since we only fit an intercept. The argument start.cov.pars provides the initial values of φ and ν 2 (= τ 2 /σ 2 ), respectively, used in the optimization algorithm. The argument fixed.rel.nugget allows the relative variance of the nugget effect ν 2 to be fixed if desired. Additionally, two different maximization algorithms are available: if method = "BFGS" (set by default), the maxBFGS function in the maxLik package is used, otherwise method = "nlminb" and the nlminb function is then used for unconstrained optimization using PORT routines. When calling a summary of the fitted model, estimates and standard errors of the covariance parameters are given on the log-scale by default. Setting log.scale = FALSE gives estimates and standard errors on the original scale.
The function loglik.linear.model can be used either for computation of the profile likelihood for φ and/or ν 2 or for evaluation of the likelihood keeping the other parameters fixed. The auxiliary function control.profile is used to define the set of values for φ and/or ν 2 used in the evaluation of the likelihood, and the fixed values for β and σ 2 , if necessary. The shape parameter κ is also fixed at the value defined in the fitted model object that must be specified as first argument of loglik.linear.model. Control profile: parameters have been set for evaluation of the profile log-likelihood.

Binomial logistic model
We now show how to fit a BL model to the Loa loa data using either the MCML method (Section 3.3.1) or a Bayesian approach (Section 3.3.2).

Likelihood-based analysis
For the MCML method, we set the parameters of the importance sampling distribution, β 0 and θ 0 , to the estimates reported in Section 3.1 using ordinary logistic regression and a least squares fit to the variogram, respectively. The above code fits a BL model by simulating 10,000 samples and retaining every eighth sample after a burn-in of 2,000 values to approximate the likelihood integral. The function control.mcmc.MCMCL sets the control parameters of the MCMC algorithm. The argument h represents the proposal density of the Langevin-Hastings (see Section 2.1). Our suggestion is to set this to 1.65/n 1/6 , where n is the sample size, which corresponds to the optimal value for sampling from a standard multivariate Gaussian distribution (Roberts and Rosenthal 2001).

R>
Note that updating β 0 and θ 0 with the resulting MCML estimates at each iteration results in the maximum value of the approximation to the log-likelihood function approaching zero. This is an indication that the MCML estimates are converging towards the actual maximum likelihood estimates of β and θ, for which the value of the Monte Carlo likelihood is exactly zero. We now carry out spatial predictions over a 0.1 by 0.1 degree regular grid, fixing the model parameters at the MCML estimates, and summarize the predictive distribution of prevalence in each grid cell through its mean, standard deviation and probability that the estimated prevalence is above 20%.
we can also specify the scale on which predictions are required: "logit", "prevalence" or "odds". Exceedance probability thresholds and the scale on which they are provided are specified through the arguments thresholds and scale.thresholds, respectively. Figure 4 shows the images of prevalence estimates, standard errors and exceedance probabilities with associated contours. These plots are obtained using the methods plot.pred.PrevMap and contour.pred.PrevMap, whose arguments type and summary can be used to specify which summaries should be displayed.
The following code generates a set of diagnostic plots, shown in Figure 5, that provide checks on convergence of the MCMC. In the first row of Figure 5, the target for prediction is the spatial average of logit-transformed prevalence, in the second and third rows the target is logit-transformed prevalence at each of two randomly sampled location. The three columns show: the autocorrelation plot of a thinned sequence of 10000 MCMC samples; the trace plot of these same 10000 samples; the empirical cumulative distribution functions of the first 5000 and the second 5000 of these 10000 samples. None of these plots show any evidence of non-convergence.
In the PrevMap package, the control.prior function can be used to set a Gaussian prior on β and any required prior distribution for the covariance parameters σ 2 , φ and τ 2 . The arguments beta.mean and beta.covar are the mean vector and the covariance matrix of the Gaussian prior for β. Log-Gaussian and uniform priors can also be directly defined for each covariance parameter by using the corresponding arguments. For example, log.normal.sigma2 and uniform.sigma2 define log-Gaussian and uniform priors, respectively, for σ 2 . In both cases a vector of length two must be provided. If the prior is log-Gaussian the two elements are the mean and standard deviation of the distribution on the log scale. If the prior is uniform the two elements are the lower and upper limits of the support of the uniform distribution.
Control parameters for the MCMC algorithm (see Section 2.2) are specified with the function control.mcmc.Bayes.
R> mcmc.Bayes <-control.mcmc.Bayes(n.sim = 6000, burnin = 1000, thin = 1, + h.theta1 = 1, h.theta2 = 0.7, h.theta3 = 0.05, L.S.lim = c(5, 50), + epsilon.S.lim = c(0.03, 0.06), start.beta = -2.3, start.sigma2 = 2.6, + start.phi = 0.8, start.nugget = 0.05, start.S = predict(fit.glm)) The arguments h.theta1, h.theta2 and h.theta3 are the starting values for the standard deviations of the Gaussian proposals; these are then tuned according to the adaptive scheme given by Equation 7. The control parameters for the Hamiltonian Monte Carlo procedure, used to update the random effects, are L.S.lim and epsilon.S.lim. These represent, respectively, the intervals used to randomly generate from a uniform distribution the number of steps and the step size in the leapfrog method at each iteration of the MCMC (see Section 2.2). Legend: sigma^2 = variance of the Gaussian process phi = scale of the spatial correlation tau^2 = variance of the nugget effect The above code fits a Bayesian BL model and returns summaries of the posterior distribution for each of the model parameters. In the output, high posterior density credible intervals are also computed, with associated coverage specified through the argument hpd.coverage.
R> par(mfrow = c(2, 4)) R> autocor.plot(fit.Bayes, param = "beta", component.beta = 1)   Autocorrelation plots can be obtained with the autocor.plot function, whose argument param specifies the model component for which the autocorrelation plot is required. If param = "beta", then component.beta must be used to specify the component of the regression coefficients. To display autocorrelation plots for the random effect, then param = "S" and component.S must be either a positive integer indicating the component of the random effect, or "all" in order to display the autocorrelation for all components in a single plot. Using a similar syntax, the functions trace.plot and dens.plot are also available for visualization of trace-plots and kernel density estimates based on the posterior samples.

Example: Simulated data
In this example, we use a simulated binomial dataset, available in the package as data_sim.
For these data, a zero-mean Gaussian process was generated over a 30 by 30 grid covering the unit square, with parameters σ 2 = 1, φ = 0.15 and κ = 2; the nugget effect was not included, hence τ 2 = 0. Binomial observations, with 10 trials at each grid point and probabilities given by the anti-logit of the simulated values of the Gaussian process, constitute the variable y in the data. To illustrate the accuracy of the low-rank approximation, we analyze these data using three different grids covering the square [−0.2, 1.2] × [−0.2, 1.2] with 25, 100 and 225 spatial knots, respectively. By letting some knots lie outside of the unit square, we avoid the presence of edge-effects due to the restriction of the integral in Equation 9 to a sub-region of the real plane.

Conclusions and future developments
We have illustrated the use the PrevMap package for geostatistical modelling of spatially referenced prevalence data. The package is intended to be compatible with the existing geoR and geoRglm packages, but with increased functionality. By comparison with these earlier packages, PrevMap provides more accurate numerical procedures for maximum likelihood estimation of the geostatistical linear and BL models, as well as routines for evaluation of the profile likelihood. Computationally faster approximations of the likelihood function for geostatistical BL models can be obtained using the Laplace approximation (LA). However, the resulting parameter estimates can be substantially biased in the case of binomial observations with small denominators (Joe 2008), whereas the MCML method delivers asymptotically unbiased estimates.
For likelihood-based inference we have used a Langevin-Hastings MCMC algorithm because the availability of optimal scaling results makes it easier to tune than the Hamiltonian MCMC. However, for Bayesian inference where model parameters are also updated at each iteration, the required computation of the mode of the random effects conditional distribution (see Section 2.1) would have been computationally too demanding. For Bayesian analysis, we have therefore implemented an efficient Hamiltonian MCMC scheme that updates the random effects on their original scale and allows a more flexible prior specification for the model parameters.
The sampling-based approach to inference allows the user to access any predictive target through post-processing of the samples from the joint predictive distribution of the val- ues of the latent field at all prediction points. Within the package, the user can specify whether marginal or joint predictions are required for different predictive targets: logit, prevalence, odds and exceedance probabilities. Software based on analytical approximations to the marginal predictive distributions, such as the geostatsp package that uses Integrated nested Laplace approximations (Rue, Martino, and Chopin 2009), cannot routinely calculate predictive distributions for arbitrary functionals of the latent field.
The package includes several functions for automatic post-processing of the results, such as the diagnostic plots illustrated in Figure 5. As is the case for any MCMC application, these can only reveal or fail to reveal non-convergence rather than guarantee convergence, but are nevertheless useful as partial checks. We therefore considered it important to make them easily accessible to users.
The accuracy of the low-rank approximations that are incorporated into the package is context-specific. However, used with care they offer computationally efficient procedures for analyzing large datasets. The spBayes package implements a low-rank procedure based on Gaussian predictive process models (Banerjee, Gelfand, Finley, and Sang 2008). In this approach, the latent field S(x) in Equation 1 is replaced by the conditional expectation of S(x) given S(x i ) for i = 1, . . . , r < n, wherex i is a set of pre-defined spatial knots. This is particularly useful and computationally advantageous when spatial interpolation is the sole objective of the analysis. In this context, other computationally efficient procedures could also be considered, such as low-rank spline smoothers (Wood 2003). However, for applications that involve a range of inferential objectives, including both spatial prediction and estimation of covariate effects, it is desirable that the low-rank method approximates the same probabilistic model that would be used were computational burden not an issue, rather than changing the model specification. For this reason, we consider our version of low-rank approximation (Section 2.4) to be more suitable for disease mapping applications where, typically, the objectives include inference for regression parameters, both to assess the importance of hypothesized risk-factors and to enable spatial prediction under a range of scenarios. A specific example is the construction of predictive maps for malaria under different climate scenarios, or before and after widespread distribution of insecticide-treated bed-nets.
Another feature not illustrated in the present paper is the possibility of fitting a BL model to prevalence data from household surveys so as to include information at both household and individual level. More specifically, let i and j identify the i-th household and the j-th individual within that household; in this case the linear predictor is where the random effects are now defined at household level. With the exception of the geostatsp package, the model in Equation 14 can not be fitted in the other packages reported in Table 1 other than by replacing individual-level explanatory variables d ij by their locationlevel averages. However, this would invalidate inferences on the regression coefficients β by introducing ecological bias (Wakefield and Lyons 2010).
Possible extensions of the package include the implementation of functions for spatio-temporal analyses, for geostatistical modelling of zero-inflated data, for combining data from multiple spatially referenced prevalence surveys (Giorgi, Sesay, Terlouw, and Diggle 2015) and for openends count data through a Poisson log-linear formulation. We will report these extensions separately in due course.