geoCount: An R Package for the Analysis of Geostatistical Count Data

We describe the R package geoCount for the analysis of geostatistical count data. The package performs Bayesian analysis for the Poisson-lognormal and binomial-logitnormal spatial models, which are subclasses of the class of generalized linear spatial models proposed by Diggle, Tawn, and Moyeed (1998). The package implements the computational intensive tasks in C++ using an R/C++ interface, and has parallel computation capabilities to speed up the computations. geoCount also implements group updating, Langevin- Hastings algorithms and a data-based parameterization, algorithmic approaches proposed by Christensen, Roberts, and Skold (2006) to improve the efficiency of the Markov chain Monte Carlo algorithms. In addition, the package includes functions for simulation and visualization, as well as three geostatistical count datasets taken from the literature. One of those is used to illustrate the package capabilities. Finally, we provide a side-by-side comparison between geoCount and the R packages geoRglm and INLA.


Introduction
Spatial data are nowadays routinely collected and analyzed in numerous scientific disciplines such as ecology, epidemiology, forestry, geography and meteorology to name a few. Following the classification of spatial data in Cressie (1993), we consider in this work geostatistical (also called point-referenced) data that contain information about both location and an attribute of interest. The basic data structure consists of pairs {(x i , Y i ) : i = 1, . . . , n}, where x i are the coordinates of the i-th sampling location within some region of interest A ⊂ R 2 , and Y i is the observation taken at x i . Each observed value Y i is either a direct measurement of, or statistically related to, a spatially varying attribute of interest at x i , where the latter will be modeled by transforming an underlying continuous spatial process.
collected from an agricultural field in Sweden. Sections 7 and 8 provide comparisons between geoCount and, respectively, the R packages geoRglm and INLA, that can also fit geostatistical count data. Finally, Section 9 provides a summary of the geoCount capabilities and limitations.

Models
The spatial data to be modeled here consist of triplets {(x i , Y i , t i ); i = 1, . . . , n}, where x i are the coordinates of the i-th sampling location within the region of interest A ⊂ R 2 , Y i is the measurement taken at x i , and t i is a known positive value associated with x i that depends on the sampling design. There may also be available location-dependent covariates d 1 (·), . . . , d p (·) (see below). The class of GLSM introduced by Diggle, Tawn, and Moyeed (1998) is based on the following general assumptions: (A1) The spatially varying attribute of interest, which is unobservable, is related by a one-toone function to a Gaussian random field with certain parametric mean and covariance functions.
(A2) For any set of locations the observations of the response variable at these locations are conditionally independent given the values of the attribute of interest at these locations. In addition, the response variable at any location is stochastically related to the attribute of interest only at that location.
In this section we provide the model definition for the attribute of interest and response at the sampling locations, and defer the model definition at the prediction locations until Section 6.5. Let {S(x) : x ∈ A} be the Gaussian random field that is functionally related to the spatially varying attribute of interest, and S = (S(x 1 ), . . . , S(x n )) . Although the latter is not observable, it is assumed that each observed value Y i is stochastically related to the attribute of interest at x i . The general class of GLSM introduced by Diggle, Tawn, and Moyeed (1998) is hierarchically specified as follows: where: {Y i : i = 1, . . . , n} are conditionally independent given S, and have marginal pdfs/pmfs p(· | µ i ).
µ i = E(Y i | S(x i )) and g(·) is a known one-to-one link function. D = (1, d 1 , . . . , d p ) is a known n × (p + 1) design matrix, assumed of full rank, with 1 the n × 1 vector of ones and d j = (d j (x 1 ), . . . , d j (x n )) , where d j (x i ) is the value of the j-th covariate at the i-th sampling location, and β = (β 0 , β 1 , . . . , β p ) are unknown regression parameters. Σ = (σ ij ) is an n × n positive definite variance-covariance matrix with σ ij = σ 2 ρ(u ij ), where σ 2 > 0 is the unknown constant variance, ρ(u ij ) is a parametric isotropic correlation function, and u ij = ||x i − x j || is Euclidean distance.
Three families of correlation functions are implemented in geoCount: the spherical family given by where 1 A (u) is the indicator function of the set A; the Matérn family given by where Γ(·) is the gamma function and K κ (·) is a modified Bessel function of order κ; and the power exponential family given by For all of the above families the so-called range parameter φ controls the rate of correlation decay, and for the latter two families the so-called smoothness parameter κ controls smoothness of the random field S(·).
Although the above general framework can be used to model a variety of non-Gaussian spatial data, it has been mostly used to model spatial count data. The two most widely used GLSM for spatial count data are the Poisson-lognormal and binomial-logitnormal spatial models; see Diggle, Tawn, and Moyeed (1998), Christensen and Waagepetersen (2002), and Zhang (2002). For the first model, the top two levels of the hierarchical model state that while for the second model the bottom two levels of the above models are both equal to the corresponding levels in the general model (1). The above models use the parametrization in Christensen, Roberts, and Sköld (2006), which is slightly different from the one used in Diggle, Tawn, and Moyeed (1998). The R package geoCount has been written to make different kinds of Bayesian statistical inferences about these two GLSM. Among other tasks, it simulates samples from the posterior distribution of the model parameters and latent values at the sampling locations, and simulates samples from the posterior predictive distribution of the latent variables and potential counts at unsampled locations.

Priors
Specification of prior distributions for the parameters of these GLSM is a relatively unexplored area. Previous works have mostly glossed over this aspect of the model construction and have used priors chosen somewhat ad hoc and in analogy to priors used for hierarchical and non-hierarchical Gaussian models. In the present absence of a better alternative the implementation in geoCount follows the same approach and uses some default "non-informative" priors. The parameters β, σ, φ and κ are assumed independent a priori, with marginal distributions given by geoCount provides two options for π(σ): the 'half-t' family proposed by Gelman (2006) where a 1 , a 2 are user-defined hyper-parameters, which includes the (improper) uniform distribution (a 2 = −1) and the (proper) half-Cauchy distribution (a 2 = 1); and the inverse gamma family, π IG (σ) ∝ σ −(a 1 +1) exp − a 2 /σ , which includes the (improper) reciprocal distribution π(σ) ∝ 1/σ (a 1 = a 2 = 0). The latter option yields an improper posterior distribution though; see Natarajan and Kass (2000). The user-defined hyper-parameters b 1 and b 2 that determine the support of the range parameter φ can be chosen either subjectively or based on exploratory data analysis, e.g., by inspecting empirical semivariogram plots. On the other hand, the values that determine the support of the smoothness parameter κ are fixed in geo-Count: (c 1 , c 2 ) = (0.05, 4.95) for the Matérn family, and (c 1 , c 2 ) = (0.05, 1.95) for the power exponential family.

Programming strategies
4.1. R and C++ Implementation of MCMC algorithms can be computationally very intensive for some models. This is the case for the algorithms we consider for GLSM because of the large number of latent variables to be sampled, and the handling and computation of large matrices. Both computations are very time-consuming for R to handle solely, and demand a lower level and faster language. Our choice for this is C++ which is one of the most popular programming languages with both low-level and high-level language features. It provides fast computation as well as good portability and extendibility.
In geoCount the programs for the implementation of MCMC algorithms and computation of large matrices are written in C++, with the help of GSL and Armadillo libraries. The former is a numerical library for C and C++ which provides a wide range of mathematical routines, such as random number generators and computation of special functions. The latter is an open-source C++ linear algebra library (matrix maths) which supports common data types and a subset of trigonometric and statistical functions.
For efficient communication between R and C++ we employ Rcpp API developed by Eddelbuettel and François (2011). In this new API, Rcpp::RObject is the basic class which actually encapsulates SEXP objects with a thin wrapper and comes with methods that are appropriate for all types of objects and transparently manage garbage collection. In fact the SEXP is indeed the only data member of an RObject. RObject manages the life cycle of underlying SEXP wisely. The constructor of RObject takes necessary measures to guarantee the protection from garbage collection during C++ scope, and the destructor takes the responsibility to withdraw that protection. It also defines several member functions common to all objects (e.g., isS4(), attributeNames, . . . ) and the derived classes define specific member functions.

Parallel computing in R
There are many tasks for which parallel computation can be quite useful for the analysis of geostatistical data with GLSM. Some of these are Simultaneous simulation of several datasets.
Simultaneous posterior simulation of several Markov chains.
Generation of replicated data.
Simultaneous prediction at several locations.
Multi-processor and multi-core computers, either in the form of personal computers (PC) or high performance computing (HPC) clusters, have recently become much more accessible. So it is desirable, and sometimes even necessary, to optimize the computations in our package with the help of parallel computation techniques. R will only utilize one processor under the default build, regardless of the number of available processors. So we employ several R packages to solve this shortcoming, including snow by Rossini, Tierney, and Li (2007) and snowfall by Knaus (2013).

Group updating
It is by now known that when performing Gibbs or Metropolis-Hastings sampling, grouping usually leads to faster convergence rates, specially when the variables to be simulated are highly correlated. This is the case in GLSM where the components of S and θ are often highly correlated a posteriori. In addition, grouping usually reduces the overall computational work in high dimensional settings, which is also the case in GLSM since there are n latent variables. The package geoCount groups the latent variables and parameters in three groups: S, β and θ, where the components of the first two groups are updated jointly while the components of the last group are updated individually.

Langevin-Hastings algorithm
Metropolis-Hastings algorithms that use random walk proposals are the most commonly used algorithms due to the ease of implementation in many problems, but they often suffer from slow mixing and convergence issues. This is the case for the algorithm used by Diggle, Tawn, and Moyeed (1998), as shown by Christensen, Roberts, and Sköld (2006) and Jing (2011). This behavior is due to the fact that the chain moves from the current point following a proposal distribution that completely ignores the information in the target distribution. In contrast, Langevin-Hastings algorithms use local information of the target density to make proposals, which can be significantly more efficient, specially in high dimensional problems. Some properties of these algorithms for performing simulation-based Bayesian inference were investigated by Roberts and Tweedie (1996) and Roberts and Rosenthal (1998), while the application of Langevin-Hastings algorithms for fitting GLSM appeared in Christensen and Waagepetersen (2002), Christensen, Roberts, and Sköld (2006) and Jing (2011). It was shown in Roberts and Rosenthal (1998) that when a proper scale is chosen for the proposal, algorithms that use Langevin proposals take on average O(r 1/3 ) steps to converge, with r the dimension of the vector being updated. This compares quite favorably to the O(r) steps that take algorithms that use random walk proposals to converge, and the benefit of Langevin-Hastings algorithms increase with the dimension r. This is specially relevant for the current models where the possibly high-dimensional vector S (for which r = n) is one of the groups to be jointly updated.

Data-based parameterization
It is by now well recognized that convergence of MCMC algorithms may crucially depend on the choice of parameterization; see Roberts and Sahu (1997), Papaspiliopoulos, Roberts, and Sköld (2003) and Papaspiliopoulos, Roberts, and Sköld (2007). In GLSM the components of S and θ are strongly dependent a priori, and depending on the observed value of Y , S and θ may also be strongly dependent a posteriori, which will reduce the efficiency of MCMC algorithms. Papaspiliopoulos, Roberts, and Sköld (2007) described two basic types of parameterizations in hierarchical models, called centered and non-centered parameterizations, and showed that none of them is uniformly better than the other. These parameterizations are complementary in the sense that when an MCMC algorithm displays slow convergence under one parameterization, it often converges much faster under the other parameterization. But which one occurs for a particular dataset depends on how informative the observed value of Y is about S. As a compromise Papaspiliopoulos, Roberts, and Sköld (2007) proposed the use of data-based parameterizations aimed at reducing the dependence among S and θ a posteriori, and argued that MCMC algorithms based on these parameterizations would display good convergence behavior regardless of the observed data. In this sense, these parameterizations and the MCMC algorithms based on them are considered robust.
Based on this idea, Christensen, Roberts, and Sköld (2006) proposed a data-based parameterization in GLSM aimed at making the components of (S, β, θ) approximately uncorrelated a posteriori, with equal means and equal variances. The latter is recommended since the efficiency of Langevin-Hastings algorithms is sensitive to variance inhomogeneities of the components, a situation expected to occur due to different meanings of S, β and θ.
Usually it is not possible to find explicitly the desired parameterization, unless the posterior distribution is multivariate Gaussian. Based on a N (ξ, Ω) prior for β, Christensen, Roberts, and Sköld (2006) proposed the use of a Gaussian approximation to the distribution of (S, β | θ, y) to orthogonalize and standardize these components, with y being the observed data. This data-based parameterization, denoted by (S,β,θ), is given bỹ S =S(S, β, θ, y) whereΣ 1/2 andΩ 1/2 are the Cholesky decompositions ofΣ andΩ, respectively; when π(β) ∝ 1 is used, the terms Ω −1 and Ω −1 ξ are set to zero. Also, Λ(S) is the diagonal matrix with diagonal entries − ∂ 2 By construction the components ofS andβ will be, a posteriori, approximately uncorrelated with mean 0 and variance 1. Christensen, Roberts, and Sköld (2006) . When y i = 0 in the Poisson-lognormal spatial model (y i ∈ {0, t i } in the binomiallogitnormal spatial model), both Λ(Ŝ) ii and (Λ(Ŝ)Ŝ) i are set to 0, the limit that results when y i → 0 (y i → 0 or y i → t i ). geoCount uses a Metropolis-within-Gibbs algorithm to updateS andβ as separate blocks based on a Langevin proposal with scalings chosen by trial and error aimed at obtaining an acceptance rate of about 0.57, which is close to optimal, Roberts and Rosenthal (1998). Christensen, Roberts, and Sköld (2006) reported that using the default scalings 1.65 2 /n 1/3 and 1.65 2 /p 1/3 forS andβ, respectively, works often reasonably well, which was also found to be so in Jing (2011); see Christensen, Roberts, and Sköld (2006) for details.
For the covariance parameters it has been found that the posterior correlation between φ and σ is usually strong, but the technique used above to reparametrize S and β is not feasible for θ due to the complex way in which the covariance parameters enter into the likelihood. Following Christensen, Roberts, and Sköld (2006), geoCount uses the parameterizatioñ where these parameters are updated using individual one-dimensional Metropolis-Hastings steps with random walk proposals having scalings chosen by trial and error aimed at obtaining an acceptance rate of about 0.45, which is close to optimal, Gelman, Roberts, and Gilks (1996).
6. Package functionality 6.1. Integrated datasets The package geoCount includes three geostatistical count datasets: Rongelap: This dataset appeared in Diggle, Tawn, and Moyeed (1998) and was modeled using the Poisson-lognormal spatial model. It consists of photon emission counts emitted from radioactive caesium and collected at 157 locations in the Rongelap Island, a part of the Marshall Islands. This dataset is a four-component list: the first component contains the coordinates of the sampling locations, x i , the second contains the measured counts, y i , the third contains the times (in seconds) over which counts were accumulated, t i , and the last the coordinates for the Rongelap Island boundary. For further details see Diggle, Harper, and Simon (1997).
Weed: This dataset appeared in Guillot, Lorén, and Rudemo (2009) and was modeled using the Poisson-lognormal spatial model. It consists of weed counts of non-crop plants collected over 100 frames all of 0.5 by 0.7 meters at the Bjertorp Farm in the south-west of Sweden. This dataset is a 100 by 4 data frame: the first two columns contain the coordinates of the sampling locations, x i , the third contains the observed counts, y i , and the last are estimates of the counts obtained from photographs. The areas of the frames over which the counts were collected (the t i s in this case) are all equal, so they can be assumed equal to 1.
Rhizoc: This dataset appeared in Zhang (2002) and was modeled using the binomial-logitnormal spatial model. It consists of counts of the root disease Rhizoctonia root rot present in barley, and collected at 100 sampling locations in the Cunningham Farm in the northwest of the US. For each sampling location 15 plants were pulled from the ground for inspection. This dataset is a 100 by 5 data frame: the first two columns contain the coordinates of the sampling locations, x i , the third contains the total number of crown roots in the plants pulled (the t i s in this case), the fourth contains the total number of infected crown roots in the plants pulled, y i , and the last is the barley yields at the sampling locations.

Data simulation and visualization
The function simData simulates geostatistical data from either the Poisson-lognormal or binomial-logitnormal spatial model. An illustration of the former is: R> library("geoCount") R> loc <-expand.grid(1:10, 1:10) R> dat <-simData(loc, L = 0, X = NULL, beta = 2, Y.family = "Poisson", + rho.family = "rhoPowerExp", cov.par = c(1, 0.2, 1)) where loc is an n by 2 matrix containing the coordinates of the sampling locations and L is a vector of length n containing the t i s; the default value 0 indicates that t i = 1 for all i. For internal computations geoCount decomposes the trend part of the model as where X is a n × p matrix (p ≥ 0) containing the values of the covariates, if there are any, and β + = (β 1 , . . . , β p ) . With this convention X = NULL indicates that there are no covariates, and beta is a vector of length p + 1 containing the intercept and the covariate coefficients, if there are any. Y.family sets the model type to be simulated, "Poisson" or "Binomial". The argument rho.family sets the family of correlation functions, "rhoSph", "rhoMatern" or "rhoPowerExp" for, respectively, the spherical, Matérn and power exponential families, and cov.par is a vector of length 3 containing the covariance parameters (σ, φ, κ).
The output of this function is a list with two vector elements: data containing the counts Y , and latent containing the latent variables S.
We will use throughout the paper the Weed dataset described above to illustrate the package functionality which will be fit using the Poisson-lognormal spatial model (4) with constant mean and power exponential correlation function (3).
The function plotData produces a graphical display of geostatistical data. An illustration using the Weed dataset is given in Figure 1: R> data("Weed") R> plotData(Weed[, 3], Weed[, 1:2], xlab = "East (meter)", + ylab = "North (meter)") The "bubbles" are centered at the sampling locations and their diameters are proportional to the magnitude of the observations (this default shape can be changed by setting the pch parameter). plotData can also display the boundary of the region of interest A, when this information is provided by the two-column matrix argument bdry.
The starting point to simulate or analyze geostatistical data is a set of sampling locations. geoCount provides three functions that generate "regular" arrangements of sampling locations: locGrid generates a rectangular lattice of locations, locCircle generates a set of locations on a circle centered at the origin, and locSquad generates a set of locations on a square centered at the origin. For instance, locGrid(a, b, n1, n2) generates a rectangular lattice on [0, a] × [0, b] having n1 equally spaced locations per row and n2 equally spaced locations per column. An illustration on the use of these functions and the display of the datasets Figure 2: Plot of three geostatistical data sets: dat1 was simulated at sampling locations loc1 (black); dat2 was simulated at loc2 (red); and dat3 was simulated at loc3 (green).

Posterior sampling
The function MCMCinput is used to set up the sampling model, prior and environmental settings needed to run the MCMC algorithm. An example for the Weed dataset is: R> input.Weed <-MCMCinput(Y.family = "Poisson", rho.family = "rhoPowerExp", + run = 2200, run.S = 1, phi.bound = c(10, 300), priorSigma = "Halft", + parSigma = c(1, 1), ifkappa = 0, initials = list(c(4), 0.7, 90, 1), + scales = c(0.55, 3.5, 0.5, 0.4, 1)) where run is the number of iterations of the MCMC algorithm and run.S is the number of times S is updated as a group within each MCMC iteration to improve accuracy and reduce autocorrelations. Increasing run.S usually does not increase dramatically the running time of the algorithm. phi.bound sets the support for the range parameter φ, which needs to be chosen with care to avoid deteriorating the accuracy and efficiency of the algorithms. priorSigma sets the type of prior for σ, with options "Halft" or "InvGamma", and parSigma sets its hyper-parameters; see Section 3. ifkappa is an indicator of whether the covariance parameter κ must be sampled from its posterior distribution (ifkappa = 1) or fixed at a value (ifkappa = 0). initials sets the starting values for β, σ, φ and κ which can be chosen from an exploratory data analyses, and scales sets the scalings of the Langevin and random walk proposals for S, β, σ, φ and κ; the last component of the last two arguments are ignored when ifkappa = 0. In general the algorithm converges fast, and it can be run a few times with different sets of starting values as a numerical check.
The function runMCMC samples from the posterior distribution in Poisson-lognormal and binomial-logitnormal spatial models using MCMC algorithm described in Section 5. Specifically, this function simulates posterior samples for (S,β,θ) which are later transformed back to obtain posterior samples for the latent variables S and parameters (β, θ) in the original parameterization using (6) and (7). An example for the Weed dataset is: where Y is the n × 1 vector of observed counts. runMCMC has additional arguments that can set the sampling model, prior and environmental settings, but these do not need to be specified or are disregarded when MCMCinput is given. The output from runMCMC is a list with the following elements: AccRate: A vector of length 4 or 5 containing the acceptance rates for S, β, σ, φ and κ.
The above function uses one CPU to perform the computational tasks. When several CPUs are available (as in the case of multi-core PCs or high performance computing clusters), the function runMCMC.sf performs the robust MCMC algorithms to generate several posterior samples in parallel, each of them of length run. This function enables different CPUs to run the runMCMC function simultaneously with different initial values, and performs the parallel computation using the packages snow, snowfall, and rlecuyer (Sevcikova and Rossini 2012). The arguments of this function are the same as those in runMCMC, plus additional arguments with settings for the parallel computation. An example for the Weed dataset is: R> library("snowfall") R> library("rlecuyer") R> res.Weed.p <-runMCMC.sf(loc = Weed[, 1:2], Y = Weed[, 3], L = 0, + X = NULL, MCMCinput = input.Weed, n.chn = 4, n.cores = 4, + cluster.type = "SOCK") where n.chn is the number of chains that want to be simulated in parallel and n.core is the number of CPUs used for the computation. cluster.type sets the type of cluster used in the parallel computation, with options "SOCK", "MPI", "PVM" or "NWS". The output from this function is a list of length n.chn, in which each element is itself a list like the output from runMCMC.
In practice we recommend to first simulate short pilot chains using runMCMC to discover values for the argument scales in MCMCinput that make the MCMC algorithm efficient (see Section 5.3), and then use these to simulate several longer chains in parallel using runMCMC.sf.

Chain handling and diagnostics
The function cutChain takes as input the output of runMCMC to perform the desired amount of "burn-in" and "thinning" on the simulated chain. For the Weed dataset we have: R> res.Weed <-cutChain(res.Weed, chain.ind = 1:4, burnin = 100, + thinning = 10) where chain.ind indicates the element numbers in res.Weed that are processed. The output of this function is the same as the output of runMCMC, except that the element AccRate is absent. When parallel chains are available from the output of runMCMC.sf, lapply can be used to apply cutChain to each chain individually: R> res.Weed.p <-lapply(res.Weed.p, cutChain, chain.ind = 1:4, + burnin = 200, thinning = 10) These parallel chains need to be examined for mixing and convergence before they are combined to be used for inference. The R package coda (Plummer, Best, Cowles, and Vines 2006) is a well-known package for output analysis and diagnostics of MCMC simulations. It provides a variety of functions for visualizing posterior samples and performing convergence tests. In addition, geoCount includes the function plotACF that plots the auto-correlation curves of the simulated latent variables and the function findMode that computes the modes of the estimated marginal posterior densities. An example of the above functions applied to the output res.Weed.p appears below and in Figure 3: R> par(mfrow = c(2, 2)) R> phi.est <-c() R> for (  R> res.Weed.p <-mixChain(res.Weed.p) Based on this posterior sample inference about any of the model components can be summarized in the usual way. For instance, for the Weed dataset we use posterior quantiles to summarize all parameters and three latent variables, which are reported in Table 1:

Prediction
In this section we describe how geoCount performs predictive inference about the spatially varying attribute of interest and potential response counts. Let x p 1 , x p 2 , . . . , x p m be the set of prediction locations where inference about the spatially varying attribute is sought. This is given by g −1 (S(·)) = e S(·) (an intensity) for the Poisson-lognormal spatial model (4), and g −1 (S(·)) = e S(·) /(1 + e S(·) ) (a probability) for the binomial-logitnormal spatial model (5). Also, let Y p j be the response count that could have been measured (but was not) at x p j , following the top level of the hierarchy in either model (4) or (5). Before describing how geoCount performs predictive inference we first create a regular grid of prediction locations that covers the convex hull of the sampling locations of the Weed dataset: For prediction location x p j , the posterior predictive distribution of (Y p j , S(x p j )) is given by p(y p j , s p j | y) = p(y p j , s p j , s, β, θ | y)dsdβdθ = p(y p j | s p j )p(s p j | s, β, θ)p(s, β, θ | y)dsdβdθ, where p(y p j | s p j ) is given by the top level of the hierarchy in either model (4) or (5), with a conjectured value t p j , and p(s p j | s, β, θ) is the normal pdf with parameters obtained from the standard formulas for the conditional mean and variance of Gaussian processes; the absence of some of the conditioning variables in the densities of the integrand above follow from the general assumptions (A1) and (A2) in Section 2. Hence, simulation from this posterior predictive distribution is easily obtained from a sample from the posterior distribution of (S, β, θ).
The function predY simulates a sample from the posterior predictive distribution of S p = (S(x p 1 ), . . . , S(x p m )) and Y p = (Y p 1 , . . . , Y p m ) . This function can perform the computations serially or in parallel. An illustration of the latter using the prediction locations locp created above is: R> Ypred <-predY(res.m = res.Weed.p, loc = Weed[, 1:2], locp = locp, + X = NULL, Xp = NULL, Lp = rep(1, nrow(locp)), k = 1, + Y.family = "Poisson", rho.family = "rhoPowerExp", + parallel = "snowfall", n.cores = 4, cluster.type = "SOCK") where run.m is the output of the function runMCMC, or of its parallel counterpart, locp is an m × 2 matrix with the coordinates of the prediction locations, Xp is an m × p matrix with the values of the covariates (if there are any) at the prediction locations, and Lp is a vector of length m with the conjectured values t p j s. The optional argument k specifies the value of the covariance parameter κ, when this is assumed fixed, and the argument parallel is included when parallel computation is desired; when the latter is absent the computation is done serially. The output of this function is a list with two elements, latent.predict and Y.predict, which are m × q matrices containing samples from the posterior predictive distribution of S p and Y p , respectively, in which q is the length of the chain in the input run.m.

Multiple chain generation with parallel computing
In this section we report the results of experiments to assess running times when the parallel capabilities of geoCount are used. The experiments were run in the following computer environment:   To perform the experiments we used a dataset simulated at 100 sampling locations in a regular grid of [0, 1] × [0, 1] using the Poisson-lognormal spatial model with exponential covariance function, constant mean, and parameters β = 5, σ = 0.5 and φ = 0.2.
Based on the above model and dataset we simulated posterior samples for the parameters and latent variables using different number of chains of size 1000 each (with cluster type "SOCK", see documentation of snow for details); the results are summarized in Table 2. It shows that the running times increase with the number of chains, but the average time for generating 1000 posterior samples decreases and achieves the minimum for 8 chains (there were a total of 8 available processors).

Comparison between geoCount and geoRglm
The R package geoRglm, developed by Christensen and Ribeiro (2002), is an extension of the well-known R package geoR developed by Ribeiro and Diggle (2001), that also provides functions for Bayesian inference and prediction for the Poisson-lognormal and binomiallogitnormal models (4) and (5). Like geoCount, geoRglm uses MCMC algorithms to sample from the posterior distribution of (S, β, θ), where group updating is done based on Langevin-Hasting updates for the first two blocks and one-dimensional random walk Metropolis for the components of the last block. But unlike geoCount, geoRglm uses the so-called non-centered parameterization (see Papaspiliopoulos, Roberts, and Sköld (2007) for details) that seeks to turn the components of S and β uncorrelated a priori, rather than a posteriori as geoCount does, and no capability for parallel computation is available. geoRglm implements the computationally expensive MCMC sampling and computation of large matrices in C; for details see Christensen and Waagepetersen (2002), Diggle and Ribeiro (2007), and the geoRglm webpage http://gbi.agrsci.dk/~ofch/geoRglm/. Christensen, Roberts, and Sköld (2006) found that the performance of MCMC algorithms based on the centered and non-centered parameterizations depend greatly on the observed data, and for some datasets the algorithms based on these parameterizations may perform quite poorly. To illustrate this they used two real datasets, one consisting of large counts (the Rongelap dataset described in Section 6.1) and the other of small counts, and compared the efficiency of MCMC algorithms based on the centered, non-centered and data-based parameterizations. They found that neither the centered nor the non-centered parameterizations were particularly efficient for these datasets, since posterior samples of parameters and latent values displayed very strong autocorrelations. Nevertheless, they found that the centered parameterization was more efficient than the non-centered one for the large count dataset, while the opposite held true for the small count dataset. On the other hand, the data-based parameterization was substantially more efficient than the other two for both datasets, since posterior autocorrelations of parameters and latent values were substantially smaller.
In this section we carry out a similar efficiency comparison between geoRglm that implements the non-centered parameterization and geoCount that implements the data-based parameterization, using the Weed and Rhizoc datasets described in Section 6.1. For each combination of dataset and package we simulated a single chain of size 55000, removed the first 5000 simulated values (burn-in), and then kept every 10th simulated value to assess mixing and convergence of the simulated chain. We do not use the parallel capability of geoCount to make the fitting comparable with geoRglm in most possible respects.
A referee pointed out that the above comparison might not be fair to geoRglm since it does not take into account computational speed, and the faster geoRglm should be allowed to run longer to have a better chance to reach convergence. To investigate this we perform a second run of geoRglm, but now with the number of iterations, burn-in and thinning lengths all multiplied by 10, so we end up with the same effective simulation size as in the previous run: R> MCc.w2 <-mcmc.control(S.scale = 0.0013, phi.scale = 4.5, phi.start = 150, + burn.in = 50000, thin = 100, n.iter = 500000) . These plots suggest that, for the Weed dataset, the MCMC algorithm implemented in geoRglm has serious convergence problems, and running longer chains seems to be of no avail.

Fitting the Rhizoc dataset
Following the analysis in Zhang (2002), we fit the Rhizoc dataset using the binomial-logitnormal spatial model (5) with constant mean and spherical covariance function. The chosen priors for the parameters are the same as those used to fit the Weed dataset. Below we display first the geoRglm code and later the geoCount code to fit the Rhizoc dataset: R> data("Rhizoc") R> MOD.rh <-model.glm.control(trend.d = "cte", cov.model = "spherical") R> MCc.rh <-mcmc.control(S.scale = 0.025, phi.scale = 300, phi.start = 40, + burn.in = 5000, thin = 10, n.iter = 50000) R> PGC.rh <-prior.glm.control(beta.prior = "flat", phi.prior = "uniform", R> ppk.rh <-cutChain(ppk.rh, chain.ind = 1:4, burnin = 5000, thinning = 10) Figures 8 and 9 display, respectively, side-by-side autocorrelation and trace plots of the latent variables at the sampling locations and parameters of the simulated chains from the output of geoRglm (left) and geoCount (right). For this dataset the efficiency of the algorithms in both packages is comparable since the autocorrelation among all latent variables and parameters is either low or negligible (with the exception of φ in geoRglm). In regard to mixing, the chains produced by geoCount seem to mix a bit better though, as they do a better job at visiting the tails of the (marginal) posterior distributions.
The efficiency advantages of geoCount over geoRglm do not come without a price though: the running times of the former are larger than the latter. This difference is due mainly to two reasons. First, the data-based parameterization implemented in geoCount involves more computational overhead and a larger amount of matrix computations. Second, a fair amount of the running time savings in geoRglm are achieved by precomputing and storing the inverse of the correlation matrix ρ(u ij ) for a grid of points of the range parameter φ.
A side effect of this choice is that the efficiency of the algorithm depends somewhat on the selected grid and poor mixing may result, even when using the maximum grid size (currently at 2001). In addition, it is not possible under this approach to account for the uncertainty in the smoothness parameter κ, which needs to be assumed fixed.
To offset this slowdown, geoCount allows the simulation of several posterior samples in parallel, which can roughly divide the running times by the number of CPUs that are used; see Section 6.6. In addition, shorter chains are usually sufficient for adequate inference due to the faster convergence of the MCMC algorithm implemented in geoCount.

Comparison between geoCount and INLA
The R package INLA performs Bayesian inference for a large class of hierarchical models called latent Gaussian models, and uses the computational approach proposed by Rue, Martino, and Chopin (2009) called Integrated Nested Laplace Approximation. This class of models includes the Poisson-lognormal spatial model (4). But unlike geostatistical models where the latent Gaussian model is specified in terms of covariance functions, INLA specification uses Gaussian Markov random fields (GMRF). This is computationally advantageous as no inversions of covariance matrices are required, but has the drawback that it is difficult to construct GMRF with covariance functions having common desired behaviors, such as stationarity or isotropy. INLA performs Bayesian inference by approximating marginal distributions of interest and their summaries deterministically (rather than stochastically as is done in MCMC algorithms), using a combination of Laplace approximations and numerical quadrature rules. The combination of these two features makes INLA an alternative that is computationally much faster than the MCMC algorithms implemented in geoCount and geoRglm. At the time of writing INLA is not available from the Comprehensive R Archive Network (CRAN), but can be obtained from the website http://www.R-INLA.org/.
Recently, Lindgren, Rue, and Lindström (2011) provided a partial solution to the aforementioned drawback, based on a result in Whittle (1963) which states that Gaussian random fields with covariance functions in a certain subfamily of the Matérn family are solutions of some stochastic partial differential equations (SPDE) driven by Gaussian white noise. Lindgren, Rue, and Lindström (2011) found that these solutions can be expanded in terms of certain basis functions with random coefficients given by a certain GMRF. As a result, they proposed to do the modeling using covariance functions from this Matérn subfamily and the computations using the GMRF representation of the corresponding random field. This is the approach implemented in INLA.
As a final point we note that the spatially varying attribute of interest is usually the intensity that drives the observed counts, which is represented by e S(·) in the Poisson-lognormal spatial model. Currently, INLA provides predictive inference only about S(·), while the samplingbased approaches in geoCount and geoRglm allow predictive inference about any function of S(·).

Summary
This work describes the R package geoCount for the analysis of geostatistical count data. The package implements several tasks for the analysis of the Poisson-lognormal and binomiallogitnormal spatial models, two members in the class of generalized linear spatial models that are commonly used for the analysis of spatial count data. The main package capabilities are: 1. Simulation and visualization of geostatistical count data.
2. Simulation from the posterior distribution of the parameters and latent variables at the sampling locations.
3. Simulation from the posterior predictive distribution of the latent variables and potential counts at unsampled locations.
On the programing side geoCount implements the computational intensive tasks in C++ and has parallel computation capability to speed up the computation processes, in case multiprocessor or multi-core computers are available. On the algorithmic side geoCount implements group updating, Langevin-Hastings algorithms and a data-based parameterization to improve the efficiency of the MCMC algorithms. The MCMC algorithm implemented in geoCount appears more efficient than that implemented in geoRglm, and geoCount allows inference on the natural spatially varying quantity of interest, an intensity or probability, while INLA allows inference only on a specific transformation of it.
geoCount also implements some methods for assessing model adequacy based on posterior predictive Bayesian p values, proposed by Bayarri and Castellanos (2007), and transformed residuals, proposed by Jing (2011). These developments are still incipient and under current investigation, so they are not reported here; see the package documentation and Jing (2011) for a description of these methods.

Limitations
We end with a brief description of some limitations of the package geoCount and the models considered here, as well as possible areas for future improvement.

Package
The current implementation of geoCount allows the fitting of moderate size datasets (say in the hundreds) in a relatively short amount of time, but it will require substantially longer running times for large datasets (say in the thousands). In this regard, geoRglm and INLA run much faster, with the latter being the fastest. Implementing recent algorithms for storage, manipulation and computation with large non-structured covariance matrices is a possible avenue for substantial reduction of computation times in geoCount. Also, the Langevin-Hastings MCMC algorithm based on the data-based parameterization implemented in geoCount usually produces convergent and well mixed chains, but convergence issues may arise for some datasets. These are manifested by simulated chains with very strong autocorrelations. On the other hand, convergence is not an issue for INLA since it uses deterministic approximations. A possible avenue for improvement would be to also implement Hamiltonian Monte Carlo algorithms (Neal 2011), which seem to be capable to efficiently sample from posterior distributions with highly correlated components.

Models
The Poisson-lognormal spatial model uses the log-link function which implies that the spatially varying attribute of interest is log-normally distributed. Depending on the dataset, this may or may not be a reasonable assumption. Christensen (2004) extended it by using the Box-Cox family of link functions, and offered evidence that the identity-link function provided a better fit to the Rongelap dataset. De Oliveira (2013) proposed a class models for spatial count data that allows a more direct modeling of the distribution of the spatially varying attribute of interest thought the use of copulas. It was shown that the models in this class, that includes the Poisson-lognormal spatial model, are not capable of representing moderate or strong spatial correlation in datasets consisting mostly of small counts. Similar comments and considerations might also apply to the binomial-logitnormal spatial model.
Finally, it is worthwhile to note that for this class of spatial models the prior distribution of the model parameters may have a sizeable influence on both the resulting posterior distribution and the efficiency of the MCMC algorithm. Because of this, it would be desirable to use objective or default priors aimed at minimizing the influence of the prior on the resulting posterior distribution (e.g., as those described in De Oliveira (2010) for Gaussian random fields), but such priors are yet to be developed for the models described here.