spBayes for large univariate and multivariate point-referenced spatio-temporal data models

In this paper we detail the reformulation and rewrite of core functions in the spBayes R package. These eﬀorts have focused on improving computational eﬃciency, ﬂexibility, and usability for point-referenced data models. Attention is given to algorithm and computing developments that result in improved sampler convergence rate and eﬃciency by reducing parameter space; decreased sampler run-time by avoiding expensive matrix computations, and; increased scalability to large datasets by implementing a class of predictive process models that attempt to overcome computational hurdles by representing spatial processes in terms of lower-dimensional realizations. Beyond these general computational improvements for existing model functions, we detail new functions for modeling data indexed in both space and time. These new functions implement a class of dynamic spatio-temporal models for settings where space is viewed as continuous and time is taken as discrete.


Introduction
The scientific community is moving into an era where open-access data-rich environments provide extraordinary opportunities to understand the spatial and temporal complexity of processes at broad scales.Unprecedented access to spatial data is a result of investments to collect data for regulatory, monitoring, and resource management objectives, and technological advances in spatially-enabled sensor networks along with geospatial information storage, analysis, and distribution systems.These data sources are increasingly diverse and specialized, e.g., computer model outputs, monitoring station instruments, remotely located sensors, and georeferenced field measurements.Across scientific fields, researchers face the challenge of coupling these data with imperfect models to better understand variability in their system of interest.The inference garnered through these analyses often supports decisions with important economic, environmental, and public health implications; therefore, it is critical to correctly estimate inferential uncertainty.However, developing modeling frameworks capable of accounting for various sources of uncertainty is not a trivial task-massive datasets from multiple sources with complex spatial dependence structures only serve to aggravate the challenges.
Proliferation of spatial data has spurred considerable development in statistical modeling; see, for example, the books by Cressie (1993), Chilés and Delfiner (2012), Møller and Waagepetersen (2003), Schabenberger and Gotway (2004), Wackernagel (2003), Diggle and Ribeiro (2007) and Cressie and Wikle (2011) for a variety of methods and applications.The statistical literature acknowledges that spatial and temporal associations are captured most effectively using models that build dependencies in different stages or hierarchies.Hierarchical models are especially advantageous with datasets having several lurking sources of uncertainty and dependence, where they can estimate much richer models with less stringent assumptions than traditional modeling paradigms.These models follow the Bayesian framework of statistical inference (see, e.g., Carlin and Louis 2011;Gelman, Carlin, Stern, and Rubin 2004), where analysis uses sampling from the posterior distributions of model parameters.
Computational advances with regard to Markov chain Monte Carlo (MCMC) methods have contributed enormously to the popularity of hierarchical models in a wide array of disciplines (e.g., Gilks, Richardson, and Spiegelhalter 1996;Robert and Casella 2004), and spatial modeling is no exception (see, e.g., Banerjee, Carlin, and Gelfand 2004).In the realm of spatial statistics, hierarchical models have been widely applied to analyze both areally referenced as well as point-referenced or geostatistical data.For the former, a class of models known as Conditionally Autoregressive (CAR) models have become very popular as they are easily implemented using MCMC methods such as the Gibbs sampler.In fact, these models are somewhat naturally suited for the Gibbs sampler which draws samples from conditional distributions that are fully specified by the CAR models.Their popularity has increased in no small measure due to their automated implementation in the OpenBUGS software package which offers a flexible and user-friendly interface to construct multilevel models that are implemented using a Gibbs sampler.This is performed by identifying a multilevel model with a directed acyclic graph (DAG) whose nodes form the different components of the model and allow the language to identify the full conditional distributions that need to be updated.OpenBUGS is an offshoot of the BUGS (Bayesian inference Using Gibbs Sampling) project and the successor of the WinBUGS software.
From an automated implementation perspective, the challenges are somewhat greater for point-referenced models.First, expensive matrix computations are required that can become prohibitive with large datasets.Second, routines to fit unmarginalized models are less suited for direct updating using a Gibbs sampler in the BUGS paradigm and results in slower convergence of the chains.Third, investigators often encounter multivariate spatial datasets with several spatially dependent outcomes, whose analysis requires multivariate spatial models that involve matrix computations that are poorly implemented in BUGS.These issues have, however, started to wane with the delivery of relatively simpler R (R Core Team 2013) packages via the Comprehensive R Archive Network (CRAN) (http://cran.r-project.org) that help automate Bayesian methods for point-referenced data and diagnose convergence.The Analysis of Spatial Data (Bivand 2013) and Handling and Analyzing Spatio-Temporal Data (Pebesma 2013) CRAN Task Views provide a convenient way to identify packages that offer functions for modeling such data.These packages are generally listed under the Geostatisics section in the Task View.Here, those packages that fit Bayesian model include geoR (Ribeiro Jr. and Diggle 2012), geoRglm (Christensen and Ribeiro Jr. 2011), spTimer (Christensen and Ribeiro Jr. 2011), spBayes (Finley and Banerjee 2013), spate (Sigrist, Kuensch, and Stahel 2013), and ramps (Smith, Yan, and Cowles 2011).In terms of functionality, spBayes offers users a suite of Bayesian hierarchical models for Gaussian and non-Gaussian univariate and multivariate spatial data as well as dynamic Bayesian spatial-temporal models.
Our initial development of spBayes (Finley, Banerjee, and Carlin 2007) provided functions for modeling Gaussian and non-Gaussian univariate and multivariate point-referenced data.
These hierarchical Bayesian spatial process models, implemented through MCMC methods, offered increased flexibility to fit models that would be infeasible with classical methods within inappropriate asymptotic paradigms.However, with this increased flexibility comes substantial computational demands.Estimating these models involves expensive matrix decompositions whose computational complexity increases in cubic order with the number of spatial locations, rendering such models infeasible for large spatial datasets.Through spBayes version 0.2-4, released on CRAN on 4/24/12, very little attention was given to addressing these computational challenges.As a result, fitting models with more than a few hundred observations was very time consuming-on the order of hours to fit models with ∼1,000 locations.spBayes version 0.3-7 (CRAN 6/1/13) comprises a substantial reformulation and rewrite of core functions for model fitting, with a focus on improving computational efficiency, flexibility, and usability.Among other improvements, this and subsequent versions offer: i) improved sampler convergence rate and efficiency by reducing parameter space; ii) decreased sampler run-time by avoiding expensive matrix computations, and; iii) increased scalability to large datasets by implementing a class of predictive process models that attempt to overcome computational hurdles by representing spatial processes in terms of lower-dimensional realizations.Beyond these general computational improvements for existing models, new functions were added to model data indexed in both space and time.These functions implement a class of dynamic spatio-temporal models for settings where space is viewed as continuous and time is taken as discrete.The subsequent sections highlight the fundamentals of models now implemented in spBayes.

Bayesian Gaussian spatial regression models
where y is an n × 1 vector of possibly irregularly located observations, X is a known n × p matrix of regressors (p < n), K(θ) and D(θ) are families of r×r and n×n covariance matrices, respectively, and Z(θ) is n×r with r ≤ n, all indexed by a set of unknown process parameters θ.The r ×1 random vector α ∼ N (0, K(θ)) and the p×1 slope vector β ∼ N (µ β , Σ β ), where µ β and Σ β are known.The hierarchy is completed by assuming θ ∼ p(θ), a proper prior distribution.The Gaussian spatial models in spBayes emerge as special cases of (1), which we will see later.Bayesian inference is carried out by sampling from the posterior distribution of {β, α, θ}, which is proportional to (1).
Below, we provide some details behind Bayesian inference for (1).This involves sampling the parameters θ, β and α from their marginal posterior distributions and carrying out subsequent predictions.Direct computations usually entail inverting and multiplying dense matrices and also computing determinants.In software development, care is needed to avoid redundant operations and ensure numerical stability.Therefore, in the subsequent sections we describe how we use Cholesky factorizations, solve triangular systems, and minimize expensive matrix operations (e.g., dense matrix multiplications) to perform all the computations.

Sampling the process parameters
Sampling from (1) employs MCMC methods, in particular Gibbs sampling and random walk Metropolis steps (e.g., Robert and Casella 2004).For faster convergence, we integrate out β and α from the model and first sample from (θ).This matrix needs to be constructed for every update of θ.Usually D(θ) is diagonal and XΣ β X is fixed, so the computation involves the matrix Z(θ)K(θ)Z(θ) .Assuming that Z(θ) and K(θ) are computationally inexpensive to construct for each θ, Z(θ)K(θ)Z(θ) requires rn 2 flops (floating point operations).
We adopt a random-walk Metropolis step with a multivariate normal proposal (same dimension as there are parameters in θ) after transforming parameters to have support over the entire real line.This involves evaluating where , where chol(Σ y | θ ) returns the lower-triangular Cholesky factor L of Σ y | θ .This involves O(n 3 /3) flops.Next, we obtain u = trsolve(L, y − Xµ β ), which solves the triangular system Lu = y − Xµ β .This involves O(n 2 ) flops and Q(θ) = u u requires another 2n flops.The logdeterminant in (2) is evaluated as 2 n i=1 log l ii , where l ii are the diagonal entries in L. Since L has already been obtained, the log-determinant requires another n steps.Therefore, the Cholesky factorization dominates the work and computing (2) is achieved in O(n 3 ) flops. where Computations proceed similar to the above.We first evaluate L = chol(Σ y | β,θ ) and then obtain [v : U ] = trsolve(L, [y : X]), so Lv = y and LU = X.Next, we evaluate where w i,i 's and l ii 's are the diagonal elements in W and L respectively.The number of flops is again of cubic order in n.
Importantly, our strategy above avoids computing inverses.We use Cholesky factorizations and solve only triangular systems.If n is not large, say ∼10 2 , this strategy is feasible.The use of efficient numerical linear algebra routines fetch substantial reduction in computing time (see Section 3).Our implementation employs matrix-vector multiplication and avoids dense matrix-matrix multiplications wherever possible.Multiplications involving diagonal matrices are programmed using closed form expressions and inverses are obtained by solving triangular linear systems after obtaining a Cholesky decomposition.However, when n ∼ 10 3 or higher, the computation becomes too onerous for practical use and alternative updating strategies are required.We address this in Section 2.3
Mapping point or interval estimates of spatial random effects is often helpful in identifying missing regressors and/or building a better understanding of model adequacy.
The vector b here is computed analogously as for β.For each k = 1, 2, . . ., M we now evaluate For computing B, one could proceed as for β but that would involve chol(K(θ)), which may become numerically unstable for certain covariance functions (e.g., the Gaussian or the Matérn with large ν).For robust software performance we define G(θ) −1 = Z(θ) Σ −1 y | α,θ Z(θ) and utilize the identity (Henderson and Searle 1981) We remark that estimating the spatial effects involves Cholesky factorizations for n × n positive definite linear system.The above steps ensure numerical stability but they can become computationally prohibitive when n becomes large.While some savings accrue from executing the above steps only for the post burn-in samples, for n in the order of thousands we recommend the low rank spatial models offered by spBayes (see Sections 2.3 and 4.2).

The special case of low-rank models
The major computational load in estimating (1) arises from unavoidable Cholesky decompositions for dense n × n positive definite matrices.The required number of flops is of cubic order and must be executed in each iteration of the MCMC.For example, when a specific for of ( 1) is used to analyze a dataset comprising n = 2, 000 locations and p = 2 predictors, each iteration requires ∼0.3 seconds of CPU time (see Section 4.2.1).Marginalization, as described in Section 2.1, typically require fewer iterations to converge.But even if 10, 000 iterations are required to deliver full inferential output, the associated CPU time is ∼50 minutes.Clearly, large spatial datasets demand specialized models.
One strategy is to specify Z(θ) with r << n.Such models are known as low-rank models.Specific choices for Z(θ) will be discussed later -spBayes models Z(θ) using the predictive process (see Section 4.2).To elucidate how savings accrue in low-rank models, consider the marginal Gaussian likelihood obtained by integrating out α from ( 1) We could have integrated out β too, as in Section 2.1, but there is apparently no practical advantage to that.For the low-rank model, each iteration of the Gibbs sampler updates β and θ from their full conditional distributions.
The β is drawn from N (Bb, B), where b and B are as in (4).The strategy in Section 2.2 would be expensive for large n because computing B, though itself p × p, involves a Cholesky factorization of the n × n matrix Σ y | β,θ for every new update of θ.Instead, we utilize the Sherman-Woodbury-Morrison formula where We perform the above operations for each iteration in the Gibbs sampler, using the current update of θ, and sample the β as in (5).
We update process parameters θ using a random-walk Metropolis step with target log-density where where d ii (θ) and t ii are the diagonal entries of D(θ) and T respectively.
Once the Gibbs sampler has converged and we have obtained posterior samples for β and θ, obtaining posterior samples for α can be achieved following closely the description in Section 2.2.In fact, since we the posterior samples of β are already available, we can draw α from its full-conditional distribution, given both β and θ.This amounts to replacing µ β with β and Σ y | α,θ with D(θ) in ( 6).The algorithm now proceeds exactly as in Section 2.2 and we achieve computational savings as D(θ) is usually cheaper to handle than Σ y | α,θ .

Spatial predictions
To predict a random t × 1 vector y 0 associated with a t × p matrix of predictors, X 0 , we assume that where is the n × t cross-covariance matrix between y and y 0 , and C 22 (θ) is the variance-covariance matrix for y 0 .How these are constructed is crucial for ensuring a legal probability distribution or, equivalently, a positive-definite variance-covariance matrix for (y , y 0 ) in (10).A legitimate joint distribution will supply a conditional distribution p(y 0 | y, β, θ), which is normal with mean and variance Bayesian prediction proceeds by sampling from the posterior predictive distribution p(y For each posterior sample of {β, θ}, we draw a corresponding y 0 ∼ N (µ p , Σ p ).This produces samples from the posterior predictive distribution.
Observe that the posterior predictive computations involve only the retained MCMC samples after convergence.Furthermore, most of the ingredients to compute µ p and Σ p have already been performed while updating the model parameters.For any posterior sample p ). Low-rank models, where r << n, are again cheaper here.The operations are dominated by the computation of C 12 (θ) C 11 (θ) −1 C 12 (θ), which can be evaluated as U U − V V , where U = D(θ) −1/2 C 12 (θ), V = HU and H is as in (7).This avoids direct evaluation of C 11 (θ) −1 and avoids redundant matrix operations.
Updating y (k) 0 's requires Cholesky factorization of Σ p , which is t × t and can be expensive if t is large.In most practical settings, it is sufficient to take t = 1 and perform independent individual predictions.However, if the joint predictive distribution is sought, say when full inference is desired for a function of y 0 , then the predictive step is significantly cheaper if we use the posterior samples of α as well.Now posterior predictive sampling amounts to drawing y )), which cheap because D(θ) is usually diagonal.Low rank models are especially useful here as posterior sampling for α is much cheaper with r << n.

Computing environment
The MCMC algorithms described in the preceding sections are implemented in spBayes functions.These functions are written in C++ and leverage R's Foreign Language Interface to call Fortran BLAS (Basic Linear Algebra Subprograms, see Blackford, Demmel, Dongarra, Duff, Hammarling, Henry, Heroux, Kaufman, Lumsdaine, Petitet, Pozo, Remington, and Whaley 2001) and LAPACK (Linear Algebra Package, see Anderson, Bai, Bischof, Blackford, Demmel, Dongarra, Du Croz, Greenbaum, Hammarling, McKenney, and Sorensen 1999) libraries for efficient matrix computations.Table 1 offers a list of key BLAS and LAPACK functions used to implement the MCMC samplers.Referring to Table 1 and following from Section 2.1, chol corresponds to dpotrf and trsolve can be either dtrsv or dtrsm, depending on the form of the equation's right-hand side.As noted previously, we try and use dense matrixmatrix multiplication, i.e., calls to dgemm, sparingly due to its computational overhead.Often careful formulation of the problem can result in fewer calls to dgemm and other expensive BLAS level 3 and LAPACK functions.

Models offered by spBayes
All the models offered by spBayes emerge as special instances of (1).The matrix D(θ) is always taken to be diagonal or block-diagonal (for multivariate models).The spatial random effects α are assumed to arise from a partial realization of a spatial process and the spatial covariance matrix K(θ) is constructed from the covariance function specifying that spatial process.To be precise, if {w(s) : s ∈ d } is a Gaussian spatial process with positive definite covariance function C(s, t; θ) (see, e.g., Bochner 1955) and if {s 1 , s 2 , . . ., s r } is a set of any r locations in D, then α = (w(s 1 ), w(s 2 ), . . ., w(s r )) and K(θ) is its r × r covariance matrix.

Full rank univariate Gaussian spatial regression
For Gaussian outcomes, geostatistical models customarily regress a spatially referenced dependent variable, say y(s), on a p × 1 vector of spatially referenced predictors x(s) (with an intercept) as where s ∈ D ⊆ 2 is a location.The residual comprises a spatial process, w(s), and an independent white-noise process, ε(s), that captures measurement error or micro-scale variation.
With any collection of n locations, say S = {s 1 , . . ., s n }, we assume the independent and identically distributed ε(s i )'s follow a Normal distribution N (0, τ 2 ), where τ 2 is called the nugget.The w(s i )'s provide local adjustment (with structured dependence) to the mean and capturing the effect of unmeasured or unobserved regressors with spatial pattern.
Customarily, one assumes stationarity, which means that C(s, t) = C(s − t) is a function of the separation of sites only.Isotropy goes further and specifies C(s, t) = C( s − t ), where s − t is the Euclidean distance between the sites s and t.We further specify C(s, t) = σ 2 ρ(s, t; φ) in terms of spatial process parameters, where ρ(•; φ) is a correlation function while φ includes parameters quantifying rate of correlation decay and smoothness of the surface w(s).Var(w(s)) = σ 2 represents a spatial variance component.Apart from the exponential, ρ(s, t; φ) = exp(−φ s−t ), and the powered exponential family, ρ(s, t; φ) = exp(−φ s−t α ), spBayes also offers users the Matérn correlation function Here φ = {φ, ν} with φ controlling the decay in spatial correlation and ν controlling process smoothness.Specifically, if ν lies between positive integers m and (m + 1), then the spatial process w(s) is mean-square differentiable m times, but not m + 1 times.Also, Γ is the usual Gamma function while K ν is a modified Bessel function of the second kind with order ν.
The hierarchical model built from (12) emerges as a special case of (1), where y is n × 1 with entries y(s i ), X is n × p with x(s i ) as its rows, α is n × 1 with entries w(s i ), Z(θ) = I n , K(θ) is n × n with entries C(s i , s j ; θ) and D(θ) = τ 2 I n .We denote by θ the set of process parameters in K(θ) and D(θ).Therefore, with the Matérn covariance function in (13), we define θ = {σ 2 , φ, ν, τ 2 }.

Example
The marginalized specification of ( 12) is implemented in the spLM function.The primary output of this function is posterior samples of θ.As detailed in the preceding sections, sampling is conducted using a Metropolis algorithm.Hence, users must specify Metropolis proposal variances, i.e., tuning values, and monitor acceptance rates for these parameters.
Alternately, an adaptive MCMC Metropolis-within-Gibbs algorithm, proposed by Roberts and Rosenthal (2009), is available for a more automated function call.
A key advantage of the first stage Gaussian model is that samples from the posterior distribution of β and w can be recovered in a posterior predictive fashion, given samples of θ.In practice we often choose to only use a subset of post burn-in θ samples to collect corresponding samples of β and w.This composition sampling, detailed in Section 2.2, is conducted by passing a spLM object to the spRecover function.
An analysis of a synthetic dataset serves to illustrate use of the spLM and spRecover functions.
The data are formed by drawing 200 observations from (12) within a unit square domain.The model mean includes an intercept and covariate with associated coefficients β 0 = 1 and β 1 = 5, respectively.Model residuals are generated using an exponential spatial correlation function, with τ 2 = 1, σ 2 = 2 and φ = 6.This choice of φ corresponds to an effective spatial range of 0.5 distance units.For our purposes, the effective spatial range is the distance at which the correlation equals 0.05. Figure 1 provides a surface plot of the observed spatial random effects along with the location of the 200 observations.
All spLM function arguments, and those of others functions highlighted in this paper, are defined in the package manual available on CRAN.Here we illustrate only some of the possible argument specifications.In addition to a symbolic model statement, the spLM function requires the user to specify: i) the number of MCMC samples to collect; ii) prior distribution, with associated hyperpriors for each parameter; iii) starting values for each parameter, and; iv) tuning values for each parameter, unless the adaptive MCMC option is chosen via the amcmc argument.
For this analysis, we assume an inverse-Gamma (IG) distribution for the variance parameters, τ 2 and σ 2 .These distributions are assigned shape and scale hyperpriors equal to 2 and 1, respectively.With a shape of 2, the mean of the IG is equal to the scale and the variance is infinite.In practice, the choice of the scale value can be guided by exploratory data analysis using a variogram or similar tools that provide estimates of the spatial and nonspatial variances.The spatial decay parameter φ is assigned a uniform (U) prior with support that covers the extent of the domain.Here, we assume φ lies in the interval between 0.1 to 1 in distance units, i.e., working from our definition of the effective spatial range this corresponds to the prior U(−log(0.05)/1,−log(0.05)/0.1).In the code below, we define these priors along with the other necessary arguments that are passed to spLM.The resulting posterior samples of θ are summarized using the coda package's summary function and each parameter's posterior distribution median and 95% credible interval (CI) is printed.

Number of covariates 2 (including intercept if specified).
Using the exponential spatial correlation model.
The previous implementation updated β from its full conditional distribution in each MCMC iteration and sampled θ using a Metropolis algorithm that did not take advantage of triangular solvers and other efficient computational approaches detailed in the preceding sections.
For comparison, the current version of spLM generates the same number of samples in 0.069 minutes.

Low-rank predictive process models
spBayes offers low-rank models that allow the user to choose and fix r << n within a hierarchical linear mixed model framework such as (1).Given the same modeling scenario as in Section 4.1, the user chooses r locations, say S * = {s * 1 , s * 2 , . . ., s * r }, and defines the process Banerjee, Gelfand, Finley, and Sang (2008) call w(s) the predictive process.Replacing w(s) with w(s) in ( 12) yields the predictive process counterpart of the univariate Gaussian spatial regression model.
The predictive process produces a low-rank model and can be cast into (1).For example, if we take α to the r × 1 random vector with w(s * i ) as its entries, then the predictive process counterpart of ( 12) is obtained from (1) with D(θ) = τ 2 I, K(θ) = C * (θ) and Z(θ) = C(θ) C * (θ) −1 , where C(θ) is n × r whose entries are the covariances between w(s i )'s and w(s * j )'s and C * (θ) −1 is the r × r covariance matrix of the w(s * i )'s.When employing the computational strategy for generic low-rank models described in Section 2.3, an alternative, but equivalent, parametrization is obtained by letting K(θ) = C * (θ) −1 and Z(θ) = C(θ) .This has the added benefit of avoiding the computation of C * (θ) −1 , which, though not expensive for low-rank models, can become numerically unstable depending upon the choice of the covariance function.Now α ∼ N (0, C * (θ) −1 ) is no longer a vector of process realizations over the knots but it still is an r × 1 random vector with a legitimate probability law.If the spatial effects over the knots are desired, they can be easily obtained from the posterior samples of α and θ as C * (θ)α.
We also offer an improvement over the predictive process, which attempts to capture the residual from the low-rank approximation by adjusting for the residual variance (see, e.g., Finley, Sang, Banerjee, and Gelfand 2009).The difference between the spatial covariance matrices for the full rank model ( 12) and the low-rank model is C w (θ) − Z(θ)K(θ)Z(θ) , where C w (θ) is the n × n covariance matrix of the spatial random effects for (12).
The modified predictive process model approximates this "residual" covariance matrix by absorbing its diagonal elements into D(θ).Therefore, D(θ) = diag{C w (θ)−Z(θ)K(θ)Z(θ) }+ τ 2 I n , where diag(A) denotes the diagonal matrix formed with the diagonal entries of A. The remaining specifications for Z(θ), K(θ) and α in (1) remain the same as for the predictive process.
We often refer to the modified predictive process as wε (s) = w(s) + ˜ (s), where w(s) is the predictive process and ˜ (s) is an independent process with zero mean and variance given by var{w(s)} − var{ w(s)}.In terms of the covariance function of w(s), the variance of ˜ (s) is C(s, s; θ) − c(s, θ) C * (θ) −1 c(s), where c(s) is the r × 1 vector of covariances between w(s) and w(s * j) as its entries.Also, w * , w and w denote the collection of w(s * i )'s over the r knots, w(s i )'s over the n locations and w (s i )'s over the n locations respectively.
A key issue in low-rank models is the choice of knots.Given a computationally feasible r one could fix the knot location using grid over the extent of the domain, space-covering design (e.g., Royle and Nychka 1998), or more sophisticated approach aimed at minimizing a predictive variance criterion (see, e.g., Finley et al. 2009;Guhaniyogi, Finley, Banerjee, and Gelfand 2011).In practice, if the observed locations are evenly distributed across the domain, we have found relatively small difference in inference based on knot locations chosen using a grid, space-covering design, or other criterion.Rather, it is the number of knots locations that has the greater impact on parameter estimates and subsequent prediction.Therefore, we often investigate sensitivity of inference to different knot intensities, within a computationally feasible range.

Example
Moving from (12) to its predictive process counterpart is as simple as passing a r × 2 matrix of knot locations, via the knots argument, to the spLM function.Choice between the nonmodified and modified predictive process model, i.e., w(s) and wε (s), is specified using the modified.pplogical argument.Passing a spLM object, specified for a predictive process model, to spRecover will yield posterior samples from w or wε and w * .
We construct a second synthetic dataset using the same model and parameter values from Section 4.1.1,but now generate 2, 000 observations.Parameters are then estimated using the following candidate models: i) non-modified predictive process with 25 knot grid; ii) modified predictive process with 25 knot grid; iii) non-modified predictive process with 100 knot grid, and; iv) modified predictive process with 100 knot grid.
The spLM call for the 25 knot non-modified predictive process model is given below.The starting, priors, and tuning arguments are taken from Section 4.1.1.As noted above, the knots argument invokes the predictive process model.The value portion of this argument c(5, 5, 0) specifies a 5 by 5 knot grid with should be placed over the extent of the observed locations.The third value in this vector controls the extent of the this grid, e.g., one may want the knot grid to extend beyond the convex haul of the observed locations.The placement of these knots is illustrated in Figure 2(b).Users can also pass in their own knot locations via the knots argument.

Number of covariates 2 (including intercept if specified).
Using the exponential spatial correlation model.
Using non-modified predictive process with 25 knots.
For comparison with Table 2, the full rank model required 5.18 minutes to generate the 5, 000 posterior samples.Also parameter estimates from the full rank model were comparable to those of model iv.These attractive qualities of the predictive process models do not extend to all settings.For example, if the range of spatial dependence is short relative to the spacing of the knots, then covariance parameter estimation will suffer.We are obviously forgoing some information about underlying spatial process when using an array of knots that is coarse compared to the number of observations.This is most easily seen by comparing estimated spatial random effects surfaces to the true surface used to generate the data, as shown in Figure 2.This smoothing of the random effects surface can translate into diminished predictive ability and, in some cases, model parameter inference, compared to a full rank model.Following from Section 2.4, given coordinates and predictors for new locations, and a spLM object, the spPredict function returns posterior predictive samples from y 0 .The spPredict function provides a generic interface for prediction using most model functions in spBayes.
The code below illustrates prediction using model iv for 1, 000 holdout locations.Here, X.ho is the 1, 000 × 2 (i.e., t × p) predictor matrix associated with the 1, 000 holdout coordinates stored in coords.ho.

Multivariate Gaussian spatial regression models
Multivariate spatial regression models consider m point-referenced outcomes that are regressed, at each location, on a known set of predictors y j (s) = x j (s) β j + w j (s) + j (s) , for j = 1, 2, . . ., m , where x j (s) is a p j × 1 vector of predictors associated with outcome j, β j is the p j × 1 slope, w j (s) and j (s) are the spatial and random error processes associated with outcome y j (s).Customarily, we assume the unstructured residuals ε(s) = ( 1 (s), 2 (s), . . ., m (s)) follow a zero-centered multivariate normal distribution with zero mean and an m × m dispersion matrix Ψ.
Spatial variation is modeled using an m × 1 Gaussian process w(s) = (w 1 (s), . . ., w m (s)) , specified by a zero mean and a cross-covariance matrix C w (s, t) with entries being covariance between w i (s) and w j (t).spBayes uses the linear model of coregionalization (LMC) to specify the cross-covariance.This assumes that C w (s, t) = AM (s, t)A , where A is m × m lowertriangular and M (s, t) is m×m diagonal with each diagona entry a spatial correlation function endowed with its own set of process parameters.
Suppose we have observed the m outcomes in each of b locations.Let y be n × 1, where n = mb, obtained by stacking up the y(s i )'s over the b locations.Let X be the n × p matrix of predictors associated with y, where p = m j=1 p j , and β is p × 1 with the β j 's stacked correspondingly.Then, the hierarchical multivariate spatial regression models arise from (1) with the following specifications: D(θ) = I b ⊗ Ψ, α is n × 1 formed by stacking the w i 's and K(θ) is n×n partitioned into m×m blocks given by AM (s i , s j )A .The positive-definiteness of K(θ) is ensured by the linear model of coregionalization (Gelfand, Schmidt, Banerjee, and Sirmans 2004).spBayes also offers low rank multivariate models involving the predictive process and the modified predictive process that can be estimated using strategies analogous to Section 2.3.Both the full rank multivariate Gaussian model and its predictive process counterpart are implemented in the spMvLM function.Notation and additional background for fitting these models is given by Banerjee et al. (2008) and Finley et al. (2009) as well as example code in the spMvLM documentation examples.

Non-Gaussian models
Two typical non-Gaussian first stage settings are implemented in spBayes: i) binary response at locations modeled using logit or probit regression, and; ii) count data at locations modeled using Poisson regression.Diggle, Moyeed, and Tawn (1998) unify the use of generalized linear models in spatial data contexts.See also Lin, Wahba, Xiang, Gao, and Klein (2000), Kammann and Wand (2003) and Banerjee et al. (2004).Here we replace the Gaussian likelihood in (1) with the assumption that E[y(s)] is linear on a transformed scale, i.e., η(s) ≡ g(E(y(s))) = x(s) β + w(s) where g(•) is a suitable link function.We refer to these as spatial generalized linear models (GLM's).
With the Gaussian first stage, we can marginalize over the spatial effects and implement our MCMC over a reduced parameter space.With a binary or Poisson first stage, such marginalization is precluded and we have to update the spatial effects in running our Gibbs sampler.We offer both the traditional random-walk Metropolis as well as the adaptive random-walk Metropolis (Roberts and Rosenthal 2009) to update the spatial effects.spBayes also provides low-rank predictive process versions for spatial GLM's.The analogue of (1) is where f (•) represents a Bernoulli or Poisson density with η(s) represents the mean of y(s) on a transformed scale.This model and its predictive process counterpart is implemented in the spGLM function.These models are extended to accommodate multivariate settings, outlined in Section 5, using the spMvGLM function.

Dynamic spatio-temporal models
There are many different flavors of spatio-temporal data and an extensive statistical literature that addresses the most common settings.The approach adopted here applies to the setting where space is viewed as continuous, but time is assumed to be discrete.Put another way, we view the data as a time series of spatial process realizations and work in the setting of dynamic models.Building upon previous work in the setting of dynamic models by West and Harrison (1997), several authors, including Stroud, Müler, and Sansó (2001) and Gelfand, Banerjee, and Gamerman (2005), proposed dynamic frameworks to model residual spatial and temporal dependence.These proposed frameworks are flexible and easily extended to accommodate nonstationary and multivariate outcomes.
Dynamic linear models, or state-space models, have gained tremendous popularity in recent years in fields as disparate as engineering, economics, genetics, and ecology.They offer a ver-satile framework for fitting several time-varying models (West and Harrison 1997).Gelfand et al. (2005) adapted the dynamic modeling framework to spatio-temporal models with spatially varying coefficients.Alternative adaptations of dynamic linear models to space-time data can be found in Stroud et al. (2001).

Model specification
spBayes offers a relatively simple version of the dynamic models in Gelfand et al. (2005).Suppose, y t (s) denotes the observation at location s and time t.We model y t (s) through a measurement equation that provides a regression specification with a space-time varying intercept and serially and spatially uncorrelated zero-centered Gaussian disturbances as measurement error t (s).Next a transition equation introduces a p × 1 coefficient vector, say β t , which is a purely temporal component (i.e., time-varying regression parameters), and a spatio-temporal component u t (s).Both these are generated through transition equations, capturing their Markovian dependence in time.While the transition equation of the purely temporal component is akin to usual state-space modeling, the spatio-temporal component is generated using Gaussian spatial processes.The overall model is written as ind.
∼ N (0, τ 2 t ) ; where the abbreviations ind.and i.i.d are independent and independent and identically distributed, respectively.Here x t (s) is a p × 1 vector of predictors and β t is a p × 1 vector of coefficients.In addition to an intercept, x t (s) can include location specific variables useful for explaining the variability in y t (s).The GP (0, C t (•, θ t )) denotes a spatial Gaussian process with covariance function C t (•; θ t ).We customarily specify C t (s 1 , s 2 ; θ t ) = σ 2 t ρ(s 1 , s 2 ; φ t ), where θ t = {σ 2 t , φ t } and ρ(•; φ) is a correlation function with φ controlling the correlation decay and σ 2 t represents the spatial variance component.We further assume β 0 ∼ N (m 0 , Σ 0 ) and u 0 (s) ≡ 0, which completes the prior specifications leading to a well-identified Bayesian hierarchical model with reasonable dependence structures.In practice, estimation of model parameters are usually very robust to these hyper-prior specifications.Also note that (17) reduces to a simple spatial regression model for t = 1.
We consider settings where the inferential interest lies in spatial prediction or interpolation over a region for a set of discrete time points.We also assume that the same locations are monitored for each time point resulting in a space-time matrix whose rows index the locations and columns index the time points, i.e. the (i, j)-th element is y j (s i ).Our algorithm will accommodate the situation where some cells of the space-time data matrix may have missing observations, as is common in monitoring environmental variables.
Conducting full Bayesian inference for (17) is computationally onerous and spBayes also offers a modified predictive process counterpart of (17).This is achieved by replacing u t (s) in ( 17) with ũt (s) = t k=1 [ wk (s) + ˜ k (s)], where wk (s) is the predictive process as defined in (14) and the "adjustment" ˜ t (s) compensates for the oversmoothing by the conditional expectation component and the consequent underestimation of spatial variability (see Finley, Banerjee, and Gelfand 2012) for details. the regression within each time step.This can be easily assembled using the lapply function as shown in the code below.Here too, we define the station coordinates as well as starting, tuning, and prior distributions for the model parameters.Exploratory data analysis using time step specific variograms can be helpful for defining starting values and prior support for parameters in θ t and τ 2 t .To avoid cluttering the code, we specify the same prior for the φ t 's, σ 2 t 's, and τ 2 t 's.As in the other spBayes model functions, one can choose among several popular spatial correlation functions including the exponential, spherical, Gaussian and Matérn.The exponential correlation function is specified in the spDynLM call below.Unlike other model functions described in the preceding sections, the spDynLM function will accept NA y t (s) values.The sampler will provide posterior predictive samples for these missing values.If the get.fitted argument is TRUE then these posterior predictive samples are save along with posterior fitted values for locations where the outcomes are observed.

Model choice
The spDiag function provides several approaches to assessing model performance and subsequent comparison for spLM, spMvLM, spGLM, and spMvGLM objects.These include the popular Deviance Information Criterion (Spiegelhalter, Best, Carlin, and Linde 2002) as well as a measure of posterior predictive loss detailed in Gelfand and Ghosh (1998) and a scoring rule defined in Gneiting and Raftery (2007).
9. Summary and future direction spBayes version 0.3-7 (CRAN 6/1/13), and subsequent versions, offers a complete reformulation and rewrite of core functions for efficient estimation of univariate and multivariate models for point-referenced data using MCMC.Substantial increase in computational efficiency and flexibility in model specification, compared earlier spBayes package versions, is the result of careful MCMC sampler formulation that focused on reducing parameter space and avoiding expensive matrix operations.In addition, all core functions provide predictive process models able to accommodate large data sets that are being increasingly encountered in many fields.
We are currently developing an efficient modeling framework and sampling algorithm to accommodate multivariate spatially misaligned data, i.e., settings where not all of the outcomes are observed at all locations, that will be added to the spMvLM and spMvGLM functions.Prediction of these missing outcomes should borrow strength from the covariance among outcomes both within and across locations.In addition, we hope to add functions for non-stationary multivariate models such as those described in Gelfand et al. (2004) and more recent predic-tive process versions we developed in Guhaniyogi, Finley, Banerjee, and Kobe (2013).We will also continue developing spDynLM and helper functions.Ultimately, we would like to provide more flexible specifications of spatio-temporal dynamic models and allow them to accommodate non-Gaussian and multivariate outcomes.

Finley
et al. (2007)  outline the first version of spBayes as an R package for estimating Bayesian spatial regression models for point-referenced outcomes arising from Gaussian, binomial or Poisson distributions.For the Gaussian case, the recent version of spBayes offers several Bayesian spatial models emerging from the hierarchical linear mixed model framework

Figure 1 :
Figure 1: Interpolated surface of the observed (a) and estimated (b) spatial random effects.

Figure 2 :
Figure 2: Interpolated surfaces of the (a) observed spatial random effects and (b), (c), (d), (e) are the estiamted spatial random effects from models i, ii, iii, and iv, respectively.Filled circile symbols in (b), (c), (d), (e) show the location of predictive process knots.(f) plots holdout observed versus candidate model iv predicted median and 95% CI intervals with 1:1 line.
-Model fit with 28 observations in 62 time steps.Number of missing observations 117.Number of covariates 4 (including intercept if specified).Using the exponential spatial correlation model.Number of MCMC samples 5000.Priors and hyperpriors: beta normal:

Figure 4 :
Figure 4: Posterior distribution medians and 95% credible intervals for model intercept and predictors.

Figure 6 :
Figure 6: Posterior predicted distribution medians and 95% credible intervals, solid and dashed lines respectively, for three stations.Open circle symbols indicate those observations use for model parameter estimation and filled circle symbols indicate those observations withheld for validation.

Table 1 :
Common BLAS and LAPACK functions used in spBayes function calls.Function Description dpotrf LAPACK routine to compute the Cholesky factorization of a real symmetric positive definite matrix.
dtrsv Level 2 BLAS routine to solve the systems of equations Ax = b, where x and b are vectors and A is a triangular matrix.dtrsm Level 3 BLAS routine to solve the matrix equations AX = B, where X and B are matrices and A is a triangular matrix.dgemv Level 2 BLAS matrix-vector multiplication.dgemm Level 3 BLAS matrix-matrix multiplication.A heavy reliance on BLAS and LAPACK functions for matrix operations allows us to leverage multi-processor/core machines via threaded implementations of BLAS and LAPACK, e.g., Intel's Math Kernel Library (MKL; http://software.intel.com/en-us/intel-mkl).With the exception of dtrsv, all functions in Table 1 are threaded in Intel's MKL.Use of MKL, or similar threaded libraries, can dramatically reduce sampler run-times.For example, the illustrative analyses offered in subsequent sections were conducted using R, and hence spBayes, compiled with MKL on an Intel Ivy Bridge i7 quad-core processor with hyperthreading.The use of these parallel matrix operations results in a near linear speadup in the MCMC sampler's run-time with the number of CPUs-at least 4 CPUs were in use in each function call.spBayes also depends on several R packages including: coda (Plummer, Best, Cowles, Vines, Sarkar, and Almond 2012) for casting the MCMC chain results as coda objects for easier posterior analysis; abind (Plate and Heiberger 2013) and magic (Hankin 2013) for forming multivariate matrices, and; Formula (Zeileis 2013) for interpreting symbolic model formulas. - -Sampled: 5000 of 5000, 100.00%Report interval Metrop.Acceptance rate: 32.24% Overall Metrop.Acceptance rate: 33.72%

Table 2 :
-Parameter estimates and run-time (wall time) in minutes for candidate predictive process models.Parameter posterior summary 50 (2.5, 97.5) percentiles.