R package for Nearest Neighbor Gaussian Process models

This paper describes and illustrates functionality of the spNNGP R package. The package provides a suite of spatial regression models for Gaussian and non-Gaussian point-referenced outcomes that are spatially indexed. The package implements several Markov chain Monte Carlo (MCMC) and MCMC-free Nearest Neighbor Gaussian Process (NNGP) models for inference about large spatial data. Non-Gaussian outcomes are modeled using a NNGP Polya-Gamma latent variable. OpenMP parallelization options are provided to take advantage of multiprocessor systems. Package features are illustrated using simulated and real data sets.


Introduction
This paper introduces the spNNGP R (R Core Team 2018) package that provides a suite of univariate spatial regression models for Gaussian and non-Gaussian outcomes observed at point-referenced locations. There are, by now, many R packages that provide similar basic functionality. A recent read of the "Analysis of Spatial Data" CRAN Task View (Bivand 2019) yielded ∼46 packages listed for geostatistical analysis-and this is not an exhaustive accounting of packages available for such analyses. Our software design focus and unique contribution is to provide Bayesian models and associated diagnostic and prediction functions capable of handling data sets with a large number of locations via the Nearest Neighbor Gaussian Process (NNGP; Datta, Banerjee, Finley, and Gelfand 2016). Specifically, functions in spNNGP implement recent methodological and algorithmic developments presented in .
There have been many recent methodological developments within the large spatial data literature that aim to deliver massively scalable spatial processes. Sun, Li, and Genton (2011) and Banerjee (2017) provide background and discussion of current work in this area. A recent contribution by Heaton, Datta, Finley, Furrer, Guinness, Guhaniyogi, Gerber, Gramacy, Hammerling, Katzfuss et al. (2019) is particularly useful as it provides an overview of modeling approaches for large spatial data that are under active software development, and a comparison of these approaches based on the analysis of a common dataset in the form of a "friendly competition." In addition to NNGP models, the comparison presented by Heaton et al. (2019) considered covariance tapering via the spam package (Furrer and Sain 2010;Fur-rer 2016), gapfilling via gapfill (Gerber 2017), metakriging (Guhaniyogi and Banerjee 2018), spatial partitioning (Sang, Jun, and Huang 2011;Barbian and Assunção 2017), fixed rank kriging via FRK (Cressie and Johannesson 2008;Zammit-Mangion and Cressie 2017), multiresolution approximation (Katzfuss 2017), stochastic partial differential equations via INLA (Rue, Martino, Lindgren, Simpson, Riebler, Krainski, and Fuglstad 2017), lattice kriging via LatticeKrig (Nychka, Bandyopadhyay, Hammerling, Lindgren, and Sain 2015), local approximate Gaussian processes via laGP (Gramacy and Apley 2015;Gramacy 2016), and reduced rank predictive processes (Banerjee, Gelfand, Finley, and Sang 2008;Finley, Sang, Banerjee, and Gelfand 2009) via spBayes (Finley, Banerjee, and Gelfand 2015). The comparison was based on out-of-sampled predictive performance and, to a lesser extent, computing time for a moderately sized simulated and real dataset comprising 105,569 observations. Comparisons showed NNGP models yielded highly competitive predictive performance and computation time. More recently, Risser and Turek (2019) developed the BayesNSGP package for nonstationary Gaussian process modeling with options to use NNGPs for large data settings. In a frequentist setup, fast maximum likelihood-based parameter estimation and predictions using nearest neighbor approximations to the Gaussian Process likelihood are available in the GpGp (Guinness 2018a) and BRISC (Saha and Datta 2018b) packages on CRAN. The latter also offers inference on the spatial covariance parameters using a fast spatial bootstrap (Saha and Datta 2018a). While most of the software noted above, exploit sparsity in the spatial covariance or precision matrix, or pursue a low-rank approximation, the ExaGeoStat package (Abdulah, Ltaief, Sun, Genton, and Keyes 2018) tackles decomposition of the full dense spatial covariance matrix head-on using high performance linear algebra libraries associated with various leading edge parallel architectures.
Our contribution here is many fold. We propose a novel extension for analyzing spatially correlated non-Gaussian (binary) responses using a NNGP spatial generalized linear mixed model (GLMM), and show how using a Pólya-Gamma prior (Polson, Scott, and Windle 2013) for the regression coefficients leads to an efficient data-augmented Gibbs sampler. To our knowledge, this is the first sampler for Binomial spatial-GLMM that ensures closed form Gibbs updates for most parameters. We discuss model comparison and model adequacy and show how different implementations of the NNGP models have fundamentally different implementation and interpretation of these metrics. This is an important pragmatic aspect for practitioners choosing between different NNGP models. Finally, we provide detailed documentation and exposition of the user-friendly spNNGP R package that: 1) implements NNGP model fitting algorithms for Gaussian response spatial data presented in Datta et al. (2016); ; 2) provides NNGP models for the non-Gaussian spatial response via the Pólya-Gamma data-augmented sampler; 3) offers support functions for NNGP model fit diagnostics, summary, and prediction. We discuss code-optimization aspects which ensured that spNNGP can handle very large data sets by being judicious with memory use, taking advantage of properties of the NNGP dependence scheme to efficiently store and retrieve neighbor information, and applying parallel processing where advantageous. Core model fitting and prediction functions in spNNGP have achieved unprecedented scalability, delivering inference for data sets in the hundreds of millions of spatial locations.
The remainder of this article proceeds as follows. Section 2.1 provides a brief overview of Gaussian process models followed by specifics about the NNGP models in Section 2.2. In Section 2.3 we propose the extension for Binomial spatial data and outline the Pólya-Gamma data-augmented Gibbs sampler. Section 2.4 discusses appropriate model comparison and adequacy measures for the different NNGP models. Section 2.5 gives a brief description of the code underlying spNNGP and some software development specifics for large data sets. This is followed by analysis of simulated and real data sets in Section 3 meant to provide a practical tour of some of the package's features. Finally, Section 4 provides a brief summary with an eye toward future development.

Review of Gaussian Process models for spatial data
The standard geostatistical paradigm envisions each data unit as a triplet (s i , x(s i ), y(s i )) where s i denotes the geographical location where the data is recorded, x(s i ) is the p × 1 vector of covariates and y(s i ) is the response of interest, for i = 1, 2, . . . , n. If the covariates fail to account for all of the structured variation observed on the response, a spatial linear mixed effects model for analyzing the data is specified as: where β is the p × 1 vector of regression coefficients, (s i ) is the random noise, customarily modeled as independent and identically distributed (iid) observations from N (0, τ 2 ), and w(s i ) is the location-specific spatial random effect. Typically, the spatial random effects are assumed to be smooth across space, i.e., w(s i ) can be conceived of as realizations of a smooth latent surface {w(s) | s ∈ D} where D denotes the geographical domain of interest. Gaussian Processes (GP) (Rasmussen 2003) are widely used in modeling smooth functions or surfaces. A GP model for the spatial surface {w(s)}, denoted by w(s) ∼ GP (0, C(·, · | θ)), where C(·, · | θ) is a covariance function, implies that for any finite set of locations s 1 , . . . , s n , the vector of random effects w = (w(s 1 ), . . . , w(s n )) follows a zero-mean multivariate Gaussian distribution with covariance matrix C(θ) = C = (c ij ) where c ij = C(s i , s j | θ). The parametric covariance function C(·, · | θ) is often selected from the Matérn family of functions (Stein 2012), popular due to its versatility to model surfaces with varying degrees of smoothness.
The mixed effects model for the response and the GP model for the random effects can be combined into the hierarchical model: where y denotes the vector formed by stacking the y(s i )'s, and similarly X is the n × p covariate matrix. Equivalently, one can integrate out w from (2) and write y ∼ N (Xβ, C(θ) + τ 2 I). ( Parameter estimation in a frequentist paradigm maximizes the likelihood from (3) with respect to β, τ 2 and θ, while in a Bayesian framework, priors are assigned to these parameters, and one can use either the hierarchical model (2) or the marginal model (3) to obtain posterior inference via Markov chain Monte Carlo (MCMC).

Nearest Neighbor Gaussian Processes for large spatial data
Gaussian Processes encounter computational roadblocks when data is observed at a large number of locations. Both the joint likelihood from the hierarchical model (2) or the data likelihood from the marginalized model (3) involves a multivariate Gaussian distribution with a dense n×n covariance matrix (C(θ) and C(θ)+τ 2 I, respectively). This task involves O(n 2 ) storage, and O(n 3 ) computations (floating point operations or FLOPs), which is infeasible when n is large.
One of the scalable solutions is replacing the GP prior for the spatial random effects with a Nearest Neighbor Gaussian Process prior (Datta et al. 2016). The NNGP prior implies that the random effect vector w is now endowed with a multivariate Gaussian distribution The NNGP covariance function C is constructed from the original covariance function C and ensures that the matrix C(θ) −1 is sparse and, consequently, the likelihood (4) can be evaluated using only O(n) memory and storage. This makes NNGP a scalable replacement for the full GP, while delivering inference almost indistinguishable from the full GP. The NNGP can, in fact, be looked upon as a special case of a Gaussian Markov Random Field (GMRF; Rue and Held 2005) with a specific neighborhood structure defined through a directed acyclic graph. It is better motivated using likelihood approximation ideas in Vecchia (1988) and Stein, Chi, and Welty (2004) rather than GMRFs and has the advantage of providing sufficiently accurate approximations to Gaussian random fields without requiring mesh-based finite element approximations of SPDE representations of Gaussian random fields (as done in Lindgren, Rue, and Lindström (2011)).

Latent NNGP
The original implementation of the NNGP model proposed in Datta et al. (2016) used a fully Bayesian hierarchical specification for running an MCMC algorithm, where, all the parameters (w, β, θ and τ 2 ) are updated in a Gibbs sampler. Use of Gaussian priors for β and the variance components ensure that they produce conjugate full conditionals in the Gibbs sampler. The remaining covariance parameters are updated using a Metropolis random-walk step. The full conditional distribution for w from (5) is given by Unfortunately, this block update of w is not practical. While the full conditional precision matrix C(θ) −1 +I/τ 2 has the same sparsity as C(θ) −1 , unlike C(θ) −1 , its determinant cannot be calculated in O(n) FLOPs. Instead, Datta et al. (2016) recommended sequentially updating the full conditionals w i | · for i = 1, . . . , n and outlined an algorithm that accomplishes this entire sequence of updates in O(n) FLOPs. We refer to this NNGP model as the latent NNGP.

Response NNGP
The latent NNGP algorithm involves running an MCMC of dimension O(n) where each of the n parameters w i are sequentially updated. While this ensures linear scalability of the NNGP model with sample size, it can also imply very slow convergence of the highdimensional MCMC. Instead of using NNGP for the latent Gaussian process w(s),  directly considered the marginal Gaussian process for the response, i.e., {y(s)} ∼ GP (x(s) β, Σ(·, ·)) where the marginalized covariance function Σ is specified as Σ(s i , s j ) = C(s i , s j | θ) + τ 2 δ(s i , s j ), δ denoting the Kronecker delta. Since the covariance function of an NNGP can be derived from any parent GP,  replaced the covariance function Σ with its NNGP analogue Σ yielding the response model where Σ is the NNGP covariance matrix for derived from Σ = C(θ) + τ 2 I.

Conjugate NNGP model
The conjugate NNGP algorithm is an implementation of the response NNGP model which fixes certain spatial covariance parameters leading to exact (MCMC-free) posterior Bayesian inference. Recall that under the full-GP specification, the covariance function for y(s) is specified as Σ(s i , s j ) = C(s i , s j | θ) + τ 2 δ(s i , s j ). Usually, the covariance functions C(·, · | θ) can be expressed as σ 2 R(·, · | φ) where σ 2 is the marginal variance, and R is the correlation function parameterized by φ, i.e., θ = (σ 2 , φ). Rewriting τ 2 = ασ 2 , we have Σ(s i , s j ) = σ 2 (R(s i , s j | φ) + α δ(s i , s j )). The conjugate NNGP model fixes φ and α, and generates the NNGP covariance function approximation M (·, · | α, φ) of R(·, · | φ) + α δ(·, ·). This implies we have the following marginal model where M = M(φ, α) is a known covariance matrix once φ and α are fixed. The model in (7) is the standard Bayesian linear model with only unknowns β and σ 2 . Using a Normal-Inverse-Gamma prior for (β, σ 2 ) leads to conjugate Normal-Inverse- The fixed values of φ and α are either chosen based on a variogram or can be selected in a more formal fashion using cross-validation. While the cross-validation enforces multiple runs of the conjugate model for a grid of values of φ and α, these runs can proceed in parallel. Also, φ is usually one or two dimensional, hence, there are only 2 or 3 tuning parameters for the model and empirical results in  suggest that a crude-resolution grid often suffices to yield accurate predictive inference. Empirical comparisons in  and also in Heaton et al. (2019) suggest that the conjugate NNGP is orders of magnitude faster than the MCMC-based NNGP algorithms and also other competing big-spatial data methods while delivering highly competitive prediction performance.

Binomial response
All current implementations of the NNGP model assume that the response is Gaussian. We here propose an implementation of a Bayesian NNGP model for Binomial response. Note that the response and conjugate NNGP model explicitly rely on Gaussianity to derive the marginal distribution for the response y. Such closed form marginal distributions are not available for non-Gaussian responses. However, conceptually we can extend the latent NNGP model of (5) to non-Gaussian settings. Assuming y( is modeled as a NNGP, the joint likelihood for this model is given by: Since w is given an NNGP prior, this likelihood can still be evaluated using O(n) memory and FLOPs. However, unlike the latent NNGP model for Gaussian case, where the full conditional distributions w(s i ) | · used in the Gibbs updates were Gaussian distributions, for the non-Gaussian case these full conditionals w(s i ) | · do not belong to any standard family. Alternatively, one can resort to Metropolis-Hastings (MH) random walk updates for each w(s i ). Introducing n MH random walk updates in every iteration would exacerbate the slow convergence issues already plaguing the latent NNGP model.
To eschew the numerous Metropolis random-walk updates, we here propose a Gibbs sampler for the latent NNGP model for Binomial responses exploiting the Pólya-Gamma dataaugmented sampler of Polson et al. (2013). Like before, we write C(θ) = σ 2 R(φ) where R is the NNGP approximation of the GP correlation matrix. We then assume a N (µ, V) prior for β, IG(a, b) prior for σ 2 , and p(φ) prior for the other covariance parameters φ. Letting κ(s i ) = y(s i ) − n(s i )/2 we introduce the augmented data ω = (ω(s) 1 ), . . . , ω(s n )) . We can then write the conditional likelihood: We see that using the augmented data ω, this likelihood for the Binomial mixed linear model is similar to the usual likelihood from a spatial mixed linear model with NNGP using responses y(s i ) * = κ(s i )/ω(s i ) and heteroskedastic variances τ 2 (s i ) = 1/ω(s i ).
An immediate consequence is that the updates for β and w are analogous to the updates in the NNGP with y(s i ) replaced with y(s i ) * and the constant nugget τ 2 replaced by τ 2 (s i ).
Note that for y(s i ) * and τ 2 (s i ) are functions of the augmented data ω and will change in every iteration. Other than that, the updates remain exactly the same as in the latent model.
The only additional update we need to do for non-Gaussian responses is that of the ω(s i )'s. We update these using the full conditionals: where P G(b, z) denotes the Pólya-Gamma random variable with shape parameter b and tilting parameter z (see Polson et al. 2013, for more details about this distribution).
This completes a Gibbs sampler for spatial GLMM with Binomial responses where, like the Gaussian case, every parameter except the covariance parameters (φ) have closed form full conditionals. The memory and storage requirements also remain O(n) thereby ensuring that the Binomial NNGP model is a viable candidate for analysis of massive non-Gaussian spatial data. Next we consider generating replicate data at the observed data locations. This is often used for model checking and model adequacy evaluations. We note that while both the latent model (2) and the response model (3) lead to the same marginal distribution for y, the two models differ fundamentally on the definition for replicate data. For the hierarchical model, conditional on β, θ and w's, the y(s i )'s are independent with iid N (0, τ 2 ) distributed error terms. Hence, for each post-burn-in sample of the parameters, replicates at each data location can be generated as

Fitted values, replicates, and predictions
, τ 2 (l) )}. For the response model, however, we can think of y as a single multivariate observation from N (Xβ, Σ). Things are further complicated when switching to the NNGP covariance matrix Σ as, unlike Σ, we cannot express Σ as sum of a purely spatial covariance matrix and a diagonal matrix of error variances. Hence, replicates for the marginalized model are basically new multivariate draws from N (Xβ (l) , Σ(θ (l) )). Unlike the replicates from the hierarchical model which are expected to exhibit strong spatial alignment with the observed data owing to the correlation induced through the use of the common w's, the replicates for the response model are only correlated with the observed data through β and θ and hence are not expected to have similar spatial contours as the observed data. For the conjugate NNGP, which also relies on the marginalized model, the replicates are draws from N (Xβ Finally, we turn our attention to predictions at a new location s. For the latent model, predictions are also generated hierarchically by first generating samples of w(s) (l) | w(N (s)) (l) , θ (l) . This conditional distribution is Gaussian with mean and variance being the kriging mean and variance at s based on a set of nearest neighbors N (s). The exact expressions for these quantities are specified in Algorithm 2 of . Subsequent to generating samples of w(s), we generate replicates for the response as y(s) (l) | w(s) (l) , β (l) , τ 2 (l) ∼ N (x(s) β (l) + w(s) (l) , τ 2 (l) ). For the response model, the prediction algorithm is presented in Algorithm 4 of  and involves directly generating samples of y(s) (l) | y(N (s)), β (l) , θ (l) . This is once against a conditional normal distribution equivalent to kriging at s given its neighbors N (s) for the response process y(·). For the conjugate model, exact predictive distributions are available as y(s) | data following a t-distribution with location and scale parameters provided in Algorithm 5 of . Finally, for Binomial responses, predictions follow the same strategy as the latent model for Gaussian responses as both models rely on the hierarchical latent variable formulation. We initially generate samples of w(s) (l) | w(N (s)) (l) , θ (l) followed by generating Binomial samples for the response as where n(s) denotes the total count at s.

Software features
Two model fitting functions spNNGP and spConjNNGP, with associated support functions, offer users a set of efficient regression and prediction tools that implement the methods outlined in Sections 2.2 and 2.3. Following from Section 2.2, spNNGP provides full MCMC-based inference for the latent (Section 2.2.1) and response (Section 2.2.2) spatial regression models. The latent model can be specified for Gaussian and Binomial (via the Pólya-Gamma distribution Section 2.3) outcome variables, while the response model only fits Gaussian outcome variables. The highly efficient MCMC-free conjugate NNGP algorithm used for Gaussian outcome variables described in Section 2.2.3 is available via the spConjNNGP function.
spNNGP and spConjNNGP as well as their support functions (fitted for generating regression fitted and replicated data, spDiag for computing model diagnostics, and spPredict for sampling from posterior predictive distributions) are written in C/C++ using R's foreign language interface and offer parallelization using openMP (Dagum and Menon 1998).
As noted previously, our aim was to provide software capable of handling data sets with 10s to 100s of millions of locations. This aim was met by minimizing memory requirements, and taking advantage of parallelization where possible. We attempted to minimize memory requirements in several ways. First, we avoid making copies of input data, e.g., the regression design matrix, spatial coordinate matrix, etc. Second, we do not hold any euclidean distance or subsequent spatial covariance vectors and matrices in memory, but rather compute them when needed. For example, all sampling algorithms require the computation of the spatial covariance vector between each observed location and its set of nearest neighbors, and the among neighbors spatial covariance matrix. For any given observation this storage requirement is not large, i.e., at most a m length vector and upper or lower triangle of an m × m matrix, where m is the number of neighbors (which is small, e.g., m = 15). However, when n is large, storing n m length vectors and m×m matrices is substantial. Therefore, we made the decision to compute these covariance vectors and matrices "on the fly" for each observation and for each iteration of the given sampler. To do this efficiently, we find each observation's neighbor set via a fast code book search described in Ra and Kim (1993) which is modified to accommodate the ordering imposed on the neighbor graph described in Datta et al. (2016). The neighbor search is performed only once at the beginning of the program, and the result is n vectors of at most m integers that record the data row index for each observation's neighbors (i.e., the indexes in the input outcome vector and corresponding design and coordinate matrices).
Given the conditional ordering required by NNGP (see Datta et al. 2016), the neighbor graph can be recorded as a sparse lower-triangular matrix where each row is an observation and the non-zero elements in each row hold the neighbor indexes. Within the software, this matrix is held in Compressed Row Storage (CRS) format to minimize memory use and neighbors' data retrieval time. The CRS format does not store zeros and organizes non-zero matrix elements ordered by row in contiguous memory. This is advantageous in our setting because it makes traversing all observations' neighbor indexes as simple as a single loop over the vector that holds the CRS matrix's non-zero elements. Given the index for a given neighbor, retrieving its spatial coordinates, design matrix, and outcome value is further simplified because all input data are stored in contiguous memory column-major format, which allows for fast retrieval of data using integer indexing. This straightforward looping and use of integer indexing to retrieve neighbor input data facilitates parallelization using an openMP omp for pragma at several points in the sampling algorithms. Figure 1 provides an example of a neighbor graph and the corresponding sparse matrix that we store in CRS format. Following the sampling algorithms in Datta et al. (2016) and , the response and conjugate models require only each observation's neighbor information; however, the latent model sampling algorithm requires information on which observations have a given observation as a neighbor. Or put another way, we need to know which neighbor sets each observation is a member of. This information is easily accessed by noting the row index of each column's non-zero elements in Figure 1(b), e.g., observation 1 is in the neighbor sets for observation 2, 3, and 4. Efficiently accessing and traversing these indexes is done in the software by converting the CRS neighbor index matrix to a Compressed Column Storage (CCS) format, which effectively reorders the needed row indexes in contiguous memory by column. Storing all vectors and matrices in contiguous memory column-major format also simplifies calls to Fortran Basic Linear Algebra Subprograms (BLAS; www.netlib.org/blas) and Linear Algebra Package (LAPACK; www.netlib.org/lapack), which are used for all computationally intensive matrix operations. Hence, additional speed-up can be realized if R calls a threaded implementation of BLAS, e.g., openBLAS (Zhang 2016) or Intel's Math Kernel Library (Intel 2019), while working on a multiple processor computer.  provide details about where and how each sampling algorithm uses parallelization.

Analysis of simulated data
The basic functionality of spNNGP and spConjNNGP, along with their support functions, are illustrated using a small n=5000 simulated data set with locations distributed at random within a unit square domain. This data set was kept purposely small so the NNGP models could be compared with the full GP model. The Gaussian outcome variable was generated from (2) with latent w centered on zero and covariance matrix elements c ij = σ 2 exp(−φ||s i − s j ||), where φ=6 and σ 2 =1. The columns in the design matrix comprise an intercept and variable x which was drawn from a Normal distribution with mean zero and variance 1. The regression coefficients were β = (1, −0.1) and measurement error τ 2 =0.1. An additional n 0 =2500 observations on a grid were set aside to illustrate prediction.

MCMC-based inference
A call to spNNGP generates samples from parameters' posterior distributions and, optionally, corresponding samples of regression fitted values and replicates via composition sampling as described in Section 2.4. Similar to R's lm function, the desired model is passed to spNNGP via the formula argument. The model's outcome vector and design matrix components are then found in a data frame passed to the data argument or, if data is not specified, taken from the calling environment. Additionally, the locations corresponding to the observed data are passed as a matrix via the coords argument. One can also pass coords a character vector of column names in data's data frame.
Specifying the latent or response model is done via the method argument. As detailed in , for the latent model (initialized by setting method="latent") β, w, σ 2 , and τ 2 are updated from their respective full conditional distributions via a Gibbs algorithm. The spatial correlation function decay parameter φ and, if cov.model="matern" (where cov.model specifies the desired spatial correlation function), smoothness parameter ν are updated via a MH algorithm. If method="response", only β has an efficient closed form full conditional Gibbs update and the remaining parameters σ 2 , τ 2 , φ, and perhaps ν are updated via MH. For both models, starting values and prior distributions must be specified for all parameters (with the optional exception of β and w) via the starting and priors arguments, as illustrated below. Those parameters updated via MH are transformed to have support on the real line so that a Normal proposal distribution can be used. The variance of the parameter specific proposal distribution is controlled via the tuning argument. Tuning values should be selected to maintain the desired MH acceptance rate (in general we aim for ∼25-50 percent, see Roberts and Rosenthal (2009) and Gelman, Carlin, Stern, Dunson, Vehtari, and Rubin (2013) for guidance). Information about acceptance rate and other model specifics is printed to the terminal when verbose=TRUE.
As described in Datta et al. (2016) and , to implement a NNGP model the user must impose an ordering on the neighbor graph and specify the number of neighbors to consider in each observations' conditional set. By default, spNNGP and spConjNNGP order in increasing value of the first column of the matrix passed to coords, which, in most settings, produces an excellent approximation to a full GP (see supplemental experiments in Datta et al. (2016)); however, as demonstrated by Guinness (2018b) improvements can be achieved with different ordering designs. If the user wishes to try different orderings, they can be passed as an integer index vector via the optional ord argument. For illustration, we bypass the default ordering in the call to spNNGP below and order using the sum of locations' x and y coordinate values. The number of neighbors to consider is controlled by the n.neighbors argument. As demonstrated in Datta et al. (2016) 15 neighbors is usually sufficient; however, depending on the data, one can achieve a good approximation to the full GP with as few as five neighbors. If n is large, selecting fewer neighbors can result in substantial decrease in runtime.
Users can choose a slow brute force or fast code book nearest neighbor search algorithm by setting the search.type argument to "brute" or "cb", respectively. The fast code book search is a modified version of the algorithm detailed in Ra and Kim (1993). If locations do not have identical coordinate values on the axis used for the nearest neighbor ordering then "cb" and "brute" should produce identical neighbor sets. However, if there are identical coordinate values on the axis used for nearest neighbor ordering, then the search algorithms might produce different, but equally valid, neighbor sets. If n is large (e.g., a million or more), constructing the nearest neighbor sets can take a long time even when search.type="cb".
To save time, the neighbor set can be reused in subsequent model calls if the values passed to coords, ord, and n.neighbors do not change. Setting return.neighbor.info = TRUE in spNNGP or spConjNNGP returns the necessary neighbor information in an object called neighbor.info. Then passing this object to the optional neighbor.info argument in subsequent calls to spNNGP or spConjNNGP avoids the costly nearest neighbor search. The information in neighbor.info can also be used if one wishes to plot the neighbor sets (see example code in the spNNGP manual page).
The call to spNNGP in the code below generates 2000 MCMC samples using the latent model (with the n.samples argument specifying the desired number of samples). The n.report argument defines the sample interval for reporting the MH acceptance rate, which in this case is once every 1000 samples. The fit.rep=TRUE indicates that we are requesting regression fitted and replicate data be generated via composition sampling (i.e., one-for-one with each MCMC sample). The argument sub.sample specifies that we only want regression fitted and replicate samples starting at MCMC sample 1000 and for each subsequent sample, i.e., start=1000 (the list passed to sub.sample can also define end and thin for additional control over MCMC chain samples used in the computations).
The computer used to conduct this analysis has multiple processors and R compiled with openMP. Therefore, setting n.omp.threads=4 should decrease runtime, see Section 3.2 for more details on computing time.
Objects returned by spNNGP and other functions in the package use S3 methods summary and print. As illustrated later, S3 methods for fitted and residuals are also implemented.
The print method prints the initial call to the function as well as some model and sampler specifics. The default behavior of summary is to print posterior summaries for model parameters using samples from the second half of the MCMC chains. The values for arguments sub.sample and quantiles passed to summary allow for finer control on posterior summaries. Alternatively the user can access all posterior samples as coda (Plummer, Best, Cowles, and Vines 2006) mcmc class objects in the spNNGP return objects' p.beta.samples, p.theta.samples, and p.w.samples, which as the names suggest hold β, θ, and w samples, respectively.

summary(sim.s)
Call: spNNGP(formula = y~x, coords = coords, method = "latent", n.neighbors = 10, starting = starting, tuning = tuning, priors = priors, cov.model = cov.model, n.samples = n.samples, n.omp.threads = 4, ord = ord, return.neighbor.info = TRUE, fit.rep = TRUE, sub.sample = list(start = 1000), n.report = 1000) Model class is NNGP latent gaussian family. As requested by setting fit.rep=TRUE the spNNGP return object also holds samples for the regression fitted values, labeled y.hat.samples, and replicated data, labeled y.rep.samples. For convenience, the median and lower and upper 95% credible intervals for MCMC samples at each location are provided in y.hat.quants and y.rep.quants. These samples and corresponding summaries can be accessed directly in the spNNGP return object or extracted using the S3 fitted method. Beyond simply extracting the regression fitted values and replicated data from spNNGP and other model objects in the package, the fitted function performs additional composition sampling if the requested MCMC sample subset differs from the one initially specified in the model call. For example, the initial call to spNNGP specified sub.sample=list(start=1000), but if later we decide we want regression fitted values and replicated data for every 10 t h MCMC sample starting at sample 100, a call to fitted(sim.s, sub.sample=list(start=100, thin=10)) would generate the desired subset (the residuals function provides the same behavior).

MCMC-free inference
The conjugate model is called using the spConjNNGP function. Fixed α, φ, and perhaps ν are specified using a named vector passed to the theta.alpha argument. Alternatively, a K-fold cross-validation (where K is set via the k-fold argument) is used to discover the "optimal" set of parameters if theta.alpha is passed a matrix with columns named alpha, phi, and perhaps nu. The "optimal" set of parameter values (i.e., a row in theta.alpha) is the one that minimizes the average value of the specified scoring rule over the K folds. This scoring rule is set via the score.rule argument with options "rmspe" and "crps" for root mean squared prediction error (RMSPE) and continuous ranked probability score (CRPS; Gneiting and Raftery 2007). The K-fold cross-validation progress is printed to the screen as illustrated below. Once the optimal parameter set is identified, a final model is fit using all the available data. The description of this final model is given in the Model description section followed by the optimal set of α, φ, and, if the Matérn correlation model is used, ν.
The printout following the call to spConjNNGP below also includes a section called Computing replicates that reports on the exact sampling from the model parameters and regression fitted values posterior distributions, and generation of replicated data. Posterior sampling and generation of replicated data is optional and controlled by the fit.rep and n.samples arguments, with n.samples being set to the number of desired samples to collect. When fit.rep=TRUE, spConjNNGP effectively calls the S3 method fitted function for the conjugate model class. Hence, if posterior samples are not collected in the initial call to spConjNNGP or a different number of samples is needed, then a call to fitted using the spConjNNGP object will generate the required samples.

Number of covariates 2 (including intercept if specified).
Using the exponential spatial correlation model. Results from the K-fold cross-validation are returned by spConjNNGP and held in the k.fold.scores matrix which is a copy of theta.alpha with additional columns for K-fold RMSPE and CRPS as illustrated below. k.fold.scores is useful for assessing predictive performance sensitivity to choice of covariance parameters. Figure 2(a) was created by plotting the search grid parameter values and their resulting minimum RMSPE. Note, the call to spConjNNGP specifies score.rule = "rmspe" which means the "optimal" α, φ, and, if cov.model = "matern", ν will be the set that minimizes RMSPE. If one sets score.rule = "crps" the k.fold.scores can be used to identify the set of covariance parameters that minimize CRPS (the result of which is illustrated in Figure 2(b)).
The S3 summary method provides parameter point estimates using the optimal set and, if n.samples is specified, posterior summaries. The sub.sample argument can be used in summary if the spConjNNGP fit.rep=FALSE or if you wish for a posterior summary for a different number of n.samples. If sub.sample is specified in summary, the function returns a matrix of the requested number of samples. The output from summary(sim.c) which uses the initial number of 200 samples is given later in Table 1.  Figure 2: Simulated data analysis spConjNNGP parameter search grid and K-fold crossvalidation results for RMSPE (a) and CRPS (b) scoring rulese. The "optimal" parameter combination is circled. A plus symbol identifies the "true" α and φ used to generate the data.

Model diagnostics and prediction
When passed a spNNGP or spCongNNGP object the spDiag function returns a list that, depending on the model method, includes some or all of the following elements: DIC a data frame holding the Deviance information criterion (DIC) and associated values defined by Spiegelhalter, Best, Carlin, and Van Der Linde (2002). The DIC data frame includes rows labeled DIC the criterion (lower is better), D a goodness of fit, and pD the effective number of parameters.
WAIC a data frame holding Watanabe-Akaike information criteria (WAIC) and associated values. The WAIC data frame includes rows labeled LPPD log pointwise predictive density, P.1 penalty term defined in unnumbered equation above Equation (11) in Gelman et al. (2013), P.2 an alternative penalty term defined in Equation (11), and the criteria WAIC.1 and WAIC.2 (lower is better) computed using P.1 and P.2, respectively.
GPD a data frame holding the values needed to compute the predictive criterion D = G + P defined by Gelfand and Ghosh (1998). The GPD data frame includes rows labeled G a goodness of fit, P a penalty term, and D the criterion (lower is better).
GRS a scoring rule, see Equation 27 in Gneiting and Raftery (2007) for details, with larger values of GRS indicating better model fit.
Among the four model comparison metrics, DIC and WAIC rely on the assumption that given all the parameters (including latent ones), the datapoints are conditionally independent. The response and conjugate NNGP models do not preserve this conditional independence structure and do not provide samples from the latent effect. Hence, it is not appropriate to compute WAIC and DIC for them. Comparisons across the models using GPD and GRS scores require generating replicate data. We have discussed in Section 2.4 how replicates have fundamentally different interpretation for the latent and response models. For the former, the replicates are generated conditional on the latent random effects and hence are spatially correlated with the original data, whereas for the marginalized response model, the replicate is simply a new realization of a multivariate Gaussian distribution with the same mean and covariance structure as the original data. Hence, generally the GPD and GRS scores will be better for the latent models (as is evident in Table 2). It is not advisable to compare the latent and response NNGP models using GPD and GRS as they represent different principles of replication. Finally, we can compare the response model with the conjugate model using GPD and GRS as they both use the same form of replicates. However, the conjugate models uses cross-validation to tune hyper-parameters, violating the principles of all these model comparison metrics tailored for classical Bayesian procedures, and it is difficult to interpret the model comparison values for it.
The code below calls spDiag for the spNNGP latent and spConjNNGP model output sim.s and sim.c, respectively. Additionally, for comparison, we ran spNNGP for the response model (i.e., by setting method="response") and called the resulting object sim.r. Fit diagnostics for all three models are given in Table 2. However, as shown in Table 1, all models recover the "true" parameter values well, and Figure 4 shows all models produce comparable predictive surfaces.
s.diag <-spDiag(sim.s) r.diag <-spDiag(sim.r) c.diag <-spDiag(sim.c) Given the discordance in the interpretation of the traditional model comparison metrics for the NNGP models, a pragmatic way to compare them is based on their predictive performance on a hold out set. The spPredict function is used to generate posterior predictive samples for new locations with associated covariates, given a spNNGP or spConjNNGP object. The code below generates posterior predictive samples for the n 0 =2500 holdout locations using the latent model. The latent model is the only method that provides posterior predictive samples for the latent effect w. These samples are held in the p.w.0 matrix in the s.pred object generated below, and a summary of these samples along with the "true" w are given in Figure 3. The spPredict function was also called for the response sim.r and conjugate sim.c model objects to generate posterior predictive samples for the holdout locations (held in output object p.y.0) along with subsequent surface summaries in Figure 4.

Number of covariates 2 (including intercept if specified).
Using the exponential spatial correlation model.

Timing given n and number of CPUs
Here we provide a brief overview of the relationship between n, model type, and number of cores. The computer used for these runtime experiments (and analysis in subsequent sections) is running a linux operating system with two Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz chips each with 18 cores and R compiled with openMP with thread-enabled openBLAS (Zhang 2016). Timings generated by proc.time() is returned by core spNNGP functions in the run.time vector. Figure 5(a) provides a sense of runtime (i.e., "wall time") needed to collect 1000 MCMC samples for n=100000 using the response and latent algorithms for a range of cores. Execution time for the conjugate model is about equal to one MCMC iteration of the response model. As this figure shows, for this computer and n, there is no speed-up beyond ∼18 cores mostly due to communication overhead. Then fixing the number of cores at 18, Figure 5(b) gives a sense of computing time required for different size n.

Analysis of forest canopy height
In this section we analyze a forest canopy height dataset at n=188,717 locations using spN-NGP. Digital maps of forest structure are key inputs to many ecosystem and Earth system modeling efforts (Finney 2004;Hurtt, Dubayah, Drake, Moorcroft, Pacala, Blair, and Fearon 2004;Stratton 2006;Lefsky 2010;Klein, Randin, and Korner 2015). These and similar applications seek inference about forest canopy height variables and predictions that can be propagated through computer models of ecosystem function to yield more robust error quantification. Given the scientific and applied interest in forest structure, there is increasing demand for wall-to-wall (i.e., complete domain coverage) forest canopy height data at national and biome scales. Next generation LiDAR systems capable of large-scale mapping of forest canopy characteristics, such as ICESat-2 (Abdalati, Zwally, Bindschadler, Csatho, Farrell, Fricker, Harding, Lefsky, Markus, Marshak, Neumann, Palm, Schutz, Smith, Spinhirne, and Webb 2010; ICESat-2 2015), Global Ecosystem Dynamics Investigation LiDAR (GEDI 2014), and NASA Goddard's LiDAR, Hyperspectral, and Thermal (G-LiHT) Airborne Imager (Cook, Corp, Nelson, Middleton, Morton, McCorkel, Masek, Ranson, Ly, and Montesano 2013), sample forest features using LiDAR instruments in long transects or cluster designs (see, e.g., the strips of LiDAR in Figure 6(a)). These next generation systems yield LiDAR data over the desired large spatial extents; however, the sparseness of the LiDAR sampling designs means prediction is required to deliver the desired wall-to-wall data products.
Our goal is to create high spatial resolution forest canopy height predictions, with accompanying uncertainty estimates for the Bonanza Creek Experimental Forest (BCEF; https: //www.lter.uaf.edu) located in interior Alaska, USA. The BCEF domain delineated for this study, Figure 6(a), is ∼21,000 ha and includes a section of the Tanana River floodplain along the southeastern border. The BCEF is a mixture of non-forest and forest vegetation featuring white spruce, black spruce, tamarack, quaking aspen, and balsam poplar trees mixed with willow and alder shrubland species Bonanza Creek LTER (2019). Figure 6(a) also shows location of the n=188,717 G-LiHT LiDAR forest canopy height (FCH) estimates. These data are included in the spNNGP package.
The 188,717 FCH estimates come from the G-LiHT LiDAR point cloud summarized to a 13 × 13 m grid cell size (G-LiHT: Goddard's LiDAR, Hyperspectral & Thermal Imager 2019). Over each grid cell, the maximum canopy height (i.e., FCH) was estimated using the 100 th percentile height of the point cloud. A Landsat derived percent tree cover (PTC) data product developed by Hansen, Potapov, Moore, Hancher, Turubanova, Tyukavina, Thau, Stehman, Goetz, Loveland, Kommareddy, Egorov, Chini, Justice, and Townshend (2013), shown as the underlying surface in Figure 6(a) is used as a predictor variable for FCH. PTC is the percent tree cover estimates for peak growing season in 2010 and was created using a regression tree model applied to Landsat 7 ETM+ annual composites. A semivariogram of the non-spatial regression model residuals can inform how the residual spatial/non-spatial variance (i.e., outcome variance not explained by the regression mean) is partitioned and the spatial range, see, e.g., Chapter 5 in Banerjee, Carlin, and Gelfand (2014) for details. Here we consider the residuals from where y is the vector of observed FCH estimates, β 0 is an intercept, β P T C is the slope coefficient associated with the PTC predictor variable denoted as x, and ε is the n × 1 vector following N (0, τ 2 I n ). In the subsequent analyses we use an exponential spatial correlation function that approaches zero as the distance between locations increases. Therefore we define the distance, d 0 , at which this correlation drops to 0.05 as the "effective spatial range," which allows us to solve φ = − log(0.05)/d 0 ≈ 3/d 0 . Using the variog and variofit functions in the geoR package (Ribeiro Jr and Diggle 2018), the semivarogram and empirical parameter estimates for the BCEF are given in Figure 6(b). Due to computational constraints we used a random subset of 25,000 residuals from (10) to generate the variogram.

Estimation and prediction
Here we consider the non-spatial, latent, response, and conjugate models for BCEF data.
Posterior samples for the non-spatial regression model were generated using the bayesLMRef function in spBayes. Models are assessed using output from spDiag. Further, out-of-sample predictive performance was assessed by fitting the models to 100,000 observations (selected at random from the 188,717) and then predicting for the remaining holdout 88,717 observations. Finally, the models are used to predict FCH for a grid of 237,617 locations over the BCEF where PTC was recorded and resulting maps are compared.
Number of MCMC samples 5000.
Posterior summaries are given in Table 3. As suggested by the exploratory variogram analysis, and now confirmed with formal model estimates, the spatial range is quite short, e.g., the latent model estimate of the median effective spatial range is ∼0.43 km (i.e., − log(0.05)/6.91). Despite this short range, the spatial variance is large relative to the non-spatial variance. Such results are not surprising given the BCEF's composition and structure are the result of myriad large and small spatial scale biotic (e.g., insect disturbance) and abiotic (e.g., soil, topography, climate, wind, fire) factors that cause spatially complex mortality and regrowth patterns. Formal model fit diagnostics provided in Table 4 suggests the addition of the latent spatial effect does improve fit compared with the non-spatial model.  Table 4: BCEF data analysis model fit via spDiag and out-of-sampled prediction diagnostics for the non-spatial and three spatial model types fit to the BCEF data. The last four rows were calculated using prediction for the holdout set. The row labeled CI Cover is the percent of 95% posterior predictive distribution credible intervals that cover the observed holdout value. The row labeled CI Width is the average width of the 95% posterior predictive credible interval. Lastly, and as illustrated in the synthetic data analysis, a call to spPredict yields posterior predictive samples for the entire domain of interest which in this case is sampling over the grid of n 0 =237617 locations where only PTC was recorded. A surface of the posterior distributions' mean and variance are given in Figures 8(a) and 8(b), respectively. Locations where observations are available to inform prediction (i.e., along flight lines) are clearly delineated by high prediction precision in the prediction variance surfaces Figure 8(b). Given the relatively short spatial range this information borrowing to inform prediction does not extend too far off of the flight lines. These surfaces along with the out-of-sampled prediction performance metrics given in the last four rows in Table 4 suggest there is not too much difference among the spatial models.

Analysis of species distributions
In this Section, we present analysis of Species distribution using the Binomial NNGP model. Species distribution models (SDMs) project the outcome of community assembly processes dispersal, the abiotic environment, and biotic factors onto geographic space (Guisan and Zimmermann 2000;Pulliam 2000). Here, we reanalyze data recently presented in Lany, Zarnetske, Finley, and McCullough (2019) to develop a SDM for eastern hemlock (Tsuga canadensis L.) coded as TSCA in subsequent analysis. The data comprise hemlock occurrence (Binomial outcome) on 17,743 forest stands across Michigan, USA. A set of predictors were also observed at each stand and subsequently used to explain the probability of hemlock occurrence. Predictor variables included minimum winter temperature (MIN), maximum summer temperature (MAX), total precipitation in the coldest quarter of the year (WIP), total precipitation in the warmest quarter of the year (SUP), annual actual evapotranspiration (AET) and annual climatic water deficit (DEF).
We consider three candidate models: 1) non-spatial logistic regression using the Pólya-Gamma data-augmented sampler of Polson et al. (2013) as implemented in the our spNNGP PGLogit function; 2) logistic regression with a space-varying Gaussian Predictive Process (GPP) random effect Banerjee et al. (2008) using the spBayes' spGLM function; 3) logistic regression with space-varying NNGP again via the Pólya-Gamma sampler invoked using the family="binomial" argument in the spNNGP function as illustrated in the code below. Models were fit using n=15,000 observations and assessed using goodness of fit metrics and out-of-sampled predictive performance based on a holdout set of n 0 =2,743.
Like a spNNGP class object, the return object from the PGLogit function can be passed to spDiag to yield model diagnostics as illustrated in the code below. Both PGLogit and spNNGP with faimly="binomial" accept different number of trials for each location, i.e., n i in Section 2.3, which is passed via the weights argument; however, for the current setting we accept the default of all weights equal to 1, i.e., each stand yields either presence or absence of hemlock.
Number of MCMC samples 10000.
Priors and hyperpriors: beta flat. sigma.sq IG hyperpriors shape=2.00000 and scale=10.00000 phi Unif hyperpriors a=0.01000 and b=0.30000 Source compiled with OpenMP support and model fit using 10 thread(s). Parameter estimates for the three models are given in Table 5, and shows that a number of the predictor variables help explain the probability of hemlock occurrence across the study area. Due to possible spatial confounding (Hanks, Schliep, Hooten, and Hoeting 2015), we should interpret the sign and significance of these regression parameters in the spatial models with caution. Within sample fit diagnostics given in Table 6 and out-of-sample receiver operating characteristic (ROC) prediction curves (based on n 0 =2,743 holdout locations) given in Figure 9, suggest the latent spatial variables improve the species distribution models over that of the non-spatial model. Among the spatial models, the NNGP model outperforms the reduced rank predictive process model. The computing times per 1000 samples for the non-spatial, GPP, and NNGP models are 7, 35, and 19 seconds, respectively, using 10 cores.

Statistical gap-filling of a massive remotely sensed data
As reviewed in Section 1, methods and software are now emerging that can readily fit geostatistical models to data sets comprising locations in the 10s to 100s of thousands. However, as the number of observations climb into the millions, inferential and software options are quite limited. It is common to encounter data sets of such size in a variety of fields. For example, remotely sensed data of this magnitude either as gridded or scattered data products is now ubiquitous. Here we consider a massive-scale "gap-filling" exercise of missing Normalized Difference Vegetation Index (NDVI)-a measure of vegetation greenness-data from around 39 million locations. The NDVI data for a LandSat 8 sensor image was taken over Limpopo National Park, Mozambique, Africa on 7/17/2015. The image shown in Figure 10(a) has n 0 =778644 pixels that are missing NDVI (denoted in gray) due to cloud cover during image acquisition. The red box in Figure 10(a) delineates a region with a large number of missing pixels. Filling in missing NDVI values for this and other images in a time series over Limpopo was a step in a larger study conducted by Desanker, Dahlin, and Finley (2019) that looked at environmental drivers in vegetation phonology change. Our aim is to build a geostatistical model to predict the n 0 missing NDVI pixels given the n=38825052 observed NDVI pixels and complete coverage land surface elevation predictor variable shown in Figure 10(b).
Source compiled with OpenMP support and model fit using 18 thread(s).
------------Estimation for parameter set(s) Set phi=0.12000 and alpha=0.01000 Given the size of this data set, the search time to identify nearest neighbor sets can take a considerable amount of time, even when using the Ra and Kim (1993) fast code book algorithm invoked by search.type="cb". In this case the search time was 73.5 minutes (search proc.time() is held in neighbor.info$nn.indx.run.time in the model object). Once the optimal φ and α was found, the runtime for parameter estimation and prediction for the missing pixels was 166.03 minutes. We have experimented with a variety of treebased data structures (e.g., kd-trees modified to accommodate the nearest neighbor search constraints) and associated search algorithms, and believe there are substantial gains in search time efficiency to be had with additional development.
As shown below, a subsequent call to summary returns the optimal parameter set as well as point estimates for the other model parameters. Given the large size of this data set, parameter variance estimates are remarkably small (for some parameters the variance might be smaller than machine precision). The message at the end of the summary output reminds us that posterior samples were not requested in the initial call to spConjNNGP or summary. Generating a reasonable number of posterior samples for parameters would take a few hours for a data set of this size.

summary(ser.c, digits = 8)
Call: spConjNNGP(formula = ndvi~elev, data = ser.mod, coords = c("x", "y"), n.neighbors = 10, theta.alpha = theta.alpha, sigma.sq.IG = c(2, 0.01), cov.model = "exponential", k.fold = 2, score.rule = "crps", X.0 = cbind(1, ser.ho$elev), coords.0 = as.matrix(ser.ho[, c("x", "y")]), n.omp.threads = 18) Model class is NNGP conjugate gaussian family.  For prediction, as illustrated in the previous analyses, we can pass the spConjNNGP object ser.c to spPredict. Alternatively, a slightly more efficient option (which avoids computing observed location covariance with their neighbor set twice, i.e., once for estimating parameters in spCongNNGP and again in subsequent call to spPredict) is to specify the prediction locations and associated design matrix via coords.0 and X.0 in the initial call to spConjNNGP. In this case, ser.c includes mean (y.0.hat) and variance (y.0.hat.var) estimates for the prediction locations. These predicted mean and variance of the mean estimates are shown for the missing pixels delineated by the red box in Figure 10(a) (see Figure 11(a) for a zoomed in view) in Figures 11(b) and (c), respectively. As expected, the prediction variance surface shows that pixels close to observed pixels have higher precision due to borrowing of information through the spatial correlation structure.

Summary
The spNNGP R package provides a suite of NNGP-based (Datta et al. 2016) spatial regression models for both Gaussian and non-Gaussian point-referenced outcomes that are spatially indexed. The package implements the MCMC and MCMC-free algorithms detailed  with the addition of the Pólya-Gamma latent variable model for binomial outcomes. Special care was taken to design algorithms that take advantage of multiprocessor computer via OpenMP and those with threaded BLAS and LAPACK libraries. Our future aim is to add functionality to accommodate multivariate outcomes, where we envisage two settings, first, where a limited number of outcomes (e.g., fewer than 10) might be handled using a NNGP linear model of coregionalization (Gelfand, Schmidt, Banerjee, and Sirmans 2004), second, where the number of outcomes is larger and requires some dimension reduction, e.g., via a NNGP spatial factor model, akin to the model detailed in Taylor-Rodriguez, Finley, Datta, Babcock, Andersen, Cook, Morton, and Banerjee (2019). For such highly multivariate data, besides factor models, we will also explore new multivariate covariance functions using sparse inter-variable graphical models that parsimoniously capture the relationship between multiple variables. Future releases will also provide the flexibility to specify a general space-varying coefficient model like the spBayes spSVC function (Finley and Banerjee 2019).