spTimer: Spatio-Temporal Bayesian Modeling Using R

Hierarchical Bayesian modeling of large point-referenced space-time data is increasingly becoming feasible in many environmental applications due to the recent advances in both statistical methodology and computation power. Implementation of these methods using the Markov chain Monte Carlo (MCMC) computational techniques, however, requires development of problem-specific and user-written computer code, possibly in a low-level language. This programming requirement is hindering the widespread use of the Bayesian model-based methods among practitioners and, hence there is an urgent need to develop high-level software that can analyze large data sets rich in both space and time. This paper develops the package spTimer for hierarchical Bayesian modeling of stylized environmental space-time monitoring data as a contributed software package in the R language that is fast becoming a very popular statistical computing platform. The package is able to fit, spatially and temporally predict large amounts of space-time data using three recently developed Bayesian models. The user is given control over many options regarding covariance function selection, distance calculation, prior selection and tuning of the implemented MCMC algorithms, although suitable defaults are provided. The package has many other attractive features such as on the fly transformations and an ability to spatially predict temporally aggregated summaries on the original scale, which saves the problem of storage when using MCMC methods for large datasets. A simulation example, with more than a million observations, and a real life data example are used to validate the underlying code and to illustrate the software capabilities.


Introduction
Model-based Bayesian analysis methods are becoming popular for taking account of uncertainties in the analysis and spatial and temporal prediction of environmental space-time data.
Practitioners are increasingly benefiting from their ability to reduce uncertainty in the inference statements that arise from joint space-time modeling (Cressie and Wikle 2011). Bayesian methods are also popular because of their ability to combine information from several sources using melding or data fusion (Sahu, Gelfand, and Holland 2010). However, currently there is no suitable software package for Bayesian modeling and analysis of large space-time data. In this paper, our interest is on modeling and analyzing spatio-temporal point-referenced data (Banerjee, Carlin, and Gelfand 2004), where random observations are measured over time at a number of spatial locations, which vary continuously over a study region.
General purpose R software packages, such as MCMCpack (Martin, Quinn, and Park 2011), MCMCglmm (Hadfield 2010), and BLR (de los Campos and Perez Rodriguez 2014), are available for implementing Bayesian models. However, these are not suitable for analyzing spatially correlated data. Packages that can handle spatial correlations include spBayes (Finley, Banerjee, and Carlin 2007;Finley, Banerjee, and Gelfand 2015), geoR (Ribeiro and Diggle 2001), geoRglm (Christensen and Ribeiro 2002), rjags (Plummer 2014) and R2WinBUGS (Sturtz, Ligges, and Gelman 2005) which is an R interface to WinBUGS (Spiegelhalter, Thomas, Best, and Lunn 2003). However, these are not suitable and sometimes complicated for modeling data rich in both space and time, although the package spBayes can model spatially varying short-length time series data. These packages are not intended to handle large, e.g., more than a million, space-time data and these packages do not allow incorporation of popular models in the time series literature such as the auto-regressive models.
Non-Bayesian packages implementing the generalized additive models such as gam (Hastie 2014) and mgcv (Wood 2006) can also fit models for spatial data by implementing functional relationships between the response and the coordinates of the spatial locations, e.g., latitude and longitude. However, these modeling approaches are not process-based, i.e., do not incorporate random spatial and temporal processes, and we find that our process-based models implemented in spTimer have superior performance in out of sample spatial predictions, see Sections 4 and 5.
The main contribution of this paper is the development of the R package spTimer that is available from the Comprehensive R Archive Network at http://CRAN.R-project.org/package= spTimer. The package enables Bayesian modeling of regularly monitored point-referenced data obtained from a sparse spatial network of monitoring stations. Data at each monitoring station are obtained as a regular time series but may contain missing data. The package is also able to incorporate an arbitrary number of explanatory variables that may vary in both space and time. The residual spatio-temporal variation is modeled using three different recently developed modeling methods appropriate for analyzing space-time environmental monitoring data. All the inferences in this package and the paper are proposed to be under the Bayesian paradigm using MCMC methods.
Using Bayesian computation methods, the package spTimer is able to process, i.e., fit and predict, both spatially and temporally, large space-time data sets that may contain missing data. In so doing, the user is able to choose a particular covariance function from the Matérn family (Cressie 1993) for the underlying Gaussian process and also a suitable method for calculating distances between two locations. In addition, the user can select the hyper-parameters of the prior distributions and is also given the ability to choose the tuning parameters for the implemented MCMC algorithms. The package also allows the user to select one of the two possible (log and square-root) on the fly transformations for the response variable. Another attractive feature of the package is the ability to spatially predict temporally aggregated summaries, e.g., annual mean from daily data, on the original scale, which only requires storage of the annual aggregate, instead of the full length time series, for each prediction location at each MCMC iteration when we fit and predict for large data sets.
Users of spTimer only need to provide the model specifications in the high-level R language, but the main body of the code is developed using the C programming language, that is hidden from the user. This enables faster computation and better data handling capacity than what can be achieved by simply using R. However, once the MCMC iterations have been finished the output can be analyzed using any other contributed R package such as coda (Plummer, Best, Cowles, and Vines 2006). For model selection purposes, the package automatically reports the values of a predictive model selection criteria (Gelfand and Ghosh 1998). Many other utility functions for model validation and output analyzes are also provided. The main functions of the package spTimer are discussed in detail in Section 3.
The first of the three models (see Section 2) implemented in spTimer is a hierarchical nugget effect model together with an independent Gaussian process (GP) model at each time point. The Gaussian process implies a spatio-temporal random effect that captures the space-time interactions, see e.g., Cressie and Wikle (2011, Chapter 6). Overall, this model parallels the spatial random effect model in spatial only data analysis and naturally provides a very simple starting model in many investigations involving space-time data, see Banerjee et al. (2004); Diggle and Ribeiro (2007); and Gelfand, Diggle, Fuentes, and Guttorp (2010). The second implemented model is the hierarchical auto-regressive model for space-time data developed by Sahu, Gelfand, and Holland (2007). An explicit auto-regressive term for the underlying true spatio-temporal process is assumed in a hierarchical set-up that includes the overall nugget effect. This model is included in the spTimer package because of its superior performance in modeling air-pollution data, see e.g., Cameletti, Ignaccolo, and Bande (2009) and Sahu and Bakar (2012a). This auto-regressive (AR) model is modified, as the third and final implemented model, to include the recently developed Gaussian predictive process approximation technique for handling large spatial and spatio-temporal data following Sang (2008), Finley, Sang, Banerjee, andGelfand (2009) and Sahu and Bakar (2012b). This last paper illustrates the capability of spTimer in handling and processing more than a million space-time observations within a reasonable amount of computing time in a standard personal computer. This paper illustrates the package spTimer with two examples. The first is a simulation example (see Section 4) that is used for code verification and illustration of the software capabilities. The highlight of this example is that it simulates more than a million space-time observations from the third model-based on a Gaussian predictive process approximation. It then fits the model and illustrates prediction using MCMC methods. The code for the other two models are verified using a smaller simulation data set. A real data example, modeling the daily 8-hour maximum ozone concentration in the months of July and August 2006 in the state of New York, is used for rapid illustration of the models and methods, see Section 5.

Preliminaries
The Bayesian spatio-temporal models can be represented in a hierarchical structure, where, according to Gelfand (2012), we specify distributions for data, process and parameters in three stages: In the second stage, the process can add different levels, for example in Gaussian process (GP) models (Cressie and Wikle 2011;Gelfand et al. 2010) we have true underlying process in the first level and the spatio-temporal random effect in the second level of the hierarchy. Some illustrations are provided below in Section 2.2. Further examples, based on temporal processes, are given in Sections 2.3 and 2.4. In the third stage of the hierarchy we introduce the prior distribution of the parameters or hyper-parameters.
The models are described for time series data that are segmented using two different units of time, such as hours within days or days within years, to have extra flexibility in modeling and inputting data into the package. This enables us, for example, to model observed ozone concentration levels during the high ozone season (May to September in the United States) in each year for several years without having to model data for the remaining months in each year when ozone concentration levels are low and not harmful (Sahu and Bakar 2012b). However, by default, the package works for modeling data indexed by just one unit of time.
Let l and t denote the two units of time where l denotes the longer unit, e.g., year, l = 1, . . . , r, and t denotes the shorter unit, e.g., day, t = 1, . . . , T l where r and T l denote the total number of two time units, respectively. Let Z l (s i , t) denote the observed point-referenced data and O l (s i , t) be the true value corresponding to Z l (s i , t) at site s i , i = 1, ..., n at time denoted by two indices l and t. Let Z lt = (Z l (s 1 , t), ..., Z l (s n , t)) and O lt = (O l (s 1 , t), ..., O l (s n , t)) . We shall denote all the observed data by z and z * will denote all the missing data. Let N = n r l=1 T l be the total number of observations to be modeled. Throughout, the notation lt = ( l (s 1 , t), ..., l (s n , t)) will be used to denote the so called nugget effect or the pure error term assumed to be independently normally distributed N (0, σ 2 I n ), where σ 2 is the unknown pure error variance and I n is the identity matrix of order n. The spatio-temporal random effects will be denoted by η lt = (η l (s 1 , t), ..., η l (s n , t)) and these will be assumed to follow N (0, Σ η ) independently in time, where Σ η = σ 2 η S η , σ 2 η is the site invariant spatial variance and S η is the spatial correlation matrix obtained from the, often used, general Matérn correlation function (Matérn 1986;Handcock and Stein 1993;Handcock and Wallis 1994) defined as: where Γ(ν) is the standard gamma function, K ν is the modified Bessel function of second kind with order ν, and ||s i − s j || is the distance between sites s i and s j . The parameter φ controls the rate of decay of the correlation as the distance ||s i − s j || increases and the parameter ν controls smoothness of the random field (Banerjee et al. 2004;Cressie 1993). The package spTimer allows several possibilities regarding estimation of the correlation parameters φ and ν that range from fixing them (point mass prior) to estimating them by assuming suitable prior distributions, see Section 2.5 for further details. In addition, spTimer is also able to incorporate the spherical correlation function, see Banerjee et al. (2004).
In the development below, we assume that there are p covariates, including the intercept, denoted by the n × p matrix X lt . Some of these covariates may vary in space and time. The notation β = (β 1 , ..., β p ) will be used to denote the p × 1 vector of regression coefficients. We shall use the generic notation θ to denote all the parameters.

GP model specification
The independent Gaussian process (GP) model is specified hierarchically by: for each l = 1, . . . , r and t = 1, . . . , T l , where we assume that lt and η lt are independent and each are normally distributed with their respective parameters as given in Section 2.1. Let O denote all the random effects, O lt , l = 1, . . . , r and t = 1, . . . , T l . Let θ = (β, σ 2 , σ 2 η , φ, ν) denote all the parameters of this model and let π(θ) denote the prior distribution that we shall specify later. The logarithm of the joint posterior distribution of the parameters and the missing data for this GP model is given by: The prior distribution π(θ) is specified in Section 2.5 and the full conditional distributions of the parameters, required for Gibbs sampling, are provided in Appendix A.

AR model specification
Following Sahu et al. (2007), we specify the hierarchical AR models as follows: for all l and t; where, ρ denotes the unknown temporal correlation parameter assumed to be in the interval (−1, 1). Obviously, for ρ = 0, these models reduce to the GP models described above in Section 2.2. We continue to assume the Gaussian distributions, introduced in Section 2.1, for lt and η lt for all values of l and t.
The auto-regressive models require specification of the initial term O l0 for each l = 1, . . . , r.
Here we specify an independent spatial model for each O l0 with mean µ l and the covariance matrix σ 2 l S 0 where the correlation matrix S 0 is obtained using the Matérn correlation function in Equation (1) with the same set of correlation parameters φ and ν for η lt .
Let θ denote all the parameters, i.e., θ = (β, ρ, σ 2 , σ 2 η , φ, ν, µ l , σ 2 l ) and we also suppose that O contains all the parameters O lt for l = 1, . . . , r, t = 0, . . . , T l . The logarithm of the joint posterior distribution of the parameters and the missing data is now given by: As for the GP models of previous section, the prior distributions are specified in Section 2.5. Full conditional distributions are provided in Appendix B.

Spatio-temporal models based on GPP
These models are based on the recent work by Sahu and Bakar (2012b) and the main idea here is to define the random effects η l (s i , t) at a smaller number, m, of locations, called the knots, and then use kriging to predict those random effects at the data and prediction locations.
Here, an AR model is only assumed for the random effects at the knot locations and not for all the random effects at the observation locations. At the top level we assume the model: for all l and t, where A = CS −1 η and C denotes the n by m cross-correlation matrix between the random effects at the n observation locations and m knot locations, s * 1 , . . . , s * m , and S η is the m by m correlation matrix of the m random effects w lt . We specify w lt at the knots conditionally given w lt−1 as: for all l and t, where η lt ∼ N (0, Σ η ) independently, Σ η = σ 2 η S η . Note that, here Σ η is an m × m matrix which is of much lower dimensional than the same for two previous models GP and AR since we assume that m << n.
The auto-regressive models are completed by the assumption for the initial conditions, w l0 ∼ N (0, σ 2 l S 0 ) independently for each l = 1, . . . , r, where the correlation matrix S 0 is obtained by using the Matérn correlation function in Equation (1) with decay parameter φ. Let w denote the random effects w lt for l = 1, . . . , r and t = 0, 1, . . . , T l . Let θ denote all the parameters β, ρ, σ 2 , σ 2 w , φ, ν, σ 2 l , l = 1, ..., r. The logarithm of the joint posterior distribution of the parameters and the missing data is given by: As previously in Sections 2.2 and 2.3, we specify the prior distributions in Section 2.5. The full conditional distributions for Gibbs sampling are provided in Appendix C.

Prior distributions
The Bayesian model for each of the above three specifications is completed by assuming suitable prior distributions for the underlying parameters. For simplicity and convenience, we group the model parameters of different spatial processes and data into three different types depending on whether those describe: the mean, the variance or the correlation. All the parameters describing the mean, e.g., β, ρ, other than the random effects, are given independent normal prior distributions, where the user can specify the means (µ β , µ ρ ) and variances (δ 2 β , δ 2 ρ ). In our illustrations in Section 3, we have taken all those means to be 0 and variances to be 10 10 , that correspond to our assumption of flat prior distributions. We also define an independent normal prior distribution with mean 0 and σ 2 µ , assumed to be 10 10 in our illustrations, for each component of the n-dimensional vector µ l in the autoregressive model.
The prior distribution for a typical precision (inverse of variance) parameter is specified through a gamma distribution with mean a/b and variance a/b 2 . The user can specify any suitable values for a and b, but in our illustrations we have chosen a = 2 and b = 1 to have a proper prior distribution for any variance component that will guarantee a proper posterior distribution (Gelman, Carlin, Stern, and Rubin 2004).
There is a large literature regarding the identifiability and the consistency of the parameters describing the correlation in a Gaussian process; see for example, Stein (1999) and Zhang (2004). These problems manifest themselves in the Bayesian estimation literature as well, see e.g., Sahu, Gelfand, and Holland (2006), who use an empirical Bayes (EB) approach. To facilitate estimation using this EB approach, spTimer allows the user to run Gibbs sampling by fixing the correlation parameters φ and ν so that a grid search can be performed to find the optimal values of these parameters.
Full Bayesian estimation of φ and ν is also allowed in spTimer corresponding to discrete uniform prior distributions for both these parameters. In all our illustrations the smoothing parameter ν is estimated using a discrete uniform prior distribution taking values from 0 to 1.5 with the increment 0.05. In addition to a discrete uniform distribution, we also allow a Gamma prior distribution with suitable hyper-parameter values for the decay parameter φ.
In practice, in any Bayesian analysis, the sensitivity with respect to the chosen distribution must be studied, and the spTimer package that we have developed here makes it easy to do so without much programming effort.

Model fitting
All the models are fitted using Gibbs sampling (Gelfand and Smith 1990). The conjugate prior distributions assumed for all the parameters except for the φ and ν enable standard conjugate sampling from the full conditional distributions. These details are given in Appendices A, B, and C. Missing data values are sampled from their conditional distributions at each iteration of the Gibbs sampler.
The full conditional distributions of the correlation parameters φ and ν are non-standard. The package provides two options for sampling these parameters corresponding to the two different prior distributions. The full conditional distribution will be discrete, and hence easy to sample from, if a discrete uniform prior distribution has been assumed for them. The second option, only allowed for the decay parameter φ, is to assign a continuous uniform prior distribution over an interval or a Gamma prior distribution and then to use the random-walk Metropolis-Hastings algorithm to draw samples from it. The proposal distribution is taken as the normal distribution with the mean at the current value and the variance σ 2 p which is tuned to have an acceptance rate between 15 to 40%, see Gelman et al. (2004) for theoretical justifications. The Metropolis algorithm is implemented on the log-scale for φ, i.e., we work with the density of log(φ) instead of φ since the support of the normal proposal distribution is the real line. In this sampling scheme, keeping φ within a range, as may be required by the assumed prior distribution, is trivial since any proposal value outside the range is rejected forthwith.
The quality of the model fit and its predictive abilities are assessed by calculating the predictive model choice criteria (PMCC, Gelfand and Ghosh 1998) which is given by, where, Z l (s i , t) rep denotes a future replicate of the data z l (s i , t). The first term in the PMCC assesses the goodness of fit and second term is a penalty term for model complexity. The PMCC, justified using a squared error loss function, is most suitable for comparing Bayesian hierarchical models that involve a first stage Gaussian model. The values of the criteria are estimated by sampling from the posterior predictive distributions discussed below.

Prediction details
Spatial prediction at a new location and temporal prediction at future time points proceed with the posterior predictive distribution for Z l (s 0 , t ), where s 0 denotes a new location and t is a time point, which can be in the future. The posterior predictive distribution for Z l (s 0 , t ) is obtained by integrating over the parameters with respect to the joint posterior distribution as: Note that for the GPP models in Section 2.4 we replace the random effects O l (s, t ) by w l (s, t ). Predictions are obtained by composition sampling. First, a random sample θ (j) , O (j) is drawn from the posterior distribution π(θ, O, z * |z) using the details in the model fitting described in Sections 2.2, 2.3, 2.4. Then, Bayesian kriging is applied to draw a sample, O At the end of the MCMC run, the samples Z l (s 0 , t ), j = 1, . . . , J where J is a large number, are summarized to make predictive inference. If a transformation, such as the log or the square Function Description spT.Gibbs For model fitting using Gibbs sampling approach. It can also predict simultaneously if suitable options are turned on. predict.spT Using output from spT.Gibbs this is able to perform prediction, or predict either spatially or temporally or both. root, has been applied then the posterior predictive samples Z (j) l (s 0 , t ) are first transformed back to the original scale before their summaries are calculated. Further, details regarding these predictions are provided in Bakar (2012) for the GP models, Sahu et al. (2007) for the AR models and Sahu and Bakar (2012b) for the GPP-based models.

Main functions in spTimer
There are two main functions in the spTimer package, namely, spT.Gibbs for model fitting and predict to obtain spatial and temporal predictions based on the fitted models. Table 1 provides a snapshot of these two main functions. In the following sub-sections we discuss these functions more elaborately.

spT.Gibbs
The function spT.Gibbs is used to fit all three models using the Gibbs sampling approach and it takes a number of arguments that define and control its behavior. For example, the argument formula is used to specify the linear part of the model. It is an object of class formula, which is same as that for the lm function to fit linear regression models in R.
The argument data provides the data frame to be used for model fitting. The package also has the capability to support data classes STFDF of spacetime and SpatialPointsDataFrame of sp. The data set must be first ordered by the location index, and for each location the data must be ordered by time. Time series data with more than one segment, for example, T l unit (e.g., daily) observations in the lth segment (l = 1, . . . , r) (e.g., month), must be ordered first by the longer unit (e.g., month) and then by the shorter unit (e.g., day). The total length of the time series at each location, r l=1 T l , must be the same for each location and the data rows in each site must correspond to the same basic time unit.
Varying segment length time series are also allowed, e.g., T 1 = 31, T 2 = 28, T 3 = 31, and T 4 = 30 when it is intended to model daily data for the first four calendar months (r = 4) in a year which is not a leap year. Moreover, all sites must have data rows for the exact same total number of time units and the rows for the shorter time units in each site must correspond to those for all other sites. This is to prohibit passing of irregularly observed time series to the data argument, i.e., in the current example one observation site cannot replace, for example, 2 days of January data by data for two extra days in May when all the other sites have daily data for the first four months. However, it is possible to have missing observations for the response variable and the missing values must be denoted by the standard NA identifier. No missing data values are allowed for the covariates since handling of such situations requires additional modelling. Details on how to define the time segments are provided in Section 3.3.
A typical code for the GP model is written as: spT.Gibbs(formula, data = parent.frame(), model = "GP", time.data = NULL, coords, knots.coords, newcoords = NULL, newdata = NULL, priors = NULL, initials = NULL, nItr = 5000, nBurn = 1000, report = 1, tol.dist = 0.05, distance.method = "geodetic:km", cov.fnc = "exponential", scale.transform = "NONE", annual.aggrn = "NONE", spatial.decay = spT.decay(distribution = "FIXED")) The model argument specifies the model to be fitted which can be any one of the three: GP, AR, and GPP with GP being the default. In spT.Gibbs the argument priors specify the hyperparameter values of the prior distributions. The default value of this argument is NULL when automatic values are chosen. More details regarding this are provided in Section 3.3.
The required argument coords can be supplied in different formats. It can be specified as an n × 2 matrix or data frame containing the co-ordinates of the n spatial locations, or as a formula object defining the coordinates, e.g., coords =~Longitude + Latitude. The optional argument knots.coords is used only for the GPP model and, must be provided as an m × 2 matrix of coordinates of the m (< n) knot locations.
The two optional arguments newcoords and newdata need to be provided if it is intended to do model fitting and prediction simultaneously. The argument newcoords provides the new coordinate points where we want to predict and newdata must contain the values of the covariates at the prediction locations. No predictions are performed if these two arguments are omitted. In that case, predcitions can still be performed after model fitting by calling the predict function, see Section 3.2. The argument initials specifies the starting values for the model parameters. Default initial values can be chosen by specifying the option initials = NULL, and in that case the following values will be set: σ 2 η = 0.1, σ 2 = 0.01. The default initial value for the spatial decay parameter, φ, is set as − log(0.05)/d max ≈ 3/d max (Finley et al. 2007), where d max is the maximum distance calculated from the coordinates of the model fitting locations, which ensures that the effective spatial range (the distance by which the spatial correlation becomes negligible) is d max . The initial values for the regression parameters and the auto-regressive parameter are obtained by fitting a simple linear model using the lm function.
The arguments nItr, nBurn, and report control the running of the MCMC algorithm where: nItr specifies the total number of iterations, nBurn denotes the number of burn-in iterations to discard, and report allows how many reports of progress of MCMC to print on screen.
The distance.method argument allows the distance between any two locations to be calculated using "geodetic:km" (the default), "geodetic:mile" and "euclidean" for distance in kilometers, miles and Euclidean unit respectively. Related to this is the tol.dist, which defaults to 0.05, that allows the user to avoid modelling locations which are less than the tol.dist away. This is to avoid singularity in the covariance matrices. The argument cov.fnc provides the choice of the spatial covariance function and can take any one of the values: "exponential" (the default), "gaussian", "spherical", and "matern". The handling of the spatial decay parameters is discussed in Section 3.3.
The argument scale.transform specifies the on-the-fly transormation for the response variable and it can take one of the three possible values "SQRT", "LOG" and "NONE" with the last one being the default. Note that all the predictions will be performed on the orginal scale of the data and it is not possible to use any on-the-fly transformation for the explanatory variables.
An optional argument annual.aggrn is also used only if the predictions are to be made using spT.Gibbs. This argument specifies the required type of time series aggregation in the predictions and, currently it can take any of the values: "ave" for annual average, "an4th" for obtaining the annual 4th highest value and "NONE" for no annual aggregates. For example, if dataset has 365 daily observations in each of 10 years, then the use of annual.aggrn = "ave" yields the 10 annual averages at each MCMC iteration without having to store 3650 iterates of the daily values. Thus, this argument helps to solve the storage problem when it is of interest to predict the aggregated summaries rather than the individual atomic space-time data.

predict
The function predict is used to get spatial and temporal predictions based on the results obtained from the routine spT.Gibbs. The two required arguments of predict are the newcoords, which must contain the new coordinate points where we want to predict and newdata, which contains the values of the covariates. The argument type in predict specifies the type of prediction the user wants to make which can be either "spatial" or "temporal". If the value is "spatial" then only spatial prediction will be performed at the newcoords which must be different from the fitted sites provided by the coords argument. When the "temporal" option is specified then forecasting will be performed and in this case the newcoords may also contain elements of the fitted sites in which case only temporal forecasting beyond the last fitted time point will be performed.

Some other useful functions
The package spTimer includes several utility functions, e.g., spT.time, spT.priors, spT.decay, spT.initials, and spT.validation for performing various important tasks. The function spT.time is used to specify the temporal structure of the data that can be used in spT.Gibbs. It has two arguments: (i) t.series that specifies the number of observations in each segment of the time series and (ii) segments that specifies the total number of segments. For example, to model data for 5 years with 30 days in each year we call the function spT.time(t.series = 30, segments = 5) and send the output to spT.Gibbs. The default value for segments is 1 and this is what should be used for simple time series modelling problems where time is described by one unit only, e.g., day.
The package can also handle unequal length time segments which can be specified by a vector valued t.series argument in spT.time. For example, to model daily data for the first four calendar months in a non-leap year, the appropriate code is spT.time(t.series = c(31, 28, 31, 30), segments = 4). However, in this paper we only illustrate with equal segment length time series data.
The spT.priors function is used to define the hyper-parameter values of the prior distributions. As the parameters are model dependent, the call to this function requires a model argument and then a list of prior distributions for the associated parameters. The abbreviation "Gamm" denotes the gamma prior distribution, while the abbreviation "Norm" denotes the normal prior distribution. For example, the call R> prior <-spT.priors(model, inv.var.prior = Gamm(a = 2, b = 1), + beta.prior = Norm(0, 10^10), rho.prior = Norm(0, 10^10)) will specify independent gamma prior distribution with parameters 2 and 1 for the inverse of the each variance component, and each of the regression parameters β and the autoregressive parameter ρ will be assigned an independent normal prior distribution with mean 0 and variance 10 10 . A default proper prior with large variance will be assumed if prior for any parameter is missing.
The spT.decay function is used to assign prior distribution for the spatial decay parameter φ in one of the following three possible ways: 1. Fixed: This choice fixes φ at a particular value, which is achieved by writing distribution = "FIXED" in the argument. For example, for fixing φ at 0.01 we write: spatial.decay = spT.decay(distribution = "FIXED", value = 0.01) The default initial value, 3/d max , will be assumed if the particular value is not provided. This "FIXED" option is the package default.

Uniform distribution:
This option corresponds to assuming a discrete uniform prior for φ in a specified interval. A typical specification is provided below: spatial.decay = spT.decay(distribution=Unif(.01,.02), npoints=5) where the npoints argument specifies the number of support points in the prior distribution under this option. Default value for npoints is 5. At the moment it is not possible to specify a continuous uniform distribution as the prior for the decay parameter.
3. Gamma distribution: The gamma prior distribution for φ can be assumed by the statement: spatial.decay = spT.decay(distribution = Gamm(a = 2, b = 1), tuning = 0.08) where the tuning parameter specifies the standard deviation of the normal proposal distribution centered at the current value for the random-walk Metropolis sampling algorithm implemented for sampling φ on the log-scale.
The function spT.initials is used to gather initial values for the parameters in the model.
where, m is the total number of observations we want to validate, z i is the data indexed by i, z i is the prediction value,z andz p are the arithmetic mean of the observations and predictions respectively.

Simulation study
The main purpose of this example is to validate the body of code underpinning the package spTimer. The code for carrying out the main inference tasks: estimation of model parameters and predictions are validated for each of three models GP, AR and GPP. This section reports the results from experiments on larger data sets using GPP-based models. We examine different scenarios, for example an intercept only model and a model with four covariates to test inference of the regression (β) parameters. We also provide a sensitivity analysis for different signal-to-noise ratios and compare the performance of spTimer with spBayes and mgcv, which adopts a non-Bayesian framework. The code lines for doing this analysis are provided in the accompanying R file, v63i15.R.

Simulation design and results
The spatial domain of the simulation study is taken as a square ranging from zero to 1000 units, and for fitting GP and AR models we suppose that the sampling locations form a regular grid of 12 × 12 = 144 points (see Figure 1(a)) inside the square. For the GPPbased approximation model we consider 55 × 55 = 3025 grid points inside the square, that is moderately large (see Figure 1(b)). The number of knots are defined for the GPP-based models is a 10 × 10 square grid inside the range. The temporal domain is taken as 365 days in a year. Hence, for GP and AR models we obtain 52, 560 (= 144 × 365) observations in total, and for the GPP model we generate a data set with 1, 104, 125 (= 3025 × 365) observations.
Out of these data locations (144 for the GP and AR models; and 3025 for the GPP-based model), we randomly choose 10% locations, as validation sites and model data from the remaining 90% sites. Hence, we model data from 129 locations for the GP and AR models   Table 2: True values of the parameters and their estimates using the summary statistics of the MCMC samples for all three models. The column Median represents the posterior median and a 95% credible interval is also provided for each of the parameters. and validate them for the set aside data from the remaining 15 locations. Similarly for the GPP-based models we set aside data from 300 locations for validations and model data from the remaining 2725 locations. We also assume that 5% data are missing at random for the GP, AR and GPP-based models. This is done to test the missing data handling capabilities of the Bayesian approach which is very common in practice. The Euclidean distance is used in the simulation study while the real life example in the next section uses the geodetic distance for a geographic spatial domain. We illustrate throughout using the exponential covariance function (ν = 0.5) for both simulation and model fitting, although we have validated using all other covariance functions.
Three different data sets are simulated from the three models using the following values of the model parameters.
We assume that four covariate effects are captured including the intercept term, and take the true value of the parameters as β 0 = 5.0, β 1 = 2.0, β 2 = 1.0 and β 3 = 0.5. The covariates x 1 , . . . , x 3 are generated from the standard normal distribution. The spatial effect variance, i.e., the signal σ 2 η under GP, AR and GPP-based models is assumed to be randomly chosen from the uniform distribution in (0, 1) and the pure error (or nugget effect) variance is set at σ 2 = 0.01. We have also provided a sensitivity analysis of the signal-to-noise ratio later in this section. The spatial decay parameter φ is assumed to be 0.003 that implies high spatial correlation even at large distances. The temporal auto-correlation parameter, ρ is set at 0.2 for the GP and GPP-based models. This moderate value was deemed to be enough for simulating space-time dependent data over and above the previously assumed high spatial correlation. For the AR model, the initial mean and variance for O l (s i , 0) are taken to be 5.0 and 0.5 for l = 1. The number of knots for the GPP-based model is taken to be 100 (see Figure 1(b) for their locations) and we assume that the initial values for w l (s i , 0) follow the normal distribution with 0 and variance 0.5 independently for l = 1 under this model.
In this study we illustrate throughout with the Metropolis-Hastings algorithm for simulating from the conditional distribution of the spatial decay parameter since that was found to be the best among all three methods available in the package (Bakar 2012). In all our MCMC implementations we use a small number of iterations to choose the tuning parameter (mostly by trial and error) that achieves an acceptance rate between 15 to 40%, see Section 2.6. For the simulation example, the Gibbs sampler is then run for 5,000 iterations for making inference with first 1,000 iterations as burn-in. We replicate the experiment 25 times and store the MCMC iterates of the model parameters, see Table 2 for the summary statistics.
The Gibbs sampler produced estimates of the parameters that were very close to the true simulation values, see Table 2. Moreover, the acceptance rate for sampling the decay parameter φ was also reasonable.

Fitted plots and missing values
Figure 2(a) shows the fitted mean surface plots for time point 5 with the GPP-based model. Note that, in this simulation example we only take 30 time points to reduce the computational burden of model fitting. The fitted surface plot for the mean is obtained using spTimer S3 class function plot, by using the argument surface inside the function. The argument surface is set to "Mean" or "SD" to plot the fitted mean or standard deviations respectively. It is also possible to change the color pallete of the surface plot using the argument col; for example, we can use package colorspace (Zeileis, Hornik, and Murrell 2009) and write col = rainbow_hcl(100, start = 200, end = 0). In addition, argument a3d = TRUE can be used to obtain a 3-dimensional plot of the fitted observations. Moreover, contours can be added to the plots. The accompanying R file, v63i15.R contains the detailed code for doing this by sending the fitted model output to two new functions plot.spT and contour.spT which are also provided. Note that these functions will require the akima (Akima 2013) package. In Figure 2(b) we provide a residual (z −ẑ) surface plot. We observe that the residual surface plot varies from −0.4 to 0.4 and in most places the fitted values are close to the true observations.
A time series plot of the fitted values is shown in Figure 3(a), for the first location in the simulated data set, using the GPP model. We see that the fitted mean is very close to the true value, and in addition it also shows that the estimates of the missing values are very close to the true value. We also provide residual plots for 4 randomly chosen locations in Figure 3(b), where we see the time series plots for the error are close to zero without any visible dependence structure. The missing values are indicated by circles, which are fitted using the Bayesian modeling discussed in Section 2.

Sensitivity of the signal-to-noise ratio (SNR)
We also check the sensitivity of the fit for different signal-to-noise ratio (SNR), i.e., ζ = σ 2 η /σ 2 , through the simulated data. We use 3 different scenarios: (1) σ 2 η and σ 2 are same, i.e., ζ = 1, (2) when σ 2 η is 10 times higher than σ 2 , i.e., ζ = 10 and (3) when ζ = 15. We do not allow ζ to be less than one since it is very unusual for the nugget effect, σ 2 , to be larger than the spatial variance, σ 2 η . We simulate data sets in 144 locations for 365 days for the intercept only model defined earlier in Section 4.1. We also replicate the procedure 25 times and obtain the MCMC results for the variance parameters. Figure 4 shows density plots of ζ for the MCMC samples obtained from the GP model. The true value of ζ is shown using a vertical line. We observe, the distribution of ζ's for different scenarios include its true value. The MCMC summary statistics for β 0 , i.e., the intercept coefficient is given in Table 3. Note that the 95% credible intervals are almost totally unaffected by the value of ζ. True β 0 Median 95% Interval ζ = 1 5.0 5.01 (4.96, 5.07) ζ = 10 5.0 4.98 (4.94, 5.03) ζ = 15 5.0 5.02 (4.97, 5.08) Table 3: Summary statistics for β 0 for different scenarios of signal-to-noise ratio (ζ).

Comparison study
In this section we compare the performance of spTimer models with spBayes and also with a non-Bayesian model, e.g., the additive model as implemented in the mgcv package. From a modeling perspective we particularly focus on the issues of model fitting, prediction and computation time. To proceed, we start with a simple time series data having just 5 time points and then increase this to 10, 20 and 60. We simulate 49 locations (see Figure 5(a)) from the spatial domain defined in Section 4.1; where in addition to the grid points, the sampling locations are also chosen randomly, see Figure 5(b). We use the same model parameters as defined earlier in Section 4.1. Note that in this section we consider the intercept only model and obtain 4,000 MCMC samples after discarding first 1,000 samples as burn-in. Figure 6 represents the density plots for the coefficients used in the models. As expected, we observe that all three packages provide estimates of the β 0 parameter close to the true value. However, we also observe that the estimate obtained from spTimer has a shorter length credible interval than that using the spBayes package. Table 4 shows a comparison of approximate computation times for the GP models using spTimer, and spBayes for data sets with different length of time series. Computation times for using the mgcv package are not included in this table since that package does not require  an iterative fitting method such as MCMC. This table clearly shows that spTimer provides faster model fitting than spBayes when the number of time points in the data is greater than one. Computation times for implementing the model when there is data for exactly one time point are comparable for the spTimer and spBayes packages and hence are not shown here.
To compare the off-site predictive performance we set aside 20% data for validation purposes.
We  Table 5: Comparison of validation statistics, mean squared error (MSE) and mean absolute error (MAE) obtained from spTimer, spBayes and mgcv models. the models. We observe that both the validation criteria, MSE and MAE, are smallest for the spTimer model for the spatio-temporal data, see Table 5. Note that we omit the predictive results for spBayes at T = 60, because it is computationally prohibitive as it does not finish computations for predictions even after running continuously for several days, although the model fitting job finishes within a day as reported in Table 4. We also provide further comparison of predictive performance of the models using the real life data example in Section 5.

A practical example
We use a real life data set, previously analysed by Sahu and Bakar (2012a), on daily maximum 8-hour average ground level ozone concentration for the months of July and August in 2006, observed at 28 monitoring sites in the state of New York. We consider three important covariates: maximum temperature (cMAXTEMP in degree Celsius), wind speed (WDSP in nautical miles) and percentage average relative humidity (RH) for building a spatio-temporal model for ozone concentration. Further details regarding the covariate values and their spatial interpolation are provided in Bakar (2012). Figure 7 represents a map of the study region together with the 28 monitoring locations of which 8 have been set aside for model validation purposes. Moreover, we also set aside the data for the last 2 days (August 30 and 31) for validating the temporal forecasts. The following set of code lines are used for data preparation.
To fit GP model using spTimer we use the following code: R> set.seed(11) R> post.gp <-spT.Gibbs(formula = o8hrmax~cMAXTMP + WDSP + RH, + data = DataFit, model = "GP", + coords =~Longitude + Latitude, scale.transform = "SQRT", + spatial.decay = spT.decay(distribution = Gamm(2, 1), tuning = 0.1)) A number of remarks are in order. The fitted model is the GP model, see Equations 2 and 3. The linear (covariate) part of the model is specified by the formula argument that automatically includes the intercept. Secondly, the square-root transformation is used, on the fly, to stabilize the variance (Sahu et al. 2007;Sahu and Bakar 2012a). The initial values and the values of the hyper-parameters for the prior distributions are assumed by default. Moreover, by default MCMC is run for 5,000 further iterations after discarding first 1,000. The spTimer user manual lists all the defaults and the ways to change them.
The package spTimer provides the usual R print and summary commands for obtaining summaries of the model fit. Here is some sample output: We can also view the MCMC trace plots of the parameters using the plot function as: R> plot(post.gp) Figure 8 shows the MCMC trace plots for the four model parameters for 15000 iterations. Further MCMC diagnostics can be done using coda (Plummer, Best, Cowles, and Vines 2012). For example, one can use the code R> autocorr.diag(as.mcmc(post.gp)) to generate the autocorrelation plots. One can also obtain residual plots using the function plot with the additional argument residuals = TRUE as: R> plot(post.gp, residuals = TRUE) However, none of these plots are included here for brevity.
The predictive model choice criteria (PMCC) described in Section 2.6 is obtained as post.gp$PMCC. We obtain the parameter estimates from the MCMC samples using the familiar summary command: These parameter estimates show that except for RH, all regression coefficients are statistically significant for the GP model since the 95% credible intervals do not contain zero. The estimate of the spatial decay parameter φ = 0.007 implies an effective range of 427 kilometers. We also observe that, as expected, the spatial variance σ 2 η is higher than the nugget effect σ 2 . Prediction capabilities of spTimer are explored using the predict function, as noted in Table 1. After the above model fitting we can use the following code lines to perform and examine prediction. The package spTimer can also perform temporal forecasting at both observed and unobserved locations using the same predict function when it is called with the additional argument type = "temporal" -details are provided in the package manual. A predictive map can be drawn using the predictive output obtained from the predict function. Figure 9 provides such maps for daily ozone concentration levels and their standard deviations on August 29, 2006. The accompanying R file v63i15.R provides the code to produce these figures.
We conclude this example with a comparison study with the non-Bayesian generalized additive models (Hastie and Tibshirani 1990) using the R package mgcv (Wood 2006) as suggested by a reviewer. The GP models, with typical code as given above, are fitted to the data from the 20 fitting sites and then spatial predictions are obtained for the 8 set aside validation sites. The code for implementing this comparison study is provided in the accompanying R file, v63i15.R.   For the additive model we use the gam function and fit additive model and report the predictive performance of the model, see Table 6. We observe that the MSE for the Bayesian space-time GP model is reduced by about 56% compared to the generalized additive models, showing superiority of the spatio-temporal models implemented in the package spTimer. Further similar comparison studies have been performed by Crimp, Bakar, Kokic, Jin, Nicholls, and Howden (2015), where spTimer GP model also shows better predictive performance compared to the additive models for analyzing frost levels in south-east Australia.

Summary
This paper introduces the contributed R package spTimer that enables model fitting, spatial and temporal predictions for large structured point-referenced spatio-temporal data sets. Currently, the package is able to analyze data using three substantial and well established spatio-temporal models. The package also includes a number of attractive features ranging from on the fly transformation to the ability to infer for certain temporal aggregates. The main body of the code has been validated using a substantial simulation example and a real life data example.
The underlying code for the package has been written using the C programming language that is very portable across many different operating systems such as Microsoft Windows, Linux and Macintosh. The end-user, however, does not need to work with the C language as all the analysis can be performed by using commands in R. The MCMC-based Bayesian hierarchical modeling, as implemented in the package, is relatively fast for moderate (a few thousand) to large (more than a million) data sets. In particular, the GPP-based model is the fastest to run as we have reported in our related work (Sahu and Bakar 2012b).
The package can be extended in several ways, for example, for modeling multivariate data and for modeling data with a non-Gaussian first stage model. In addition, it will be very fruitful to add modeling capabilities for spatially varying coefficient process models. Other possible extensions include the ability to handle data from sensor networks that vary over time. Moreover, the implemented models can be enhanced to model mixture of discrete and continuous data such as rainfall. Lastly, extension of the package for handling spatially mis-aligned data will also be of considerable interest in the literature.

B. Full conditional distributions for the AR model
The full conditional distribution of β can be obtained from the joint posterior distribution of AR models (7) as: π(β|..., z) ∼ N (∆χ, ∆), where: