Spatio-Temporal Areal Unit Modeling in R with Conditional Autoregressive Priors Using the CARBayesST Package

Spatial data relating to non-overlapping areal units are prevalent in fields such as economics, environmental science, epidemiology and social science, and a large suite of modeling tools have been developed for analysing these data. Many utilize conditional autoregressive (CAR) priors to capture the spatial autocorrelation inherent in these data, and software packages such as CARBayes and R-INLA have been developed to make these models easily accessible to others. Such spatial data are typically available for multiple time periods, and the development of methodology for capturing temporally changing spatial dynamics is the focus of much current research. A sizeable proportion of this literature has focused on extending CAR priors to the spatio-temporal domain, and this article presents the R package CARBayesST, which is the first dedicated software package for spatio-temporal areal unit modeling with conditional autoregressive priors. The software package allows to fit a range of models focused on different aspects of spacetime modeling, including estimation of overall space and time trends, and the identification of clusters of areal units that exhibit elevated values. This paper outlines the class of models that the software package implement, before applying them to simulated and two real examples from the fields of epidemiology and housing market analysis.


Introduction
Areal unit data are a type of spatial data where the observations relate to a set of K contiguous but non-overlapping areal units, such as electoral wards or census tracts.Each observation relates to an entire areal unit, and thus is typically a summary measure such as an average, proportion, or total, for the entire unit.Examples include the total yield in sectors in an agricultural field trial (Besag and Higdon 1999), the proportion of people who are Catholic in lower super output areas in Northern Ireland (Lee, Minton, and Pryce 2015), the average score on SAT college entrance exams across US states (Wall 2004), and the total number of cases of chronic obstructive pulmonary disease from populations living in counties in Georgia, USA (Choi and Lawson 2011).Areal unit data such as these have become increasingly available in recent times, due to the creation of databases such as Scottish Statistics (http://statistics.gov.scot/), the Health and Social Care Information Centre Indicator Portal (http://www.hscic.gov.uk/indicatorportal), and cancer registries such as the Surveillance Epidemiology and End Results program (http://seer.cancer.gov/).These databases provide data on a set of K areal units for N consecutive time periods, yielding a rectangular array of K × N spatio-temporal observations.The motivations for modeling these data are varied, and include estimating the effect of a risk factor on a response (see Wakefield 2007 andLee, Ferguson, andMitchell 2009), identifying clusters of contiguous areal units that exhibit an elevated risk of disease compared with neighboring areas (see Charras-Garrido, Abrial, andde Goer 2012 andAnderson, Lee, andDean 2014), and quantifying the level of segregation in a city between two or more different groups (see Lee et al. 2015).As a result different space-time structures have been proposed for modeling spatio-temporal data, which depend on the underlying question(s) of interest and the goals of the analysis.
However, a common challenge when modeling these data is that of spatio-temporal autocorrelation, namely that observations from geographically close areal units and temporally close time periods tend to have more similar values than units and time periods that are further apart.Temporal autocorrelation occurs because the data relate to largely the same populations in consecutive time periods, while spatial autocorrelation can arise for a number of reasons.The first is unmeasured confounding, which occurs when a spatially patterned risk factor for the response variable is not included in a regression model, and hence its omission induces spatial structure into the residuals.Other causes of spatial autocorrelation include neighborhood effects, where the behaviors of individuals in an areal unit are influenced by individuals in adjacent units, and grouping effects where groups of people with similar behaviors choose to live close together.
Predominantly, a Bayesian approach is taken to modeling these data in the statistical community, where the spatio-temporal structure is modeled via sets of autocorrelated random effects.Conditional autoregressive (CAR; Besag, York, and Mollié 1991) priors and spatio-temporal extensions thereof are typically assigned to these random effects to capture this autocorrelation, which are special cases of a Gaussian Markov random field (GMRF).A range of models have been proposed with different space-time structures, which can be used to answer different questions of interest about the data.For example, Knorr-Held (2000) proposed an ANOVA style decomposition of the data into overall spatial and temporal trends and a spacetime interaction, which allows common patterns such as the region-wide average temporal trend to be estimated.In contrast, Li, Best, Hansell, Ahmed, and Richardson (2012) developed a model for identifying areas that exhibited unusual trends that were different from the overall region-wide trend, while Lee and Lawson (2016) presented a model for identifying spatio-temporal clustering in the data.
An array of freely available software packages can now implement purely spatial areal unit models, ranging from general purpose statistical modeling software such as BUGS (Lunn, Spiegelhalter, Thomas, and Best 2009) and R-INLA (Rue, Martino, and Chopin 2009), to specialist spatial modeling packages in the statistical software environment R (R Core Team 2017) such as CARBayes (Lee 2013), spatcounts (Schabenberger 2009) and spdep (Bivand and Piras 2015).Due to the flexibility of general purpose software, implementing spatial models, in say BUGS, requires a degree of programming that is non-trivial for applied researchers.
Specialist software for spatio-temporal modeling is much less well developed, with examples for geostatistical data including spTimer (Bakar and Sahu 2015) and spBayes (Finley, Banerjee, and Gelfand 2015).For areal unit data the surveillance package (Meyer, Held, and Höhle 2017) models epidemic data, the plm (Croissant and Millo 2008) and splm (Millo and Piras 2012) packages model panel data, while the nlme (Pinheiro, Bates, DebRoy, Sarkar, and R Core Team 2015) and lme4 (Bates, Mächler, Bolker, and Walker 2015) packages have functionality to model spatial and temporal random effects structures.However, software packages to fit a range of spatio-temporal areal unit models with CAR type autocorrelation structures are not available.This has motivated us to develop the R package CARBayesST (Lee, Rushworth, and Napier 2018).
Package CARBayesST can fit models with different spatio-temporal structures.This enables the user to answer different questions about their data within a common software environment.A number of models can be implemented, including: (i) a spatially varying linear time trends model (similar to Bernardinelli, Clayton, Pascutto, Montomoli, Ghislandi, and Songini 1995); (ii) a spatial and temporal main effects and interaction model (similar to Knorr-Held 2000); (iii) a spatially autocorrelated autoregressive time series model (Rushworth, Lee, and Mitchell 2014); (iv) a model with a common temporal trend but varying spatial surfaces (similar to Napier, Lee, Robertson, Lawson, and Pollock 2016); (v) a spatially adaptive smoothing model for localized spatial smoothing (Rushworth, Lee, and Sarran 2017); and (vi) a spatiotemporal clustering model (Lee and Lawson 2016).The package has the same syntax and feel as the R package CARBayes for spatial areal unit modeling, and retains all of its easy-touse features.These include specifying the spatial adjacency information via a single matrix (unlike BUGS that requires 3 separate list objects), fitting models via a one-line function call, and compatibility with package CARBayes that allows it to share the latter's model summary functionality for interpreting the results.The models available in this software package can be fitted to binomial, Gaussian (in some cases) or Poisson data, and Section 2 summarizes the models that can be implemented.Section 3 provides an overview of the software package and its functionality, while Section 4 presents simulation results to illustrate the correctness of the CARBayesST implementation of one of the models.Sections 5 and 6 present two worked examples illustrating how to use the package for epidemiological and housing market research, while Section 7 concludes with a summary of planned future development for this package.

Spatio-temporal models for areal unit data
This section outlines the class of Bayesian hierarchical models that package CARBayesST can implement, where in all cases inference is based on Markov chain Monte Carlo (MCMC) simulation.The first part of this section outlines the general hierarchical model that can be fitted, while the second describes the range of space-time random effects structures that are available.

General Bayesian hierarchical model
The study region comprises a set of k = 1, . . ., K non-overlapping areal units S = {S 1 , . . ., S K }, and data are recorded for each unit for t = 1, . . ., N consecutive time periods.Thus data are available for a K × N rectangular array with K rows (spatial units) and N columns (time periods).The response data are denoted by Y = (Y 1 , . . ., Y N ) K×N , where Y t = (Y 1t , . . ., Y Kt ) denotes the K × 1 column vector of observations for all K spatial units for time period t.Next, a vector of known offsets are denoted by O = (O 1 , . . ., O N ) K×N , where similarly O t = (O 1t , . . ., O Kt ) denotes the K × 1 column vector of offsets for time period t.Finally, x kt = (x kt1 , . . ., x ktp ) is a vector of p known covariates for areal unit k and time period t, and can include factors or continuous variables and a column of ones for the intercept term.Note, non-linear covariate-response relationships can be included by adding transformations (e.g., squared) or spline basis functions (e.g., using ns()) of covariates to x kt .Package CARBayesST can fit a generalized linear mixed model to these data, whose general form is: (1) The vector of covariate regression parameters are denoted by β = (β 1 , . . ., β p ), and a multivariate Gaussian prior is assumed with mean µ β and diagonal variance matrix Σ β that can be chosen by the user.The ψ kt term is a latent component for areal unit k and time period t encompassing one or more sets of spatio-temporally autocorrelated random effects, and the complete set are denoted by ψ = (ψ 1 , . . ., ψ N ), where ψ t = (ψ 1t , . . ., ψ Kt ).Package CARBayesST can fit a number of different spatio-temporal structures for ψ kt , all of which are outlined in Section 2.2 below.The package can implement Equation 1 for binomial, Gaussian and Poisson data, and the exact specifications of each are given below: • Binomial: In the binomial model (n kt , θ kt ) respectively denote the number of trials and the probability of success in each trial in area k and time period t, while in the Gaussian model ν 2 is the observation variance.An Inverse-Gamma(a, b) prior is specified for ν 2 , and default values of (a = 1, b = 0.01) are specified in the software package but can be changed by the user.

Spatio-temporal random effects models
All models implementable in this package induce spatio-temporal autocorrelation into the response data Y via the latent random effects ψ, using CAR-type prior distributions and spatio-temporal extensions thereof.Spatial autocorrelation is controlled by a symmetric nonnegative K × K neighborhood, weight or adjacency matrix W = (w kj ), where w kj represents the spatial closeness between areal units (S k , S j ).Larger valued elements represent spatial closeness between the two areas in question, whereas smaller or zero values correspond to areas that are not spatially close.Typically, W is assumed to be binary, where w kj = 1 if areal units (S k , S j ) share a common border (i.e., are spatially close) and is zero otherwise.Additionally, w kk = 0.Under this binary specification the values of (ψ kt , ψ jt ) for spatially adjacent areal units (where w kj = 1) are spatially autocorrelated, whereas values for nonneighboring areal units (where w kj = 0) are conditionally independent given the remaining {ψ it } values.This binary specification of W based on sharing a common border is the most commonly used for areal data, but the only requirement by package CARBayesST is for W to be symmetric, non-negative, and for each row sum to be greater than zero.Similarly, the model ST.CARanova() defined below uses a binary N × N temporal neighborhood matrix D = (d tj ), where d tj = 1 if |j − t| = 1 and d tj = 0 otherwise.Package CARBayesST can fit the models for ψ outlined in Table 1, and full details for each one are given below.This set of models allows users to fit different spatio-temporal structures to their data and thus examine different underlying hypotheses.Out of these models ST.CARlinear(), ST.CARanova() and ST.CARar() are the simplest in terms of parsimony, and thus missing (NA) values are allowed in the response data (Y) for these models.Missing values are not allowed in the remaining models as they have more complex forms, and exploratory simulation-based testing showed that missing values could not be well recovered in these cases.

ST.CARlinear()
This model is a modification of that proposed by Bernardinelli et al. (1995), and estimates separate but correlated linear time trends for each areal unit.Thus it is appropriate if the goal of the analysis is to estimate which areas are exhibiting increasing or decreasing (linear) trends in the response over time.The full model specification is given below. where Here t = (1/N ) N t=1 t, and thus the modified linear temporal trend covariate is t * = (t − t)/N , and runs over a centered unit interval.Thus areal unit k has its own linear time trend, with a spatially varying intercept β 1 + φ k and a spatially varying slope α + δ k .Note, the β 1 term comes from the covariate component x kt β in Equation 1.Each set of random effects φ = (φ 1 , . . ., φ K ) and δ = (δ 1 , . . ., δ K ) are modeled as spatially autocorrelated by the CAR prior proposed by Leroux et al. (2000), and are mean centered, that is K j=1 φ j = K j=1 δ j = 0.Here (ρ int , ρ slo ) are spatial dependence parameters, with values of one corresponding to strong spatial smoothness that is equivalent to the intrinsic CAR prior proposed by Besag et al. (1991), while values of zero correspond to independence (for example if ρ slo = 0 then δ k ∼ N(0, τ 2 slo )).Flat uniform priors on the unit interval are Model Eq.Description ST.CARlinear() 2 This model is similar to that proposed by Bernardinelli et al. (1995), and represents the spatio-temporal pattern in the mean response with spatially varying linear time trends.The slope and intercept parameters for each area's temporal trend are spatially autocorrelated, by assigning each the conditional autoregressive prior proposed by Leroux, Lei, and Breslow (2000).

ST.CARanova() 3
This model is similar to that proposed by Knorr-Held (2000), and represents the spatio-temporal pattern in the mean response with an ANOVA style decomposition into overall spatial and temporal main effects and a space-time interaction.The spatial and temporal main effects are modeled by the conditional autoregressive prior proposed by Leroux et al. (2000), while the interactions are assumed to be independent.

ST.CARsepspatial() 4
This model is that proposed by Napier et al. (2016), and represents the spatio-temporal pattern in the mean response with an overall temporal effect and separate independent spatial effects for each time period.Each time-period's spatial effect as well as the overall temporal effect are modeled by the conditional autoregressive prior proposed by Leroux et al. (2000).

ST.CARar() 5
This model is that proposed by Rushworth et al. (2014), and represents the spatio-temporal pattern in the mean response with a single set of spatially and temporally autocorrelated random effects.The effects follow a multivariate autoregressive process of order 1, where spatial autocorrelation is induced via the precision matrix based on the conditional autoregressive prior proposed by Leroux et al. (2000).

ST.CARadaptive() 7
This model is that proposed by Rushworth et al. (2017), and has the same spatio-temporal random effect structure as the ST.CARar() model.Additionally, the spatially autocorrelated precision matrix based on the adjacency matrix W is estimated in the model, by means of treating those elements of W that correspond to spatially adjacent areas as random quantities on the unit interval.This allows adaptive levels of spatial smoothness in the random effects surface.

ST.CARlocalised() 9
This model is that proposed by Lee and Lawson (2016), and has the same spatio-temporal random effect structure as the ST.CARar() model.The difference is that it additionally has a piecewise constant intercept term with a maximum of G different levels, which can capture step-changes in disease risk between two spatially adjacent areas.
Table 1: Summary of the models available in the CARBayesST package together with the equation numbers (Eq.) defining them mathematically in this paper.
specified for the spatial dependence parameters (ρ int , ρ slo ), while conjugate Inverse-Gamma and Gaussian priors are specified for the random effects variances (τ 2 int , τ 2 slo ) and the overall slope parameter α respectively.The corresponding hyperparameters (a, b, µ α , σ 2 α ) can be chosen by the user, and the default values specified by the software are (a = 1, b = 0.01, µ α = 0, σ 2 α = 1000), which correspond to weakly informative prior distributions.Alternatively, the dependence parameters (ρ int , ρ slo ) can be fixed at values in the unit interval [0, 1] rather than being estimated in the model, by specifying arguments to the ST.CARlinear() function.For example, using the arguments fix.rho.slo= TRUE, rho.slo = 1 sets ρ slo = 1 when fitting the model.Finally, missing (NA) values are allowed in the response data Y for this model.

ST.CARanova()
The model is a modification of that proposed by Knorr-Held (2000), and decomposes the spatio-temporal variation in the data into 3 components, an overall spatial effect common to all time periods, an overall temporal trend common to all spatial units, and a set of independent space-time interactions.Thus this model is appropriate if the goal is to estimate overall time trends and spatial patterns.The model specification is given below. (3) Here the spatio-temporal autocorrelation is modeled by a common set of spatial random effects φ = (φ 1 , . . ., φ K ) and a common set of temporal random effects δ = (δ 1 , . . ., δ N ), and both are modeled by the CAR prior proposed by Leroux et al. (2000).Additionally, the model can incorporate an optional set of independent space-time interactions γ = (γ 11 , . . ., γ KN ), which can be specified by the argument interaction = TRUE (the default) in the function call.All sets of random effects are mean centered.Fixed uniform (ρ S , ρ T ) or conjugate (τ 2 S , τ 2 T , τ 2 I ) priors are specified for the remaining parameters, and the default specifications for the latter are (a = 1, b = 0.01).Alternatively, in common with the ST.CARlinear() function the dependence parameters (ρ S , ρ T ) can be fixed at values in the unit interval [0, 1] rather than being estimated in the model, for full details see the help file for this function.Finally, missing (NA) values are allowed in the response data Y for this model.

ST.CARsepspatial()
The model is a generalization of that proposed by Napier et al. (2016), and represents the data by two components, an overall temporal trend, and separate spatial surfaces for each time period that share a common spatial dependence parameter but have different spatial variances.This model is appropriate if the goal is to estimate both a common overall temporal trend and the extent to which the spatial variation in the response has changed over time.The model specification is given below. (4) where φ −kt = (φ 1,t , . . ., φ k−1,t , φ k+1,t , . . ., φ K,t ).This model fits an overall temporal trend to the data δ = (δ 1 , . . ., δ N ) that is common to all areal units, which is augmented with a separate (uncorrelated) spatial surface φ t = (φ 1t , . . ., φ Kt ) at each time period t.The overall temporal trend and each spatial surface are modeled by the CAR prior proposed by Leroux et al. (2000), and the latter have a common spatial dependence parameter ρ S but a temporally-varying variance parameter τ 2 t .Thus the collection (τ 2 1 , . . ., τ 2 N ) allows one to examine the extent to which the magnitude of the spatial variation in the data has changed over time.Note that here we fix ρ S to be constant in time as it is not orthogonal to τ 2 t , thus if it varied then any changes in τ 2 t would not directly correspond to changes in spatial variance over time.As with all other models the random effects are zero-mean centered, while flat and conjugate priors are specified for (ρ S , ρ T ) and (τ 2 T , τ 2 1 , . . ., τ 2 N ) respectively with (a = 1, b = 0.01) being the default values.Alternatively, in common with the ST.CARlinear() function, the dependence parameters (ρ S , ρ T ) can be fixed at values in the unit interval [0, 1] rather than being estimated in the model.

ST.CARar()
The model is that proposed by Rushworth et al. (2014), and represents the spatio-temporal structure with a multivariate first order autoregressive process with a spatially correlated precision matrix.This model is appropriate if one wishes to estimate the evolution of the spatial response surface over time without forcing it to be the same for each time period.The model specification is given below. (5) In this model φ t = (φ 1t , . . ., φ Kt ) is the vector of random effects for time period t, which evolve over time via a multivariate first order autoregressive process with temporal autoregressive parameter ρ T .The temporal autocorrelation is thus induced via the mean ρ T φ t−1 , while spatial autocorrelation is induced by the variance τ 2 Q(W, ρ S ) −1 .The corresponding precision matrix Q(W, ρ S ) was proposed by Leroux et al. (2000) and corresponds to the CAR models used in the other models above.The algebraic form of this matrix is given by where 1 is the K × 1 vector of ones while I is the K × K identity matrix.In common with all other models the random effects are zero-mean centered, while flat and conjugate priors are specified for (ρ S , ρ T ) and τ 2 respectively, with (a = 1, b = 0.01) being the default values for the latter.In common with the ST.CARanova() function the dependence parameters (ρ S , ρ T ) can be fixed at values in the unit interval [0, 1] rather than being estimated in the model.Finally, missing (NA) values are allowed in the response data Y for this model.

ST.CARadaptive()
The model is that proposed by Rushworth et al. (2017), and is an extension of ST.CARar() proposed by Rushworth et al. (2014) to allow for spatially adaptive smoothing.It is appropriate if one believes that the residual spatial autocorrelation in the response after accounting for the covariates is consistent over time but has a localized structure.That is, it is strong in some parts of the study region but weak in others.The model has the same autoregressive random effects structure as the previous model ST.CARar(), namely: However, the random effects from ST.CARar() have a single level of spatial dependence that is controlled by ρ S .All pairs of adjacent areal units will have strongly autocorrelated random effects if ρ S is close to one, while no such spatial dependence will exist anywhere if ρ S is close to zero.However, real data may exhibit spatially varying dependencies, as two adjacent areal units may exhibit similar values suggesting a value of ρ S close to one, while another adjacent pair may exhibit very different values suggesting a value of ρ S close to zero.
This model allows for localized spatial autocorrelation by allowing spatially neighboring random effects to be correlated (inducing smoothness) or conditionally independent (no smoothing), which is achieved by modeling the non-zero elements of the neighborhood matrix W as unknown parameters, rather than assuming they are fixed constants.These adjacency parameters are collectively denoted by w + = {w kj |k ∼ j}, where k ∼ j means areas (k, j) share a common border.Estimating w kj ∈ w + as equal to zero means (φ kt , φ jt ) are conditionally independent for all time periods t given the remaining random effects, while estimating it close to one means they are correlated.The adjacency parameters in w + are each modeled on the unit interval, by assuming a multivariate Gaussian prior distribution on the transformed scale v + = log w + /(1 − w + ) .This prior is a shrinkage model with a constant mean µ and a diagonal variance matrix with variance parameter ζ 2 , and is given by The prior distribution for v + assumes that the degree of smoothing between pairs of adjacent random effects is not spatially dependent, which results from the work of Rushworth et al. (2017) that shows poor estimation performance when v + (and hence w + ) is assumed to be spatially autocorrelated.Under small values of τ 2 w the elements of v + are shrunk to µ, and here we follow the work of Rushworth et al. (2017) and fix µ = 15 because it avoids numerical issues when transforming between v + and w + and implies a prior preference for values of w kj close to 1.That is as τ 2 w → 0 the prior becomes the global smoothing model ST.CARar(), where as when τ 2 w increases both small and large values in w + are supported by the prior.As with the other models the default values for the Inverse-Gamma prior for τ 2 w are (a = 1, b = 0.01).Alternatively, it is possible to fix ρ S using the rhofix argument, e.g., rhofix = 1 fixes ρ S = 1, so that globally the spatial correlation is strong and is altered locally by the estimates of w + .For further details see Rushworth et al. (2017).

ST.CARlocalised()
The model was proposed by Lee and Lawson (2016), and augments the smooth spatiotemporal variation in ST.CARar() with a piecewise constant intercept component.This model is appropriate when the aim of the analysis is to identify clusters of areas that exhibit elevated (or reduced) values of the response compared with their geographical and temporal neighbors.Thus this model is similar to ST.CARadaptive(), in that both relax the restrictive assumption that if two areas are close together then their estimated random effects must be similar.This model captures any step-changes in the response via the mean function, whereas ST.CARadaptive() captures them via the correlation structure (via W).Model ST.CARlocalised() is given by where the " − " in Q(W) − denotes a generalized inverse.The random effects φ = (φ 1 , . . ., φ T ) are modeled by a simplification of the ST.CARar() model with ρ S = 1, which corresponds to the intrinsic CAR model proposed by Besag et al. (1991).Note, for this model the inverse Q(W) −1 does not exist as the precision matrix is singular.This simplification is made so that the random effects capture the globally smooth spatio-temporal autocorrelation in the data, allowing the other component to capture localized clustering and step-changes.
where λ 0 = −∞ and λ G+1 = ∞.Here Z kt ∈ {1, . . ., G} and controls the assignment of the (k, t)th data point to one of the G intercept levels.A penalty based approach is used to model Z kt , where G is chosen larger than necessary and a penalty prior is used to shrink it to the middle intercept level.This middle level is is even, and this penalty ensures that Z kt is only in the extreme low and high risk classes if supported by the data.Thus G is the maximum number of distinct intercept terms allowed in the model, and is not the actual number of intercept terms estimated in the model.The allocation prior is independent across areal units but correlated in time, and is given by: for t = 2, . . ., N, (11) Temporal autocorrelation is induced by the (Z kt − Z k,t−1 ) 2 component of the penalty, while the (Z kt − G * ) 2 component penalizes class indicators Z kt towards the middle risk class G * .The size of this penalty and hence the amount of smoothing that is imparted on Z is controlled by δ, which is assigned a uniform prior.The default value is m = 10, and full details of this model can be found in Lee and Lawson (2016).

Inference
All models in this package are fitted in a Bayesian setting using Markov chain Monte Carlo simulation.All parameters whose full conditional distributions have a closed form distribution are Gibbs sampled, which includes the regression parameters (β) and the random effects (e.g., φ, etc.) in the Gaussian data models, as well as the variance parameters (e.g., τ 2 , etc.) in all models.The remaining parameters are updated using Metropolis or Metropolis-Hastings steps, and the random effects in the binomial and Poisson data models can be updated via the simple Gaussian random walk Metropolis algorithm or the Metropolis adjusted Langevin algorithm (MALA; Roberts and Rosenthal 1998).The default is to use MALA, but the user can choose simple random walk Metropolis steps by specifying the MALA = FALSE argument in the function call.The regression parameters are updated in blocks of size 10 utilizing MALA updates, although if only a single covariate is incorporated in the model or only an intercept term is included then simple random walks are used as they were found to perform better.The remaining parameters utilize simple Gaussian random walk Metropolis updates.The simple random walk Metropolis updates are automatically tuned in the algorithms to have acceptances rates of between 40%-50% for scalar parameter updates, and between 20%-40% for vector parameters.The MALA updates are also automatically tuned in the software to have acceptance rates between 40%-50%.The overall functions that implement the MCMC algorithms are written in R, while the computationally intensive updating steps are written as computationally efficient C++ routines using the R package Rcpp (Eddelbuettel and Francois 2011).Additionally, the sparsity of the neighborhood matrices W and D are utilized via their triplet forms when updating the random effects within the algorithms, which increases the computational efficiency of the software.Finally, we note that the software runs only on one core, and leaves possibilities of parallelization to the user.
As a note of caution, all conclusions from MCMC-based inference in Bayesian models are only valid if the samples generated are well behaved, that is they are realizations from the target posterior distribution.Thus one of the challenges of fitting Bayesian models using any software is determining when the Markov chains have converged, and as a result how many samples to discard as the burn-in period and then how many more to generate on which to base inference.Convergence can be assessed using many metrics, the simplest of which is by eye, by viewing trace-plots of the parameters that should be stationary and show random fluctuations around a single mean level (see Figure 1 for an example of Markov chains showing no evidence against convergence).In addition to this visual check package CARBayesST presents the convergence diagnostic proposed by Geweke (1992)

Loading the software
CARBayesST is a package for the statistical computing environment R (R Core

Using the software
The software can fit six models: ST.CARlinear(), ST.CARanova(), ST.CARsepspatial(), ST.CARar(), ST.CARadaptive() ST.CARlocalised(), and full details of the arguments required for each function are given in the help files.However, the main arguments common to all the functions that are required for a baseline analysis (for example using default priors) are as follows.
• formula: A formula for the covariate part of the model using the same syntax used in the lm() function.Offsets can be included here using the offset() function.The response and each covariate should be vectors of length KT × 1, where each vector is ordered so that the first K data points are the set of all K spatial locations at time 1, the next K are the set of spatial points for time 2 and so on.
• family: The likelihood model which must be one of "binomial", "Gaussian" or "Poisson".
• trials: This is only needed if family = "binomial", and it is a vector of the same length and in the same order as the response containing the total number of trials for each area and time period.
• W: A K × K symmetric and non-negative neighborhood matrix whose row sums must all be positive.Typically a binary specification is used, where the kjth element w kj equals one if areas (S j , S k ) are spatially close (e.g., share a common border) and is zero otherwise.This matrix can be specified by hand or created from a shapefile and data frame using functionality from the CARBayes and spdep packages.
• burn-in: The number of MCMC samples to discard as the burn-in period.
• n.sample:The number of MCMC samples to generate.
When a model has been fitted in package CARBayesST, the following methods are available for the returned fitted model: • coef(): Returns the estimated (posterior median) regression coefficients.
• fitted(): Returns the fitted values based on posterior medians.
• model.matrix():Returns the design matrix of covariates.
• print(): Prints a summary of the fitted model to the screen, including both parameter summaries and convergence diagnostics for the MCMC run.
Additionally, the CARBayes functions summarise.samples()and summarise.lincomb()can be applied to CARBayesST models to summarize the results.The software updates the user on its progress to the R console, which allows the user to monitor the function's progress.However, using the verbose = FALSE option will disable this feature.Once run, each model fitted with this package consists of a list object with the following components.
• summary.results:A summary table of selected parameters that is presented when using the print() function.The table includes the posterior median (Median) and 95% credible interval (2.5%, 97.5%), the number of samples generated (n.sample), the acceptance rate for the Markov chain (% accept), the effective number of independent samples using the effectiveSize() function from the coda package (n.effective), and the convergence diagnostic proposed by Geweke (1992) and implemented in the coda package (Geweke.diag).This diagnostic takes the form of a Z-score, so that convergence is suggested by the statistic being within the range (−1.96, 1.96).
• samples: A list containing the MCMC samples from the model, where each element in the list is a matrix.The names of these matrix objects correspond to the parameters defined in Section 2 of this paper, and each column contains the set of samples for a single parameter.For example, for ST.CARlinear() the (tau2, rho) elements of the list have columns ordered as (τ 2 int , τ 2 slo ) and (ρ 2 int , ρ 2 slo ) respectively.Similarly, for ST.CARanova() the (tau2, rho) elements of the list have columns ordered as (τ 2 S , τ 2 T , τ 2 I ) (the latter only if interaction = TRUE) and (ρ 2 S , ρ 2 T ) respectively.Finally, each model returns samples from the posterior distribution of the fitted values for each data point (fitted).
• fitted.values:A vector of fitted values based on posterior medians for each area and time period in the same order as the data Y.
• residuals: A matrix of 3 types of residuals in the same order as the response.The 3 columns of this matrix correspond to "response" (raw), "pearson", and "deviance" residuals.
• modelfit: Model fit criteria including the deviance information criterion (DIC; Spiegelhalter, Best, Carlin, and Van der Linde 2002) and its corresponding estimated effective number of parameters (p.d.), the Watanabe-Akaike information criterion (WAIC; Watanabe 2010) and its corresponding estimated number of effective parameters (p.w.), and the log marginal predictive likelihood (LMPL; Congdon 2005).Also included is the log-likelihood.The best fitting model is one that minimizes the DIC and WAIC but maximizes the LMPL.
• accept: The acceptance probabilities for the parameters.
• localised.structure:This element is NULL except for the models ST.CARadaptive() and ST.CARlocalised().For ST.CARadaptive() this element is a list with 2 K × K matrices, Wmn and W99, which summarize the estimated adjacency relationships.Wmn contains the posterior median for each w kj element estimated in the model for adjacent areal units, while W99 contains indicator variables for whether P(w jk < 0.5|Y) > 0.99.For both matrices, elements corresponding to non-adjacent pairs of areas have NA values.
For ST.CARlocalised() this element is a vector of length KT ×1, and gives the posterior median class (Z kt value) that each data point is assigned to.This vector is in the same order as the data Y.
• formula: The formula (as a text string) for the covariate and offset part of the model.
• model: A text string describing the model that has been fitted.
• X: The design matrix of covariates inherited from the formula argument.
The remainder of this paper illustrates the CARBayesST package via a small simulation study to illustrate the correctness of the MCMC algorithms, as well as two worked examples, the latter of which utilize spatio-temporal data to answer important questions in public health and the housing market.Note that slightly different results might be obtained depending on the specific computing environment, in particular the linear algebra library used.

Simulation exercises
This section is split into three parts.The first illustrates how to use the software to fit a model, the second presents a short simulation study to illustrate the correctness of the CARBayesST implementation of a model, while the third describes a comparison of run times for various data sizes.All three exercises are based on the ST.CARanova() model, but similar studies could be done for the other models.We note in passing that the correctness of the CARBayesST implementations of the ST.CARsepspatial() ).Generation of the data is described below, and in what follows we fix β = (0, 0.1), ρ S = ρ T = 0.8, τ 2 S = τ 2 T = τ 2 I = 0.01.

Generating data and fitting a model
Consider a spatial region comprising K = 400 areal units on a regular 20×20 grid and N = 20 consecutive time periods.Such a grid can be constructed using the following code.
Similarly, a binary 20 × 20 temporal neighborhood matrix D can be constructed using the following code.
R> Q.W.inv <-solve(Q.W) R> library("mvtnorm") R> phi <-rmvnorm(n = 1, mean = rep(0, K), sigma = (0.01 * Q.W.inv)) R> phi.long <-rep(phi, N) Here the last line repeats the spatial random effects N times, as the ST.CARanova() model assumes that there is a single set of spatial random effects for all time periods.The temporal random effects under the ST.CARanova() model have the same functional form but depend on D rather than W, and thus a realization can be generated analogously using the code: Again, the final line repeats the temporal random effects for each spatial unit.Next, we generate space-time interactions and a covariate x, both of which are generated independently from Gaussian distributions.
R> x <-rnorm(n = N.all, mean = 0, sd = 1) R> gamma <-rnorm(n = N.all, mean = 0, sd = sqrt(0.01)) Finally, we set the intercept term β 1 = 0, the regression coefficient β 2 = 0.1, and the number of trials for the binomial likelihood in each area and time period being n kt = 50.Then we generate the response variable via the code below.Here LP denotes the linear predictor, which contains an intercept term, a covariate and three sets of random effects (spatial, temporal, and interactions).
R> library("CARBayesST") R> model <-ST.CARanova(formula = Y ~x, family = "binomial", trials = n, + W = W, burnin = 20000, n.sample = 120000, thin = 10) In the code above inference is based on 10,000 MCMC samples, which were generated from a single Markov chain that was run for 120,000 iterations with a 20,000 burn-in period and R> colnames(model$samples$beta) <-c("beta1", "beta2") R> plot(model$samples$beta) The figures show no evidence against convergence, and that the posterior distributions for both parameters are centered close to their true values.A summary of the fitted model can be obtained using the print() function as follows.
R> model Latent structure model -spatial and temporal main effects and an interaction Regression equation -Y ~x The summary is presented in two parts, the first of which describes the model that has been fit.The second summarizes the results, and includes the posterior median (Median) and 95% credible intervals (2.5%, 97.5%) for selected parameters (not the random effects), the convergence diagnostic proposed by Geweke (1992) (Geweke.diag)as a Z-score and the effective number of independent samples (n.effective).Also displayed are the actual number of samples kept from the MCMC run (n.sample) as well as the acceptance rate for each parameter (% accept).Note, parameters that have an acceptance rate of 100% have been Gibbs sampled due to their full conditional distributions being a standard distribution.Finally, the DIC and LMPL overall model fit criteria are displayed, which allows models with different space-time structures to be compared.

Small simulation study
This section illustrates the correctness of the CARBayesST package implementation of the ST.CARanova() model, by simulating 100 data sets using the code presented above and summarizing the bias and 95% coverage probabilities of the estimated model parameters.The results of this simulation study are presented in Table 2, which shows bias and 95% coverage probabilities (mean square error is not presented as we are not seeking to compare two different models) for (β 1 , β 2 , ρ S , ρ T , τ 2 S , τ 2 T , τ 2 I , φ, δ, γ) as well as for the fitted values.For the random effects and fitted values all results are averaged over both the 100 simulated data sets and over all the elements (either K, N or N.all) in each simulated data set.The results show that overall the CARBayesST implementation of the ST.CARanova() model produces largely unbiased parameter estimates, with all parameters except the dependence parameters (ρ S , ρ T ) having negligible biases.The largest bias in absolute size is −0.219 for ρ T , which is not surprising because it is a temporal dependence parameter estimated from data on only 20 time points.Additionally, the table shows that the coverage probabilities for all the parameters are generally close to the nominal 0.95 levels, suggesting that the 95% credible intervals have the correct width.The only exceptions are the dependence parameters which are slightly below 90% coverage.

Timing and data sizes
The final part of this section presents some timings for fitting the ST.CARanova() model to data sets of various sizes.All data sets are generated on a regular lattice with a single covariate and interaction random effects present as illustrated in Section 4.1.The results are presented in a burn-in period of 20,000 and thinning the resulting Markov chains by 10.The timings were carried out on an Apple iMac computer with a 3.5 GHZ Intel Core i7 processor and 32GB 1600 MHz DDR3 memory.The table shows that the example run times range between just over 2 minutes for 1,000 data points to around 3 hours and 45 minutes for 100,000 data points, which shows the increased computational effort required as the number of data points increases.

Example 1: The effect of air pollution on human health
This first example is an ecological regression problem, where the aim is to estimate the effect that air pollution concentrations have on respiratory disease risk.

Data and exploratory analysis
For the purposes of delivering health care Scotland is split into 14 health boards, and this study focuses on the Greater Glasgow and Clyde health board, which contains the city of Glasgow and has a population of around 1.  1.
The pollution data we utilize are yearly average modeled concentrations of particulate matter less than 10 microns (PM 10 ), which come from both anthropogenic (e.g., particles in car exhaust fumes) and natural (e.g., sea salt) sources.These data are estimates on a 1 kilometer square grid produced by a numerical simulation model, and full details can be found in Ricardo-AEA (2015).These 1 kilometer square estimates are spatially misaligned with the irregularly shaped polygonal IGs at which the disease and covariate data are available, and thus simple averaging is used to produce IG level PM 10 estimates.Specifically, the median value of PM 10 over the set of 1 kilometer grid squares having centroids lying within each IG was computed, and if an IG was too small to contain a grid square centroid then the nearest grid square was used as the concentration.
Finally, the data set contains 2 potential confounders that will be included in the model, both of which are proxy measures of socio-economic deprivation (poverty).The main confounder in spatio-temporal air pollution and health studies is smoking rates, but such smoking data are unavailable.However, smoking rates are strongly linked to socio-economic deprivation, and thus existing studies such as Haining, Li, Maheswaran, Blangiardo, Law, Best, and Richardson (2010) control for smoking effects using deprivation based proxy measures.Here we have two measures of socio-economic deprivation, the average property price in each IG and year (in hundreds of thousands), and the proportion of the working age population who are in receipt of job seekers allowance (JSA), the latter being a benefit paid to individuals who are unemployed and seeking employment.
These data are available in the CARBayesdata package in the object pollutionhealthdata, and the package also contains the spatial polygon information for the Greater Glasgow and Clyde health board study region in the object GGHB.IG as a 'SpatialPolygonsDataFrame' object.These data can be loaded using the following commands.
R> set.seed(1234)R> library("CARBayesdata") R> library("sp") R> data("GGHB.IG", package = "CARBayesdata") R> data("pollutionhealthdata", package = "CARBayesdata") The structure of pollutionhealthdata is shown below R> head(pollutionhealthdata) The first column labeled IG is the set of unique identifiers for each IG, while observed and expected are respectively the observed (e.g., Y kt ) and expected (e.g., E kt ) numbers of hospital admissions due to respiratory disease.An exploratory measure of disease risk is the standardized morbidity ratio (SMR), which for the kth IG and tth year is computed as SMR kt = Y kt /E kt .However, due to the natural log link function in the Poisson model, the covariates are related in the model to the natural log of the SMR.Therefore the code below adds the SMR and the natural log of the SMR to the data set and produces a pairs() plot showing the relationship between the variables.
R> pollutionhealthdata$SMR <-with(pollutionhealthdata, observed / expected) R> pollutionhealthdata$logSMR <-log(pollutionhealthdata$SMR) R> par(pty = "s", cex.axis = 1.5, cex.lab = 1.5)R> pairs(pollutionhealthdata[, c(9, 5:7)], pch = 19, cex = 0.5, + lower.panel= NULL, panel = panel.smooth,+ labels = c("ln(SMR)", "PM10", "JSA", "Price (*100,000)")) The pairs plot shown in Figure 2 shows respectively positive and negative relationships between the natural log of SMR and the two deprivation covariates jsa and price, in both cases suggesting that increasing levels of poverty are related to an increased risk of respiratory hospitalization.There also appears to be a weak positive relationship between log(SMR) and PM 10 , while the only relationship that exists between the covariates is a negative nonlinear one between jsa and price.Next, it is of interest to visualize the average spatial pattern in the SMR over all five years, and the data can be appropriately aggregated using the summarise() function from the dplyr package using the code below.The aggregation is done by the second line, while the final line adds the aggregated averages to the GGHB.IG 'SpatialPolygonsDataFrame' object.R> library("dplyr") R> SMR.av <-summarise(group_by(pollutionhealthdata,IG), SMR.mean = + mean(SMR)) R> GGHB.IG@data$SMR <-SMR.av$SMR.meanFinally, the spplot() function from the sp package can be used to draw a map of the average SMR over time using the code below: R> l1 <-list("SpatialPolygonsRescale", layout.north.arrow(),+ offset = c(220000, 647000), scale = 4000) R> l2 <-list("SpatialPolygonsRescale", layout.scale.bar(),offset = + c(225000, 647000), scale = 10000, fill = c("transparent", "black")) R> l3 <-list ("sp.text", c(225000, 649000), "0") R> l4 <-list ("sp.text", c(230000, 649000), "5000 m") R> breakpoints <-seq(min(SMR.av$SMR.mean)-0.1, + max(SMR.av$SMR.mean)+ 0.1, length.out= 11) R> spplot(GGHB.IG, "SMR", sp.layout = list(l1, l2, l3, l4), xlab = "Easting", + ylab = "Northing", scales = list(draw = TRUE), at = breakpoints, + col.regions = terrain.colors(n= length(breakpoints) -1), + par.settings= list(fontsize = list(text = 20))) The first four lines in the code create a north-arrow and scale-bar to add to the map, the fifth line creates the breakpoints for the color scheme into 10 equally sized intervals, while the last line draws the map.The resulting map is shown in Figure 3, where the green shaded areas are low risk (SMR < 1), while the orange to silver areas exhibit elevated risks (SMR > 1).The map shows that the main high-risk areas are in the east-end of Glasgow in the east of the study region, and the Port Glasgow area in the far west of the region on the lower bank of the river Clyde (the white line running north west to south east).The analysis that follows requires us to compute the neighborhood matrix W and a 'listw' object variant of the same spatial information, the latter being used in a hypothesis test for spatial autocorrelation.
Both of these quantities can be computed from the 'SpatialPolygonsDataFrame' object using functionality from the spdep package, and code to achieve this is shown below.

Assessing the presence of spatial autocorrelation
The spatio-temporal models in package CARBayesST allow for spatio-temporal autocorrelation via random effects, which capture the remaining autocorrelation in the disease data after the effects of the known covariates have been accounted for.Therefore, we assess the presence of spatial autocorrelation in our data set by first computing the residuals from a simple over-dispersed Poisson log-linear model that incorporates the covariate effects.This model is fitted using the code: R> formula <-observed ~offset(log(expected)) + jsa + price + pm10 R> model1 <-glm(formula = formula, family = "quasipoisson", + data = pollutionhealthdata) R> resid.glm<-residuals(model1) R> coef(summary(model1)) The results show significant effects of all three covariates on disease risk, as well as substantial over-dispersion with respect to the Poisson equal mean and variance assumption (over-dispersion parameter equal to around 4.40).To quantify the presence of spatial autocorrelation in the residuals from this model we can compute Moran's I statistic (Moran 1950) and conduct a permutation test for each year of data separately.The permutation test has the null hypothesis of no spatial autocorrelation and an alternative hypothesis of positive spatial autocorrelation, and is conducted using the moran.mc()function from the spdep package.The test can be implemented for the first year of residuals ( 2007) using the code below.The estimated Moran's I statistic is 0.10358 and the p value is less than 0.05, suggesting strong evidence of unexplained spatial autocorrelation in the residuals from 2007 after accounting for the covariate effects.Similar results were obtained for the other years and are not shown for brevity.We note that residual temporal autocorrelation could be assessed similarly for each IG, for example by computing the lag − 1 autocorrelation coefficient, but with only 5 time points the resulting estimates would not be reliable.These results show that the assumption of independence is not valid for these data, and that spatio-temporal autocorrelation should be allowed for when estimating the covariate effects.

Spatio-temporal modeling with CARBayesST
We illustrate model fitting in CARBayesST by applying the ST.CARar() model to the data, details of which are given in Section 2. This model has previously been used to account for spatio-temporal autocorrelation in an air pollution and health study, for details see Rushworth et al. (2014).The model can be fitted with the following one-line function call, and we note that all data vectors (response, offset and covariates) have to be ordered so that the first K data points relate to all spatial units at time 1, the next K data points to all spatial units at time 2 and so on.
R> library("CARBayesST") R> model2 <-ST.CARar(formula = formula, family = "poisson", + data = pollutionhealthdata, W = W, burnin = 20000, n.sample = 220000, + thin 10) In the above code the covariate and offset component defined by formula is the same as for the simple Poisson log-linear model fitted earlier, and the neighborhood matrix is binary and defined by whether or not two areas share a common border.The ST.CARar() model is run for 220,000 MCMC samples, the first 20,000 of which are removed by the burn-in period.The samples are then thinned by 10 to reduce the autocorrelation of the Markov chain, resulting in 20,000 samples for inference.A summary of the model results can be visualized using the print() function developed for package CARBayesST, which gives a very similar summary to that produced in the CARBayes package.Here the Y object is NA as there are no missing Y kt observations in this data set.If there had been say m missing values, then the Y component of the list would have contained m columns, with each one containing posterior predictive samples for one of the missing observations.The key interest in this analysis is the effects of the covariates on disease risk, which for Poisson models are typically presented as relative risks.The relative risk for an unit increase in a covariate with regression parameter β s is given by the transformation exp( β s ), and a relative risk of 1.02 corresponds to a 2% increased risk if the covariate increased by .The code below draws the posterior relative risk distributions for a one unit increase in each covariate, which are all realistic increases given the variation observed in the data in Figure 2.

Example 2: Monitoring the state of the housing market
This second example focuses on the state of the housing market, specifically property sales, and aims to quantify its changing trend over time in an era that encompasses the global financial crisis that began in late 2007.

Data and exploratory analysis
The study region is the same as for the first example, namely the set of K = 271 intermediate geographies that make up the Greater Glasgow and Clyde health board.The data also come from the same source (Scottish Statistics; http://statistics.gov.scot/), and include yearly observations of house sales from 2003 to 2013 inclusive.The response variable is the number of property sales Y kt in each IG (indexed by k) and year (indexed by t), and we additionally have the total number of properties n kt in each IG and year that will be used in the model as the offset term.We use the following Poisson log-linear model for these data, Y kt ∼ Poisson(n kt θ kt ), where θ kt is the rate of property sales as a proportion of the total number of properties.We note that we have not used a binomial model here as a single property could sell more than once in a year, meaning that each property does not constitute a Bernoulli trial.Thus θ kt is not strictly the proportion of properties that sell in a year, but is on approximately the same scale for interpretation purposes.
These data are available in the CARBayesdata package in the object salesdata, as is the spatial polygon information for the Greater Glasgow and Clyde health board study region (in the object GGHB.IG).These data can be loaded using the following commands.
R> set.seed(1234)R> library("CARBayesdata") R> library("sp") R> data("GGHB.IG", package = "CARBayesdata") R> data("salesdata", package = "CARBayesdata") The data frame salesdata contains 4 columns, the intermediate geography code (IG), the year the data relates to (year), the number of property sales (sales, Y kt ) and the total number of properties (stock, n kt ).We visualize the temporal trend in this data using the code below, where the first line creates the raw rate of property sales as a proportion of the total number of properties.
The plot shows a clear step-change in property sales between 2007 and 2008, as sales were increasing up to and including 2007, before markedly decreasing in subsequent years.Sales in the last year of 2013 show slight evidence of increasing relative to the previous 4 years, possibly suggesting the beginning of an upturn in the market.Also there appears to be a change in the level of spatial variation from year to year, with larger amounts of spatial variation observed before the global financial crisis.The spatial pattern in the average (over time) rate of property sales as a proportion of the total number of properties is shown in Figure 6, which was created using the code below.
R> library("dplyr") R> salesprop.av<-summarise(group_by(salesdata, IG), + salesprop.mean= mean(salesprop)) R> GGHB.IG@data$sales <-salesprop.av$salesprop.meanR> l1 <-list("SpatialPolygonsRescale", layout.north.arrow(),+ offset = c(220000, 647000), scale = 4000) R> l2 <-list("SpatialPolygonsRescale", layout.scale.bar(),+ offset = c(225000, 647000), scale = 10000, + fill = c("transparent", "black")) The first 3 lines of code create the IG specific temporal averages using the dplyr package, and then adds them to the GGHB.IG 'SpatialPolygonsDataFrame' object.The remaining lines of code produce the map shown in Figure 6, again using code very similar to that in Example 1.The map and its color key shows that the property sales data are skewed to the right, as the color key chosen has unequal groups.Initially an equally-spaced color scheme was created, but that showed little color variation, hence the use of the unequal quantile-based color key.Secondly, the map shows a largely similar pattern to that seen for respiratory disease risk in Figure 3, with areas that exhibit relatively high sales rates largely being the same ones that exhibit relatively low disease risk.Figures 5 and 6 highlight the change in temporal dynamics and the spatial structure in property sales in Glasgow, and we now apply the ST.CARsepspatial() model from CARBayesST to more formally quantify these features.

Quantifying temporal trends and spatial patterns in sales rates
The extent to which the region-wide average level of sales and its spatial variation and spatial structure changes over time can be assessed by applying the model proposed by Napier et al. (2016) to the data, which can be implemented using the ST.CARsepspatial() function.
Before fitting this model we need to create the neighborhood matrix using the following code: R> library("spdep") R> W.nb <-poly2nb(GGHB.IG, row.names= salesprop.av$salesprop.mean)R> W <-nb2mat(W.nb, style = "B") Then the model can be fitted using the code below, where inference is again based on 20,000 post burn-in and thinned MCMC samples.
R> library("CARBayesST") R> formula <-sales ~offset(log(stock)) R> model1 <-ST.CARsepspatial(formula = formula, family = "poisson", + data = salesdata, W = W, burnin = 20000, n.sample = 220000, thin = 10) A summary of the model fit can be obtained using the print() function, and the output is similar to that shown in Example 1 and is not shown for brevity.The model fitted represents the estimated rate of property sales by which is the sum of an overall intercept term β 1 , a space-time effect φ kt with a time period specific variance, and a region-wide temporal trend δ t .The mean and standard deviation of {θ kt } over space for each year is computed by the following code, which produces the posterior median and a 95% credible interval for each quantity for each year.These temporal trends in the average rate of property sales and its level of spatial variation can be plotted by the following code, and the result is displayed in Figure 7.
The posterior median sales rate {θ kt } is computed and then plotted for the 6 odd numbered years using the code below, where the color scale is the same as for Figure 6.The first 3 lines create a 'data.frame'object of estimated sales rates, while the fourth line adds these sales rate data to the 'SpatialPolygonsDataFrame' object.Finally, the last line plots the estimated sales rates for odd numbered years.

Discussion
This paper presented the CARBayesST package, which is the first software package dedicated to fitting spatio-temporal CAR-type models to areal unit data.This paper outlined the class of models that can be implemented by the software together with the Bayesian inferential framework used to fit these models.The key advantage of this package, compared to say implementing the models in BUGS, is its ease of use, which includes fitting models with a one-line function call, specifying the spatial adjacency information via a single matrix, and the ability to fit multiple models addressing different questions about the data in a common software environment.The paper has presented a small simulation study to illustrate the correctness of the software, as well as two fully worked examples based on quantifying the effect of air pollution on human health and the changing nature of the housing market.
Future development of the software will be in two main directions.First, as the literature on spatio-temporal modeling advances we aim to increase the number of spatio-temporal models that can be implemented, providing the user with an even wider set of modeling tools.Second, with the rapid increase in the availability of small-area data, we aim to develop a suite of multivariate space-time models (MVST).The development of MVST methodology for areal unit data is in its infancy, and the ability to jointly examine the spatio-temporal patterns in multiple response variables simultaneously allows one to address questions that cannot be addressed by single variable models.For example, in a public health context it allows one to estimate overall and disease specific spatio-temporal patterns in disease risk, allowing one to see which areas repeatedly signal at high risk for all diseases, and which exhibit elevated risks for only one disease.In the housing context of Example 2, an MVST approach would allow one to extend the analysis carried out by different property types, e.g., flats, terraced houses, etc., which would allow more insight to be gained about the exact nature of the housing market.

Figure 1 :
Figure 1: Posterior distributions of the regression parameters (the true values are β 1 = 0 and β 2 = 0.1).The left panel contains trace-plots while the right panel are density estimates.

Figure 2 :
Figure 2: Scatter plot of the disease, pollution and covariate data.

Figure 3 :
Figure 3: Map showing the average SMR over all five years from 2007 to 2011.

Figure 4 :
Figure 4: Posterior distributions for the covariate effects.

Figure 5 :
Figure 5: Boxplots showing the temporal trend in the raw rate of property sales as a proportion of the total number of properties between 2003 and 2013.

Figure 6 :
Figure 6: Map showing the average (between 2003 to 2013) raw rate of property sales as a proportion of the total number of properties.

Figure 7 :
Figure 7: Posterior median (red) and 95% credible interval (black) for the temporal trend in: (a) region-wide average property sales rates; and (b) spatial standard deviation in property sales rates.In panel (a) the blue dots are the raw sales proportions for each area and year (jittered in the x direction to improve the presentation).

Figure 8 :
Figure 8: Maps showing the changing spatial evolution of the posterior median estimated sales rates {θ kt } for odd numbered years.
This second component is a piecewise constant clustering or intercept component λ Z kt .Thus spatially and temporally adjacent data points (Y kt , Y js ) will be similar (autocorrelated) if they are in the same cluster or intercept, that is if λ Z kt = λ Z js , but exhibit a step-change if they are estimated to be in different clusters, that is if λ Z kt = λ Z js .The piecewise constant intercept or clustering component comprises at most G distinct levels, making this component a piecewise constant intercept term.The G levels are ordered via the prior specification: Gerber and Furrer (2015)unson, Vehtari, and Rubin (2013) to a fitted model object, which uses the geweke.diag()functionfromthecodapackage(Plummer,Best,Cowles,andVines2006).This statistic is in the form of a Z-score, and values between (−1.96, 1.96) are suggestive of convergence.A full discussion of how many samples to generate, burn-in lengths and whether or not to thin the Markov chains are beyond the scope of this paper, and further details can be found in general texts on Bayesian modeling such asRobert and Casello (2010)andGelman, Carlin, Stern, Dunson, Vehtari, and Rubin (2013).Additionally, further details of MCMC algorithms for CAR-type models are given byRue and Held (2005)andGerber and Furrer (2015).
(Bivand and Lewin-Koh 2015)re automatically loaded for use in package CARBayesST by the above call, but a complete spatial analysis beginning with reading in and formatting shapefiles and data, creating the neighborhood matrix W, and plotting the results requires a number of other packages.Thus the worked examples in this paper utilize functionality from the following packages: CARBayes, CARBayesdata, dplyr, maptools(Bivand and Lewin-Koh 2015), MASS, sp and spdep.

Table 3 ,
and relate to fitting the model for a total of 120,000 iterations, with

Table 2 :
Summary of the simulation study undertaken to assess the bias and 95% coverage probabilities of the parameter estimates from the ST.CARanova() model.All results are based on 100 simulated data sets generated as outlined above.

Table 3 :
Summary of the time taken to run the ST.CARanova() model in seconds (in hours, minutes and seconds in brackets) on a regular grid with different square grid sizes (K), numbers of time periods (N ) and total number of data points (N all).
2 million people during the 2007 to 2011 study period.This health board is split into K = 271 intermediate geographies(IG), which are a key geography for the distribution of small-area statistics in Scotland and contain populations of between 2,468 and 9,517 people.The aim of this study is to quantify the impact of particulate matter air pollution concentrations on respiratory ill health, and we have yearly data for N = 5 years (2007 to 2011) for the K = 271 IGs.The disease and covariate data are freely available from Scottish Statistics (http://statistics.gov.scot/), while the particulate matter pollution concentrations are available from the Department for the Environment, Food and Rural Affairs (DEFRA; https://uk-air.defra.gov.uk/data/pcm-data).respiratory disease data are population level counts of the numbers of admissions to hospital in each IG and year with a primary diagnosis of respiratory disease, which corresponds to the International Classification of Disease tenth revision (ICD-10) codes J00-J99 and R09.1.However, the observed numbers of admissions in an IG and year depends on the size and demographic structure (e.g., age and sex profile) of the population living there, which is adjusted for using indirect standardization.This involves computing the number of admissions that would be expected in each IG and year if national age and sex specific admissions rates applied.The observed and expected numbers of respiratory hospital admissions in the kth IG and tth year are denoted by (Y kt , E kt ) respectively, and the Poisson model, Y kt ∼ Poisson(E kt R kt ) is typically used to model these data.Here R kt is the risk, relative to E kt , of disease in IG k and year t, and a value of 1.2 corresponds to a 20% increased risk of disease.Operationally, the E kt is included as an offset term in the model on the natural log scale, that is O kt = ln(E kt ) in Equation The output from the print() function shows that all three covariates exhibit relationships with disease risk, as none of the 95% credible intervals contain zero.Furthermore, the spatial (rho.S) and temporal (rho.T) dependence parameters exhibit relatively high values in the unit interval, suggesting that both spatial and temporal autocorrelation are present in these data after adjusting for the covariate effects.The model model2 is a list, and details of its elements are described in Section 3 of this paper.A list object containing the MCMC samples for each individual parameter and the fitted values is stored in model2$samples, and each element of this list corresponds to a different group of parameters and is stored as a 'mcmc' object from the coda package.Applying the summary() function to this object yields: