spBayesSurv: Fitting Bayesian Spatial Survival Models Using R

Spatial survival analysis has received a great deal of attention over the last 20 years due to the important role that geographical information can play in predicting survival. This paper provides an introduction to a set of programs for implementing some Bayesian spatial survival models in R using the package spBayesSurv. The function survregbayes includes the three most commonly-used semiparametric models: proportional hazards, proportional odds, and accelerated failure time. All manner of censored survival times are simultaneously accommodated including uncensored, interval censored, current-status, left and right censored, and mixtures of these. Left-truncated data are also accommodated. Time-dependent covariates are allowed under the piecewise constant assumption. Both georeferenced and areally observed spatial locations are handled via frailties. Model fit is assessed with conditional Cox-Snell residual plots, and model choice is carried out via the log pseudo marginal likelihood, the deviance information criterion and the Watanabe-Akaike information criterion. The accelerated failure time frailty model with a covariate-dependent baseline is included in the function frailtyGAFT. In addition, the package also provides two marginal survival models: proportional hazards and linear dependent Dirichlet process mixture, where the spatial dependence is modeled via spatial copulas. Note that the package can also handle non-spatial data using non-spatial versions of aforementioned models.


Introduction
Spatial location plays a key role in survival prediction, serving as a proxy for unmeasured regional characteristics such as socioeconomic status, access to health care, pollution, etc. Literature on the spatial analysis of survival data has flourished over the last decade, including the study of leukemia survival (Henderson, Shimakura, and Gorst 2002), childhood mortality (Kneib 2006), asthma (Li and Lin 2006), breast cancer (Banerjee and Dey 2005;Zhou, Hanson, Jara, and Zhang 2015a), political event processes (Darmofal 2009), prostate cancer (Wang, Zhang, and Lawson 2012;Zhou, Hanson, and Zhang 2017), pine trees (Li, Hong, Thapa, and Burkhart 2015), threatened frogs (Zhou, Hanson, and Knapp 2015b), health and pharmaceutical firms (Arbia, Espa, Giuliani, and Micciolo 2016), emergency service response times (Taylor 2017), and many others.
Here we introduce the spBayesSurv  package for fitting various survival models to spatially-referenced survival data. Note that all models included in this package can also be fit without spatial information, including nonparametric models as well as semiparametric proportional hazards (PH), proportional odds (PO), and accelerated failure time (AFT) models. The model parameters and statistical inference are carried out via selftuning adaptive Markov chain Monte Carlo (MCMC) methods; no manual tuning is needed. The R syntax is essentially the same as for existing R survival (Therneau 2015) functions. Sensible, well-tested default priors are used throughout, however, the user can easily implement informative priors if such information is available. The primary goal of this paper is to introduce spBayesSurv and provide extensive examples of its use. Comparisons to other models and R packages can be found in Zhou et al. (2015b), , and .
Section 2 discusses spBayesSurv's implementation of PH, PO, and AFT frailty models for georeferenced (e.g., latitude and longitude are recorded) and areally-referenced (e.g., county of residence recorded) spatial survival data; the functions also work very well for exchangeable or no frailties. The models are centered at a parametric family through a novel transformed Bernstein polynomial prior and the centering family can be tested versus the Bernstein extension via Bayes factors. All manner of censoring is accommodated as well as left-truncated data; left-truncation also allows for the inclusion of time-dependent covariates. The LPML, DIC and WAIC statistics are available for model selection; spike-and-slab variable selection is also implemented.
In Section 3, a generalized AFT model is implemented allowing for continuous stratification. That is, the baseline survival function is itself a function of covariates: baseline survival changes smoothly as a function of continuous predictors; for categorical predictors the usual stratified AFT model is obtained. Note that even for the usual stratified semiparametric AFT model with one discrete predictor (e.g., clinic) it is extremely difficult to obtain inference using frequentist approaches; see Chiou, Kang, and Yan (2015) for a recent development. The model fit in spBayesSurv actually extends discrete stratification to continuous covariates, allowing for very general models to be fit. The generalized AFT model includes the easy computation of Bayes factors for determining which covariates affect baseline survival and whether a parametric baseline is adequate.
Finally, Section 4 offers a spatial implementation of the completely nonparametric linear dependent Dirichlet process mixture (LDDPM) model of De Iorio, Johnson, Müller, and Rosner (2009) for georeferenced data. The LDDPM does not have one simple "linear predictor" as do the models in Sections 2 and 3, and therefore a marginal copula approach was taken to incorporate spatial dependence. A piecewise-constant baseline hazard PH model is also implemented via spatial copula for comparison purposes, i.e., a Bayesian version of the model presented in Li and Lin (2006). Section 5 concludes the paper with a discussion.
Although there are many R packages for implementing survival models, there are only a handful of that allow the inclusion of spatial information and these focus almost exclusively on variants of the PH model. BayesX (Belitz, Brezger, Klein, Kneib, Lang, and Umlauf 2015) is an immensely powerful standalone program for fitting various generalized additive mixed models, including both georeferenced and areally-referenced frailties in the PH model. The package R2BayesX (Umlauf, Adler, Kneib, Lang, and Zeileis 2015) interfaces BayesX with R, but does not appear to include the full functionality of BayesX, e.g., a Bayesian approach for interval-censored data is not included. BayesX uses Gaussian Markov random fields for discrete spatial data. For georeferenced frailties BayesX uses what have been termed "Matern splines," first introduced in an applied context by Kammann and Wand (2003). Several authors have used this approach including Kneib (2006), Hennerfeind, Brezger, and Fahrmeir (2006), and Kneib and Fahrmeir (2007). This approximation was termed a "predictive process" and given a more formal treatment by Banerjee, Gelfand, Finley, and Sang (2008) and Finley, Sang, Banerjee, and Gelfand (2009). The spBayesSurv package utilizes the full-scale approximation (FSA) of Sang and Huang (2012) which extends the predictive process to capture both the large and small spatial scales; see Section 2.1.4.
The package spatsurv (Taylor and Rowlingson 2017) includes an implementation of PH allowing for georeferenced Gaussian process frailties. The frailty process is approximated on a fine grid and the covariance matrix inverted via the discrete Fourier transform on block circulant matrices; see Taylor (2015) for details. Taylor's approach vastly improves computation time over a fully-specified Gaussian process. The package mgcv (Wood 2017) also fits a spatial PH model by including a spatial term through various smoothers such as thin plate spline, Duchon spline and Gaussian process. All the three aforementioned R packages focus on the PH model, whereas the spBayesSurv includes several other spatial frailty models and two marginal copula models (Zhou et al. 2015b).
To set notation, suppose subjects are observed at m distinct spatial locations s 1 , . . . , s m . Let t ij be a random event time associated with the jth subject in s i and x ij be a related pdimensional vector of covariates, i = 1, . . . , m, j = 1, . . . , n i . Then n = m i=1 n i is the total number of subjects under consideration. Assume the survival time t ij lies in the interval (a ij , b ij ), 0 ≤ a ij ≤ b ij ≤ ∞. Here left censored data are of the form (0, b ij ), right censored (a ij , ∞), interval censored (a ij , b ij ) and uncensored values simply have a ij = b ij , i.e., we define (x, x) = {x}. Therefore, the observed data will be D = {(a ij , b ij , x ij , s i ); i = 1, . . . , m, j = 1, . . . , n i }. For areally-observed outcomes, e.g., county-level, there is typically replication (i.e., n i > 1); for georeferenced data, there may or may not be replication. Note although the models are discussed for spatial survival data, non-spatial data are also accommodated. All code below is run in R version 3.3.3 under the platform x86 64-apple-darwin13.4.0 (64-bit).

Models
The function survregbayes supports three commonly-used semiparametric frailty models: AFT, PH, and PO. The AFT model has survival and density functions while the PH model has survival and density functions and the PO model has survival and density functions where β = (β 1 , . . . , β p ) is a vector of regression coefficients, v i is an unobserved frailty associated with s i , and S 0 (t) is the baseline survival with density f 0 (t) corresponding to x ij = 0 and v i = 0. Let Γ(a, b) denote a gamma distribution with mean a/b and N p (µ, Σ) a p-variate normal distribution with mean µ and covariance Σ. The survregbayes function implements the following prior distributions: where TBP L , ICAR, GRF and IID refer to the transformed Bernstein polynomial (TBP) (Chen, Hanson, and Zhang 2014; prior, intrinsic conditionally autoregressive (ICAR) (Besag 1974) prior, Gaussian random field (GRF) prior, and independent Gaussian (IID) prior distributions, respectively. The function argument prior allows users to specify these prior parameters in a list with elements defined as follows: We next briefly introduce these priors but leave details to .

ICAR and IID priors
where e i+ = m j=1 e ij is the number of neighbors of area s i . The induced prior on v under ICAR is improper; the constraint m j=1 v j = 0 is used for identifiability (Banerjee, Carlin, and Gelfand 2014). Note that we assume that every region has at least one neighbor, so the proportionality constant for the improper density of v is (τ −2 ) (m−1)/2 (Lavine and Hodges 2012).

GRF priors
For georeferenced data, it is commonly assumed that v i = v(s i ) arises from a Gaussian random field (GRF) {v(s), s ∈ S} such that v = (v 1 , . . . , v m ) follows a multivariate Gaussian distribution as v ∼ N m (0, τ 2 R), where τ 2 measures the amount of spatial variation across locations and the (i, j) element of R is modeled as R[i, j] = ρ(s i , s j ). Here ρ(·, ·) is a correlation function controlling the spatial dependence of v(s). In survregbayes the powered exponential correlation function ρ(s, s ) = ρ(s, s ; φ) = exp{−(φ s − s ) ν } is used, where φ > 0 is a range parameter controlling the spatial decay over distance, ν ∈ (0, 2] is a pre-specified shape parameter which can be specified via prior$nu, and s − s refers to the distance (e.g., Euclidean, great-circle) between s and s . Therefore, the prior GRF(τ 2 , φ) is defined as where p ij is the (i, j) element of R −1 .

Full-scale approximation
As m increases evaluating R −1 from R becomes computationally impractical. To overcome this computational issue, we consider the FSA (Sang and Huang 2012) due to its capability of capturing both large-and small-scale spatial dependence. Consider a fixed set of "knots" S * = {s * 1 , . . . , s * K } chosen from the study region. These knots are chosen using the function cover.design within the R package fields (Nychka, Furrer, Paige, and Sain 2015), which computes space-filling coverage designs using the swapping algorithm (Johnson, Moore, and Ylvisaker 1990). Let ρ(s, s ) be the correlation between locations s and s . The usual predictive process approach (e.g., Banerjee et al. 2008 is a K ×K correlation matrix at knots S * . However, noting that ρ(s, s ) = ρ l (s, s ) + [ρ(s, s ) − ρ l (s, s )], the predictive process discards entirely the residual part ρ(s, s ) − ρ l (s, s ). In contrast, the FSA approach approximates the correlation function ρ(s, s ) with where ρ s (s, s ) = {ρ(s, s ) − ρ l (s, s )} ∆(s, s ) serves as a sparse approximate of the residual part. Here ∆(s, s ) is a modulating function, which is specified so that ρ s (s, s ) can well capture the local residual spatial dependence while still permitting efficient computation. Motivated by Konomi, Sang, and Mallick (2014), we first partition the total input space into B disjoint blocks, and then specify ∆(s, s ) in a way such that the residuals are independent across input blocks, but the original residual dependence structure within each block is retained. Specifically, the function ∆(s, s ) is taken to be 1 if s and s belong to the same block and 0 otherwise. The approximated correlation function ρ † (s, s ) in Equation 6 provides an exact recovery of the true correlation within each block, and the approximation errors are ρ(s, s ) − ρ l (s, s ) for locations s and s in different blocks. Those errors are expected to be small for most entries because most of these location pairs are farther apart. To determine the blocks, we first use the R function cover.design to choose B ≤ m locations among the m locations forming B blocks, then assign each s i to the block that is closest to s i . Here B does not need to be equal to K. When B = 1, no approximation is applied to the correlation ρ. When B = m, it reduces to the approach of Finley et al. (2009), so the local residual spatial dependence may not be well captured.
Applying the above FSA approach to approximate the correlation function ρ(s, s ), we can approximate the correlation matrix R with where Here, the notation "•" represents the element-wise matrix multiplication. To avoid numerical instability, we add a small nugget effect = 10 −10 when defining R, that is, R = (1 − )ρ mm + I m . It follows from Equation 7 that R can be approximated by Applying the Sherman-Woodbury-Morrison formula for inverse matrices, we can approximate R −1 by In addition, the determinant of R can be approximated by Since the m × m matrix R s is a block matrix, the right-hand sides of Equations 8 and 9 involve only inverses and determinants of K × K low-rank matrices and m × m block diagonal matrices. Thus the computational complexity can be greatly reduced relative to the expensive computational cost of using original correlation function for large value of m. However, for small m, e.g., m < 300, the FSA is usually slower than direct inverse of R due to the complexity of FSA's implementation. Note that K and B can be specified via prior$K and prior$B, respectively.

MCMC
The likelihood function for (w L , θ, β, v) is given by MCMC is carried out through an empirical Bayes approach (Carlin and Louis 2010) coupled with adaptive Metropolis samplers (Haario, Saksman, and Tamminen 2001). Recall that w j = 1/L implies the underlying parametric model with S 0 (t) = S θ (t). Thus, the parametric model provides good starting values for the TBP survival model. Letθ andβ denote the parametric estimates of θ and β, e.g., maximum likelihood estimates, and letV andŜ denote their estimated covariance matrices, respectively. Set z L−1 = (z 1 , . . . , z L−1 ) with z j = log(w j ) − log(w L ). The β, θ, z L−1 , α and φ are all updated using adaptive Metropolis samplers, where the initial proposal variance isŜ for β,V for θ, 0.16I L−1 for z L−1 and 0.16 for α and φ. Each frailty term v i is updated via Metropolis-Hastings, with proposal variance as the conditional prior variance of v i |{v j } j =i ; τ −2 is updated via a Gibbs step from its full conditional. A complete description and derivation of the updating steps are available in .
The function survregbayes sets the following hyperparameters as defaults: β 0 = 0, S 0 = 10 10 I p , θ 0 =θ, V 0 = 10V, a 0 = b 0 = 1, and a τ = b τ = .001. Although the default Γ(0.001, 0.001) prior on τ 2 has been tested to perform well across various simulation scenarios , it still should be used with caution in practice; see Gelman (2006) for general suggestions. In addition, we assume a somewhat informative prior on θ to obviate confounding between θ and w L . For the GRF prior, we set a φ = 2 and b φ = (a φ − 1)/φ 0 so that the prior of φ has mode at φ 0 and the prior mean of 1/φ is 1/φ 0 with infinite variance.

Model diagnostics and comparison
For model diagnostics, we consider a general residual of Cox and Snell (1968), defined as , r(t ij ) has a standard exponential distribution. If the model is "correct," and under the arbitrary censoring, the pairs {r(a ij ), r(b ij )} are approximately a random arbitrarily censored sample from an Exp(1) distribution, and the estimated (Turnbull 1974) integrated hazard plot should be approximately straight with slope 1. Uncertainty in the plot is assessed through several cumulative hazards based on a random posterior sample from [β, θ, w L , v|D]. Note that conditional on frailties, the Cox-Snell residuals considered here are still independent. This is in contrast to typical Cox-Snell plots which only use point estimates yieding dependent residuals under frailty models.
For model comparison, we consider three popular model choice criteria: the deviance information criterion (DIC) (Spiegelhalter, Best, Carlin, and Van Der Linde 2002), the log pseudo marginal likelihood (LPML) (Geisser and Eddy 1979), and the Watanabe-Akaike information criterion (WAIC) Watanabe (2010), where DIC (smaller is better) places emphasis on the relative quality of model fitting, and LPML (larger is better) and WAIC (smaller is better) focus on the predictive performance. These criteria are readily computed from the MCMC output; see  for more details.

Leukemia survival data
A dataset on the survival of acute myeloid leukemia in n = 1, 043 patients (Henderson et al. 2002) is considered, named as LeukSurv in the package. It is of interest to investigate possible spatial variation in survival after accounting for known subject-specific prognostic factors, which include age, sex, white blood cell count (wbc) at diagnosis, and the Townsend score (tpi) for which higher values indicates less affluent areas. Both exact residential locations of all patients and their administrative districts (the boundary file is named as nwengland.bnd in the package) are available, so we can fit both geostatistical and areal models.

PO model with ICAR frailties
If the IID or ICAR frailties are considered, to easily identify the correspondence between frailties and clusters/regions, we program the function survregbayes so that the input dataset should be sorted by the cluster variable before any use. The following code is used to sort the dataset by district and obtain the adjacency matrix E.
The output from applying the summary function to the returned object res1 is given below.
The following code is used to produce trace plots (Figure 1) for β and τ 2 . Note that the mixing for τ 2 is not very satisfactory. This is not surprising, since we are using very vague gamma prior Γ(0.001, 0.001) and the total number of districts is only 24. One may consider to use a more informative prior Γ(1, 1) on τ 2 or run a longer chain with higher thin to improve the mixing.

PO model with GRF frailties
Note that all coordinates are distinct, so we have m = 1043 and n i = 1 in terms of our notation. To use frailtyprior to specify the prior, we need to create an ID variable consisting of 1043 distinct values. The powered exponential correlation function with ν = 1 is used. To specify the number of knots and blocks for the FSA of R, we consider K = 100 and B = 1043. The code below is used to fit a PO model with GRF frailties under above settings. The running time is a bit under three hours.

Variable selection
Let x = (x 1 , . . . , x p ) denote the p-vector of covariates in general. The most direct approach is to multiply β by a latent Bernoulli variable γ for = 1, . . . , p, where γ = 1 indicates the presence of covariate x in the model, and then assume an appropriate prior on (β, γ), where γ = (γ 1 , . . . , γ p ) . Following Kuo and Mallick (1998)  (2014), we consider below independent priors γ 1 , . . . , γ p iid ∼ Bern(0.5) and β ∼ N p (0, gn(X X) −1 ), where X is the usual design matrix, but with mean-centered covariates, i.e., 1 n X = 0 p , and g is chosen by picking a number M such that a random e x β is less than M with probability q, i.e., approximately g = log M /Φ −1 (q) 2 /p. The function survregbayes sets M = 10 and q = 0.9 as the defaults. For other choices, one can specify M and q via prior$M and prior$q, respectively. The MCMC procedure is described in .
To perform variable selection for the leukemia survival data, we simply need to add the argument selection=TRUE to the function survregbayes. A part of the output from summary is also shown. The model with age, wbc and tpi has the highest proportion (89.8%), and thus can be served as the final model.

Parametric vs. semiparametric
Many authors have found parametric models to fit as well or better than competing semiparametric models (Cox and Oakes 1984, p. 123;Nardi and Schemper 2003). The semiparametric -or more accurately richly parametric -formulation of the AFT, PH and PO models presented here have their baseline survival functions centered at a parametric family S θ (t). Note that z J−1 = 0 implies S 0 (t) = S θ (t). Therefore, testing H 0 : z J−1 = 0 versus H 1 : z J−1 = 0 leads to the comparison of the semiparametric model with the underlying parametric model. Let BF 10 be the Bayes factor between H 1 and H 0 .  proposed to estimate BF 10 by a large-sample approximation to the generalized Savage-Dickey density ratio (Verdinelli and Wasserman 1995). Adapting their approach BF 10 is estimated where p(0|α) = Γ(αJ)/[J α Γ(α)] J is the prior density of z J−1 evaluated at z J−1 = 0,α is the posterior mean of α, N p (·; m, Σ) denotes a p-variable normal density with mean m and covariance Σ, andm andΣ are posterior mean and covariance of z J−1 .
The Bayes factor BF 10 under the semiparametric PO model with ICAR frailties can be obtained using the code below (here the object res1 is obtained in Section 2.4).
The function survregbayes also supports the efficient fitting of parametric frailty models with loglogistic, lognormal or Weibull baseline functions. In parametric models, the prior for θ can be set to be relatively vague. Setting a 0 at any negative value will force the α to be fixed at the value specified in the argument state. For example, setting prior <-list(a0 = -1) and state = list(alpha = 1) will fix α = 1 throughout the MCMC; setting prior = list(a0 = -1) and state = list(alpha = Inf) will fit a parametric model. The following code fits a parametric loglogistic PO model with ICAR frailties to the leukemia survival data. The LPML is -5950, much worse than the value under the semiparametric PO model.

Left-truncation and time-dependent covariates
The survival time t ij is left-truncated at u ij ≥ 0 if u ij is the time when the ijth subject is first observed. Left-truncation often occurs when age is used as the time scale. Given the observed left-truncated data {(u ij , a ij , b ij , x ij , s i )}, where a ij ≥ u ij , the likelihood function in Equation 10 becomes Note that the left censored data under left-truncation are of the form (u ij , b ij ). Allowing for left-truncation allows the semiparametric AFT, PH and PO models to be easily extended to handle time-dependent covariates. Following Kneib (2006) and Hanson, Johnson, and Laud (2009), assume the covariate vector x ij (t) is a step function that changes at o ij ordered times t ij,1 < . . . < t ij,o ij ≤ a ij , i.e., where t ij,1 = u ij and t ij,o ij +1 = ∞. Assuming one of PH, PO, or AFT holds conditionally on each interval, the survival function for the ijth individual at time a ij is Thus one can replace the observation (u ij , a ij , b ij , x ij (t), s i ) by a set of new o ij observations (t ij,1 , t ij,2 , ∞, x ij,1 , s i ), (t ij,2 , t ij,3 , ∞, x ij,2 , s i ), . . ., (t ij,o ij , a ij , b ij , x ij,o ij , s i ). This way we get a new left-truncated data set of size m i=1 n i j=1 o ij . Then the likelihood function becomes Note that the derivations above still hold for time-dependent covariates without left-truncation (i.e., u ij = 0 for all i and j).

PBC data
We use the primary biliary cirrhosis (PBC) dataset (available in the package survival as pbc) as an example to show how to incorporate time-dependent covariates in the function survregbayes. Although this is not a spatial dataset, spatial frailties can be added similarly as in Section 2.4. The following code is copied from Therneau, Crowson, and Atkinson (2017) to create the data frame with time-dependent covariates.
The end time point tstop together with endpt are formulated as interval censored data using type = "interval2" of Surv. This format is more general than the former one, as one can easily incorporate interval censored data.

The model
The generalized accelerated failure time (GAFT) frailty model ) generalizes the AFT model in Equation 1 to allow the baseline survival function S 0 (t) to depend on certain covariates, say a q-dimensional vector z ij which is usually a subset of x ij . Specifically, the GAFT frailty model is given by or equivalently, wherex ij = (1, x ij ) includes an intercept,β = (β 0 , β ) is a vector of corresponding coefficients, ij is a heteroscedastic error term independent of v i , and P (e β 0 + ij > t|z ij ) = S 0,z ij (t). Note the regression coefficients β here are defined differently with those in Equation 1. Here we assume where G z is a probability measure defined on R for every z ∈ X ; this defines a model for the entire collection of probability measures G X = {G z : z ∈ X } so that each element is allowed to smoothly change with the covariates z. The frailtyGAFT function considers the following prior distributions:β where LDTFP L refers to the linear dependent tailfree process prior (LDTFP) prior as described in . The function argument prior allows users to specify these prior parameters in a list with elements defined as follows: The LDTFP prior considered in  is centered at a normal distribution Φ σ with mean 0 and variance σ 2 , that is, E(G z ) = Φ σ for every z ∈ X . Define the function , where x is the ceiling function, the smallest integer greater than or equal to x. Further define probability p z (k) for k = 1, . . . , 2 L as where Y j+1,2k−1 (z) = 1 + exp{−z γ j,k } −1 and Y j+1,2k (z) = 1 − Y j+1,2k−1 (z) for j = 0, . . . , L−1, k = 1, . . . , 2 j , wherez = (1, z ) includes an intercept, and γ j,k = (γ j,k,0 , . . . , γ j,k,q ) is a vector of coefficients. Note there are 2 L − 1 regression coefficient vectors γ = {γ j,k }, e.g., for L = 3, γ = {γ 0,1 , γ 1,1 , γ 1,2 , γ 2,1 , γ 2,2 , γ 2,3 , γ 2,4 }. For a fixed integer L > 0, the random density associated with LDTFP L (α, σ 2 ) is defined as where Z is the n × (q + 1) design matrix with mean-centered covariatesz ij s. Furthermore, the LDTFP is specified by setting γ 0,1 ≡ 0, such that for every z ∈ X , G z is almost surely a median-zero probability measure.
The function frailtyGAFT sets the following hyperparameters as defaults: m 0 = 0, S 0 = 10 5 I p+1 , a 0 = b 0 = 1, a τ = b τ = 1, and a σ = 2 +σ 4 0 /(100v 0 ), b σ =σ 2 0 (a σ − 1), wherê σ 2 0 andv 0 are the estimates of σ 2 and its asymptotic variance from fitting the parametric lognormal AFT model, respectively. Note here we assume a somewhat informative prior on σ 2 so that its mean isσ 2 0 and variance is 100v 0 . For the GRF prior, we again set a φ = 2 and b φ = (a φ − 1)/φ 0 so that the prior of φ has mode at φ 0 and the prior mean of 1/φ is 1/φ 0 with infinite variance. Here φ 0 satisfies ρ(s , s ; φ 0 ) = 0.001, where s − s = max ij s i − s j . Note by default frailtyGAFT standardizes each covariate by subtracting the sample mean and dividing the sample standard deviation. Therefore, the user-specified hyperparameters should be based on the model with scaled covariates unless the argument scale.designX = FALSE is added.
Suppose we wish to test H 0 : Υ j = 0 versus H 1 : Υ j = 0, for fixed j ∈ {1, . . . , q}. Following , the Bayes factor between hypotheses H 1 and H 0 can be approximated bŷ where N p (·; m, S) denotes a p-variate normal density with mean m and covariance matrix S, andm j andŜ j are the sample mean and covariance for Υ j .

Leukemia survival data
The code below is used to fit the GAFT model with ICAR frailties for the leukemia survival data. As suggested by , the gamma prior Γ(a 0 = 5, b 0 = 1) is used for α. We include all four covariates in modeling the baseline survival function.
R> set.seed(1) R> mcmc <-list(nburn = 5000, nsave = 2000, nskip = 4, ndisplay = 1000) R> prior <-list(maxL = 4, a0 = 5, b0 = 1) R> ptm <-proc.time() R> res1 <-frailtyGAFT(formula = Surv(time, cens)~age + sex + wbc + tpi + + baseline(age, sex, wbc, tpi) + frailtyprior("car", district), + data = d, mcmc = mcmc, prior = prior, Proximity = E) R> (sfit1 <-summary(res1)) ## Output below is partial   The Bayes factors for testing age and wbc effects on LDTFP are 16 and 28, respectively, indicating that the baseline survival function under the AFT model depends on age and wbc, and thus GAFT should be considered. The trace plots, survival curves and frailty map ( Figure 5) can be obtained using the code similarly as in Section 2.4. The only difference for plotting survival curves is that we need to specify the baseline covariates by including the argument xtfnewdata = xpred into the plot function. Note that the mixing for covariate effects is okay but not great due to the non-smoothness of Polya trees. In this case, we need to run a longer chain with much higher thinning as suggested in .

Survival models via spatial copulas
In environmental studies, survival times (e.g. time to water pollution) often present a strong spatial dependence after adjusting for available risk factors, making frailty models extremely difficult to fit because of the strong posterior dependency among frailties. The spatial copula approach (Bárdossy 2006) offers an appealing way to describe spatial dependence among survival times separately from their univariate distributions, thus leads to more efficient posterior sampling algorithms. In addition, the regression coefficients have population-level interpretations under copula models. However, the copula approach can be very slow in the presence of high censoring rate due to the imputation of centered survival times.
Currently the package only supports spatial copula models for georeferenced (without replication, i.e., n i = 1), right-censored spatial data. Suppose subjects are observed at n distinct spatial locations s 1 , . . . , s n . Let t i be a random event time associated with the subject at s i and x i be a related p-dimensional vector of covariates, i = 1, . . . , n. For right-censored data, we only observe t o i and a censoring indicator δ i for each subject, where δ i equals 1 if t o i = t i and equals 0 if t i is censored at t o i . Therefore, the observed data will be D = {(t o i , δ i , x i , s i ); i = 1, . . . , n}. Note although the models below are developed for spatial survival data, non-spatial data are also accommodated.
In the context of survival models, the idea of spatial copula approach is to first assume that the survival time t i at location s i marginally follows a model S x i (t), then model the joint distribution of (t 1 , . . . , t n ) as P (t 1 ≤ a 1 , . . . , t n ≤ a n ) = C(F x 1 (a 1 ), . . . , F xn (a n )), where F x i (t) = 1 − S x i (t) is the cumulative distribution function and the function C is an n-copula used to capture spatial dependence.
The current package assumes a spatial version of the Gaussian copula (Li 2010), defined as where Φ n (·, . . . , ·; R) denotes the distribution function of N n (0, R). To allow for a nugget effect, we consider R[i, j] = θ 1 ρ(s i , s j ; θ 2 )+(1−θ 1 )I(s i = s j ), where ρ(s i , s j ; θ 2 ) = exp{−θ 2 s i − s j }. Here θ 1 ∈ [0, 1], also known as a "partial sill" in Waller and Gotway (2004), is a scale parameter measuring a local maximum correlation, and θ 2 controls the spatial decay over distance. Note that all the diagonal elements of R are ones, so it is also a correlation matrix. Under the above spatial Gaussian copula, the likelihood function based on upon the complete data {(t i , x i , s i ), i = 1, . . . , n} is is the density function corresponding to S x i (t). We next discuss two marginal spatial survival models for S x i (t) that are accommodated in the package. Note that for large n, the FSA introduced in Section 2.1 (with replaced by 1 − θ 1 ) can be applied.

Proportional hazards model via spatial copulas
Assume that t i |x i marginally follows the proportional hazards (PH) model with cdf and density where β is a p × 1 vector of regression coefficients, λ 0 (t) is the baseline hazard function and Λ 0 (t) = t 0 λ 0 (s)ds is the cumulative baseline hazard function. The piecewise exponential model provides a flexible framework to deal with the baseline hazard (e.g., Walker and Mallick 1997). We partition the time period R + into M intervals, say I k = (d k−1 , d k ], k = 1, . . . , M , where d 0 = 0 and d M = ∞. Specifically, we set d k to be the k M th quantile of the empirical distribution of the observed survival times for k = 1, . . . , M − 1. The baseline hazard is then assumed to be constant within each interval, i.e., where h k s are unknown hazard values. Consequently, the cumulative baseline hazard function can be written as where M (t) = min{k : d k ≥ t} and ∆ k (t) = min{d k , t} − d k−1 . After incorporating spatial dependence via the copula in Equation 12, the spCopulaCoxph function considers the following prior distributions: The spCopulaCoxph function sets the following default hyperparameter values: M = 10, r 0 = 1, h =ĥ, β 0 = 0, S 0 = 10 5 I p , θ 0 = (θ 1a , θ 1b , θ 2a , θ 2b ) = (1, 1, 1, 1), whereĥ is the maximum likelihood estimate of the rate parameter from fitting an exponential PH model. A function indeptCoxph is also provided to fit the non-spatial standard PH model with above baseline and prior settings. The function argument prior allows users to specify these prior parameters in a list with elements defined as follows:

Bayesian nonparametric survival model via spatial copulas
We assume that y i = log t i given x i marginally follows a LDDPM model (De Iorio et al. 2009) with cdf, where Φ(·) is the cdf of the standard normal, and G follows the Dirichlet Process (DP) prior. This Bayesian nonparametric model treats the conditional distribution F x as a function-valued parameter and allows its variance, skewness, modality and other features to flexibly vary with the x covariates. After incorporating spatial dependence via the copula in Equation 12, the function spCopulaDDP assumes the following prior distributions: The following default hyperparameters are considered in spCopulaDDP: 1, 1, 1), m 0 =β, S 0 =Σ, Σ 0 = 30Σ, and κ 0 = 7, wherê β andσ 2 are the maximum likelihood estimates of β and σ 2 from fitting the log-normal accelerated failure time model log(t i ) = x i β + σ i , i ∼ N (0, 1), andΣ is the asymptotic covariance estimate forβ. A function anovaDDP is also provided to fit the non-spatial LDDPM model in Equation 14 with above prior settings. The function argument prior allows users to specify these prior parameters in a list with elements defined as follows: element N a0 b0 m0 S0 k0 Sig0 theta0 symbol N a 0 b 0 m 0 S 0 κ 0 Σ 0 θ 0

Leukemia survival data
PH model with spatial copula The following code is used to fit the piecewise exponential PH model in Equation 13 with the Gaussian spatial copula in Equation 12 using M = 20 and default priors. We consider K = 100 and B = 1043 for the number of knots and blocks in the FSA of R. The total running time is 15445 seconds.

LDDPM model with spatial copula
The following code is used to fit the LDDPM model in Equation 14 with the Gaussian spatial copula in Equation 12 using N = 10 and default priors. For the FSA, K = 100 and B = 1043 are used. The total running time is 20056 seconds. Note there is no summary output as before, as we are fitting a nonparametric model. The trace plots, survival curves, and map of z i s can be obtained using the same code used for the PH copula model.

Conclusions
There is a wealth of R packages for non-spatial survival data, starting with survival, included with all base installs of R. The survival package fits (discretely) stratified semiparametric PH models to right-censored data with exchangeable gamma frailties, as well as left-truncated data, time-dependent covariates, etc. Parametric log-logistic, Weibull and log-normal AFT models can also be fit by this package. From there, there are many packages for various models and types of censoring; a partial review discussing several available R packages is given by Zhou and Hanson (2015); also see . In comparison there are very few R packages for spatially correlated survival data, with the notable exceptions of R2BayesX and spatsurv, both of which focus on PH exclusively. The spBayesSurv package allows the routine fitting of several popular semiparametric and nonparametric models to spatial survival data.
spBayesSurv can also handle non-spatial survival data using either exchangeable Gaussian or no frailty models. Another unintroduced function is survregbayes2 which implements the Polya tree based PH, PO, and AFT models of Hanson (2006) and Zhao, Hanson, and Carlin (2009) for areally-referenced data. As pointed out in these papers, MCMC mixing for Polya tree models can be highly problematic when the true baseline survival function is very different from the parametric family that centers the Polya tree; the TBP prior provides much improved MCMC mixing with essentially the same quality of fit as Polya trees. Another function very recently added function is SuperSurvRegBayes, which provides Bayes factors for testing among PO, PH, and AFT, as well as three other survival models Zhang, Hanson, and Zhou (2018).
Future additions to spBayesSurv include spatial copula (both georeferenced and areal) versions of the PH, PO, and AFT models using TBP priors, as well as continuously-stratified proportional hazards and proportional odds models. An extension of all semiparametric models to additive linear structure, which is already incorporated into BayesX, is also planned. Finally, computational efficiency can be gained by replacing some of the adaptive MCMC updates with gradient-based updates for the semiparametric models, e.g. the IWLS updates implemented in BayesX for the PH model (Hennerfeind et al. 2006).