spatsurv : An R Package for Bayesian Inference with Spatial Survival Models

Survival methods are used for the statistical modelling of time-to-event data. Survival data are characterized by a set of complete records, in which the time of the event is known; and a set of censored records, in which the event was known to have occurred in an interval. When survival data are spatially referenced, the spatial variation in survival times may be of scientiﬁc interest. In this article, we introduce a new R package, spatsurv , for inference with spatially referenced survival data. The speciﬁc type of model ﬁtted by this package is a parametric proportional hazards model in which the spatially correlated frailties are modelled by a log-Gaussian stochastic process. The package is extensible in that it allows the user to easily create new models for the baseline hazard function and spatial covariance function. The package implements an advanced adaptive Markov chain Monte Carlo algorithm to deliver Bayesian inference with minimal input from the user. A particular feature of the new package is the ability to handle large datasets via the use of auxiliary frailties on a regular grid and the technique of circulant embedding for fast matrix computations. We demonstrate the new package on a real-life dataset.


Introduction
Statistical methods for the analysis of survival data are not only applicable in the medical context, but also in many other areas of science and engineering. When survival times are spatially-referenced, some evidence of clustering of high or low times might be apparent on a visual inspection of the data. The question naturally arises as to whether these observed spatial survival patterns can be explained by incorporating appropriate covariates into the model or whether, in order to obtain reliable inferences for model parameters of interest, it is necessary to explicitly model the unexplained spatial variation. This article introduces the spatsurv package for Bayesian inference with spatially referenced survival data, which is available from the Comprehensive R Archive Network at https://CRAN.R-project.org/ package=spatsurv (R Core Team 2017). Methodological developments in the analysis of spatial survival data have included the following. In Henderson, Shimakura, and Gorst (2002), the authors use a proportional hazards (PH) framework based on Gamma frailties and the Breslow estimate of the hazard function (Breslow 1974) to model the survival times of individuals diagnosed with leukemia in the north west of England. In Li and Ryan (2002), the authors propose the use of log-Gaussian frailties in a semi-parametric framework. In Diva, Dey, and Banerjee (2008), the authors consider both PH and proportional odds (PO) models for the SEER (surveillance, epidemiology and end results) cancer data (Howlader, Noone, Krapcho, Garshell, Neyman, Altekruse, Kosary, Yu, Ruhl, Tatalovich, Cho, Mariotto, DR, and EJ 2013); the baseline hazard function of their parametric PH model is based on the Weibull survival model. In each of the above cases, the authors used a Markov chain Monte Carlo (MCMC) algorithm to produce samples from a posterior, enabling them to perform Bayesian inference for each class of models. Li and Ryan (2002) also investigated the Laplace approximation as a means of approximate posterior inference.
To our knowledge, there are few R packages for dealing with spatial survival data. One recent addition to CRAN is the package spBayesSurv (Zhou and Hanson 2016), which implements functions for fitting several Bayesian survival models including spatial copula linear dependent Dirichlet process mixture models, anova Dirichlet process mixtures and a marginal spatial proportional hazards model. The R package INLA (Rue, Martino, and Chopin 2009;Lindgren, Rue, and Lindström 2011) can also be used to fit survival models. Our package spatsurv uses parametric models for the baseline hazard function and correlated log-Gaussian frailties to model spatial dependence. We have chosen to implement Markov chain Monte Carlo (MCMC) inferential algorithms because although they are typically slower than approximate methods, based on the Laplace approximation for example, they in principle deliver unbiased, joint inference for all model parameters and are relatively easily extensible to wider model classes with additional hierarchies.
The software package BayesX (Belitz, Brezger, Klein, Kneib, Lang, and Umlauf 2015;Adler, Kneib, Lang, Umlauf, and Zeileis 2013) and associated interface to R, R2BayesX (Umlauf, Adler, Kneib, Lang, and Zeileis 2015) also fits spatial survival models. BayesX comprises a set of tuned C++ functions for the implementation of structured additive regression models (STAR), including a flexible parametric survival model using penalized splines to model the baseline hazard. STAR models offer a flexible parametrization to the linear predictor, and a particular advantage of the BayesX package in the context of the present article is its capability for handling time-varying coefficients, which is not currently implemented in the spatsurv package. The package BayesX also has facilities for handling large datasets through reduced rank methodology (Kammann and Wand 2003;Hennerfeind, Brezger, and Fahrmeir 2006). As Park and Liang (2012) state, while methods using a low-dimensional approximation to the latent Gaussian process are indeed effective at reducing the computation time, the dimension of the approximation can still be very high for large datasets, meaning these methods will not work well. There are also other issues with low-rank methods including the question of how to choose an appropriate rank, the specification of priors and the physical interpretation of spatial dependence.
In contrast, the package spatsurv implements a new methodology for handling large spatial datasets that (i) delivers O(n) inference for spatial survival models, where n is the number of observations and (ii) simultaneously solves the spatial prediction problem on a fine regular grid at a cost of O(m log m), where m is the number of prediction points; both of which would incur cubic complexity using the standard method, see Section 4 for an example and (Taylor 2015) for further details. The package spatsurv implements a combination of advanced adaptive MCMC techniques that require little input from the user and Fourier methods for the fast computation of matrix operations. To illustrate the massive reduction in computation time using the proposed method, we estimate that for the real example in Section 4 (6708 observations and 16384 prediction points), the standard O(n 3 ) method would take approximately five months to analyze, while our method can be run in little over three hours; moreover, the standard method would incur a massive additional cost in order to predict the spatial process at unsampled locations, whereas with our method, these are a by-product of the sampling scheme, and moreover the user has access to an exact sample from their joint posterior density. Our method is full rank and has the advantage that spatial dependence can be specified through a covariance function, which has interpretational benefits. Lastly, our package is written fully in the R programming language, so from the point of view of extensibility, users without specialist knowledge of another programming language will be able to use and modify spatsurv code more easily; examples of this are in Appendices A and B. To the authors' knowledge, this article is the first published walk-through analysis of a spatial survival dataset in the R programming language.
The article is organized as follows. In Section 2, we provide a general introduction to survival models, spatial survival models, the form of the likelihood function and Bayesian statistical inference. In Section 3 we give a tour of spatsurv functionality including the preliminaries: formatting data, simulation of spatial survival data, plotting, specifying priors and running the MCMC and also post-processing: MCMC diagnostics, plotting the posterior baseline hazard/cumulative hazard, plotting the posterior covariance function, prediction and the computation of Monte Carlo expectations over the joint posterior. The article concludes with a discussion in Section 6, before detailing some of the more technical aspects in the appendices including user-defined covariance functions in Appendix A and user-defined baseline hazard functions in Appendix B.

Survival models
In this section we give a brief overview of the analysis of survival data. We will make frequent reference to medical data, but the reader should note that with the appropriate translation of scenarios and terminology, what is to follow is applicable in duration modelling or reliability analysis etc.
Survival analysis (Cox and Oakes 1984;Klein and Moeschberger 2003;Klein, Ibrahim, Scheike, Van Houwelingen, and Van Houwelingen 2013) is concerned with the modelling and prediction of time-to-event data: for each individual, we observe the time of an event. The defining characteristic of survival data relates to the concept of censoring: we do not observe the survival time of all individuals, rather we might only know that a certain individual was alive the last time they were seen by a doctor, for example. The observed time might therefore correspond to an event (e.g., death or disease progression), or it might correspond to the last time the patient was recorded as alive. In general, the situation is a little more complex than this example, which is known as right censoring: sometimes our data are left censored, meaning the the event happened prior to the patient's first examination by the doctor; or interval censored, where we know that the event happened somewhere in an interval of time, between two visits to the doctor for example. In this article, we make the common assumption that the censoring process occurs independently of the process governing the survival times.
Let T be a random variable, the occurrence time of an event of interest. The three (interrelated) quantities of main interest in survival analysis are the density function, which gives the probability density function of survival times, denoted f (t); the survival function, which gives the probability that an event happened after some time t, S(t) = P(T > t); and the hazard function, which is the instantaneous failure rate at some time t conditional on survival up to that time, Of additional interest is the cumulative hazard, H(t) = t 0 h(s)ds and lifetime distribution function, F (t) = 1 − S(t). The following relationships hold, (1)

The spatial parametric proportional hazards model
In this section, we introduce the spatial parametric proportional hazards model for which inference is implemented in the package spatsurv.
In this article, we will concentrate on the case where the hazard function takes the following form: where t i is the observed time for the ith individual, X i is a vector of covariates for the ith observation and the parameters of this model, ψ = (β, ω, η), correspond respectively to the covariate effects, the parameters of the baseline hazard and the parameters of the covariance function of a spatially continuous stationary latent Gaussian field Y , of which Y i is the value of the field at the location of observation i. We will parameterize the latent field Y in such a way that E[exp(Y )] = 1. In the case of a Gaussian field whose marginal variance is σ 2 , this is easily achieved by setting the mean of Y to be −σ 2 /2. We do this partly to avoid identifiability issues, but also to give a direct and useful interpretation for exp(Y ), as a multiplicative scaling on the hazard function.
Examples of a suitable spatial covariance function might be the Matérn or Exponential models. The latter specifies the covariance between any two objects of distance d apart to be σ 2 exp{−d/φ}, where σ 2 is the marginal variance of the field and φ is the 'spatial decay' parameter (the larger φ, the longer the range of spatial dependence there is in the field).
We now give two examples of baseline hazard functions derived from parametric survival models. The baseline hazard function derived from the exponential survival model has form, The parameter ω = θ is called the rate parameter. The baseline hazard function derived from the Weibull survival model has form, this is the parametrization used by the package spatsurv, i.e., ω = (α, λ). Note that the standard R functions dweibull, rweibull etc., use a different parametrization.
The density function, hazard function, survival function and cumulative hazard function all depend on the parameters ξ, hence we use the notation f (t; ξ i ), h(t; ξ i ), S(t; ξ i ) and H(t; ξ i ) to represent these functions respectively. The density function for a parametric PH spatial survival model can be derived using the right-hand expression in Equation 1 and expanding definitions: this is the density function for the ith individual. It is straightforward to see that,

The likelihood for a survival model
Suppose we have n observations and again let t i be the data from the ith individual, represented as a vector, i ), in the case of an interval censored observation. We use likelihood-based Bayesian inference to describe what information the data provide about the parameters, ξ = {ψ, Y 1 , . . . , Y n }, and functions of these parameters. Conditional on Y 1 , . . . , Y n , we will assume that observations are independent. The likelihood in this case splits into contributing components from left, right and interval censored data as well as uncensored data:

Inference
The package spatsurv uses a Markov chain Monte Carlo algorithm to perform Bayesian inference for the parametric proportional hazards model. The idea is to use MCMC to draw samples from the posterior: π(ξ|data) ∝ π(data|ξ)π(ξ), whence we can perform Monte-Carlo inference for E π(ξ|data) [g(ξ)], for functions g for which these expectations exist. MCMC provides an extremely flexible and intuitive basis on which to perform inference, allowing us to easily generate plots showing quantiles of quantities (such as the hazard, baseline hazard, survival function) of interest.
The package spatsurv implements an advanced adaptive MCMC algorithm fully described in Taylor (2015); an abbreviated description can be found in Appendix C. It remains for the user to specify their prior distributions on the parameters ψ: the MCMC algorithm works with a transformed version of Y , for which there already exists a sensible prior, again see Taylor (2015) for details.

Introducing the package spatsurv
In this section we give a tour of the main functions in the package spatsurv.

Formatting data
In encoding survival data, the package spatsurv makes use of the Surv class of objects from the survival package (Therneau 2015;Therneau and Grambsch 2000). Specifically, the type of data must be "right", "left" or "interval", though negative survival times are not allowed. In a Surv object of type = "interval", we adopt the convention that times in the first column of the object are the event or censoring times for uncensored, right-censored or left-censored observations: the second column is only used for storing the right endpoints of interval-censored observations. An example is given below.

Simulating data
We first simulate some data from a spatial parametric PH model using MCMC. Note that simulating spatial survival data using MCMC is much faster than analyzing a set of spatial survival data; the simulations here took around a minute on a desktop PC.
We begin by creating a number of objects in which to store the various model parameters.
An alternative way of choosing a spatial covariance function is given through the covmodel function, which calls CovarianceFct from the package RandomFields (Schlather, Malinowski, Menck, Oesting, and Strokorb 2015), e.g., a Matérn covariance function with roughness parameter 1, can be obtained with covmodel("matern", pars = 1) in place of ExponentialCovFct(). Note that for covariance functions defined using the function covmodel, any additional parameters (in this case the roughness parameter) are not estimated, rather they are fixed (in this case at 1).

Code
Name The call weibullHaz() returns a list inheriting class "basehazardspec" which provides functions to evaluate the baseline hazard and its derivatives. Details of the built-in baseline hazard models are given in Table 1.
The package spatsurv also provides an extensible base on which to build new models for the baseline hazard and spatial covariance functions, see Appendix A and Appendix B for details on how to do this.
The function simsurv is used for simulating spatial parametric proportional hazards data. This function requires the user to supply a design matrix, X, together with a column vector of covariate effects, beta; in conjunction with the chosen hazard and covariance functions and their parameters as defined above. These inputs enable us to construct the hazard, cumulative hazard, density function etc., and hence we can simulate a set of survival times from the PH model again using MCMC. By default, the spatial location of the observations will be the unit square; this behavior can be changed by specifying the coords argument explicitly. The dataset we generate below corresponds to 300 individuals between the age of 5 and 50; the probability of each individual being male (X[, "sex"] == 1) is 0.5 and the probability that the individual has cancer (X[, "cancer"] == 1) is 1/5.
R> dat <-simsurv(X = cbind(age = runif(n, 5, 50), sex = rbinom(n, 1, 0.5), + cancer = rbinom(n, 1, 0.2)), + beta = c(0.0296, 0.0261, 0.035), dist = DIST, omega = OMEGA, + cov.parameters = COVPARS, cov.model = COVMODEL, + mcmc.control = mcmcpars(nits = 110000, burn = 10000, thin = 100)) In the call above, the MCMC chain is run for 110000 iterations with a 10000 iteration burn-in, retaining every 100th sample, this is set using the mcmc.control argument. The function simsurv uses an exponentially-distributed independence proposal kernel to produce samples from the target density: the density function for a spatial parametric PH model. The acceptance probability for this scheme is printed at the end of the run, values close to 1 are optimal. Note that at this stage, there is no censoring. Since the candidate survival times have been generated using MCMC, before proceeding, the user should check that the chains have converged and are mixing sufficiently well. The chains are stored in dat$T. We recommend using traceplots and the lag 1 autocorrelation to check for mixing and convergence: R> plot(apply(dat$T, 2, function(x) acf(x, plot = FALSE)$acf[2]), + xlab = "Subject Index", ylab = "Lag 1 autocorrelation") The first line gives an example trace plot for the 178th individual and the second produces a plot of the lag-1 autocorrelation for each chain; these plots are shown in Figure 1. Strictly the user should check mixing and convergence in all chains, but for large n, a practical solution would be to check 10-20 random traceplots and then use the plot of lag-1 autocorrelations to justify the sample from the posterior being reasonable; in the latter, if most of the points plotted are in the range −0.05 to 0.05, then this is a very good sample.
Having checked for good mixing and convergence, we can proceed to extract the simulated survival times (data$survtimes), generate some candidate right-censoring times independently of the survival times, censtimes, and finally using the function gencens, produce an object of class Surv that can be used in the analysis: R> survtimes <-dat$survtimes R> censtimes <-rexp(n, 1 / mean(survtimes)) R> survdat <-gencens(survtimes, censtimes) By default, the function gencens simply compares each survival time with the candidate censoring time and declares the observation as censored if the censoring time is less than the simulated survival time.
Lastly, we construct an appropriate object to store the spatial survival data: an sp object (Pebesma and Bivand 2005;Bivand, Pebesma, and Gomez-Rubio 2013) of class SpatialPointsDataFrame.

Plotting data
The package spatsurv includes a basic visualization tool for spatially referenced survival data.
R> plotsurv(spp = spatdat, ss = spatdat$ss, maxcex = 3) The output is shown in Figure 2. The parameter maxcex controls how large the plotted points are, acting as a global scale for both the uncensored and censored times, since the size of each dot or plus is proportional to the observed time. The user can control the plotting symbols and colors for the events and the censored observations.

Specifying priors
Before we can run the main inferential algorithm in the function survspat, we must first define prior density functions for the parameters in the model: β, ω and η. As mentioned above, the user does not set a prior for Y ; this is because we do not work with Y directly, rather with Γ, where Y = −σ 2 /2 + Σ 1/2 σ,φ Γ and we assume a priori that Γ ∼ N(0, 1), where 1 is the identity matrix and Σ 1/2 σ,φ is the Cholesky decomposition of the matrix of the covariance function evaluated at each of the coordinates of the observations. We declare priors for β, ω and η as follows: R> betaprior <-betapriorGauss(mean = 0, sd = 10) R> omegaprior <-omegapriorGauss(mean = 0, sd = 10) R> etaprior <-etapriorGauss(mean = c(0, -2), sd = 0.5) Each of the functions betapriorGauss, omegapriorGauss and etapriorGauss constructs an independent Gaussian prior for the variable in question (see below). These functions accept vectors or scalars for setting the mean and sd parameters as they would be processed by the function dnorm. In the first line, the code betapriorGauss(mean = 0, sd = 10) means that each β parameter will have a mean of zero and standard deviation of 10 independently.
Note that the priors for ω and for η are in fact priors for the log of these parameters. This is because σ, φ, α and λ are all positive, whereas our MCMC algorithm needs to operate on the whole of the real line. As to which parameters are being sampled on the log (or indeed other) scales, for ω and η, this depends respectively on the choice of baseline hazard distribution and covariance function. The transformation used is specified in the definitions of each of these functions, see Appendix A and Appendix B for further details; in brief all of the parameters in Table 1 are positive and so all of these are in practice sampled on the log-scale.
Note also that although within the MCMC algorithm, ω, η and Y are sampled on a transformed scale, for reporting back to the user at the end of the run, these are back-transformed for ease of interpretation.
Before proceeding to analyze the data, we first wrap the priors in an object of class mcmcPriors, constructed as follows, R> priors <-mcmcPriors(betaprior = betaprior, omegaprior = omegaprior, this can now be passed to the main inferential function, survspat.
The function mcmcPriors defines independent priors for β, ω and η and the package spatsurv only provides functionality for the built-in Gaussian priors discussed above. However, the choice of prior is extensible by the user by creating functions similar to the functions betapriorGauss, omegapriorGauss, etapriorGauss, indepGaussianprior and lastly, derivindepGaussianprior: the first three of which provide a mechanism for storing and retrieving the parameters of the priors; the fourth, a function for evaluating the log of the prior for a given set of parameter values; and the fifth, a function for evaluating the first and second derivatives of the log of the prior.

Running the MCMC algorithm
Having chosen a distribution on which the hazard function is to be based and specified our priors, we are in a position to be able to perform Bayesian inference for the model defined by Equation 5. All that remains is to choose the covariates to be included; our model here is ss age + sex + cancer, note that unlike lm this formula will not include an intercept term. The number of iterations, burn-in and thinning parameters are set using the mcmc.control argument.
R> mod <-survspat(formula = ss~age + sex + cancer, data = spatdat, + dist = DIST, cov.model = COVMODEL, + mcmc.control = mcmcpars(nits = 500000, burn = 10000, thin = 490), + priors = priors) This produces an object of class mcmcspatsurv, for which there exist various methods, discussed below. Note that this may take several hours to run; the progress bar indicates how far into the burn-in or run stage the algorithm is currently at.

MCMC diagnostics
Before proceeding to interpret any of the model outputs, we must first look for evidence of satisfactory convergence and mixing in the MCMC chain. The chains for β, ω, η and Y are stored in mod$betasamp, mod$omegasamp, mod$etasamp and mod$Ysamp respectively For the parameters β, ω and η, the function mcmcplot from the package mcmcplots (Curtis 2015) can be used to check trace and autocorrelation plots of the chain: R> require("mcmcplots") R> mcmcplot(cbind(mod$betasamp, mod$omegasamp, mod$etasamp)) This opens up a web browser and displays the diagnostic plots in an easily navigable way, see Figure 3. For parameters sampled on the log scale, it is sometimes preferable to look at plots of the log of that parameter, for example by using mcmcplot(log(mod$omegasamp)).
Examining the resulting autocorrelation plots shows that the algorithm was producing near independent samples from the posterior with the exception of the parameters σ and φ; and these latter two parameters still mixed reasonably well, with the autocorrelation dropping to around zero around the 8th and 10th lag respectively.
In a similar way to how we conducted diagnostic checks for the simulated data in Section 3.2, we can also look at a plot of the lag-1 autocorrelations for each of the Y s. A further diagnostic check is given by examining a plot of the log-posterior over the saved iterations in the chain, see the second line of code below. Note that this latter plot includes the very first iteration of the chain (so is of length the number of stored iterations plus 1), to provide evidence that the chain has left the transient phase and has reached stationarity.
R> frailtylag1(mod) R> plot(mod$tarrec, ylab = "log-Posterior", type = "s") See Figure 4 for the output of these commands. The left hand plot shows that the lag-1 autocorrelation in the Y s is very low and the right hand plot shows that the MCMC algorithm q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q has found a mode and is exploring it -in this case, the initial value of the log-target appears to be close to the mode of the posterior, more commonly there would be a jump from the initial value, the chain settling at some other value. What is important with this diagnostic plot is that there is no long-term trend evident, which is the case here.

Post-processing
Now that we have provided evidence that the MCMC chain has satisfactorily converged, we can proceed to producing posterior summaries of the quantities of interest. The package spatsurv has a print method for objects of class mcmcspatsurv, such as the object mod:

Plots of the prior and posterior
For inferential purposes, it is of interest to qualify the influence of the choice of prior on the resulting parameter estimates. One way of achieving this is to overlay plots of the prior and posterior densities:

R> priorposterior(mod)
the result is shown in Figure 5. These plots show the prior density function on its original scale (in this case, on the log-scale for ω and η) as a red line with a histogram of the posterior samples overlaid. For each parameter, we would hope to see a big difference between the prior and posterior, which would indicate that the data are informative about that parameter; in this case, it can be seen that the parameters β and ω and σ are well identified by the data whereas, as is typical in geostatistical inference, the spatial decay parameter, φ, is less well identified: see for example Zhang (2004).

Posterior baseline hazard and spatial covariance function
We can plot the posterior baseline hazard, h 0 , and the posterior spatial covariance function using the following commands:

R> baselinehazard(mod) R> posteriorcov(mod)
this produces the output in Figure 6. Note that the posterior cumulative hazard function can be obtained using baselinehazard(mod, cumulative = TRUE). To plot the posterior correlation function, type posteriorcov(mod, corr = TRUE).

Prediction
The package spatsurv enables the user to extract and examine the hazard, survival and density curves for each individual in the analysis. For example, for individual number 260, we would use the following: R> predict(mod, type = "hazard", indx = 260) R> predict(mod, type = "survival", indx = 260) R> predict(mod, type = "density", indx = 260) The result is shown in Figure 7.
The other useful predict type options include "densityquantile", which returns the Monte Carlo mean quantiles of the density function for each individual.
R> dq <-predict(mod, type = "densityquantile") Note that for some distributions, the quantiles of the density function are not available in an analytically convenient form.

Monte Carlo expectations
Lastly, it is of interest to be able to compute Monte Carlo expectations over the posterior: the package spatsurv provides a function MCE to facilitate this. Such expectations take the general form: where for example β (i) is the ith sample of β drawn from the chain. Note the dependency on Y i in the function g here. The function MCE takes as input the mcmcspatsurv object and also a function which has arguments beta, omega, eta and Y. An example of such a function is provided by the returned value of the hazardexceedance function, detailed below.
Writing the hazard function as, it can be seen that the frailty for individual i is composed of what we can explain by using covariates, exp{X i β}, and what we cannot explain, exp{Y i }, both of which act multiplicatively on the baseline hazard. It is often of scientific interest to quantify the unexplained variation, taking into account the uncertainty in estimation. One way of achieving this is to compute posterior exceedance probabilities, that is P[exp(Y ) > c] for some threshold c. attr(fun, "threshold") <-threshold + attr(fun, "direction") <-direction + return(fun) + } The hazardexceedance function, shown above returns an R function taking the appropriate arguments, which can be passed to the MCE function, which will compute the Monte Carlo expectation required: R> func <-hazardexceedance(c(1.5, 2, 3)) R> mchaz <-MCE(mod, func) R> mchaz[, 1:10]

A large dataset example: Fire service response times
The purpose of the previous example was to illustrate basic functionality of the spatsurv package. In this section we present an analysis of a large dataset: the London Fire Brigade (LFB) response time data, explored more fully in Taylor (2017). The data analyzed in this section is available from the London Datastore (London Fire and Emergency Planning Authority 2015). Here we present an analysis of the data from 2009. We restrict our attention to fires occurring in dwellings, as most deaths occur in this type of property. Our response variable is the time for the first fire engine to arrive on the scene of a fire at a dwelling and we exclude observations where this response time was not recorded. There is no censoring in the resulting dataset. This is a large dataset containing 6708 observations and a spatial analysis using the model and methods of the previous section would take prohibitively long.
The package spatsurv includes methods for computationally efficient inference with large datasets, described in detail in Taylor (2015). Our model is a slight modification of Equation 2 with respect to the frailties Y i . The package spatsurv implements a shared frailty model on an auxiliary grid of cells on which we wish to predict the response times. The model for the hazard takes the form: where m is the number of prediction grid cells. The main advantages of this model are that they allow us to use standard covariance models (such as the exponential or Matérn models) for the spatial dependence between points and spatial prediction of the process Y is a byproduct of the Monte Carlo sampling procedure.
We begin by reading in the data, which is stored as a SpatialPointsDataFrame.
R> library("spatsurv") R> library("spatstat") R> library("sp") R> library("survival") R> library("rgeos") R> library("rgdal") R> library("leaflet") R> library("fields") R> library("raster") R> library("lubridate") R> data("fstimes") Our model will include harmonic regression terms to describe the response time as a function of the time of day. Here we set up the required sine and cosine terms in our data frame: R> for(i in 1:4) { R> fstimes@data[paste("s", i, sep="")] <-+ sin(i * 2 * pi * fstimes$timenumeric / 24) R> fstimes@data[paste("c", i, sep="")] <-+ cos(i * 2 * pi * fstimes$timenumeric / 24) R> } We might expect the response time to a call to the fire service to depend on the proximity of the nearest fire station to the location of the fire. However, it is possible the engines from the nearest station are already engaged in responding to another call. In this analysis we therefore use (scaled) fire station intensity as measure of proximity of fire stations to fire locations.
R> data("fs") R> fs <-fs[fs$Description=="Fire Station", ] R> fscoords <-cbind(fs$Easting,fs$Northing) R> chull <-convexhull.xy(rbind(coordinates(fstimes), fscoords)) R> win <-expandwinPerfect(chull, TRUE, 1.2) R> fsppp <-ppp(x=fscoords[, 1], fscoords[, 2], window=win) R> fsintens <-density.ppp(fsppp) R> fsintens <-raster(fsintens) R> proj4string(fsintens) <-CRS("+init=epsg:27700") R> fstimes$fsintens <-raster::extract(fsintens, fstimes) * 100000000 We next set up the model and priors: R> FORM <-S~fsintens + s1 + c1 + s2 + c2 + s3 + c3 + s4 + c4 R> betaprior <-betapriorGauss(mean = 0, sd = 100) R> omegaprior <-omegapriorGauss(mean = 0, sd = 10) R> etaprior <-etapriorGauss(mean = log(c(1, 1000)), sd = 0.5) R> priors <-mcmcPriors(betaprior = betaprior, omegaprior = omegaprior, + etaprior = etaprior, call = indepGaussianprior, + derivative = derivindepGaussianprior) Lastly, we call the MCMC algorithm to produce samples from the posterior. Here we use the BsplineHaz function to model the baseline hazard with a cubic basis spline and the inference.control function to specify gridded = TRUE which selects the model in Equation 6; the cellwidth option is used to control the height and width of the square grid cells. The package spatsurv will by default choose the optimum number of computational cells for a given cell width subject to the restriction that the dimension of the grid is a power of 2 in each direction. Smaller choices of the cellwidth parameter obviously lead to finer grids and hence a larger computation time, but exactly how much time an MCMC sampler will take to run will depend on the power of the PC in question. We anticipate that to a great extent, the cell width employed in practice will depend on the available computational resource (seeking to make the grid as fine as possible given time constraints). There is an optimum cell width, which minimizes the size of the computational grid and hence the computational wastage; the function getOptCellwidth computes this optimum for a given initial starting value and the function showGrid can be used to visualize the computational grid for a given cell width. The optimal cell width computed with getOptCellwidth can be passed to the function survspat; if entering manually, this optimal cell width should be rounded upwards.
R> mod <-survspat(formula = FORM, data = fstimes, + dist = BsplineHaz(fstimes$S[, 1]), cov.model = ExponentialCovFct(), + mcmc.control = mcmcpars(nits = 500000, burn = 10000, thin = 490), + priors = priors, + control = inference.control(gridded = TRUE, cellwidth = 1000)) Computation time for this run, performed on a 3.40GHz Intel Core i7-4770 desktop computer, was 3.1 hours; the O(n 3 ) method of the previous section would have taken around five months to run for this dataset and model. Before proceeding to summarize the results, we checked for convergence using traceplots and a plot of the log-posterior. Having established convergence and satisfactory mixing, we can then summarize the results from our model, we discuss here the production of two plots that are of interest in this example.
We first show how to produce a plot of the effect of time of day on the relative risk. Our model includes harmonic regression terms, so we first construct a vector of times of length 100, tseq, at which we want to compute the predicted relative risk. The next step is to create a function, getpred, to reproduce the harmonic functional at these time points for a given vector of parameters, representing the sampled harmonic coefficients. Finally, applying this function to the appropriate subset of retained β samples yields a matrix, timetrend, of dimension 100 × 1000 from which we can extract quantiles of the functionals evaluated at each time point and plot them. The result is Figure 8. Because a longer 'survival time' in our model is regarded as bad (the engine did not arrive quickly), the intervals of time where the confidence band for the relative risk is below 1 are where the delay in response time is significantly lower than average.
The second plot of interest is a plot of the relative risk over space, computed using the function hazardexceedance with option direction = "lower" to compute the probability that the relative risk is below the thresholds specified: R> func <-hazardexceedance(c(0.9, 0.8, 0.7), direction = "lower") R> mchaz <-MCE(mod, func) We can plot these relative risks on a map by creating a SpatialPolygonsDataFrame object corresponding to the inferential grid with the exceedance probabilities stored as an entry in the data frame, exceed. Alternatively, the results can be stored in a raster brick or a SpatialPixelsDataFrame object, depending on what type of grid is returned by the func-tion getgrid. Finally, we can plot the exceedance probabilities using the wrapper function, spplot1, to features from the leaflet package (Cheng and Xie 2016), see Figure 9. Areas of high probability in this plot are where the relative risk of an engine arriving on the scene of a fire, adjusted for time of day and proximity of fire stations, is such that there may be cause for concern about the provision of services.

Polygonal data
The package spatsurv also handles survival data where the locations of individuals are only known to the regional level, such as the SEER cancer data (Howlader et al. 2013). Note that it is not possible to redistribute the SEER data, so we have not included an example analysis in the present article, though this would largely proceed as in the examples above but with the call to survspat as detailed below. In this chunk of code, FORM is a model formula; cancerdata is a data.frame containing the covariate and survival data; DIST is the assumed distribution for the baseline hazard function; COVMODEL is the spatial covariance model; priors are the priors, as set up in the first example above.
R> mod <-survspat(formula = FORM, data = cancerdata, dist = DIST, + cov.model = COVMODEL, mcmc.control = mcmcpars(nits = 500000, + burn = 10000, thin = 490), priors = priors, shape = stateshp, + ids = list(shpid = "FIPS", dataid = "STCOUNTY")) The penultimate argument shape is assigned an object stateshp; which can be either a spatialPolygonsDataFrame or a spatialPointsDataFrame. In either case, as part of the MCMC algorithm, the command coordinates(shape) will be called in order to compute inter-region (or more technically, inter-point) distances. These inter-region distances are later passed to the covariance function, so that in the case of a polygonal shape, the covariance between regions is a function of the distance between the centroids of those regions. The reason we have the option of accepting a SpatialPointsDataFrame object is so that it is possible to use weighted centroids of the polygons to define the distance between two regions; an example would be population weighted centroids. The package spatsurv does not include any functionality for computing weighted centroids, but if they are available they may thus be passed to the inferential algorithm, survspat.
The argument ids tells the program how to match the case data, data, to the shape argument; it is assumed that there is a variable in each of these containing the names (or regional identifiers) of the polygons of interest. This argument is a list object with two entries: shpid is the name of the variable in the shape object with the polygon (or point) names; and dataid is the name of the variable in the object cancerdata containing the polygon (or point) names at the individual level. In the case of the SEER cancer data (not presented here), "FIPS" would be the Federal Information Processing Standard code assigned to polygonal regions labelling counties within states and "STCOUNTY" is the corresponding variable in the SEER cancer data at the individual level.

Discussion
In this article, we have introduced a new R package, spatsurv, for the analysis of spatially referenced survival data using parametric proportional hazards models. Our model for the spatially-correlated frailties has assumed log-Gaussian form and the package is extensible in the sense that it allows the user to construct their own baseline hazard and spatial covariance functions. The package functions are able to cope with right, left or interval censored survival data. For reference purposes, we have provided a statistical introduction to spatial survival models and have demonstrated how to simulate and analyze data using advanced Markov chain Monte Carlo methods. Our package provides simple functions for assessing model convergence and post-processing tools for visualizing key quantities of interest. We have also introduced a model and computational method for handling large spatial survival datasets, and demonstrate the performance of this method on a real-life example.
Developing reliable and efficient MCMC code for Bayesian inference with spatial survival models is a challenging task. We hope that the spatsurv package will prove useful to those in the statistical/epidemiological and other communities who seek an off-the-shelf MCMC algorithm they can straightforwardly apply to their data.

A. User-defined covariance functions
In order to define a new covariance function for use with spatsurv, the user must provide a list inheriting class c("covmodel","fromUserFunction"). An example of how to create such a list is shown in the function ExponentialCovFct below. The essential elements of this list are: npar, an integer specifying the number of parameters for the model (note typically this should be a small number, e.g., 2 or 3); parnames, names for the parameters -note that at least one of the parameters must be named sigma and have the interpretation that σ 2 is the marginal variance of the latent field; itrans, a transformation applied to the working parameters returning a vector the same length as the parameters; trans, the inverse of the $itrans function, this is the scale on which the MCMC algorithm performs sampling (in this case, the log-scale) and again this function must return a vector the same length as the parameters; and eval, a function accepting a vector input of distances, u, and parameters, pars, which evaluates the covariance function for each element of u.

B. User-defined hazard functions
In this section, we give details on how to define new baseline hazard functions to be used with the package spatsurv. This involves defining a function that returns a list whose elements are themselves functions; the returned list must inherit the class "basehazardspec". The package spatsurv calls these functions at various points during an MCMC run, or while postprocessing. The example code below is from the weibullHaz function, this particular function has no arguments. We now discuss each of these functions individually.
The distinfo function is used to provide basic distribution specific information to other spatsurv functions. The user is required to provide the following information in the returned list: npars, the number of parameters in this distribution; parnames, the names of the parameters; trans, the transformation scale on which the priors will be provided; itrans, the inverse transformation function that will be applied to the parameters before the hazard, and other functions are evaluated; jacobian, the derivative of the inverse transformation function with respect to each of the parameters; and hessian, the second derivatives of the inverse transformation function with respect to each of the parameters -note that currently the package spatsurv only allows the use of functions where the parameters are transformed independently.
The basehazard function is used to evaluate the baseline hazard function for the distribution of interest. It returns a function that accepts as input a vector of times, t and returns a vector.
The gradbasehazard function is used to evaluate the gradient of the baseline hazard function with respect to the parameters, this typically returns a vector. It returns a function that accepts as input a vector of times, t, and returns a matrix.
The hessbasehazard function is used to evaluate the Hessian of the baseline hazard function. It returns a function that accepts as input a vector of times, t and returns a list of hessian matrices corresponding to each t.
The cumbasehazard function is used to evaluate the cumulative baseline hazard function for the distribution of interest. It returns a function that accepts as input a vector of times, t and returns a vector.
The gradcumbasehazard function is used to evaluate the gradient of the cumulative baseline hazard function with respect to the parameters, this typically returns a vector. It returns a function that accepts as input a vector of times, t, and returns a matrix.
The hesscumbasehazard function is used to evaluate the Hessian of the cumulative baseline hazard function. It returns a function that accepts as input a vector of times, t and returns a list of hessian matrices corresponding to each t.
The densityquantile function is used to return quantiles of the density function. This is not required for running the MCMC, merely for use in post-processing with the predict function where type="densityquantile". In the case of the Weibull model for the baseline hazard, it can be shown that the qth quantile is: Note that quantiles of the density function are not always analytically tractable, in which case the densityquantile function should not return anything, instead issuing an error, for example, stop("Quantiles of the density function are not available"), or something similar.
The constant c is present because we adapt the value of h in our chain using a method described in Andrieu and Thoms (2008) (see Taylor, Davies, Rowlingson, and Diggle (2013) for details) to achieve an approximately optimal acceptance rate of 0.574 to suit the Langevin kernels. This acceptance rate is too high for the random walk, so we set c = 0.4 which is roughly the ratio of the approximate optimal acceptance rates for random walk and Langevin kernels (0.234/0.574).
In the package spatsurv, we use a combination of maximum likelihood estimation and ad hoc methods to obtain initial estimates of β,ω,η and Γ; conditional on these, the matrices Σ β,ω and Σ Γ are obtained from the second derivative of the posterior with respect to those parameters (Taylor 2015). We initialize the MCMC run from these estimates apart from Γ, which is initialized at Γ = 0.

D. Simulation study
We performed a simulation study to check that our code was able to estimate model parameters correctly. This study consisted of 16 simulated datasets (indexed by the ID column in Table 2), and we ran the survspat function on each dataset three times. In the first 8 datasets the data were simulated assuming an exponential baseline hazard and in the last 8 scenarios we used a Weibull baseline hazard. When it came to analyzing the data, we assumed the functional form of the baseline hazard was known (i.e., we assumed an exponential model for datasets 1-8 and a Weibull model for datasets 9-16). We simulated data using an exponential covariance function for the spatial random effects and again assumed this was known when analyzing the data. We assumed that survival time depended on two covariates: age and sex the former being treated as a continuous variable and the latter a binary.
The true and estimated parameters from each of the three runs for each of the 16 datasets are shown in Table 2. We ran each MCMC sampler for 1,000,000 iterations with a burn-in of 10,000 iterations and retaining every 990th sample. For each dataset, we assumed N(0, 10 2 ) priors for β and log ω; N(0, 0.5) priors for log σ; and N(−3, 0.4) for log φ. The results show that with the exception of dataset 12 we obtain very good estimates of parameters; the columns with title 'IN' have a checkmark where the credible interval contains the true parameter value, though note that due to the presence of the spatial random effects in the model, we do not necessarily expect to obtain the true parameters back when we analyze these datasets.