General Semiparametric Shared Frailty Model Estimation and Simulation with frailtySurv

The R package frailtySurv for simulating and fitting semi-parametric shared frailty models is introduced. frailtySurv implements semi-parametric consistent estimators for a variety of frailty distributions, including gamma, log-normal, inverse Gaussian and power variance function, and provides consistent estimators of the standard errors of the parameters' estimators. The parameters' estimators are asymptotically normally distributed, and therefore statistical inference based on the results of this package, such as hypothesis testing and confidence intervals, can be performed using the normal distribution. Extensive simulations demonstrate the flexibility and correct implementation of the estimator. Two case studies performed with publicly-available datasets demonstrate applicability of the package. In the Diabetic Retinopathy Study, the onset of blindness is clustered by patient, and in a large hard drive failure dataset, failure times are thought to be clustered by the hard drive manufacturer and model.


Introduction
The semi-parametric Cox proportional hazards (PH) regression model was developed by Sir David Cox (Cox 1972) and is by far the most popular model for survival analysis. The model defines a hazard function, which is the rate of an event occurring at any given time, given the observation is still at risk, as a function of the observed covariates. When data consist of independent and identically distributed observations, the parameters of the Cox PH model are estimated using the partial likelihood (Cox 1975) and the Breslow estimator (Breslow 1974).
Often, the assumption of independent and identically distributed observations is violated. In clinical data, it is typical for survival times to be clustered or depend on some unobserved covariates. This can be due to geographical clustering, subjects sharing common genes, or some other predisposition that cannot be observed directly. Survival times can also be clustered by subject when there are multiple observations per subject with common baseline hazard. For example, the Diabetic Retinopathy Study was conducted to determine the time to the onset of blindness in high risk diabetic patients and to evaluate the effectiveness of laser treatment. The treatment was administered to one randomly-selected eye in each patient, leaving the other eye untreated. Obviously, the two eyes' measurements of each patient are clustered by patient due to unmeasured patient-specific effects.
Clustered survival times are not limited to clinical data. Computer components often exhibit clustering due to different materials and manufacturing processes. The failure rate of magnetic storage devices is of particular interest since component failure can result in data loss. A large backup storage provider may utilize tens of thousands of hard drives consisting of hundreds of different hard drive models. In evaluating the time until a hard drive becomes inoperable, it is important to consider operating conditions as well as the hard drive model. Hard drive survival times depend on the model since commercial grade models may be built out of better materials and designed to have longer lifetimes than consumer grade models. The above two examples are used in Section 5 for demonstrating the usage of the frailtySurv package. Clayton (1978) accounted for cluster-specific unobserved effects by introducing a random effect term into the proportional hazards model, which later became known as the shared frailty model. A shared frailty model includes a latent random variable, the frailty, which comprises the unobservable dependency between members of a cluster. The frailty has a multiplicative effect on the hazard, and given the observed covariates and unobserved frailty, the survival times within a cluster are assumed independent.
Under the shared frailty model, the hazard function at time t of observation j of cluster i is given by λ ij (t|Z ij , ω i ) = ω i λ 0 (t) e β Z ij j = 1, . . . , m i i = 1, . . . , n where ω i is an unobservable frailty variate of cluster i, λ 0 (t) is the unknown common baseline hazard function, β is the unknown regression coefficient vector, and Z ij is the observed vector of covariates of observation j in cluster i. The frailty variates ω 1 , . . . , ω n , are independent and identically distributed with known density f (·; θ) and unknown parameter θ.
There are currently several estimation techniques with available R package (R Core Team 2015) for fitting a shared frailty model, shown in Table 1. In a parametric model, the baseline hazard function is of known parametric form, with several unknown parameters. Parameter estimation of parametric models is performed by the maximum marginal likelihood (MML) approach (Duchateau and Janssen 2007;Wienke 2010). The parfm package (Munda, Rotolo, and Legrand 2012) implements several parametric frailty models. In a semi-parametric model, the baseline hazard function is left unspecified, a highly important feature, as often in practice the shape of the baseline hazard function is unknown. Under the semi-parametric setting, the top downloaded packages, survival (Therneau 2015a) and coxme (Therneau 2015b), implement the penalized partial likelihood (PPL). frailtypack parameter estimates are obtained by nonlinear least squares (NLS) with the hazard function and cumulative hazard function modeled by a 4th order cubic M-spline and integrated M-spline, respectively (Rondeau, Mazroui, and Gonzalez 2012). Since the frailty term is a latent variable, expectation maximization (EM) is also a natural estimation strategy for semi-parametric models, implemented by phmm (Donohue and Xu 2013). More recently, a hierarchical-likelihood (h-likelihood, or HL) method (Do Ha, Lee, and Song 2001) has been used to fit hierarchical shared frailty models, implemented by frailtyHL (Do Ha, Noh, and Lee 2012). Both R packages R2BayesX (Brezger, Kneib, and Lang 2003;Gu 2014) and gss (Hirsch and Wienke 2012) can fit a shared frailty model and support only Gaussian random effects with the baseline hazard function estimated by penalized splines.
This work introduces the frailtySurv R package, an implementation of Gorfine, Zucker, and ;Zucker, Gorfine, and Hsu (2008), wherein an estimation procedure for semiparametric shared frailty models with general frailty distributions was proposed. Gorfine et al. (2006) addresses some limitations of other existing methods. Specifically, all other available semi-parametric packages can only be applied with gamma, log-normal (LN), and log-t (LT) frailty distributions. In contrast, the semi-parametric estimation procedure used in frailtySurv supports general frailty distributions with finite moments, and the current version of frailtySurv implements gamma, log-normal, inverse Gaussian (IG), and power variance function (PVF) frailty distributions. Additionally, the asymptotic properties of most of the semi-parametric estimators in Table 1 are not known. In contrast, the regression coefficients' estimators, the frailty distribution parameter estimator, and the baseline hazard estimator of frailtySurv are backed by a rigorous large-sample theory (Gorfine et al. 2006;Zucker et al. 2008). In particular, these estimators are consistent and asymptotically normally distributed. A consistent covariance-matrix estimator of the regression coefficients' estimators and the frailty distribution parameter's estimator is provided by Gorfine et al. (2006) and Zucker et al. (2008), also implemented by frailtySurv. Alternatively, frailtySurv can perform variance estimation through a weighted bootstrap procedure.
While some of the packages in Table 1 contain synthetic and/or real-world survival datasets, none of them contain functions to simulate clustered data. There exist several other packages capable of simulating survival data, such as the rmultime function in the MST package (Su, Calhoun, and Fan 2015), the genSurv function in survMisc (Dardis 2015), and survsim (Moriña and Navarro 2014), an R package dedicated to simulating survival data. These functions simulate only several frailty distributions. frailtySurv contains a rich set of simulation functions, described in Section 2, capable of generating clustered survival data under a wide variety of conditions. The simulation functions in frailtySurv are used to empirically verify the implementation of unbiased (bug-free) estimators through several simulated experiments.
The rest of this paper is organized as follows. Sections 2 and 3 describe the data generation and model estimation functions of frailtySurv, respectively. Section 4 demonstrates simulation capabilities and results. Section 5 is a case study of two publicly available datasets, including high-risk patients from the Diabetic Retinopathy Study and a large hard drive failure dataset. Finally, Section 6 concludes the paper. The currently supported frailty distributions are described in Appendix A, full simulation results are presented in Appendix B, and Appendix C contains an empirical analysis of runtime and accuracy.

Data generation
The genfrail function in frailtySurv can generate clustered survival times under a wide variety of conditions. The survival function at time t of the jth observation of cluster i, given time-independent covariate Z ij and frailty variate ω i , is given by where Λ 0 (t) =´t 0 λ 0 (u) du is the unspecified cumulative baseline hazard function. In the following sections we describe in detail the various options for setting each component of the above conditional survival function.

Covariates
Covariates can be sampled marginally from normal, uniform, or discrete uniform distributions, as specified by the covar.distr parameter. The value of β is specified to genfrail through the covar.param parameter. User-supplied covariates can also be passed as the covar.matrix parameter. There is no limit to the covariates' vector dimension. However, the estimation procedure requires the number of clusters to be much higher than the number of covariates. These options are demonstrated in Section 2.7.

Baseline hazard
There are three ways the baseline hazard can be specified to generate survival data: as the inverse cumulative baseline hazard Λ −1 0 , the cumulative baseline hazard Λ 0 , or the baseline hazard λ 0 . If the cumulative baseline hazard function can be directly inverted, then failure times can be computed by where U ij ∼ U (0, 1) and T o ij is the failure time of member j of cluster i. Consequently, if Λ −1 0 is provided as parameter Lambda_0_inv in genfrail, then survival times are determined by Eq.
(3). This is the most efficient way to generate survival data.
When Λ 0 cannot be inverted, one can use a univariate root-finding algorithm to solve for failure time T o ij . Alternatively, taking the logarithm and solving yields greater numerical stability. Therefore, genfrail uses Eq. (5) when Λ 0 is provided as parameter Lambda_0 in genfrail and uses the R function uniroot, which is based on Brent's algorithm (Brent 2013).
If neither Λ −1 0 or Λ 0 are provided to genfrail, then the baseline hazard function λ 0 must be passed as parameter lambda_0. In this case, is evaluated numerically. Using the integrate function in the stats package, which implements adaptive quadrature, Eq. (5) can be numerically solved for T o ij . This approach is the most computationally expensive since it requires numerical integration to be performed for each observation ij and at each iteration in the root-finding algorithm.
Section 2.7 demonstrates generating data using each of the above methods, which all generate failure times in the range [0, ∞). The computational complexity of each method is O (n) under the assumption that a constant amount of work needs to be performed for each observation. Despite this, the constant amount of work per observation varies greatly depending on how the baseline hazard is specified. Using the inverse cumulative baseline hazard, there exists an analytic solution for each observation and only arithmetic operations are required. Specifying the cumulative baseline hazard requires root finding for each observation, and specifying the baseline hazard requires both root finding and numerical integration for each observation. Since the time to perform root finding and numerical integration is not a function of n, the complexity remains linear in each case. Appendix C.1 contains benchmark simulations that compare the timings of each method.

Shared frailty
Shared frailty variates ω 1 , . . . , ω n are generated according to the specified frailty distribution, through parameters frailty and theta of genfrail, respectively. The available distributions are gamma with mean 1 and variance θ; PVF with mean 1 and variance 1 − θ; log-normal with mean exp(θ/2) and variance exp(2θ) − exp(θ); and inverse Gaussian with mean 1 and variance θ. genfrail can also generate frailty variates from a positive stable (PS) distribution, although estimation is not supported due to the PS having an infinite mean. The supported frailty distributions are described in detail in Appendix A. Specifying parameters that induce a degenerate frailty distribution, or passing frailty = "none", will generate non-clustered data. Hierarchical clustering is currently not supported by frailtySurv.
The dependence between two cluster members can be measured by Kendall's tau 2 , given by where L is the Laplace transform of the frailty distribution and L (m) , m = 1, 2, . . . are the mth derivative of L. If failure times of the two cluster members are independent, κ = 0. Figure 1 shows the densities of the supported distributions for various values of κ. Note that gamma, IG, and PS are special cases of the PVF. For gamma, LN, and IG, κ = 0 when θ = 0, and for PVF, κ = 0 when θ = 1. Also, for gamma and LN, lim θ→∞ κ = 1, for IG lim θ→∞ κ = 1/2, for PVF lim θ→0 κ = 1/3, and for PS κ = 1 − θ.

Cluster sizes
2 Kendall's tau is denoted by κ to avoid confusion with τ , the end of the follow-up period. In practice, the cluster sizes m i , i = 1, . . . , n, can be fixed or may vary. For example, in the Diabetic Retinopathy Study, two failure times are observed for each subject, corresponding to the left and right eye. Hence, observations are clustered by subject, and each cluster has exactly two members. If instead the observations were clustered by geographical location, the cluster sizes would vary, e.g., according a discrete power law distribution. genfrail is able to generate data with fixed or varying cluster sizes.
For fixed cluster size, the cluster size parameter K of genfrail is simply an integer. Alternatively, the cluster sizes may be specified by passing a length-N vector of the desired cluster sizes to parameter K. To generate varied cluster sizes, K is the name of the distribution to generate from, and K.param specifies the distribution parameters.
Cluster sizes can be generated from a k-truncated Poisson with K = "poisson" (Geyer 2014). The truncated Poisson is used to ensure there are no zero-sized clusters and to enforce a minimum cluster size. The expected cluster size is given by where λ is a shape parameter and k is the truncation point such that min{m 1 , . . . , m n } > k. The typical case is with k = 0 for a zero-truncated Poisson. For example, with λ = 2 and k = 0, the expected cluster size equals 2.313. The parameters of the k-truncated Poisson are determined in K.param = c(lambda, k) of genfrail.
A discrete Pareto (or zeta) distribution can also be used to generate cluster sizes with K = "pareto". Accurately fitting and generating from a discrete power-law distribution is generally difficult, and genfrail uses a truncated discrete Pareto to avoid some of the pitfalls as described in Clauset, Shalizi, and Newman (2009). The probability mass function is given by s > 1, u > l, m = l + 1, . . . , u where ζ (s) is the Riemann zeta function, s is a scaling parameter, l is the noninclusive lower bound, and u is the inclusive upper bound. With large enough u and s 1, the distribution behaves similar to the discrete Pareto distribution and the expected cluster size equals 1 ζ(s) ∞ j=1 1 j s−1 . The distribution parameters are specified as K.param = c(s, u, l). Finally, a discrete uniform distribution can be specified by K = "uniform" in genfrail. The respective parameters to K.param are c(l, u), where l is the noninclusive lower bound and u is the inclusive upper bound. Similar to the truncated zeta, the support is {l + 1, . . . , u} while each cluster size is uniformly selected from this set of values. Since the lower bound is noninclusive, the expected cluster size equals (1 + l + u)/2.

Censoring
The observed times T ij and failure indicators δ ij are determined by the failure times T o ij and right-censoring times C ij such that the observed time of observation ij is given by and the failure indicator is given by Currently, only right-censoring is supported by frailtySurv. The censoring distribution is specified by the parameters censor.distr and censor.param for the distribution name and parameters' vector, respectively. A normal distribution is used by default. A log-normal censoring distribution is specified by censor.distr = "lognormal" and censor.param = c(mu, sigma), where mu is the mean and sigma is the standard deviation of the censoring distribution. Lastly, a uniform censoring distribution can be specified by censor.distr = "uniform" and censor.param = c(lower, upper) for the lower and upper bounds on the interval, respectively.
Sometimes a particular censoring rate is desired. Typically, the censoring distribution parameters are varied to obtain a desired censoring rate. genfrail can avoid this effort on behalf of the user by letting the desired censoring rate be specified instead. In this case, the appropriate parameters for the censoring distribution are determined to achieve the desired censoring rate, given the generated failure times.
Let F and G be the failure time and censoring time cumulative distributions, respectively. Then, the censoring rate equals where the expectation of I(T o 11 > C 11 ) equals the expectation of any random subject from the population. The above formula can be estimated by where F is the empirical cumulative distribution function. To obtain a particular censoring rate 0 < R < 1, as a function of the parameters of G, one can solve For example, if G is the normal cumulative distribution function with mean µ and variance σ 2 , σ 2 should be pre-specified (otherwise the problem in non-identifiable), and Eq. (14) is solved for µ. This method works with any empirical distribution of failure times. genfrail uses this approach to achieve a desired censoring rate, specified by censor.rate, with normal, log-normal, or uniform censoring distributions. Lastly, user-supplied censorship times can be supplied through the censor.time parameter, which must be a vector of length N * K, where N is the number of clusters and K is the size of each cluster. Because of this, censor.time cannot be used with variable-sized clusters.

Rounding
In some applications the observed times are rounded and tied failure times can occur. For example, the age at onset of certain diseases are often recorded as years of age rounded to the nearest integer. To simulate tied data, the simulated observed times may optionally be rounded to the nearest integer of multiple of B bẏ If B = 1, the observed times are simply rounded to the nearest integer. The value of B is specified by the parameter round.base of genfrail, with the default being the non-rounded setting.

Examples
The best way to see how genfrail works is through examples. R and frailtySurv versions are given by the following commands.
In the above example, the baseline hazard function was specified by the lambda_0 parameter. The same dataset can be generated more efficiently using the Lambda_0 parameter if the cumulative baseline hazard function is known. This is accomplished by integrating Eq. (16) to get the cumulative baseline hazard function and passing this function as an argument to Lambda_0 when calling genfrail: > set.seed(2015) > dat.cbh <-genfrail(N = 300, K = 2, beta = c(log(2),log(3)), + frailty = "gamma", theta = 2, + Lambda_0 = function(t, c = 0.01, d = 4.6) + (c * t)^d) > head(dat.cbh, 3) The cumulative baseline hazard in Eq. (17) is invertible and it would be even more efficient to specify Λ −1 0 as This avoids the numerical integration, required by Eq. (6), and root finding, required by Eq. (5). Eq. (18) should be passed to genfrail as the Lambda_0_inv parameter, again producing the same data when the same seed is used: (2015) > dat.inv <-genfrail(N = 300, K = 2, beta = c(log(2),log(3)), + frailty = "gamma", theta = 2, + Lambda_0_inv = function(t, c = 0.01, d = 4.6) A different frailty distribution can be specified while ensuring an expected censoring rate by using the censor.rate parameter. For example, consider a PVF (0.3) frailty distribution while maintaining the 40% censoring rate in the previous example. The censoring distribution parameters are determined by genfrail as described in Section 2.5 by specifying censor.rate = 0.4. This avoids the need to manually adjust the censoring distribution to achieve a particular censoring rate. The respective code and output are:

Model estimation
The fitfrail function in frailtySurv estimates the regression coefficient vector β, the frailty distribution's parameter θ, and the non-parametric cumulative baseline hazard Λ 0 . The observed data consist of {T ij , Z ij , δ ij } for i = 1, . . . , n and j = 1, . . . , m i , where the n clusters are independent. fitfrail takes a complete observation approach, and observations with missing values are ignored with a warning.
There are two estimation strategies that can be used. The log-likelihood can be maximized directly, by using control parameter fitmethod = "loglik", or a system of score equations can be solved with control parameter fitmethod = "score". Both methods have comparable computational requirements and yield comparable results. In both methods, the estimation procedure consists of a doubly-nested loop, with an outer loop that evaluates the objective function and gradients and an inner loop that estimates the piecewise constant hazard, performing numerical integration at each time step if necessary. As a result, the estimator implemented in frailtySurv has computationally complexity on the order of O n 2 .

Log-likelihood
The full likelihood can be written as where τ is the end of follow-up period, f is the frailty's density function, Evidently, to obtain estimators β and θ based on the log likelihood, an estimator of Λ 0 , denoted by Λ 0 , is required. For given values of β and θ, Λ 0 is estimated by a step function with jumps at the ordered observed failure times τ k , k = 1, . . . , K, defined by where d k is the number of failures at time τ For the detailed derivation of the above baseline hazard estimation the reader is referred to Gorfine et al. (2006). The estimator of the cumulative baseline hazard at time τ k is given by and is a function , at each τ k , the cumulative baseline hazard estimator is a function of Λ 0 (t) with t < τ k . Then, for obtaining β and θ, Λ 0 is substituted into (β, θ, Λ 0 ).
In summary, the estimation procedure of Gorfine et al. (2006) consists of the following steps: Step 1. Use standard Cox regression software to obtain initial estimates of β, and set the initial value of θ to be its value under within-cluster independence or under very week dependency (see also the discussion at the end of Section 3.2).
Step 2. Use the current values of β and θ to estimate Λ 0 based on the estimation procedure defined by Eq. (21).
For frailty distributions with no closed-form Laplace transform, the integral can be evaluated numerically. This adds a considerable overhead to each iteration in the estimation procedure since the integrations must be performed for the baseline hazard estimator that is required for estimating β and θ, as With control parameter fitmethod = "loglik", the log-likelihood is the objective function maximized directly with respect to γ = (β T , θ) T , for any given Λ 0 , by optim in the stats package using the L-BFGS-B algorithm (Byrd, Lu, Nocedal, and Zhu 1995). Box constraints specify bounds on the frailty distribution parameters, typically θ ∈ (0, ∞) except for PVF which has θ ∈ (0, 1). Convergence is determined by the relative reduction in the objective function through the reltol control parameter. By default, this is 10 −6 .

Score equations
Instead of maximizing the log-likelihood, one can solve the score equations. The score function with respect to β is given by Gorfine et al. (2006). The score function with respect to θ is given by The score equations are given by U(β, θ, Λ 0 ) = (U β , U θ ) = 0 and the estimator of γ = (β T , θ) is defined as the value of (β T , θ) that solves the score equations for any given Λ 0 . Specifically, the only change required in the above summary of the estimation procedure, is to replace Step 3 with the following Step 3'. Using the current value of Λ 0 , estimate β and θ by solving U(β, θ, Λ 0 ) = 0.
frailtySurv uses Newton's method implemented by the nleqslv package to solve the system of equations. Convergence is reached when the relative reduction of each parameter estimate or absolute value of each normalized score is below the threshold specified by reltol or abstol, respectively. The default is a relative reduction of γ less than 10 −6 , i.e., reltol = 1e-6.
As an example, in the following lines of code and output we consider again the data generated in Section 2. The results are comparable to the fitted model in Section 3.1. The score equations can usually be solved in fewer iterations than maximizing the likelihood, although solving the system of equations requires more work in each iteration. For this reason, maximizing the likelihood is typically more computationally efficient for large datasets when a permissive convergence criterion is specified.
> fit.score <-fitfrail(Surv(time, status)~Z1 + Z2 + cluster(family), + dat, frailty = "gamma", fitmethod = "score") > fit.score Call: fitfrail(formula = Surv(time, status)~Z1 + Z2 + cluster(family), dat = dat, frailty = "gamma", fitmethod = "score") L-BFGS-B, used for maximizing the log-likelihood, allows for (possibly open-ended) box constraints. In contrast, Newton's method, used for solving the system of score equations, does not support the use of box constraints and, therefore, has a risk of converging to a degenerate parameter value. In this case, it is more important to have a sensible starting value. In both estimation methods, the regression coefficient vector β is initialized to the estimates given by coxph with no shared frailty. The frailty distribution parameters are initialized such that the dependence between members in each cluster is small, i.e, with κ ≈ 0.3.

Baseline hazard
The estimated cumulative baseline hazard defined by Eq. (22) is accessible from the resulting model object through the fit$Lambda member, which provides a data.frame with the estimates at each observed failure time, or the fit$Lambda.fun member, which defines a scalar R function that takes a time argument and returns the estimated cumulative baseline hazard. The estimated survival curve or cumulative baseline hazard can also be summarized by the summary.fitfrail function, which returns a data.frame. In the example below, the n.risk column contains the number of observations still at risk at time t− and the n.event column contains the number of failures from the previous time listed to time t+. The output is similar to that of the summary.survfit in the survival package.

Standard errors
There are two ways the standard errors can be obtained for a fitted model. The covariance matrix of γ, the estimators of the regression coefficients and the frailty parameter, can be obtained explicitly based on the sandwich-type consistent estimator described in Gorfine et al. (2006) and Zucker et al. (2008). The covariance matrix is calculated by the vcov function applied on the fitfrail object returned by fitfrail. Optionally, standard errors can also be obtained in the call to fitfrail by passing se = TRUE. Using the above fitted model, the covariance matrix of γ is obtained by theta.1 0.09343685 0.12673624 0.36020143 frailtySurv can also estimate standard errors through a weighted bootstrap approach, in which the variance of both γ and Λ 0 are determined 4 . The weighted bootstrap procedure consists of independent and identically distributed positive random weights applied to each cluster. This is in contrast to a nonparametric bootstrap, wherein each bootstrap sample consists of a random sample of clusters with replacement. The resampling procedure of the nonparametric bootstrap usually yields an increased number of ties compared to the original data, which sometimes causes convergence problems. Therefore, we adopt the weighted bootstrap approach which does not change the number of tied observations in the original data. The weighted bootstrap is summarized as follows.
1. Sample n random values {v * i , i = 1, . . . , n} from an exponential distribution with mean 1. Standardize the values by the empirical mean to obtain standardized weights v 1 , . . . , v n .
3. Repeat Steps 1 -2 B times and take the empirical variance (and covariance) of the B parameter estimates to obtain the weighted bootstrap variance (and covariance).
For smaller datasets, this process is generally more time-consuming than the explicit estimator. If the parallel package is available, all available cores are used to obtain the bootstrap parameter estimates in parallel. Without the parallel package, vcov runs in serial. In the preceding example, the full covariance matrix for γ, Λ 0 is obtained. If only certain time points of the estimated cumulative baseline hazard function are desired, these can be specified by the Lambda.times parameter. Since calls to vcov.fitfrail are typically computationally expensive, the results are cached when the same arguments are provided.

Control parameters
Control parameters provided to fitfrail determine the speed, accuracy, and type of estimates returned. The default control parameters to fitfrail are given by calling the function fitfrail.control(). This returns a named list with the following members.
fitmethod: Parameter estimation procedure. Either "score" to solve the system of score equations or "loglik" to estimate using the log-likelihood. Default is "loglik".
maxit: The maximum number of iterations before terminating the estimation procedure. Default is 100.
int.reltol: Relative tolerance for numerical integration convergence. Default is 1.
int.maxit: The maximum number of function evaluations in numerical integration. Default is 1000.
verbose: If verbose = TRUE, the parameter estimates and log-likelihood are printed at each iteration. Default is FALSE.
The parameters int.abstol, int.reltol, and int.maxit are only used for frailty distributions that require numerical integration, as they specify convergence criteria of numerical integration in the estimation procedure inner loop. These control parameters can be adjusted to obtain an speed-accuracy tradeoff, whereby lower int.abstol and int.reltol (and higher int.maxit) yield more accurate numerical integration at the expense of more work performed in the inner loop of the estimation procedure.
The abstol, reltol, and maxit parameters specify convergence criteria of the outer loop of the estimation procedure. Similar to the numerical integration convergence parameters, these can also be adjusted to obtain a speed-accuracy tradeoff using either estimation procedure (fitmethod = "loglik" or fitmethod = "score"). If fitmethod = "loglik", convergence is reached when the absolute or relative reduction in log-likelihood is less than abstol or reltol, respectively. Using fitmethod = "score" and specifying abstol > 0 (with reltol = 0), convergence is reached when the absolute value of each score equation is below abstol. Alternatively, using fitmethod = "score" and specifying reltol > 0 (with abstol = 0), convergence is reached when the relative reduction of parameter estimates γ is below reltol. Note that with fitmethod = "score", abstol and reltol correspond to parameters ftol and xtol of nleqslv::nleqslv, respectively. The default convergence criteria were chosen to yield approximately the same results with either estimation strategy.

Model object
The resulting model object returned by fitfrail contains the regression coefficients' vector, the frailty distribution's parameters, and the cumulative baseline hazard. Specifically, beta: Estimated regression coefficients' vector named by the input data columns.
theta: Estimated frailty distribution parameter.
Lambda: data.frame with the cumulative baseline hazard at the observed failure times.
Lambda.all: data.frame with the cumulative baseline hazard at all observed times.
Lambda.fun: Scalar R function that returns the cumulative baseline hazard at any time point.
The model object also contains some standard attributes, such as call for the function call. If se = TRUE was passed to fitfrail, then the model object will also contain members se.beta and se.theta for the standard error of the regression coefficients' vector and frailty parameter estimates, respectively.

Simulation
As an empirical proof of implementation, and to demonstrate flexibility, several simulations were conducted. The simfrail function can be used to run a variety of simulation settings.

Case study
To demonstrate the applicability of frailtySurv, results are obtained for two different datasets. The first is a clinical dataset, for which several benchmark results exist. The second is a hard drive failure dataset from a large cloud backup storage provider. Both datasets are provided with frailtySurv as data("drs") and data("hdfail"), respectively.

Diabetic Retinopathy Study
The Diabetic Retinopathy Study (DRS) was performed to determine whether the onset of blindness in 197 high-risk diabetic patients could be delayed by laser treatment (DRS 1976 The regression coefficient for the binary treated variable is estimated to be −0.918 with 0.198 estimated standard error, which indicates a 60% decrease in hazard with treatment. The p-value for testing the null hypothesis that the treatment has no effect against a two sided alternative equals 3.5 × 10 −6 (calculated by 2*pnorm (-0.918/0.198)). The parameter trace can be plotted to determine the path taken by the optimization procedure, as follows: > plot(fit.drs, type = "trace") where the seed is used to generate the weights in the bootstrap procedure of the cumulative baseline hazard plot function. Individual failures are shown by the rug plot directly above the time axis. Note that any other CI interval can be specified by the CI parameter of plot.fitfrail. Subsequent calls to vcov.fitfrail with the same arguments will use a cached value and avoid repeating the computationally-expensive bootstrap or sandwich variance estimation procedures.
For comparison, the following results were obtained with coxph in the survival package based on the PPL approach: > library("survival") > coxph (Surv(time, status)

Hard drive failure
A dataset of hard drive monitoring statistics and failure was analyzed. Daily snapshots of a large backup storage provider over two years were made publicly available 5 . On each day, the Self-Monitoring, Analysis, and Reporting Technology (SMART) statistics of operational drives were recorded. When a hard drive was no longer operational, it was marked as a failure and removed from the subsequent daily snapshots. New hard drives were also continuously added to the population. In total, there are over 52,000 unique hard drives over approximately two years of follow-up and 2885 (5.5%) failures.
The data must be pre-processed in order to extract the SMART statistics and failure time of each unique hard drive. In some cases, a hard drive fails to report any SMART statistics up to several days before failing and the most recent SMART statistics before failing are recorded. The script for pre-processing is publicly available 6 . Although there are 40 SMART statistics altogether, many (older) drives only report a partial list. The current study is restricted to the covariates described in Table 2, which are present for all but one hard drive in the dataset.
The hard drive lifetimes are thought to be clustered by model and manufacturer. There are 85 unique models ranging in capacity from 80 gigabytes to 6 terabytes. The cluster sizes loosely follow a power-law distribution, with anywhere from 1 to over 15,000 hard drives of a particular model.   Bootstrapped standard errors for the regression coefficients and frailty distribution parameter are given by > set.seed(2015) > COV <-vcov(fit.hd, boot = TRUE) > se <-sqrt(diag(COV)[c("temp", "rer", "rsc", "psc", "theta.1")]) > se temp rer rsc psc theta.1 0.03095664 0.62533725 0.18956662 0.36142850 0.32433275 Significance of the regression coefficient estimates are given by their corresponding p-values, > pvalues <-pnorm(abs(c(fit.hd$beta, fit.hd$theta)) / se, lower.tail = FALSE) * 2 > pvalues temp rer rsc psc theta.1 6.400162e-01 2.087038e-01 1.861996e-06 1.429667e-11 3.690627e-06 Only the estimated regression coefficients of the reallocated sector count (rsc) and pending sector count (psc) are statistically significant at the 0.05 level. Generally, SMART statistics are thought to be relatively weak predictors of hard drive failure (Pinheiro, Weber, and Barroso 2007). A hard drive is about twice as likely to fail with at least one previous bad sector (given by rsc > 0), while the hazard increases by a factor of 11 with the presence of bad sectors waiting to be remapped. The estimated baseline hazard with 95% CI is also plotted, up to 6 years. This time span includes all but one hard drive that failed after 15 years (model: WDC WD800BB).
> plot(fit.hd, type = "cumhaz", CI = 0.95, end = 365 * 6) Time Cumulative baseline hazard 6. Discussion frailtySurv provides a suite of functions for generating clustered survival data, fitting shared frailty models under a wide range of frailty distributions, and visualizing the output. The semi-parametric model has better asymptotic properties than most existing implementations, including consistent and asymptotically-normal estimators, which penalized partial likelihood estimation lacks. Moreover, this is the first R package that implements semi-parametric estimators with inverse Gaussian and PVF frailty models. The complete set of supported frailty distributions, including implementation details, are described in Appendix A. The flexibility and robustness of data generation and model fitting functions are demonstrated in Appendix B through a series of simulations.
The main limitation of frailtySurv is the computational complexity, which is approximately an order of magnitude greater than PPL. Despite this, critical sections of code have been optimized to provide reasonable performance for small and medium sized datasets. Specifically, frailtySurv caches computationally-expensive results, parallelizes independent computations, and makes extensive use of natively-compiled C++ functions through the Rcpp R package. As a remedy for relatively larger computational complexity, control parameters allow for fine-grained control over numerical integration and outer loop convergence, leading to a speed-accuracy tradeoff in parameter estimation.
The runtime performance and speed-accuracy tradeoff of core frailtySurv functions are examined empirically in Appendix C. These simulations confirm the O (n) complexity of genfrail and O n 2 complexity of fitfrail using either log-likelihood maximization or normalized score equations. Frailty distributions without analytic Laplace transforms have the additional overhead of numerical integration inside the double-nested loop, although the growth in runtime is comparable to those without numerical integration. Covariance matrix estimation also has complexity O n 2 , dominated by memory management and matrix operations. In order to obtain a tradeoff between speed and accuracy, the convergence criteria of the outer loop estimation procedure and convergence of numerical integration (for LN and IG frailty) can be specified through parameters to fitfrail. Accuracy of the regression coefficient estimates and frailty distribution parameter, as measured by the residuals, decreases as the absolute and relative reduction criteria in the outer loop are relaxed ( Figure 6 in Appendix B). The simulations also indicate a clear reduction in runtime as numerical integration criteria are relaxed without a significant loss in accuracy (Figure 7 in Appendix B).
Choosing a proper frailty distribution is a challenging problem, although extensive simulation studies suggest that misspecification of the frailty distribution does not affect the bias and efficiency of the regression coefficient estimators substantially, despite the observation that a different frailty distribution could lead to appreciably different association structures (Glidden and Vittinghoff 2004;Gorfine, De-Picciotto, and Hsu 2012). There are several existing works on tests and graphical procedures for checking the dependence structures of clusters of size two (Glidden 1999;Shih and Louis 1995;Cui and Sun 2004;Glidden 2007). However, implementation of these procedures requires substantial extension to the current package, which will be considered in a separate work.

A. Frailty distributions
All the frailty distributions used in frailtySurv have support ω ∈ (0, ∞). Identifiability problems are avoided by constraining the parameters when necessary. The gamma and PVF have a closed-form analytic expression for the Laplace transform, while the log-normal and inverse Gaussian Laplace transforms must be evaluated numerically. Analytic derivatives of the gamma and PVF Laplace transform were determined using the Ryacas R package (Goedman, Grothendieck, Højsgaard, and Pinkus 2011). The resulting symbolic expressions were verified by comparison to numerical results. All the frailty distribution functions have both R and C++ implementations, while the C++ functions are used in parameter estimation. The Rcpp R package provides an interface to compiled native code (Eddelbuettel, François, Allaire, Chambers, Bates, and Ushey 2011). Numerical integration is performed by h-adaptive cubature (multi-dimensional integration over hypercubes), provided by the cubature C library (Johnson 2013), which implements algorithms described in Genz and Malik (1980); Berntsen, Espelid, and Genz (1991).
For the gamma, log-normal, and inverse Gaussian, there is a positive relationship between the distribution parameter θ and the strength of dependence between cluster members. As θ increases, intra-cluster failure-times dependency increases. The opposite is true for the PVF, and as θ increases, the dependence between failure-times of the cluster's members decreases.
For frailty distributions with closed-form Laplace transforms, frailty variates are generated using a modified Newton-Raphson algorithm for numerical transform inversion (Ridout 2009). Note that while frailtySurv can generate survival data from a positive stable (PS) frailty distribution with Laplace transform L (s) = exp (−αs α /α) where 0 < α < 1, it cannot estimate parameters for this model since the PS has infinite mean. Frailty values from a log-normal distribution are generated in the usual way, and inverse Gaussian variates are generated using a transformation method in the statmod package (Smyth, Hu, Dunn, Phipson, and Chen 2015).

A.1. Gamma
Gamma distribution, denoted by Gamma(θ −1 ) ≡ Gamma(θ −1 , θ −1 ), is of mean 1 and variance θ. The frailtySurv package uses a one-parameter gamma distribution with shape and scale both θ −1 , so the density function becomes The special case with θ = 0 is the degenerate distribution in which ω ≡ 1, i.e., there is no unobserved frailty effect. Integrals in the log-likelihood function of Eq. (20) can be solved using the Laplace transform derivatives, given by where L (0) = L. The first and second derivatives of the Laplace transform with respect to θ are also required for estimation. Due to their length, these expressions are omitted. See the deriv_lt_dgamma_r and deriv_deriv_lt_dgamma_r internal functions for the explicit expressions.

B. Simulation results
All simulations were run with 200 repetitions, n = 300, fixed cluster size with m = 2 members within each cluster, covariates sampled from U (0, 1), regression coefficient vector β = (log 2, log 3) T , 30% censorship rate, and Λ 0 as in Eq. (17) with c = 0.01 and d = 4.6, unless otherwise specified. The same seed is used for each configuration. Function calls are omitted for brevity and can be seen in the code repository 7 .

B.1. Benchmark simulation
As a benchmark simulation, we consider gamma frailty, with Gamma (2). The results are summarized as follows: The cumulative baseline hazard true and estimated functions, with 95% point-wise confidence interval, is shown in the following plot. The following plot indicates that by increasing the number of clusters, n, the bias and the variance of the estimators converge to zero, as expected. q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q

B.3. Discrete observation times
Data generation allows for failure times to be rounded with respect to a specified base. The observed follow-up times were rounded to the nearest multiple of 10. The following simulation results indicate that even under the setting of ties, the empirical bias is reasonably small, and the empirical coverage rates of the confidence intervals are reasonably close to the nominal level.

B.4. Oscillating baseline hazard
Consider the baseline hazard function where a = 2, b = 0.1, c = 0.01, and d = 4.6. Such an oscillatory component may be atypical in survival data, but demonstrates the flexibility of frailtySurv data generation and parameter estimation capabilities, as evident in the following simulation results.

B.6. Poisson cluster sizes
Up until now, the cluster sizes have been held constant. Varying cluster sizes are typical in, e.g., geographical clustering and family studies. Consider the case in which the family size is randomly sampled from a zero-truncated Poisson with 2.313 mean family size. The following simulation results uses PVF (0.3). The results are very good in terms of bias and the confidence intervals' coverage rates.

C. Performance analysis
Runtime was measured by the R function system.time, which measures the CPU time to evaluate an expression. All runs used 100 clusters of size 2, covariates sampled from U (0, 1), regression coefficient vector β = (log 2, log 3) T , N (130, 15) censorship distribution, Λ 0 as in Eq. (17) with c = 0.01 and d = 4.6, and 100 repetitions of each configuration, unless otherwise specified. The benchmark simulations were performed using a cluster of Red Hat 6.5 compute nodes, each with 2x2.6 GHz Intel Sandy Bridge (8 core) processors and 64 GB memory.

C.1. Core functions
The runtimes of frailtySurv functions genfrail, fitfrail, and fitfrail.vcov were determined for increasing values of n, ranging from 50 to 200 in increments of 10. For each function, the runtime was determined for each of the four frailty distributions and each estimation procedure, where applicable. The bootstrap covariance runtime, i.e.fitfrail.vcov with boot = TRUE, was not analyzed since this consists primarily of repetitions of the parameter estimation function, fitfrail. The resulting runtimes are shown in Figures 2, 3, and 4, respectively. Figure 2 shows the runtime of genfrail, which is linear in n, i.e., on the order of O (n), although slope varies greatly depending on how the baseline hazard is specified. This is due to the amount of work that must be performed per observation. Specifying the cumulative baseline hazard or inverse cumulative baseline hazard to genfrail results in nearly-constant runtime. The linear increase in runtime is more apparent when the baseline hazard is specified since both root finding and numerical integration must be performed for each observation. The cumulative baseline hazard requires only rootfinding to be performed, and the inverse cumulative baseline hazard has an analytic solution.
The runtimes of fitfrail using each estimation procedure and frailty distribution are shown in Figure 3. Both estimation procedures (log-likelihood reduction and normalized score equations) are on the order of O n 2 due to the doubly-nested loop. This complexity is more apparent for the log-normal and inverse Gaussian frailty distributions, which both have the additional overhead of numerical integration. Gamma and PVF frailty distributions have analytic Laplace transforms, thus numerical integration is not performed in the cumulative baseline hazard estimation inner loop.
Finally, the runtimes of fitfrail.vcov for each frailty distribution are shown in Figure 4. This function is also on the order of O n 2 , and the sandwich variance estimation procedure is dominated by memory management and matrix operations to compute the Jacobian. As a result, the runtimes of frailty distributions that require numerical integration (LN and IG) are only marginally larger than those that don't (gamma and PVF).   Figure 5: Comparison of frailty model estimation runtimes using frailtySurv::fitfrail, survival::coxph, and frailtypack::frailtyPenal for increasing values of n. fitfrail uses fitmethod = "score", coxph uses default parameters, and frailtyPenal uses n.knots = 10 and kappa = 2 .

C.2. Comparison to other packages
The runtime of fitfrail using fitmethod = "score" is compared to coxph and frailtyPenal for increasing values of n. The runtimes for gamma and log-normal frailty distributions are determined, since this is the largest intersection of frailty distributions that all three functions support. The resulting runtimes are shown in Figure 5. Each estimation procedure exhibits quadratic complexity on a different scale. coxph remains roughly an order of magnitude quicker than fitfrail and frailtyPenal for log-normal frailty. frailtyPenal exhibits a large difference in performance between frailty distributions.

C.3. Speed-accuracy tradeoff
A speed-accuracy tradeoff is achieved by varying the convergence control parameters of the outer loop estimation procedure and numerical integration in the inner loop. The abstol and reltol parameters control the outer loop convergence, and int.abstol and int.reltol control the convergence of the adaptive cubature numerical integration in the inner loop. Speed is measured by the runtime of fitfrail, and accuracy is measured by the estimated parameter residuals.
In this set of simulations, the scalar regression coefficient β = log 2, is used. Frailty distribution parameters are chosen such that κ = 0.3 (β = 0.857 for gamma, β = 1.172 for LN, β = 2.035 for IG, and β = 0.083 for PVF). With N (130, 15) right censorship distribution, this results in censoring rates 0.25 for gamma and PVF, 0.16 for LN, and 0.30 for IG. Both the runtime and residuals for β and θ are reported using the "score" fit method. Figure 6 shows the speed and accuracy curves for increasing values of abstol and reltol on a log scale, taking on values in {10 −9 , . . . , 10 0 }. The runtime and residuals remain approximately constant up to 10 −3 for both abstol and reltol. Beyond 10 −3 , the tradeoff between runtime and accuracy is apparent, especially for frailty distributions requiring numerical integration. As the convergence criterion is relaxed, runtime decreases and a bias is introduced to the parameter estimates. Runtimes for gamma and PVF frailty are nearly identical as these frailty distributions do not require numerical integration. Figure 7 shows the speed and accuracy curves obtained by varying int.abstol and int.reltol over the same set of values for LN and IG frailty distributions. On this scale, the decrease in runtime is approximately linear, while residuals of the regression coefficient and frailty distribution parameters do not significantly change. We verified that the same behavior occurs using fitmethod = "loglik". This suggests that the estimation procedure is robust to low-precision numerical integration with which significantly faster runtimes can be achieved. A strategy for parameter estimation on larger datasets might then be to first fit a model with a high value of int.abstol or int.reltol and iteratively decrease the numerical integration convergence until the parameter estimates do not change. : Speed and accuracy curves obtained by varying the outer loop convergence control parameters, abstol and reltol. As abstol is varied, reltol is set to 0 (i.e., it is ignored), and vice versa. β −β and θ−θ are the residuals of regression coefficient and frailty distribution parameter estimates, respectively. Shaded areas cover the 95% confidence intervals. : Speed and accuracy curves obtained by varying the inner loop numerical integration convergence control parameters, int.abstol and int.reltol. As int.abstol is varied, int.reltol is set to 0 (i.e., it is ignored), and vice versa. β − β and θ − θ are the residuals of regression coefficient and frailty distribution parameter estimates, respectively. Shaded areas cover the 95% confidence intervals.