Fitting Accelerated Failure Time Models in Routine Survival Analysis with R Package aftgee

Accelerated failure time (AFT) models are alternatives to relative risk models which are used extensively to examine the covariate eﬀects on event times in censored data regression. Nevertheless, AFT models have been much less utilized in practice due to lack of reliable computing methods and software. This paper describes an R package aft-gee that implements recently developed inference procedures for AFT models with both the rank-based approach and the least squares approach. For the rank-based approach, the package allows various weight choices and uses an induced smoothing procedure that leads to much more eﬃcient computation than the linear programming method. With the rank-based estimator as an initial value, the generalized estimating equation approach is used as an extension of the least squares approach to the multivariate case. Additional sampling weights are incorporated to handle missing data needed as in case-cohort studies or general sampling schemes. A simulated dataset and two real life examples from biomedical research are employed to illustrate the usage of the package.


Introduction
The linear regression model is the most commonly used regression model in data analysis for uncensored data. When survival data are right-censored, two of the most frequently used regression models are the relative risk model (Cox 1972) and the accelerate failure time (AFT) model (e.g., Kalbfleisch and Prentice 2002, Chapter 4). The AFT model is appealing because it is analogous to the classical linear regression approach, directly linking the expected failure time to covariates. The AFT model with an unspecified error distribution is known as the semiparametric AFT model, which has been studied extensively and is an alternative to the relative risk model with an unspecified baseline hazard function. Two methods for fitting such models have been popular. One is the rank-based approach motivated by inverting the weighted log-rank test (Prentice 1978). Its asymptotic properties have been rigorously studied by Tsiatis (1990) and Ying (1993). The other method is an extension of the least squares principle, such as the Buckley-James (BJ) estimator (Buckley and James 1979). The theoretical properties of the BJ estimator were investigated in Ritov (1990) and Lai and Ying (1991). Due to lack of efficient and reliable computing algorithms, both approaches have not been widely used in practice until recently (Jin, Lin, Wei, and Ying 2003;Jin, Lin, and Ying 2006b,c). Our R package aftgee (Chiou, Kang, and Yan 2014c) aims to provide an easy access to AFT models with both methods based on the recent methodological developments. Package aftgee is available from the Comprehensive R Archive Network (CRAN) at http: //CRAN.R-project.org/package=aftgee.
Several packages for AFT models are available for the R environment (R Core Team 2014). For parametric AFT models, where the error distribution is parametrically specified, one can use survreg in package survival (Therneau 2014), psm in package rms (Harrel 2014) or aftreg in package eha (Brostöm 2014). Misspecified error distributions in parametric AFT modeling may lead to bias in estimation and false conclusion under the presents of censoring. For semiparametric AFT models with unspecified error distribution, one can use bj in package rms (Harrel 2014) or lss in package lss (Jin and Huang 2007). Function bj provides the BJ estimator but it has several limitations: it computes the variance estimator based on non-censored observations only which, although this has been reported to behave well in simulation studies, lacks theoretical justification (Wei 1992); its convergence is slow and not guaranteed; and it is only implemented for univariate failure time data. Package lss provides a rank-based estimator with Gehan's weight obtained from a linear programming approach (Jin et al. 2003) and a least squares estimator with an iterative algorithm starting from the rankbased estimator (Jin et al. 2006b). The variance estimators for both methods are bootstrap based with validity theoretically justified. Nevertheless, there are several features lss fall short. Its rank-based estimator is limited to Gehan's weight which may not be the optimal weight (Tsiatis 1990). The linear programming approach used for the rank-based estimator is computationally very intensive, which also affects the least squares estimator through the initial estimator. The bootstrap based variance estimation is very time consuming. Although easily fixable, the package does not support user-specified initial values for the least squares estimator. For clustered failure times, it operates with working independence and disregards the within-cluster dependence, which may lead to efficiency loss especially when the withincluster dependence is strong (Chiou, Kang, Kim, and Yan 2014a).
Our package aftgee overcomes the aforementioned limitations in existing implementations and provides a set of comprehensive tools for semiparametric AFT models in practical survival analysis. For the rank-based estimator with Gehan's weight, we implemented the induced smoothing approach which is much faster than the linear programming approach without loss in accuracy Wang 2005, 2007). The induced smoothing approach has been extended to work with any general weight (in addition to Gehan's weight;Chiou, Kang, and Yan 2013). Our efficient sandwich variance estimators provide much faster alternatives to the full bootstrap variance estimation (Chiou, Kang, and Yan 2014b). With the fast rank-based estimators as initial estimators, we implemented an iterative least squares procedure method that extends generalized estimating equations (GEE) to clustered censored data (Chiou et al. 2014a). The resulting estimator is robust to misspecification of the working covariance matrix, and the efficiency is higher when the working covariance structure is closer to the truth. Furthermore, these methodologies are generalized to incorporate additional sampling weights for handing missing data and various sampling schemes (Chiou, Kang, and Yan 2014d). Because of these features, the aftgee package is appealing to analysts who would like to fit AFT models in their routine analysis of survival data.
The rest of the article is organized as follows. In the next section, we introduce the notations and model formulation for the univariate AFT model. The multivariate extension is presented in Section 3. Incorporation of sampling weight with application to case-cohort data is extended in Section 4. Detailed usages are described in Section 5. A simulated dataset, an univariate example and a multivariate example are used for illustration in Section 6. Conclusion and some remarks are summarized in Section 7.

Univariate AFT model
For i = 1, . . . , n, let T i , C i and X i be the log-transformed failure time, censoring time and the p × 1 covariate vector for the ith subject. It is assumed that T i is conditionally independent of C i given X i . An univariate semiparametric AFT model has the form where β is an unknown p × 1 vector of regression parameters, i 's are independent and identically distributed random variables with an unspecified distribution. It is also assumed that i 's are independent of X i . In the presence of right censoring, the observed data are independent copies of (Y i , ∆ i , X i ), i = 1, . . . , n, where Y i = min(T i , C i ), ∆ i = I(T i < C i ), and I(·) is the indicator function.

Rank-based estimator
The regression parameters can be estimated by solving the rank-based weighted estimating equation where e i (β) = Y i − X i β and ϕ i (β) is a possibly data-dependent nonnegative weight function with values between 0 and 1. LetF e i (β) (t) be the estimated cumulative distribution function based on the censored residual e i (β)'s. Some common choices of ϕ i (β) are 1, (Prentice 1978), Gehan (Gehan 1965), Prentice-Wilcoxon (Prentice 1978) and the more general G ρ class (Harrington and Fleming 1982), respectively. The Kaplan-Meier estimator is typically used to obtainF e i (β) (t). The solution of Equation 1,β n,ϕ , is consistent to the true parameter, β 0 , and is asymptotically normal (Tsiatis 1990;Ying 1993). Noting that Equation 1 with Gehan's weight is the gradient of an objective function, Jin et al. (2003) used a linear programming approach to obtain the estimator, which is computationally demanding, especially for larger datasets and for obtaining variance estimators through bootstrap. In our implementation, we used the Barzilai-Borwein spectral method implemented in package BB (Varadhan and Gilbert 2009) to solve Equation 1 directly.
A computationally more efficient approach is the induced smoothing procedure of Wang (2005, 2007). The idea is to replace the nonsmooth estimating equations with a smoothed version, whose solutions are asymptotically equivalent to those of the former. Define an independent p × 1 standard normal random vector Z and a p × p matrix Γ n such that Γ 2 n = Σ n where Σ n is a symmetric positive definite matrix. The induced smoothing procedure replaces U n,ϕ (β) with E Z [U n,ϕ (β + n −1/2 Γ n Z)], where the expectation is taken with respect to Z. A choice of Σ n is the identity matrix (Brown and Wang 2007). Some other forms of Σ n might be considered but the differences are minimal in practice (Chiou et al. 2014b).
With Gehan's weight, the denominator of the ratio in Equation 1 gets canceled, and the resulting smooth estimating equation is where r 2 ij = (X i − X j ) Σ n (X i − X j ) and Φ(·) is the standard normal cumulative distribution function. The estimating equation in Equation 2 is monotone and continuously differentiable with respect to β, hence, its root can be found with standard numerical methods such as the Barzilai-Borwein spectral method. The solution to Equation 2, denoted byβ n,G , is consistent to β 0 and has the same asymptotic distribution as that obtained from the nonsmooth version Wang 2005, 2007). Its variance can be approximated from a resampling procedure similar to that in Jin et al. (2003). Even with the smoothed equations, the resampling procedure can still be very time consuming. Alternative variance estimation procedures, such as those proposed by Chiou et al. (2014b), are recommended.
Deriving the smoothed estimating equations with general weights is challenging because E Z [U n,ϕ (β + n −1/2 Γ n Z)] involves the expectation of the ratio of two random quantities. In addition, ϕ i (β) also depends on β. Holding ϕ i (β) evaluated at some initial estimator b, we propose an approximation which replaces the expectation of the ratio with the ratio of the expectations of the two terms: where κ ij (β) = [e j (β) − e i (β)]/r ij . The asymptotic equivalence between Equation 3 and the smooth version of Equation 1 for the log-rank weight is established in Chiou et al. (2013). For general weights, the regression parameters can be estimated from an iterative induced smoothing procedure with the following steps: 1. Obtain an initial estimateβ (0) n,ϕ = b n of β and initialize with m = 1.  n,ϕ,q | < t for all q = 1, . . . , p, whereβ (m) n,ϕ,q is the qth component ofβ (m) n,ϕ and t is a prefixed tolerance.

Updateβ
A simple choice of the initial estimator is the easy-to-compute Gehan's estimator,β n,G .
Since estimating Equation 3 is not necessarily monotone in β, it might cause numerical problems in solving the estimating equations. Inspired by a discussion in Jin et al. (2003), a reweighted induced smoothing strategy based on monotone estimating functions is considered (Chiou et al. 2013): where φ i (β) = ϕ i (β)/ n j=1 I[e j (β) ≥ e i (β)]. Fixing the weight φ i (b) evaluated at b and applying induced smoothing on Equation 4 lead tõ This is the same as Equation 2 except for the weight φ i (b) which is free from β. For an initial estimator b of β, an estimatorβ n,φ can be obtained from the iterative procedure with U n,ϕ (b, β) replaced byŨ n,φ (b, β). Using the arguments in Jin et al. (2003), the consistency and asymptotic normality of the resulting estimators can be established (Chiou et al. 2013). The equations within each iteration can be solved with package BB. The convergence is usually fast with the initial Gehan's estimator. Variance estimation can be done with the full resampling method (Jin et al. 2003) or a fast sandwich variance estimator (Chiou et al. 2014b(Chiou et al. , 2013.

Least squares approach
The other method for fitting AFT model is the least squares approach. With survival data from right censoring, Buckley and James (1979) replaced each response T i with the condi- where the expectation is evaluated at regression coefficients β. In particular, The theoretical properties of the BJ estimator have been studied by Ritov (1990) and Lai and Ying (1991). The method, however, is rarely used in practice due to numerical challenges. Jin, Lin, Wei, and Ying (2006a) proposed a more practical solution that generalizes the BJ estimator. Given an initial estimator b n of β, the least squares estimator is the solution of the following estimating equation The solution to U n,ls (β, β) = 0 is the BJ estimator. The advantage for fixing the initial value b n is to avoid numerical complexity caused by solving Equation 6 which is neither continuous nor monotone in β. Jin et al. (2006a) If the initial estimator b n is consistent and asymptotically normal, thenβ (m) n,ls is also consistent and asymptotically normal for every m (Jin et al. 2006b). A good candidate for the initial estimator is the induced smoothing Gehan estimator. The variance of the resulting estimator can be approximated by a resampling procedure (Jin et al. 2006b).

Multivariate AFT model
Suppose now we have a random sample of n independent clusters with K i margin in the ith cluster. For i = 1, . . . , n and k = 1, . . . , K i , let T ik , C ik and X ik be the log-transformed failure time, censoring time and the p × 1 covariate vector for margin k in cluster i. Further define the censored failure time and censoring indicator to be Y ik = min(T ik , C ik ) and ∆ ik = I(T ik < C ik ). The multivariate AFT model takes the following form where β is an unknown p × 1 vector of regression parameters and the error terms, i = { i1 , . . . , iK i } are independent and identically distributed random variables with an unspecified distribution throughout clusters. In the presence of censoring, the observed data consists of copies of

Rank-based estimator
The regression parameters in Equation 7 can be estimated from the following rank-based weighted estimating equation where e ik (β) = Y ik − X ik β and ϕ ik (β) is a possibly data-dependent nonnegative weight function. If K i = 1 for all i = 1, . . . n, Equation 8 will reduce to Equation 1. This estimating equation also yields a consistent estimator for β 0 (Jin et al. 2006a). Applying the aforementioned induced smoothing technique, the smoothed version of Equation 8 with Gehan's weight isŨ where r 2 ikjl = (X ik −X jl ) Σ n (X ik −X jl ). The consistency and asymptotic properties continue to hold (Johnson and Strawderman 2009). The multivariate version of Equations 3 and 5 arẽ (β)). The same iterative procedure as for the univariate case can be used and the asymptotic properties of the resulting estimator,β n,φ , continue to hold.

GEE approach
Depending on the set up of the design matrix X i , the multivariate AFT model accommodates margin-specific regression coefficients, identical regression coefficients across margins, and their mix. For instance, a covariate could be appropriate for one margin but unsuitable for the other; a covariate could have a different effect in the regression model of different margins. The error term i 's are assumed to be independent and identically distributed across clusters, but within a cluster, the components of i1 , . . . , iK i do not need to follow a common distribution and may be correlated. We generalize the GEE approach to multivariate AFT modeling which accounts for multivariate dependence through working correlation structures to improve efficiency (Chiou et al. 2014a).
Define Ω −1 i α(b) to be an K i × K i nonsingular working weight matrix that may involve additional working parameters α that depends on an initial value b of β 0 . For i = 1, . . . , n and k = 1, . . . , A generalization of Equation 6 can be expressed as whereX = n i=1 X i /n. Given α and b, the solution to Equation 12 has a closed form whereȲ(b) = n iŶ i (b)/n. The GEE estimator, denoted byβ n,GEE , can be obtained from an iterative procedure: 1. Obtain an initial estimateβ The iteration proceeds with the aid of function geese in package geepack (Højsgaard, Halekoh, and Yan 2014;Halekoh, Højsgaard, and Yan 2006). The estimator reduces to the least squares estimator of Jin et al. (2006a) when the working weight matrices Ω i 's are the identity matrices. We refer to Chiou et al. (2014a) for more details. The working parameter estimateα n does not affect the consistency of the GEE estimator, but may affect its efficiency. Higher efficiency can be achieved if Ω i is closer to the covariance matrix ofŶ i (b) and even an imperfect working weight still improves the efficiency (Chiou et al. 2014a). The variance of the estimator can again be estimated by resampling procedures.

Incorporating sampling weight
So far, we have assumed that we have full access to the whole data (Y ik , ∆ ik , X ik ), i = 1, . . . , n, k = 1, . . . , K i . In many cases, however, the full cohort data may not be available. For example, a case-cohort design (Prentice 1978) is known to be cost-effective when the proportion of the event of interest is rare or covariates are expensive to measure. Under this design, covariates are measured for all the subjects who experienced the event of interest by the end of the observation period but only for a subset of subjects who did not. This design is a special case of the more general stratified case-cohort design where a subcohort is selected via a stratified random sampling from S mutually exclusive strata in the original full cohort. For both designs, covariates are measured only for those who were selected into the sample. Thus, the observed data from each design are not complete and statistical methods which do not account for this missingness in covariates could result in biased estimates. One typical method employed to adjust for biases is to weight a complete observation by the inverse of the inclusion probability.
Suppose a simple random sampling was used at the cluster level. Let ψ is be the strata indicator (ψ is = 1 if the ith cluster is in the sth stratum and ψ is = 0 otherwise) and ξ is be the sampling indicator (ξ i = 1 if the ith cluster is sampled and ξ i = 0 otherwise). The casecohort weight, h i , for cluster i is h i = S s=1 ξ i ψ is /p n,s , where p n,s is the inclusion probability for the sth stratum for s = 1, . . . , S. The weight-adjusted version of Equations 10 and 11 are, respectively, Note that if we sample all subjects within each strata, then Equations 13 and 14 reduce to Equations 10 and 11, respectively. The variance estimation can be obtained via resampling procedures or fast sandwich variance estimators similar to the unweighted versions (Chiou et al. 2014b,d).

Package implementation
The two major functions in package aftgee are aftsrr for the rank-based approach and aftgee for the least squares or GEE approach. The synopsis of aftsrr is: aftsrr(formula, data, subset, id = NULL, contrasts = NULL, strata = NULL, weights = NULL, rankWeights = "gehan", method = "sm", variance = "ISMB", B = 100, SigmaInit = NULL, control = aftgee.control()) The required arguments are formula and data. Argument formula specifies the model to be fit with the variables coming with data. The formula is the same as the argument of function survreg in package survival, with response created from Surv. The 'Surv' object consists of two columns, where the first column is the survival time or censored time and the second column is the censoring indicator, indicating right censored data. Since ranks are invariant to location shift, the intercept cannot be estimated and the estimation will ignore the intercept term whether it is specified or not. Clusters are defined by vector id. The weights argument is a vector containing sampling weights (h i ) as described in Section 4. When data arise from a stratified design, a vector of integers that specifies the stratification is indicated in strata. The length of the arguments id, weights and strata needs to be the same as the number of observations. The rank weight, controlled by argument rkWeight, includes the aforementioned log-rank weight ("logrank"), Gehan's weight ("gehan"), Prentice-Wilcoxon weight ("PW") and general G ρ class weight ("GP"). Argument method determines the type of weighted estimating equations to be used. When method = "nonsm", regression parameters are estimated by directly solving the nonsmooth estimating Equations 1 or 8. When method = "sm" and rkWeight = "gehan", the induced smoothing estimating Equations 2 or 9 are used. For the non-Gehan's weights, method = "sm" and method = "monosm" apply the iterative procedure with the smooth estimating Equations 3 and 5, respectively. The initial values for the variance estimator, or the Σ n in the smoothing progress, are determined by sigmainit. The identity matrix is used for sigmainit, if it is left unspecified.
Given a point estimate, variance estimates can be obtained from several approaches which are specified by argument variance. A straightforward but computationally inefficient variance estimator is the multiplier bootstrap approach ("MB"). A more efficient method is to consider sandwich variance estimators (Chiou et al. 2014b). Suppose the variance of the estimator has a sandwich form, Σ = A −1 V (A −1 ) where V is the asymptotic variance of the estimating function and A is the slope matrix. Chiou et al. (2014b) proposed to estimate V by either a closed-form formulation (CF) or through bootstrap the estimating equations (MB). The bootstrapping estimate of V is much less demanding than the full multiplier bootstrap, because it only involves evaluations of estimating equations instead of solving them. On the other hand, to estimate the slope matrix A, Chiou et al. (2014b) proposed three methods based on the induced smoothing approach (IS), smoothed Huang's approach (sH) motivated by Huang (2002) or Zeng and Lin's approach (ZL) by Zeng and Lin (2008). Combinations between estimating V and A yield six sandwich estimators, "ISCF", "ISCF", "ZLCF", "ZLMB", "sHCF", "sHMB" for variance. When a bootstrap is needed, the bootstrap size is controlled by B with default value 100.
The convergence for the procedure is controlled by relative tolerance. The iteration stops and the output is given when the tolerance is met or iteration reaches the pre-specified maximum iteration number. The default relative tolerance is set at 0.001 and the default maximum number of iterations is set to 50.
aftgee.control(maxiter = 50, reltol = 0.001, trace = FALSE) The maximum number of iterations is controlled by maxiter and relative convergence tolerance is controlled by reltol. A logical value, trace, is used to determine whether to print the output for each iteration.
The least squares estimator can be obtained by calling aftgee with the following arguments aftgee(formula, data, subset, id = NULL, contrasts = NULL, weights = NULL, margin = NULL, corstr = "independence", binit = "srrgehan", B = 100, control = aftgee.control()) Most of the arguments and the convergence criterion of aftsrr are shared by aftgee. With aftgee, the intercept, if included, is estimated by the mean of the estimated cumulative distribution function based on the censored residual computed from the slope estimator.
The margin argument is a vector with the same length as data. It is used to specify the marginal distribution within clusters. Identical marginal distributions are assumed with unspecified margin. A character string, corstr, is used to specify the working correlation structure, as offered by package geepack. Four working correlation structures are independence ("independence"), exchangeable ("exchangeable"), autoregressive model of order one ("ar1") and unstructured ("unstructured"). The default is "independence". The initial value is specified by binit with default "srrgehan" giving the induced smooth rank-based estimator with Gehan's weight. Alternatively, although not recommended, the simple linear regression with censored observations ignored ("lm"), can also be used for faster results.
In the uncensored case, aftgee with independent working correlation structure will return an ordinary least squares estimate. In the multivariate case, efficiency can be improved in aftgee when the working correlation structure is close to the true correlation even in the absent of censoring. A more detailed multivariate illustration is presented in a kidney cather data in Section 6.3.
We next fit the simulated data with the least squares approach. For the parametric approach, survreg is used with dist = "lognormal". For the semiparametric approach, the BJ estimator (bj from package rms), the lss estimator and the aftgee estimator are considered.
R> rbind(bj = sqrt(diag(vcov(ls.bj))), lss = c(NA, ls.lss$sd), + gee = sqrt(diag(vcov(ls.gee))), sur = sqrt(diag(vcov(ls.sur)))[ Estimation for both the lss estimator and the aftgee estimator are based on a rank-based initial value that is invariant to the intercept. Once the slope estimator is obtained, the lss estimator left out the intercept whereas the bj and aftgee approach estimated the intercept by the mean of the estimated cumulative distribution function based on the censored residual computed from the slope estimator. The semiparametric methods from bj, lss and aftgee provide fairly close point estimates. In terms of timing, the lss estimator took the longest with more than six minutes. For further investigation, the estimation performance is assessed via bias and mean squared error with a full scale simulation. Table 1 summarizes the results for 1000 replicates.
The performance of bj, lss and aftgee are similar in terms of the biases and mean squared errors. As expected, when the error distribution is misspecified in the parametric model, the survreg approach produced a biased estimate.

National Wilms' tumor study
We next illustrate the usage of the arguments weights and rkWeight in aftsrr with cohort studies conducted by the national Wilms' tumor study group (NWTSG;D'Angio et al. 1989;Green et al. 1998). The dataset is available in the survival package as nwtco. The interest of the study is to assess the relationship between the likability of central lab histology measurement (histol) and days to tumor relapse (edrel). In addition to the likability of central lab histology measurement (1 = unfavorable, 0 = favorable), we also include patient's age (age) in years as covariate. There are two study groups (study), NWTSG-3 and NWTSG-4, denoting the third and the fourth Wilms' tumor studies. Patients are further categorized into four stages (stage) with stage 4 being the latest and most severest. The dataset consists of 4028 patients, among which, 571 patients experienced tumor relapse (rel = 1) and 3457 patients did not (rel = 0).
To take advantage of the full cohort data, we fit the full-cohort data with aftsrr using two types of sandwich variance estimators ("ISMB" and "ISCF").
With the same dataset, we next demonstrate incorporating weights via a case-cohort design. Define cases and controls as those who experience the event of interest by the end of the study period and who do not, respectively. In nwtco, there are 571 cases who experienced the relapse of tumor and 3457 controls who did not experience the relapse of tumor. The case-cohort sample is the union of all the cases and the sub-cohort sample selected via a simple random sampling. The case-cohort sample of the data had 1154 subjects, including all 571 cases and 583 controls. This gave sampling weights 1 and 5.93 for the cases and controls, respectively. The following codes give a summary of the case-cohort weight, h i . For the case-cohort design, we also demonstrate the usage of different rank weights in a rankbased approach; we considered the Gehan's, log-rank and PW weights. For the log-rank and PW weights, the monotone function approach was used. Jin and Huang (2007)'s lss was not considered in this analysis because it does not have the capability of handling general rank weights and sampling weights. Standard errors are estimated with the efficient sandwich variance estimator, ZLMB. Commands for these estimators are presented below followed by a summary.

Kidney catheter data
A bivariate failure time example from a kidney catheter study is used to illustrate the least squares approach feature of aftgee. The kidney catheter data are available in the survival package as kidney (McGilchrist and Aisbett 1991). The interest of the study is to examine the time to infection from the point of catheter insertion for patients using portable dialysis equipment. The catheter is removed when the infection occurs and reinserted after some pre-determined time. If catheters are removed for reasons other than infection, then the time to infection is treated as censored. The data contain 38 patients, each having exactly two insertions, where two observations on time to infection was recorded. The two covariates considered are patient's age (age in years) and gender (sex = 0 if male, sex = 1 if female).
We first fit bivariate AFT models with identical error margins and identical regression coefficients for the two margins. Since it is reasonable to expect some correlation between the two recurrence times for a given patient, we can model this in the least squares approach with some dependent working covariance structure. We will first fit the least squares approach with a working independent covariance structure and then with an exchangeable working structure. For both least squares approaches, we use the induced smoothing rank-based estimator with Gehan's weight as the initial estimator. The standard errors are estimated by the multiplier resampling method with bootstrap size 100.

Call
The coefficient of sex is found to be significantly different from zero for both models. This suggests that female patients tend to have longer recurrence times to infection. The efficiency gain is expected on average but, unfortunately, this dataset does not show an efficiency gain.
In addition to the common marginal error distribution and common coefficient assumption, we also consider the case where the marginal error distributions and regression coefficients are different. In this case, we need to specify margin and construct the corresponding block diagonal design matrix. After the block diagonal design matrix is constructed, least squares estimators with both independent covariance working structure and exchangeable working structure are fitted. For each model, we use the smooth Gehan estimator as the initial value.
R> kidney$margin <-as.factor(rep(1:2, 38)) R> fit2.ind <-aftgee(Surv(time, status)~age:margin + sex:margin + + margin -1, id = id, margin = margin, data = kidney) R> fit2.ex <-aftgee(Surv(time, status)~age:margin + sex:margin + + margin -1, id = id, margin = margin, data = kidney, corstr = "ex") R> summary(fit2.ind) Call: aftgee(formula = Surv(time, status)~age:margin + sex:margin + margin -1, data = kidney, id = id, margin = margin)  This model allows hypothesis testing of equal coefficiencts for each covariate across the two margins from Wald-type tests with covariance matrix estimates. However, the covariates of age and sex are found to be not significantly different across the two margins, with p values of 0.87 and 0.06, respectively, under the exchangeable structure. The coefficients of sex for both margins and the two working structures are found to be significantly different from zero. These results coincide with those from the common margin model. The efficiency gain from the exchangeable structure in the margin-specific case is also absent. This is probably because there is not much strength to borrow with distinctive marginal fits.

Conclusion
Package aftgee provides an easy access to fitting semiparametric AFT models for possibly clustered failure times with both rank-based approaches and the least squares approach. For rank-based approaches, we implemented the induced smoothing method with Gehan's weight and extended it to allow arbitrary rank weight. The method is much faster than those based on linear programming. Computationally efficient sandwich variance estimators are provided for all the estimators, and additional sampling weight can be incorporated for various sampling schemes. Our least squares approach uses rank-based estimators as initial estimators in an iterative estimation procedure. For clustered data, we exploited withincluster dependence through working correlation structure in a GEE framework which enhances efficiency when within-cluster dependence is strong. The implementation is fast and reliable, making it possible for AFT models to be much more widely applied in routine survival analysis.
Our package can be expanded in several directions. The current version allows weights for handling missing data in the rank-based approach, similar weights can also be made available to our GEE approach. For the rank-based approach with clustered data, Wang and Fu (2011) considered estimating equations that can be decomposed into between-and withincluster estimating equations for better efficiency. An implementation of this method would be desirable. To account for measurement errors in covariates, package simexaft (He, Xiong, and Yi 2012) implemented a simulation-extrapolation approach for AFT models. Such an approach can be extended to the semiparametric AFT model. Furthermore, our methods can also be extended to accommodate survival data other than with right censoring.