Censored Quantile Regression Redux

Quantile regression for censored survival (duration) data oﬀers a more ﬂexible alternative to the Cox proportional hazard model for some applications. We describe three estimation methods for such applications that have been recently incorporated into the R package quantreg : the Powell (1986) estimator for ﬁxed censoring, and two methods for random censoring, one introduced by Portnoy (2003), and the other by Peng and Huang (2008). The Portnoy and Peng-Huang estimators can be viewed, respectively, as generalizations to regression of the Kaplan-Meier and Nelson-Aalen estimators of univariate quantiles for censored observations. Some asymptotic and simulation comparisons are made to highlight advantages and disadvantages of the three methods.


Introduction
initiated an "era of econometric perestroika" for the censored regression model, liberating it from the oppressive Gaussian specification that had prevailed since its introduction by Tobin (1958) in the midst of the cold war. Given the linear latent variable model, with u i assumed to be iid with distribution function F , Powell noted that if censoring values, C i , are observed for all i = 1, · · · , n and we observe Y i = max{C i , T i } then the conditional quantile functions, can be consistently estimated, setting ρ τ (u) = u(τ − I(u < 0), by, provided that the design matrix, X = (x i ), contains an intercept to absorb the τ dependent contribution F −1 (τ ). This observation follows immediately from the monotonicity of the mapping T i → Y i , and the fact that for any monotonically increasing function, h, and scalar random variable Z, P (Z ≤ z) = P (h(Z) ≤ h(z)). The result generalizes nicely to a variety of non-iid latent variable settings; in particular to other linear conditional quantile latent variable models, permitting linear scale shift and other more general forms of heterogeneity in the covariate effects. Right censoring, as is more typical of duration modeling applications, is easily accommodated by replacing max by min above. Often, in econometric applications the C i 's take a constant value as in the original tobit model where C i = 0, or in wage equation top-coding, but this is not essential. What is necessary -and we shall see that this is not without its unfortunate consequences -is that the C i 's are known for all observations. Following Powell, we will refer to this situation as fixed censoring.
Random censoring, in contrast, refers to situations in which censoring values, C i , are only observed for the censored observations. In effect, we observe only the event times, Y i and a censoring indicator, δ i , taking the value one if the observation is uncensored and zero if the observation is censored. Random censoring has received much less attention in the econometric literature, and it is not difficult to conjecture why. Analysis of randomly censored data requires that censoring times are independent of event times, or, in regression settings, that they are independent conditional on covariates. This assumption is frequently implausible in econometric applications where censoring is due to endogenous influences. In biostatistics, where random censoring is more often considered, the dominant empirical strategy has been the Cox proportional hazard model. However, there has also been a recognition that the proportionality assumption underlying the Cox model is sometimes inappropriate, necessitating stratification of the baseline hazard or some other weakening of the proportional hazards condition. Much more flexible models can be constructed by modeling conditional quantiles of the event time distribution. For uncensored survival data this approach has been explored by Koenker and Geling (2001), but censoring poses some new challenges. Fitzenberger and Wilke (2006) provide a valuable survey of applications of censored quantile regression methods in econometric duration modeling.
An early alternative approach to Powell, suggested by Lindgren (1997), simply bins the data in covariate space and computes local Kaplin Meier estimates in each bin. The obvious difficulty with this approach is that the binning quickly becomes impractical as the number of covariates grows. Portnoy (2003) proposed an ingenious method of recursively estimating linear conditional quantile functions from censored survival data and established consistency and √ n-convergence of the proposed estimators. Portnoy's method can be regarded as a generalization to regression of the Kaplan Meier estimator. Recently, Peng and Huang (2008) have proposed a closely related method. Rather than building on the linkage to Kaplan-Meier, they instead develop an approach linked to the Nelson-Aalen estimator of the cumulative hazard function. The main advantage of the latter approach is that it enables them to employ counting process methods to establish a martingale property for their estimating equation from which a more complete asymptotic theory for the estimator flows.
The main objective of this paper is to describe an implementation of all the foregoing methods appearing in recent versions of my quantreg package for R. This package seeks to provide a comprehensive implementation of quantile regression methods for the R (R Development Core Team 2008) language. The package is available from the Comprehensive R Archive Network at http://CRAN.R-project.org/package=quantreg. It incorporates both linear and nonlinear in parameters methods as well as non-parametric additive model fitting techniques. The new censored quantile regression methods are accessible through the new fitting function crq, which extends the functionality of the existing functions rq, nlrq and rqss that are used/rearrage/ for fitting linear, nonlinear, and nonparametric models respectively. After a brief overview of the implementation, we will consider the three new methods in turn and provide some comparisons and offer some advice on their strengths and weaknesses.

Overview
Model fitting in R typically proceeds by specifying a formula describing the model, a data frame containing the data, and possibly some some further fitting options. For censored quantile regression these arguments are passed to the function crq. Formulae are specified for the two random censoring methods using the function Surv from the package survival (Therneau and Lumley 2008).
The accelerated failure time model, with u i iid with distribution function F is a common model for survival data. When the data are uncensored the model can be simply estimated by least squares, or using quantile regression as in Koenker and Geling (2001). The latter approach offers some distinct advantages since it permits the researcher to focus attention on narrow slices of the conditional survival distribution. In Koenker and Geling (2001) where the interest is in mortality of medflies it was particularly valuable to focus attention on the upper tail of the lifetime distribution where it was found that there was a crossover in gender survival prospects at advanced ages. It is difficult, even impossible, to see such effects in some classical survival models where attention typically focuses on covariate effects on mean survival prospects. For further details on quantile regression methods and their implementation in R (Koenker 2005) and the vignette available with the package quantreg (Koenker 2008). For censored data, and parametric choice of F , the model can be easily estimated by maximum likelihood. Relaxing the parametric restriction and the iid error assumption leads naturally to the censored quantile regression model, The choice of the log transformation, although traditional, is entirely arbitrary and may be replaced by any monotone transformation. In applications with random censoring such models can be estimated in R using crq using the formula,

Surv(log(y), delta)~x
where delta denotes the vector of censoring indicators. For fixed censoring of the type considered by Powell, formulae take the form, Curv(log(y), c, type= "left")~x Here, Curv is a slightly modified version of Surv designed to accommodate the provision of the censoring times instead of the censoring indicators to the fitting routine. The type argument indicates whether the censoring is from the left, as in the classical Tobit model, or from the right as in the case of top coding. Other arguments can be supplied to fitting function including: taus a list of quantiles to be estimated, data a data frame where the formula variables reside, etc. The argument method is used to specify one of three currently available methods: "Powell" for the Powell estimator, "Portnoy" for Portnoy' censored quantile regression estimator, and "PengHuang" for Peng and Huang's version of the censored quantile regression estimator. Partial argument matching in R permits these strings to be abbreviated to the shortest distinguishable substrings: "Pow", "Por" and "Pen". Further arguments can be specified to the specific fitting routines, notably start to specify a initial value for the coefficients for the Powell method, and grid to specify the evaluation grid for the random censoring methods.
Given fixed censoring data it is always possible to fit random censoring models, and we will argue that this may often be advantageous, but since the Powell estimator requires censoring times for all observations, it can generally not be applied to randomly censored data. We will focus in the remainder of the paper on the case of right censoring but it should be understood that all of the methods discussed can be adapted to left censoring as well.
Applications involving interval censoring are the subject of active current research and we hope to incorporate new methods when they become available.

The Powell estimator
Given censoring times C i and event times Y i ≤ C i with associated covariate vectors x i ∈ R p , the Powell estimator minimizes, The piecewise linear form of the response function poses some real computational challenges. Unlike the uncensored quantile regression problem, the objective function, R τ (b) is no longer convex, so local optimization methods like steepest descent may terminate at a local minimum that is not the global minimum. Fitzenberger (1996) describes an algorithm that adapts the classical Barrodale and Roberts (1974) simplex algorithm for 1 regression to this end. In effect, Fitzenberger's algorithm is steepest descent: due to the piecewise linear form of the objective function solutions can be characterized by an exact fit to p observations, so careful computation of the directional derivatives at successive "basic" solutions in the directions obtained by deleting one of the p points from the "basis" ensures convergence to a local optimum.
Fitzenberger and Winker (2007) investigate a modified version of this BRCENS algorithm that employs a threshold accepting outer loop somewhat like simulated annealing to improve the chances of converging to the global optimum. Ironically, it is far from obvious that this more diligent search for the global Powell solution is justified. Simulations by Fitzenberger and Winker, and supported by my own simulations, suggest that in many censored regression problems the global optimizer performs much worse than its more myopic counterparts. Starting the BRCENS iterations at β = 0 or some other plausible value and taking steepest descent steps acts as a shrinkage technique, thereby avoiding embarrassing globally optimal points further away. In the quantreg implementation the default starting value is the naive rq estimate ignoring the censoring; this has the dubious advantage that it retains the usual equivariance properties of the conventional quantile regression estimators.
In simulations, where exhaustive search for the R τ minimizer is feasible, the global optimizer is prone to find, at least occasionally, solutions that are absurdly far from the parameters used to generate the data, and at least from a mean squared error perspective these realizations wreck havoc with performance. Asymptotic theory assures us that this is only an evanescent "finite sample problem," but such assurances may not offer much consolation to the applied researcher who generally lacks the patience to let data accumulate in asymptopia. Fortunately, other methods may offer some rather unexpected advantages.
The function crq implements a new fortran version of the algorithm described in Fitzenberger (1996) for the method "Powell". This version is considerably simpler than the original BRCENS version and more modular. I have also included an implementation of an exhaustive global search algorithm that pivots through all n p basic solutions and chooses the one that minimizes the Powell objective function. This option is selected by specifying the option start = "global", but it should be recognized that for problem with even a moderately large sample size the resulting search becomes impractical. It would be quite easy to embed the current implementation into a global optimization method such as the anneal function of the R package subselect (Orestes Cerdeira, Duarte Silva, Cadima, and Minhoto 2007) but we have not (yet) done this.

Random censoring
In one-sample settings with random censoring the Kaplan-Meier product-limit estimator is known to be an efficient estimation technique and can be interpreted as a nonparametric maximum likelihood estimator (see e.g., Andersen, Borgan, Gill, and Keiding 1991). In the simplest case, without tied event times, the Kaplan-Meier estimator of the survival function, S(t) can be written as, where y (i) 's denote the ordered event times, and the δ (i) 's denote the associated censoring indicators. Efron (1967) interpretedŜ as shifting mass of the censored observations to the right, distributing it in accordance with the subsequent uncensored event times. Portnoy (2003) observed that quantiles of the Kaplan-Meier distribution function,F (t) = 1 −Ŝ(t) could be expressed as solutions to a weighted quantile optimization problem in which weight associated with censored observations was split into two pieces. A part of the mass associated with each censored observation is left in its initial position at the censoring time, and the remainder is shifted to right, in effect to +∞.

Kaplan-Meier quantiles as argmins
To see this, recall that in one-sample settings without censoring the ordinary sample quantiles can be expressed as,ξ to obtain the step function,ξ It is helpful to view this as parametric in τ : as τ increases from 0, y (1) is the solution until we reach, τ = 1/n, at which point y (2) is also a minimizer, and so on.
When there are censored observations we can proceed in a similar fashion, except that when we encounter a τ i such thatξ(τ i ) = y (i) and δ (i) = 0, we split the censored observation into two pieces: one piece remains at its original position, y (i) , and receives weight at all subsequent τ , and the other piece is shifted to y ∞ = +∞ and gets weight 1 − w i (τ ). This reweighting assures thatξ(t) is constant in an open neighborhood of any such τ i , and the remaining mass, the 1 − w i part of each censored observation, gets distributed appropriately. The crucial insight is simply that the quantiles only depend on how much mass is below and how much is above -shifting part of the censored mass to +∞ ensures that all the subsequent uncensored observations receive their fair share of the "credit" for each of the censored points.
Thus, denoting the index set of the censored observations encountered up to τ by K(τ ), the quantiles of the Kaplan-Meier distribution,F can be expressed as a solution to the problem: The advantage of this formulation is that it generalizes nicely to the regression setting where the scalar ξ is replaced by the inner product x i β. Portnoy (2003) describes in detail an algorithm for the regression analogue of this problem. There are several complications in the regression setting that do not arise in the one-sample context; the most important of these is the possibility that censored observations that are "crossed" by estimated quantile regression process and thus have negative residuals, may return to the optimal basis and have zero residuals for some subsequent τ . This cannot happen in the one-sample setting by the monotonicity of the Kaplan-Meier estimator, but may occur using the reweighting due to the weaker nature of the monotonicity condition in the p-dimensional regression setting. Portnoy describes an effective way to deal with these pivoting anomalies as well as discussing complications due to an excess of censored observations in the upper tail that limit range τ ∈ [0, 1] for which the model is inestimable. The latter is a familiar problem even in the one-sample setting where censored observations above the largest uncensored observation imply a "defective" Kaplan-Meier survival function.

Portnoy's censored quantile regression estimator
Portnoy provided a Fortran implementation of his estimator based on a "pivoting" method that is similar to that described in Koenker and D'Orey (1987), but adapted to the "recrossing" problems alluded to above. Starting τ near zero, at each step it is possible to evaluate the length of the interval of τ 's for which the current solution to the weighted quantile regression problem: remains optimal, the problem is then updated and resolved at the upper bound of the interval, and iteration proceeds until τ = 1 is reached or the process is halted because there are only non-reweighted censored observations with positive residuals remaining. Portnoy has also suggested an alternative approach in which the process is evaluated on a grid of τ ∈ [0, 1]. In large samples the latter approach is generally preferred since the inherent accuracy of the estimatedβ(τ ) process is O p (1/ √ n) making the evaluation of the process at O p (n log n) points using the pivoting method rather excessive. The algorithm written by Steve Portnoy was originally made available in the R package crq, prepared in collaboration with Tereza Neocleous and myself. The functionality of this package has now been folded into the quantreg package.
To illustrate this technique we estimate the model appearing in Portnoy (2003, Section 6.3), adapted from Hosmer and Lemeshow (1999), using the R code fragment: R> require("quantreg") R> data("uis") R> fit <-crq(Surv(log(TIME), CENSOR)~ND1 + ND2 + IV3 + + TREAT + FRAC + RACE + AGE * SITE, data = uis, method = "Por") R> Sfit <-summary(fit, 1:19/20) R> PHit <-coxph(Surv(TIME, CENSOR)~ND1 + ND2 + IV3 + + TREAT + FRAC + RACE + AGE * SITE, data = uis) R> plot(Sfit, CoxPHit = PHit) We begin by loading the quantreg package, if it is not already loaded, and then loading the Hosmer and Lemeshow data. The model formula in the call to crq specifies that the logarithm of the "time to relapse" of subjects in a drug treatment program depends on the number of prior treatments, ND1 and ND2; the treatment indicator, TREAT taking the value 1 for subjects taking the "long" course, and 0 for subjects taking the "short" course; an indicator for prior intravenous drug use, IV3; a compliance variable, FRAC; subject's race; and the main and interaction effects of subjects age and site of treatment. The object fit produced by the call to crq evaluates, by default, the Portnoy estimator on an equally spaced grid with increments of about 0.006, for this sample of size 575. The function summary computes bootstrapped standard errors for the quantile regression estimates. In this example this step generates several warning messages indicating that estimation of the bootstrapped samples result in a "premature stop." This is quite common and occurs whenever excessive censoring prevents estimation of the upper conditional quantiles. In the usual terminology of survival analysis this results in a "defective" estimate of the survival distribution. To compare with the Cox proportional hazard model, we estimate the same model with the survival package's function coxph. This enables us to compare the fitted models in the coefficient plots appearing in Figure 1.
The solid blue line in these plots is the point estimate of the respective quantile regression fits, and the lighter blue region indicates a 95% confidence region. The solid (horizontal) black line in some of the plots indicates a null effect. The red line in each of the plots indicates the estimated conditional quantile "effects" implied by the estimated Cox model, see Koenker and Geling (2001) and Portnoy (2003) for further details on how this is done. A feature of the Cox model is that all of the red lines are proportional to one another; they are forced to all have the same shape determined by the estimate of the baseline hazard function. This shape is quite consistent with the quantile regression estimates for some of the covariate effects, but for the treatment and compliance effects the estimates are quite disparate.
Because the baseline hazard function is non-negative, another feature of the Cox estimates is that they must lie entirely above the horizontal "effect equals zero" axis, or entirely below it. Thus, covariates must either increase hazard over the whole time scale, or decrease it; the model forbids the possibility that treatments may increase hazard for a time and then decrease them. Such crossovers are, however, sometimes quite plausible, and an advantage of the quantile regression approach is that they are more easily revealed. An interesting example of this phenomenon is the cross-over in gender mortality rates discussed in Koenker and Geling (2001).
Given the fitted crq object the conditional quantile function can be estimated at any setting of the covariates and plotted using something similar to the following code: R> formula <-~ND1 + ND2 + IV3 + TREAT + FRAC + RACE + AGE * SITE -1 R> X <-data.frame(model.matrix(formula, data = uis)) R> newd <-as.list(apply(X, 2, median)) R> pred <-predict(fit, newdata=newd, type = "stepfun") R> plot(pred, xlab = expression(tau), ylab = expression(Q(tau)), + do.points = FALSE, main = "Quantiles at Median Covariate Values") R> plot(rearrange(pred), add = TRUE, do.points = FALSE, + col.vert = "red", col.hor = "red") R> legend(0.15, 7, c("Raw", "Rearranged"), lty = 1:2, + col = c("black", "red")) We first construct a data frame representing the variables of the model formula and then compute medians of these variables to represent the setting of the covariates at which we wish to predict. The function predict takes the fitted object and the new data newd and returns a step function representing the predicted quantile function. If the covariate setting is chosen to be the means of the covariates,x, then the predicted quantile function is guaranteed to be monotone increasing, (Koenker 2005, Theorem 2.5) but at other settings there can be violations of monotonicity. This eventuality appears in the present example in the extremes of the plotted function in Figure 1 where the estimated function is least precisely estimated, and in some nearly invisible smaller violations occurring in the central region of the plot. A simple and theoretically attractive way of dealing with these violations has been recently introduced by Chernozhukov, Fernández-Val, and Galichon (2006). Their procedure has been embodied in the quantreg function rearrange as used in the plotting command above. Peng and Huang (2008) have recently suggested an alternative approach to censored quantile regression for censored survival data based on the well-known Nelson-Aalen estimator of the cumulative hazard function. To motivate the Peng and Huang estimator it is useful to briefly review the standard counting process development of the Nelson-Aalen estimator. As above, let Y i = min{T i , C i } denote observed event times, and δ i = I(T i < C i ) the censoring indicators. The random variables T i and C i are assumed to be independent with distribution functions F and G, respectively. The distribution function, F , is assumed to be absolutely continuous with density f with respect to Lebesgue measure. Define the counting processes

Nelson-Aalen quantiles as argmins
and the corresponding aggregated processes R(t) = R i (t) and N (t) = N i (t). The cumulative hazard function, has increments Λ(s + h) = Λ(s) ≈ λ(s)h, so it is natural to estimate this quantity by the number of uncensored events occurring in the interval [s, s + h] divided by the number of subjects at risk at time s, that is by (N (s + h) − N (s))/R(s). Summing over all of [0, t], we then have,Λ In principle, dN (s) could accommodate both discrete and continuous components, but here we need only concern ourselves with the discrete component, ∆N (s) = N (s) − N (s−), which denotes the number of uncensored events occurring precisely at time s. Thus, we can express the Nelson-Aalen estimator in somewhat more concrete notation aŝ Given the estimator,Λ(t), a natural estimator of the survival function would seem to be exp(−Λ(t)), but further reflection suggests that this is only really appropriate ifΛ were absolutely continuous. Alternatively, noting that we can write, Then following Fleming and Harrington (1991), we can define recursively the estimator, which is recognizable as the Kaplan-Meier estimator.
The close relationship between the Nelson-Aalen and Kaplan-Meier estimators is not surprising; indeed both have some claim to the status of nonparametric maximum likelihood estimators (see e.g., Andersen et al. 1991, Section IV.1.5). The martingale structure of the Nelson-Aalen estimator motivates the Peng and Huang approach to censored quantile regression, which we now briefly sketch.

Peng and Huang's censored quantile regression estimator
As above, let Y i = T i ∧ C i be a random event time and δ i = I(T i < C i ) be the associated censoring indicator. Denote, is a martingale process for t ≥ 0. Adopting the accelerated failure time version of the quantile regression model, the martingale property, EM i (t) = 0 implies that, Rewriting the Λ i term as, where H(u) = − log(1 − u) for u ∈ [0, 1), yields the estimating equation, The integral can now be approximated on a grid, 0 = τ 0 < τ 1 < · · · < τ J < 1, as, yielding Peng and Huang's final estimating equation, Since the left hand side is not continuous an exact root may not exist. Peng and Huang consider "generalized solutions" citing Fygenson and Ritov (1994), who define a generalized estimating equation, W (β), as a monotone nondecreasing field, if for any β and ξ in R p , ξ W (β + xξ) is monotone nondecreasing in the scalar x. But this is precisely the condition that W be the subgradient of a convex function. Setting r i (b) = log(Y i ) − x i b, this convex function for the Peng and Huang problem takes the form Theorem 1. Fix τ , and define the n-vectors α = (α i (τ )), δ = (δ i ) and z = (log(Y i )). The problem (Q) is equivalent to the linear programming problem: and its dual, where X 1 denotes the submatrix of X with m rows corresponding to uncensored observations, and z 1 denotes the associated subvector of z.
Proof: Note that N i (exp(x i b)) = I(r i (b) ≤ 0)δ i , and consequently splitting r i (b) into positive, u i , and negative, v i , parts yields (P). The formal dual is then, or equivalently, setting a = α − d, But the latter formulation implies that a i = 0 for all i such that δ i = 0, so the dual problem can be reduced to focus only on the dual variables associated with the uncensored observations, which yields (D), after partitioning.
Remark: The dual formulation shows that solutions to the Peng and Huang problem must interpolate p uncensored observations. See the discussion in (Koenker 2005, Section 6.2). This contrasts with both the Powell and Portnoy methods for which solutions also correspond to p-element subset solutions, but solutions may include censored as well as uncensored observations.
Implementation of the Peng and Huang estimator in the quantreg package requires that the process be evaluated on a prespecified grid. (There is no known "pivoting" form of the algorithm.) At each τ of the grid, the problem (D) is solved using a Fortran implementation of the Frisch-Newton algorithm described in Portnoy and Koenker (1997). This requires only a rather minor modification of the standard quantile regression procedure, replacing the usual right hand side of the dual equality constraints by the expression X (δ − α). In the case that δ i ≡ 1 so there is no censoring, this new right hand side reduces to approximately its original form (1 − τ )X 1 n . This reduction is exact in the one-sample setting. Repeating the model fitting and prediction exercises described above using method = "PengHuang" rather than method = "Portnoy" yields very similar results, a finding that is perhaps not very surprising in view of the similarity of the underlying Kaplan-Meier and Nelson-Aalen foundations of the two methods.
To see in a little more detail how the two methods compare we consider a small simulation experiment. Survival times are generated by the AFT model, with the u = log(e) iid and e standard exponential; x 1 ∼ U [0, 1] and x 2 is independent, Bernoulli with probability one-half. Censoring times are generated as U [0, 3.8] if x 2 = 0 and U [0.1, 3.8] otherwise. This configuration yields roughly 25% censoring. We consider 3 sample sizes n = 100, 400, 1600, and 8 distinct grid spacings, parameterized by γ = .2, .3, · · · , .9 with grid spacing h = 1/(n γ + 6). Figure 3 presents scatterplots of the Portnoy and Peng-Huang estimatesβ 2 (0.6) − β 2 (0.6) for this experiment. The estimators behave very similarly, but for finer grids (larger values of γ) the correlation is clearly stronger.

Some one-sample asymptotics
It is instructive to compare the performance of various quantile estimators in the simplest censored one-sample problem as a prelude to some simulation comparisons of estimator performance for the general regression setting.
In contrast, the asymptotic theory of the quantiles of the Kaplan-Meier estimator is slightly more complicated. Using the δ-method one can show, √ n(θ KM − θ) ; N (0, Avar(Ŝ(θ))/f 2 (θ)) where (see e.g., Andersen et al. 1991), . Since the Powell estimator makes use of more sample information than does the Kaplan Meier estimator it might be thought that it would be more efficient. This isn't true.
Proof: Consider Thus, not only is the use of the uncensored C i 's unable to improve upon the Kaplan-Meier estimator, it actually results in a deterioration in performance. Further reflection suggests why our initial expectation of an improvement was misguided: in parametric likelihood based settings a sufficiency argument shows that the C i for the uncensored observations are ancillary. From a Bayesian perspective, the likelihood principle implies that they cannot be informative (see e.g., Berger and Wolpert 1984).
Having come this far it is worthwhile to consider a few other suggestions that have appeared in the literature regarding the use the uncensored C i 's. Leurgans (1987) considered the weighted estimator of the censored survival function, that uses all the C i 's. Conditioning on the C i 's, it can be shown that E(Ŝ L (t)|C) = S(t), and that the conditional variance is Averaging this expression gives the unconditional variance which converges to and consequently quantiles based on this estimator behave (asymptotically) just like those produced by the Powell estimator. A remarkable feature of this development is that it reveals that replacing the empirical weighting by 1−Ĝ(t) by the true value 1−G(t), yields even worse asymptotic performance, since in that event the limiting variance is It gets even curioser: if instead of replacing 1 −Ĝ by the true 1 − G, we instead replace it by an even worse estimator, the Kaplan-Meier estimator of the survival distribution of the C i 's, Wang and Li (2005) show that the resulting weighted estimator is even better. Indeed, the resulting weighted estimator achieves the same asymptotic variance as the Kaplan-Meier estimator given above, so the performance of the three versions of the weighted estimator becomes successively better as the estimator of the weights becomes worse!
To evaluate the reliability of these rather perverse asymptotic conclusions we conclude this section by reporting the results of a small scale simulation experiment comparing the finite sample performance of several estimates of the median in a censored one-sample setting. For this exercise we take T as standard lognormal, and C as exponential with rate parameter 0.25. We consider 6 estimators of the median of the lognormal: the (infeasible) sample median, the Kaplan-Meier median, the Nelson-Aalen (Fleming-Harrington) median, the Powell median, the Leurgans median, and finally the Leurgans median modified to employ the true rather than the estimated weights.
The simulation results conform quite closely to the predictions of the theory. The Kaplan-Meier and Nelson-Aalen estimators perform essentially the same, sacrificing about 15% efficiency relative to the (unattainable) sample median. This is about half the proportion (30%) of censored observations in the simulation model. The Powell and Leurgans estimators also perform very similarly as predicted by the theory, sacrificing about 10% efficiency compared to the Kaplan-Meier-Nelson-Aalen. The worst of the lot is the omniscient weighted estimator that sacrifices another 20% efficiency. Beware of oracles bearing nuisance parameters!

A censored quantile regression simulation experiment
In this final section we report on a small simulation experiment intended to compare the performance of the Powell, Portnoy and Peng-Huang estimators of the censored quantile re-gression model. We consider four generating mechanisms for the data: two for generating event times and two for generating censoring times. Typical scatter plots of the four mechanisms with n = 100 observations are illustrated in Figures 4 and 5, censored points are plotted as open circles and uncensored points as filled circles.
Event times are generated either from the iid error linear model, or from the heteroscedastic model Censoring times are either constant, or generated from the linear model, In each case the x i 's are iid U [0, 2], and u i and v i 's are iid N (0, 1). Parameters were selected so that the proportion of censored observations was roughly 30% in all cases: β = (5, 1), σ = c(0.39, 0.09, 0.3), κ = 6.5, and γ = (5.5, .75).
We compare four estimators of the parameters of the conditional median function for the two iid error models: the Portnoy and Peng-Huang estimators, the Powell estimator as implemented by the Fitzenberger algorithm, and finally the Gaussian maximum likelihood estimator for the conditional mean function, which in these cases happens to be identical to the conditional median function.
Tables 2 and 3 report mean bias, median absolute error and root mean squared error measures of performance for both the intercept and slope parameters for each of these estimators for three sample sizes. The Gaussian MLE is obviously most advantageous in these settings, but it is also noteworthy that the Portnoy and Peng-Huang estimators outperform the Powell estimator by a modest margin. Bias is generally negligible for all of the estimators in these iid Gaussian settings, so the MAE and RMSE entries can be interpreted essentially as measures of the dispersion of the respective estimators. The relative efficiencies of the estimators are quite consistent with the evidence from the one sample results reported in the previous section showing that the Portnoy and Peng-Huang estimators perform very similarly and exhibit a modest advantage over Powell. This advantage is somewhat smaller for the variable censoring model than for constant censoring, a finding that seems somewhat counter-intuitive. If one maintains the iid error assumption, but alters the form of the Gaussian error distribution then the superiority of the Gaussian MLE evaporates. For example, in simulations of a variant of the foregoing models in which Student t 3 errors were used, the Gaussian MLE exhibits considerable larger variability than the other estimators as expected from regression robustness considerations, but also exhibits substantial bias as well. See Tables 6 and 7 for  details.   Tables 4 and 5       conditional quantile functions, the other using a quadratic specification. Similarly, linear and quadratic specifications are compared for the Peng-Huang estimator. Note that while the conditional median function for our simulation model is linear, all the other conditional quantile functions are quadratic in the covariate x, so we might expect the misspecification of those functions by the linear model to cause difficulties for the Portnoy and Peng-Huang estimators. Consequently, for these models we must make some choice about how to evaluate and compare quadratic and linear specifications. For this purpose we have adopted the conventional strategy of evaluating the quadratic at the mean of the covariate, x.
The Gaussian MLE is severely biased in the quadratic settings since it assumes homoscedastic Gaussian error and the model is decidedly heteroscedastic. The Powell estimator performs quite well under both configurations. The differences between the Portnoy and Peng-Huang estimators are, as expected, almost negligible. However, the comparison of their linear and quadratic specfications is quite revealing. For both estimators bias is reduced by employing the (correct) quadratic specification, but this improvement is small and comes at a rather more substantial cost of variance inflation. Thus, from both MAE and RMSE perspectives the linear specification is preferable even though it suffers from a somewhat larger bias effect.  Finally, comparing performance of the Powell estimator with those of Portnoy and Peng-Huang we see that for constant censoring the Powell estimator maintains a slight edge, while for the variable censoring model Powell performs slightly worse. In view of the one-sample results reported in Table 1 this is somewhat surprising, one might have expected to see more of an advantage for the Portnoy and Peng-Huang methods. This merits further theoretical investigation that lies beyond the scope of the present paper.

Conclusion
Censored data poses a diverse set of challenges in a wide range of applications. As was immediately apparent from the work of Powell (1984Powell ( , 1986 quantile regression offers some distinct advantages over mean regression methods when there is censoring; departures from Gaussian conditions, or any deviation from identically distributed error, induce bias for leastsquares based estimators. In contrast quantile regression estimation is easily adapted to fixed censoring of the type considered by Powell due to the "equivariance of quantiles to    Table 7: Comparison of performance for the iid t 3 error, variable censoring configuration. monotone transformations." Non-convexity of the Powell objective function can create some computational difficulties, however. Local optima abound and global optimization is far from being a panacea. In our experience, local optimization of the Powell objective via steepest descent, starting at the naive quantile regression estimator performs quite well. Recently, Portnoy (2003) and Peng and Huang (2008) have introduced new approaches to quantile regression for randomly censored observations. These approaches may be interpreted as regression generalizations of the Kaplan-Meier and Nelson-Aalen survival function estimators, respectively. Although it is difficult to compute asymptotic relative efficiencies for the three estimators we have considered in general regression settings, asymptotics for the simplest one-sample instance suggests that there is a modest efficiency advantage of the new methods over the Powell estimator. This conclusion is supported (weakly) by simulation evidence. The martingale representation of the Peng-Huang estimating equation provides a more direct approach to the asymptotic theory for their estimator, but the simulation evidence suggests that performance of Portnoy's estimator is quite similar.
Software implementations of all three censored quantile regression estimators for the R language are available in the quantreg package of Koenker (2008) using the function crq. Extensions to other forms of censoring and more general models remains an active topic of research and will be incorporated into subsequent releases of the package.