Weighted Cox Regression Using the R Package coxphw

Cox’s regression model for the analysis of survival data relies on the proportional hazards assumption. However, this assumption is often violated in practice and as a con-sequence the average relative risk may be under- or overestimated. Weighted estimation of Cox regression is a parsimonious alternative which supplies well interpretable average eﬀects also in case of non-proportional hazards. We provide the R package coxphw implementing weighted Cox regression. By means of two biomedical examples appropriate analyses in the presence of non-proportional hazards are exempliﬁed and advantages of weighted Cox regression are discussed. Moreover, using package coxphw , time-dependent eﬀects can be conveniently estimated by including interactions of covariates with arbitrary functions of time.


Introduction
When analyzing survival data Cox's proportional hazards regression (Cox 1972) continues to be one of the most popular methods of analysis. However, its assumption of proportional hazards, i.e., the assumption that the effects of all covariates do not change over time, is often violated in practice. Consequently, the relative risk for a covariate exhibiting non-proportional hazards obtained by Cox regression may be under-or overestimated. But various methods of analysis are available to avoid this bias. Choosing an appropriate method may depend on If one assumes that a covariate of primary interest has different short-term and long-term effects, one could express these effects by estimating piecewise constant HRs for consecutive periods using separate models. However, such an analysis is based on the assumptions of constant HRs within each period and a sudden change at the cutpoint between two periods. Both assumptions are usually unrealistic. But such a piecewise constant model might sometimes be preferred over more complex approaches because of its simplicity.
More generally, Cox regression can be extended to accommodate different types of nonproportional hazards by including an interaction of a covariate with time. This is technically realized by a time-dependent covariate, which is a product of the value of a covariate with a pre-specified function of time t, γ(t). The function γ(t) can be chosen such that a time-dependent effect of a particular shape results. With γ(t) = t, a linear time-dependent effect will be obtained, while γ(t) = log(t) will yield a log-linear time-dependent effect. More flexible time-dependent effects can be obtained, if products of the covariate with the elements of a data-derived vector-valued function of time are used (Lehr and Schemper 2007;Royston and Altman 1994;Heinzl and Kaider 1997). Significant regression coefficients for these interaction terms suggest that the respective covariate does not exhibit proportional hazards. A multiple Cox regression extended in this way may even include several interactions of various covariates with time. This option of analysis is the most flexible one, however, it is only useful with larger sample sizes and if a concise description of the time-dependent effect is of interest. Furthermore, it will not always be possible to draw clear decisions from such models, in particular, if the sign of the corresponding regression coefficient changes during follow-up. For example, in a medical context, the purpose of a model with survival data is often to decide which therapy most likely will lead to longer survival. Using an extended Cox regression model with a time-by-covariate interaction to account for non-proportional hazards may not be suitable to answer this question.
Finally, weighted Cox regression is a parsimonious option to account for time-dependent effects applicable also for small samples sizes (Schemper 1992;Sasieni 1993). Schemper, Wakounig, and Heinze (2009) suggested to weight the contributions of each event time to the partial likelihood according to the product of the survivor function and the inverse cumulative probability of follow-up at that time. Therefore, at each event time, their weighting function is proportional to the expected number of subjects at risk if censoring had not occurred. By applying these weights, weighted Cox regression yields estimates of average HRs which themselves approximate the odds of concordance, defined for two treatment groups A and B as OC = P(T A < T B )/P(T B < T A ), where T A and T B denote survival times from two subjects randomly selected from these groups. Odds of concordance can be conveniently transformed into concordance probabilities c = P(T A < T B ) = OC /(OC + 1), which are intuitive effect size measures. They permit a clear-cut answer to the question which treatment or level of a prognostic factor is preferable, independent from the assumption of proportional hazards.
Here we present the R package coxphw (Heinze, Ploner, and Dunkler 2018) which implements weighted estimation in Cox regression providing unbiased average HR estimates irrespective of proportionality of hazards. Moreover, our package provides options to estimate timedependent effects conveniently by including interactions of covariates with arbitrary functions of time, with or without making use of the weighting option. Two biomedical examples are introduced in Section 2 motivating and exemplifying the application of weighted Cox regression. A brief review of the weighted Cox regression methodology is provided in Section 3, followed by the description of the R package coxphw in Section 4. A small simulation study illustrates the beneficial effect of weighted Cox regression (Section 5) before in Section 6 weighted and standard Cox regression are applied to the examples and differences in estimates and interpretation are discussed. Final remarks follow in Section 7.

Gastric cancer study
The Gastrointestinal Tumor Study Group (1982) presented results from a clinical trial on survival of 90 patients with locally advanced, non-resectable gastric carcinoma comparing treatment by a combination of chemotherapy and radiation to chemotherapy alone (variable radiation). The aim of this randomized study was to determine if radiation is beneficial to the survival of patients with gastric carcinoma under chemotherapy. During a median followup time of 4.34 years 79 (87.78%) events were recorded; in the groups with and without radiation 42 (93.33%) and 37 (82.22%) patients died, respectively. The data set is available in the R package coxphw.  Crossing cumulative survival curves as visualized in Figure 1A are typical for situations of non-proportional hazards. In this example, one may hypothesize that radiation has a harmful effect, but only within the first two years. Generally, plots of scaled Schoenfeld residuals versus (functions of) time t can be used to detect non-proportionality (Grambsch and Therneau 1994) ( Figure 1B). Following Grambsch and Therneau (1994), the locally weighted scatter-plot smoother (LOWESS) applied to the scaled Schoenfeld residuals approximates the time-dependent log relative hazardβ(t) of treatment with radiation (radiation = 1) versus treatment without radiation (radiation = 0, reference). In this plot, we have used the Kaplan-Meier estimate of the empirical distribution functionF (t) for the x-axis instead of time t itself, which has the advantage that observed events are evenly distributed over the axis. Consequently, the LOWESS is not disproportionally influenced by periods during which the occurrence of events is rare. In case of proportional hazards, one would expect the LOWESS to follow a horizontal line. Therefore, in Figure 1B any departure from a horizontal line indicates a violation of the proportional hazards assumption of radiation. Such a departure can be assessed by testing, in a linear regression analysis of scaled Schoenfeld residuals on time, whether the slope estimate is different from zero (Harrell 2001, p. 487).
For radiation this test, which is also implemented in the cox.zph function of the survival package (Therneau 2017), gives a highly significant p value < 0.001. Generally, such a test may fail to reveal departures from proportionality in case of non-monotone, e.g., U-shaped, trends. From Figure 1C, a plot of scaled Schoenfeld residuals versus untransformed times, we learn that the effect of radiation may change slightly faster during earlier times.
Ignoring the time-dependent effect and applying a Cox regression (using the coxph function of the R package survival) gives a HR for treatment with radiation of 1.15 (95% CI: 0.74-1.81; p = 0.537) compared to treatment without radiation. However, this effect estimate is biased if the true HR is time-dependent, and likely also biased with respect to the true average HR. The amount of this latter bias depends on the combination of the time-dependency and the censoring pattern in this study. In Section 6.1 we will revisit this example and discuss other, more appropriate methods of analysis which will take the time-dependent effect of radiation correctly into account.

Biofeedback therapy study
In this small clinical trial comprising 33 patients suffering from aspiration after head and neck surgery the effect of biofeedback therapy (bfb) on time until success of therapy, i.e., complete swallowing rehabilitation, was evaluated (Denk and Kaider 1997). Patients were randomized into two groups: 19 (57.58%) patients received videoendoscopic biofeedback therapy, i.e., the patients and therapists could visually control swallowing maneuvers on a monitor (bfb = 1) and 14 (42.42%) received the standard therapy (bfb = 0). The median therapy duration was 25 days; during follow-up 13 (68.42%) events of successful swallowing rehabilitation were recorded in the biofeedback group and 10 (71.43%) events were observed in the group without biofeedback. The therapy was started as soon as the healing process after surgery was finished. The time elapsing from surgery to start of therapy was considered an important covariate determining success of therapy. Because of the skewed distribution of this time interval, a log 2 -transformation was applied prior to analysis (variable log2heal). The data set is available in the R package coxphw.
Kaplan-Meier analysis provides some graphical evidence that an apparent early advantage of biofeedback therapy may vanish with ongoing duration of therapy ( Figure 2A). Hence, the assumption of proportional hazards could be violated for bfb. Time-dependent effects of bfb and log2heal could be modeled by an extended Cox regression including additional time-dependent covariates, i.e., products of bfb and log2heal with suitable functions of time. However, small sample size and small number of events restrict the number of degrees of freedom which can be used during model development considerably and hence impose restrictions on the complexity of the fitted model (Steyerberg, Eijkemans, Harrell, and Habbema 2000).
In Section 6.2 we shall discuss how an average log hazard ratio of bfb can be estimated and how the estimate can be transformed into a simple effect size measure for clear-cut therapy decisions.

Weighted Cox regression
In a sample of N (1 ≤ i ≤ N ) subjects m distinct and uncensored survival times t j (m ≤ N , 1 ≤ j ≤ m) are observed. For each subject i we assume that a covariate (row) vector x i of k (r = 1, . . . , k) covariates is recorded. The set of subjects without the event and uncensored prior to t j , i.e., the risk set, is denoted by R j . Using a Cox regression a vector of coefficients is estimated by setting the first derivative of the log partial likelihood (the score vector) to zero, and by solving the resulting estimating equations numerically, usually using the iterative Newton-Raphson algorithm. In weighted Cox regression a weight function w(t j ) is introduced to weight the contributions to the partial likelihood at the m uncensored failure times differently. Thus, weighted regression coefficientsβ are derived by solving the estimating equations If the weights w(t j ) are set to 1, then a standard Cox regression is obtained. Early versions of weighted Cox regression were proposed by Schemper (1992) and Sasieni (1993). These suggestions used as weights either the observed number of individuals at risk R(t j ) or the empirical survivor function estimate at these timesŜ(t j ), extending the tests by Breslow (1970) and Prentice (1978), respectively, to a multi-covariate situation as does the Cox model to Mantel's logrank test (Mantel 1966). In these suggestions, weighted Cox regression was proposed as a robust alternative to the standard Cox estimator, reducing the influence of a violated proportional hazards assumption, but also of outlying survival times on parameter estimates. Xu and O'Quigley (2000) showed that without censoring standard Cox regression estimates a time-averaged effect also in case of non-proportional hazards. Let G(t j ) denote the cumulative probability of follow-up until t j , i.e., the survivor function of the potential follow-up time distribution. Then, applying the weighting function w(t j ) = G(t j ) −1 to the score equations yields time-averaged regression effects also for censored samples (Xu and O'Quigley 2000).
Finally, Schemper et al. (2009) suggested the weighting function w(t j ) = S(t j )G(t j ) −1 . These weights are proportional to N E (t j ), the expected number of subjects at risk at each t j if there had been no censoring, and this greatly enhances interpretability of the resulting hazard ratio estimates, denoted as "average hazard ratios" by Schemper et al. (2009). Specifically, these authors have shown that in a two-group comparison, average hazard ratios approximate the The approximation extends to the general situation of a continuous covariate x, where a generalized concordance probability c can be defined as where T i and T e the survival times of randomly chosen subjects with covariate values x i and x e , respectively. Generalized odds of concordance follow directly as c /(1 − c ). The definition of c also includes the concordance probability c of a two-group comparison, setting x i = 1 and x e = 0. The approximation of the odds of concordance with average HRs is valid independent of the type of non-proportionality and the censoring pattern. Thus, c is useful if the analyst is not interested in a detailed description of a time-dependent effect, but in a summary measure suitable for decision making. If proportional hazards can be safely assumed, an estimate of c (and of c) can also be derived from standard Cox regression by (Dunkler et al. 2010) c = exp(β)/(1 + exp(β)).
Weighted Cox regression models can be estimated by plugging estimates for S(t j ) and G(t j ) into the weighting function, such asŜ(t j ), the left-continuous Kaplan-Meier estimate of the survivor function at time t j , andĜ(t j ), the probability of still being followed-up at t j , estimated by the Kaplan-Meier method with reversed meaning of the status indicator (Schemper and Smith 1996). Inference for estimates of weighted Cox regression can be based on the Lin (1991) and Sasieni (1993) sandwich estimateÂ −1BÂ−1 with −Â and −B denoting the sum of contributions to the second derivative of the log likelihood, weighted byŵ(t j ) and w(t j ) 2 , respectively. This estimate is independent from the scaling of the weights and in case of no weighting it is algebraically equal to the usual variance estimate based on inversion of the Fisher information matrix. This variance estimate is theoretically valid only in case of proportional hazards and without model misspecification. Therefore, and since application of weighted Cox regression usually implies a violated proportional hazards assumption, the robust covariance estimate should be used (Lin and Wei 1989;Therneau and Grambsch 2000, Chapter 7.2). An estimate of the covariance matrix can also be obtained by using the Jackknife, where the regression coefficients are computed leaving out each individual in turn.
The resulting matrix of "difference inβ" residualsD is then used to compute the covariance matrix asV =D D . The Jackknife variance estimate is valid also in case of misspecification.
Unequal weighting of the contributions to the likelihood usually increases the variability of estimates, in particular in situations of proportional hazards. However, empirical studies have shown that this increase appears to be moderate (Wakounig, Heinze, and Schemper 2015), and this finding is further supported by our simulation study presented in Section 5.

The R package coxphw
The R package coxphw implements weighted estimation of Cox regression. The most important arguments of the package's main function coxphw are: • formula is an R formula with a survival object as returned by the Surv function from the survival package (Therneau 2017) as the response on the left of the~operator, and the model terms on the right. In order to specify time-dependent effects, the model terms can also contain products of covariates with functions of the survival time variable (e.g., Surv(t, status)~x + log(t):x), which the formula parser of coxphw will identify and treat appropriately.
• data is a data-frame containing the variables given in formula.
• template determines which type of (weighted) estimation of Cox regression is requested. The analyst can choose between "AHR" for estimation of average HRs (Schemper et al. 2009), "ARE" for estimation of average regression effects (Xu and O'Quigley 2000) and "PH" for unweighted Cox proportional hazards regression. "AHR" is the default type of estimation.
• robust requests the robust (Lin-Wei) covariance matrix used for any significance testing, if set to TRUE. This is the recommended choice and the default.
• jack requests that the covariance matrix is estimated by the Jackknife. Default is set to FALSE.
• trunc.weights can be used to truncate the weights such that disproportional influence of some outlying event times is avoided. Weight truncation can increase the precision of estimates, but introduces a mild bias. trunc.weights specifies the quantile of the observed weight distribution at which weights are to be truncated. Before using this option, it is recommended to inspect the values of the weights as a function of time, using the plot function (see below). For mild truncation, a value of 0.95 is recommended. Default is 1, indicating that no truncation is applied.
• alpha is the significance level of confidence limits (confidence level 1 − α) with 0.05 as default.
• betafix can be used to hold some of the regression coefficients constant at pre-defined values, in particular for use with time-dependent effects. A vector with one element for each regression coefficient is expected. While an NA element will lead to estimation of the corresponding effect, a numeric element will hold the related regression coefficient constant at this value and prevent its estimation. The default value is betafix = NULL, requesting unrestricted estimation of all regression coefficients. In Section 6.2 an application of this option will be presented. betafix has a similar meaning and can be used alternatively to specifying an offset term in the model formula. However, while offset is restricted to time-fixed effects, betafix enables the user to hold time-varying regression coefficients constant.
The package's two workhorse functions have been implemented in Fortran 90. The first function, WEIGHTEDCOX, applies Newton-Raphson iteration to arrive at the regression coefficients (Thisted 1988, p. 164). The second function, LIKE, computes the weighted first and second derivatives of the log partial likelihood, which are the main ingredients of the Newton-Raphson iteration. This latter function uses efficient updating of these quantities over the risk sets. coxphw provides some possibilities to fine-tune the iterative estimation procedure, which are explained in the help pages. coxphw uses Breslow and Crowley (1974)'s tie-handling method. A call to coxphw generates an object of class 'coxphw', and corresponding summary, print, coef, vcov and confint methods are available. A plot method can be used to visualize the weighting function. Further utility functions contained in the coxphw package are: • predict, which computes potentially non-linear or time-dependent effect estimates at specified values of a variable and has an accompanying plot method for visualizing time-dependent or non-linear hazard ratios, • concord, which computes the generalized concordance probabilitiesĉ for each covariate. This function can also be applied to an object of class 'coxph'.

Simulation
We conducted a small simulation study to demonstrate similarity of standard and weighted Cox regression under proportional hazards, and the advantages of weighted Cox regression in case of non-proportional hazards. For simplicity, we simulated studies with one binary explanatory variable x, x i ∼ Bernoulli(0.5), i = 1, . . . , N . In the proportional hazards scenario ( Figure 3PH), we simulated survival times following a Cox-Weibull distribution (Bender, Augustin, and Blettner 2005), by first generating uniformly distributed random numbers u ∼ U [0, 1] and then computing survival times as t = [− log(u)/0.11 exp(β x)] 1/1.22 with β = 0.2007. In the non-proportional hazards scenario ( Figure 3NPH), we assumed that survival times of subjects with x i = 0 followed a Cox-exponential distribution, obtained by t = − log(u)/0.3565, while the survival times of those with x i = 1 followed a Cox-Gompertz distribution, obtained by t = (1/1.6) log[1 − 1.6 log(u)/0.0228]. This setting was chosen so that both scenarios have underlying concordance probabilities of 0.55, which were numerically determined by P(T 1 < T 0 ) = f 1 (t)S 0 (t)dt, with f 1 (t) denoting the density of survival times in group of subjects with x = 1 and S 0 (t) denoting the survivor function in the group with x = 0 (Dunkler et al. 2010). In each scenario we simulated 2000 data sets with N = 1000 observations each. To quantify the effect of censoring, we simulated both administrative and exponential censoring. Administrative censoring was implemented by censoring all survival times t at a pre-defined maximum follow-up time τ if t > τ . We iteratively determined τ to obtain proportions of administrative censored times of 10% and 15%. For exponential censoring a Cox-exponentially distributed potential follow-up time was simulated as y = − log(u)/λ, where u ∼ U [0, 1] and λ = 0.0602 for the proportional hazards scenario and λ = 0.122 for the non-proportional hazards scenario, determining these values such that exponential censoring rates of 25% were obtained. The observable survival times were then defined as t * = min(t, y). For each simulated data set we generated six versions; a data set with A) no censoring, B) 25% exponential censoring, C) 10% administrative censoring, D) both B and C, E) 15% administrative censoring, F) both B and E. With uncensored data the concordance probabilityĉ = P(T i < T e |x i = 0, x e = 1) is algebraically equal to the nonparametric two-sample test statistic of Mann and Whitney (1947) and to the area under the receiver operating characteristic curve (Hanley and McNeil 1982).
From Table 1 we learn that in the proportional hazards scenario from both Cox regression and weighted Cox regression unbiased estimates of concordance probabilities can be derived, and Cox regression, which explicitly exploits the proportional hazards assumption, gives slightly more efficient estimates. However, in case of non-proportional hazards the Cox regression coefficients vary with the type and the magnitude of censoring, while weighted Cox regression coefficients remain fairly constant. When deriving estimates of concordance probabilities from the regression coefficients, we see that there is almost no systematic bias induced by weighted Cox regression, while Cox regression grossly underestimates c. While exponential censoring alone does not systematically impact the weighted Cox estimates, administrative censoring limits the time range covered by the data to a maximum value of τ and thus restricts the averaging of the time-dependent hazard ratio to the domain t ∈ [0, τ ]. In this case, no estimation procedure can yield unbiased estimates without making strong and untestable assumptions about the regression coefficients beyond τ .

Gastric cancer study
In Section 2.1 we have seen that the effect of radiation on survival in the gastric cancer study suggests a time-dependent effect, i.e., an initially harmful effect of radiation decreasing over time. In the following, we describe the results according to various methods, focusing on the capabilities of package coxphw.

Ignoring non-proportional hazards
Ignoring non-proportionality a standard Cox regression can be estimated using coxphw by the following command: R> coxphw(Surv(yrs, status)~radiation, data = gastric, template = "PH") or, equivalently, using the R package survival: R> coxph(Surv(yrs, status)~radiation + cluster(id), data = gastric, + method = "breslow") In coxphw, the robust variance estimate is computed by default to account for misspecifications such as lack of proportional hazards. In coxph the robust variance is invoked by adding the term cluster(id) where id is a unique subject identifier. In coxphw by default each line is assumed a distinct "cluster" for the computation of the robust covariance, but an id argument is available to specify other clusters of observations if needed. Computation of the robust variance could also be entirely switched off by setting robust = FALSE. The estimated HR of radiation, obtained by both equivalent function calls, is 1.15 (95% CI: 0.74-1.81; p = 0.537). However, as can be seen in Figure 1B the sign of the time-dependent log HRβ(t) of radiation changes during follow-up and consequently the non-proportional hazards of radiation should probably not be ignored. Since some survival times are censored, we must consider that the coefficient obtained by this standard Cox regression is possibly a biased estimate of the average population effect.

Estimating piecewise constant hazard ratios
As the cumulative survival probabilities ( Figure 1A) and the Schoenfeld residuals ( Figures 1B  and 1C) suggest differing short-term and long-term effects of radiation, separate modeling of different time periods might be a more suitable method of analysis. In this example, we assume that no prior knowledge of the time-dependent effect is available which could guide the choice of an appropriate cutpoint. Thus, time periods are separated such that an equal number of events are observed before and after the cutpoint. In our example, this cutpoint is at one year. Two separate Cox regression models are now fitted, describing the effect of radiation during the first year, and after the first year. In the model for the first year any survival times longer than one year are censored at this point. Hence, the data set for this model consists of 90 patients experiencing 39 (43.33%) events during follow-up with 25 (55.56%) events in the radiation group and 14 (31.11%) events in the group without radiation. A plot of scaled Schoenfeld residuals (see accompanying R example code) still shows some non-proportional hazards but the test for non-proportionality now yields a p value of 0.103. For the second time period only subjects alive at one year and with a follow-up longer than one year are included, leading to a sample of 51 patients with 40 events (78.43%); with 12 (60.00%) and 28 (90.32%) events occurring in the groups with and without radiation, respectively. The test for non-proportionality gives a p value of 0.446.
The first Cox regression model reveals a significant harmful effect of radiation on survival (HR 2.40; 95% CI: 1.27-4.55; p = 0.007) up to one year. By contrast, the second model suggests a non-significant protective effect of radiation (HR 0.55; 95% CI: 0.28-1.08; p = 0.081) after one year.

Including a time-by-covariate interaction
Extending a Cox regression by an interaction of radiation with time, a single model could be used to estimate the effect of radiation in the two time periods. Using coxphw this model can be estimated very conveniently by first defining a function of time which represents the piecewise constant estimation, and secondly, by including a product term of radiation and this function of time in the formula of a call to coxphw. The function using a simple timevarying indicator to separate two periods after one year is defined by R> fun <-function(t) as.numeric(t > 1) The product of radiation and fun(t) is specified as a model term in the formula argument of coxphw: R> coxphw(Surv(yrs, status)~radiation + fun(yrs):radiation, + data = gastric, template = "PH")) The coxphw function can recognize that the name of the time variable (yrs) appears in the product term. As a consequence, the function will compute fun(yrs) at each event time and generate a time-dependent product term, which for each subject is updated in each risk set. This way of specifying time-by-covariate interactions is even more convenient than that used by coxph, which uses a special argument, tt, to define such interactions: R> coxph(Surv(yrs, status)~radiation + tt(radiation) + cluster(id), + tt = function(x, t, ...) x * (t > 1), data = gastric, + method = "breslow") coxphw will yield the following estimates, which are identical to those obtained by the call to coxph: The HR estimate (exp(coef)) of radiation, i.e., the effect of radiation compared to therapy without radiation, in the first period is 2.405, as in this period, fun(yrs) equals 0. The HR of fun(yrs):radiation has to be interpreted as the ratio of the two HRs of both time periods of radiation: the HR of the second period divided by the HR of the first period. Thus, the HR of the second period follows as the product of the two HRs, 2.405 · 0.227 = 0.546, or, equivalently, as the exponentiated sum of the two regression coefficients, exp(0.8774 − 1.4826) = exp(−0.6052) = 0.546.
An extended Cox regression may include interactions of covariates with any function of time γ(t) and thus is able to accommodate time-dependent effects. The simplest function of time γ(t) is γ(t) = t, which leads to the estimation of an effect linearly changing with time.
R> fit1est <-predict(fit1, type = "slice.time", x = "yrs", z = "radiation", + newx = seq(from = 0.1, to = 3, by = 0.1)) R> plot(fit1est, addci = TRUE) Often a log-linear function of time γ(t) = log(t) is more suitable to model time-dependent effects, in particular if the HR changes faster during early follow-up times compared to later times. Such a log-linear time-dependent effect can be requested by the following command: R> coxphw(Surv(yrs, status)~radiation + log(yrs):radiation, + data = gastric, template = "PH") In our example, this specification of the time-dependent effect (Wald test of model: p = 0.373) does not fit as well as the linear one (p = 0.011). Although the LOWESS curve in Figure 1C would suggest that the linear time-dependent effect is not ideal, it still yields a better model fit than the log-linear time-dependent effect, which assumes a more rapid change shortly after start of follow-up. Time-dependent effects agreeing more closely with the LOWESS curve could be obtained by modeling more complex functions of time γ(t) derived in a datadependent way. For example, restricted cubic splines (Heinzl and Kaider 1997) could be applied using coxph with the rcs argument. However, in the gastric cancer study with its relatively small sample size the danger of overfitting the data outweights the possible merits of a more complex modeling approach.

Weighted Cox regression
Weighted estimation of Cox regression gives a properly population-averaged estimate of the effect of radiation on survival also in case of non-proportional hazards. Using coxphw such a population-averaged HR can be estimated by the following statement: R> fit2 <-coxphw(Surv(yrs, status)~radiation, data = gastric, + template = "AHR") R> summary(fit2) coxphw(formula = Surv(yrs, status)~radiation, data = gastric, template = "AHR")  probabilityĉ can be derived, which in our example is computed as 0.6136. This means, that with a probability of 61.36% (95% CI: 49.86-71.72%), a patient treated with radiation is expected (at baseline) to die earlier than a patient treated without radiation.
If a few "long survivors" are present in a data set, weights used in weighted Cox regression may exhibit a very skew distribution. Such extremely unequal weighting will not bias the estimates, but may result in low precision of the estimates, i.e., wide confidence intervals. Truncation of weights at a suitable value, e.g., their 95th percentile, may greatly increase precision while introducing only a small amount of bias (Dunkler et al. 2010). The distribution of weights can be reviewed by using the plot method.

R> plot(fit2)
In this example (Figure 5), the censoring weights are very small during the first four years and increase sharply afterwards. However, the normalized total weights have a fairly narrow range from 0.31 to 1.74. Hence, truncation of weights using the trunc.weights argument coxphw may not seem necessary. If the weights were truncated at the, e.g., 95th percentile, the range of the normalized total weights would barely change to [0.31, 1.65], and this would yield very similar average HR estimates (HR 1.588; 95% CI: 0.995-2.533; p = 0.053) as without truncation of weights.  of a patient with and without radiation is similar. Modeling the time-dependent effect in more detail reveals clear changes over time. Average HR as an effect size measure and the generalized concordance probabilityĉ are suitable to support clear-cut treatment decisions. In our example, results point towards better prognosis under chemotherapy alone, but the difference in survival between the two groups fails to reach significance at the 5% level.

Biofeedback therapy study
In Section 2.2 we saw that biofeedback therapy had a time-dependent effect on full swallowing rehabilitation when adjusted for healing time. As the sign of the time-dependent log HR,β(t), changes over time ( Figure 1B), the time-dependency of the effect of bfb should not be ignored by using a standard Cox regression. Applying it nonetheless, the HR of biofeedback therapy compared to standard therapy adjusted for log2heal is estimated as 1.31 (95% CI: 0.67-2.58; p = 0.434).
Since the sample is small and both variables bfb and log2heal exhibit non-proportional hazards neither stratification nor explicit modeling of the time-dependent effects are appropriate. Denk and Kaider (1997) restricted the follow-up time to 40 days and applied a Cox regression for bfb adjusted for log2heal. Now that weighted Cox regression is available, the data could be re-analyzed in order to derive the average HR of bfb adjusted for log2heal and a corresponding adjusted estimate of the generalized concordance probabilityĉ . Unlike the original analysis of Denk and Kaider (1997), this would greatly support decisions on the best therapy for patients to achieve fast swallowing rehabilitation after the healing process has been completed.
In this example, it is not just that both covariates exhibit time-dependent effects, but log2heal is also a strong confounder of bfb. Misspecification of the effect of one variable may thus lead to bias in the estimate of the other variable. For example, if we aim at estimating an average HR of bfb we are implicitly assuming a constant effect of that variable, which is not correct and hence may induce bias in the estimation of the regression coefficients of log2heal and its interaction with time. As a remedy, we propose the following two-stage estimation procedure which ensures that the effect estimate of log2heal is approximately unbiased and still allows for an unbiased assessment of the average HR of bfb: Stage 1. The time-dependent effect of log2heal is estimated while appropriately taking into account the non-proportional hazards of bfb. One way to achieve this is to estimate an extended Cox regression model which includes the interaction of log2heal with time, and is stratified by bfb.
Stage 2. Weighted Cox regression is employed to estimate the average HR of bfb while holding the regression coefficients of log2heal and the interaction of log2heal with time fixed at their respective values estimated at stage 1.
R> summary(coxphw(Surv(thdur, success)~bfb + log2heal + + log(thdur):log2heal, data = biofeedback, template = "AHR", + betafix = c(NA, coef(stage1))) This call to coxphw, setting the first element of betafix to NA, specifies that the first regression coefficient (bfb) should be estimated without restrictions. The values of the other two regression coefficients are held constant at the values extracted from the model of stage 1 using coef(stage1). This call of coxphw will give the following output:  Weights applied in weighted Cox estimation can be checked using the plot function ( Figure 6). We see that the variability of the normalized total weights is sufficiently small in this example.

Concluding remarks and summary
The choice of an adequate method to analyze survival data in the presence of non-proportional hazards will always depend on the type of research question. If researchers are interested in prediction of survival time, then they should explicitly model a time-dependent effect, probably in a data-dependent fashion, but observing general rules of statistical modeling (Harrell, Lee, and Mark 1996). However, for models to be used in etiologic research, i.e., to estimate causal effects of treatments or risk factors, a single number such as the average HR or the generalized concordance probability c will often be more suitable and better interpretable. These measures allow non-parametric comparisons of life-expectancy between groups of subjects, which are also easier to communicate to non-statisticians.
The average HR (and the related generalized concordance probability c ) can be estimated by weighted Cox regression using the weighting function w(t) = S(t j )G(t j ) −1 (Schemper et al. 2009). Other, earlier proposed weighting schemes do not necessarily lead to average HRs that can be transformed to concordance probabilities (Schemper 1992;Sasieni 1993;Xu and O'Quigley 2000). Unweighted Cox regression also provides a summary effect, but in case of non-proportional hazards its estimate is dependent on the censoring pattern. Since censoring is a property of a study and usually not of a population, estimates from standard Cox regression are not generalizable if censoring and time-dependence of effects are substantial. In case of proportional hazards, (unweighted) Cox regression will provide the most efficient estimates, but weighted Cox regression can still be safely applied as it will not introduce bias.
In Tables 3 and 4 we summarize methods of analysis in the presence of non-proportional hazards. In this paper, we introduced the R package coxphw implementing weighted Cox regression. This package significantly extends the R toolbox for survival analysis. Beyond the estimation of static weighted Cox regression models it enables the inclusion of interactions of covariates with functions of time for unweighted and weighted Cox regression models by using an intuitive syntax in the formula argument. Regression coefficients of covariates can be fixed at specified values, even if these covariates are assumed to have time-dependent effects. As exemplified in the biofeedback therapy study, this allows to implement a two-stage estimation process in which an average hazard ratio is estimated for one covariate while accounting for a time-dependent effect of another covariate.
The R package coxphw is available from the Comprehensive R Archive Network (CRAN) at https://CRAN.R-project.org/package=coxphw or at http://cemsiis.meduniwien.ac. at/en/kb/science-research/software/. The latter link also points to a SAS (SAS Institute Inc. 2013) macro WCM enabling weighted estimation of Cox regression.  Table 3: Various methods of analysis of survival data with non-proportional hazards in R.
An appropriate analysis may depend on prior knowledge of possible time-dependent effects, the aims of the analysis, the sample size, and number of events. coxph is implemented in the package survival (Therneau 2017); coxphw in coxphw; and concreg in concreg (Heinze et al. 2016). Package rms (Harrell 2017) has to be loaded before applying restricted cubic splines using coxph. DF, degrees of freedom;Ĝ(t j ), estimate of the follow-up distribution obtained by Kaplan-Meier estimation with reversed meaning of the status indicator at time t j ; HR, hazard ratio; PH, proportional hazards; RCS, restricted cubic splines;Ŝ(t j ), left-continuous Kaplan-Meier estimate of the survivor function at time t j .

Method Advantages & disadvantages
Ignoring non-PH: Cox regression + simple and efficient, if HR is always < 1 or > 1 − estimates depending on type of non-PH and on censoring pattern + provides an average effect that is independent of the observed censoring pattern − "average regression effect" is not interpretable as log odds of concordance − "average regression effect" is not suitable for prediction Including time-by-covariate interactions: Data-driven specification (e.g., RCS) + allows parsimonious, but flexible estimation of time-dependent effects − requires sufficient sample size and number of events − results may differ between alternative methods Table 4: Advantages and disadvantages of various methods to analyze survival data with non-proportional hazards. c , generalized concordance probability;Ĝ(t j ), estimate of the follow-up distribution obtained by Kaplan-Meier estimation with reversed meaning of the status indicator at time t j ; HR, hazard ratio; PH, proportional hazards; RCS, restricted cubic splines;Ŝ(t j ), left-continuous Kaplan-Meier estimate of the survivor function at time t j .