Conﬁdence Band for the Diﬀerences between Two Direct Adjusted Survival Curves

This paper focuses on comparing the survival outcomes of two treatments while ad-justing for multiple demographic and clinical factors. We wish to answer the question: “At what times do the survival outcomes of two treatments diﬀer?”. Previous work by Zhang and Klein (2001) addressed this issue by considering a stratiﬁed Cox model and by constructing the conﬁdence band for the diﬀerences between survival functions of treatments, for a given subject. In this study, we utilize direct adjusted survivals of treatments, so that treatment comparison can be performed on a general basis. The conﬁdence band for the diﬀerences between direct adjusted survivals was constructed using the simulation method. We developed two SAS macros that compute the band. The ﬁrst macro was developed for data sets consisting of ﬁxed covariates only. The second macro was developed based on the piecewise proportional hazards Cox model and allows for time-dependent variables. We illustrate both macros by analyzing the survival data of a cohort of lymphoma patients.


Introduction
One fundamental question in many clinical studies is to evaluate survival outcomes of different treatments. In controlled clinical trials where treatment groups are balanced in covariates (such as demographics and clinical characteristics), the standard analytical method is to present Kaplan-Meier survival curves by treatment group and perform the log-rank test. In observational studies where the covariates are not balanced over treatment groups, a regression model suitable for censored data would be needed to control for effects of these covariates. In this situation the multi-variable Cox model is a common choice. This model assumes a constant hazard ratio between two levels of a covariate. When the Cox model is used to analyze treatment-related survival data, the estimation of treatment specific survival is problematic. Several methods have been discussed. One possible method is to set covariate values equal to the means across the sample and then predict survivals of treatments in the fitted Cox model. The underlying rationale behind this method was that the survival of a treatment based on mean covariate values tended to be close to the average survival of the treatment. However, this rationale is actually a flawed one (Nieto and Coresh 1996) and interpretation of this type of survival is awkward when discrete covariates are present. Makuch (1982) and Gail and Byar (1986) advocate the direct adjustment method based on a Cox model. In this method the survival estimates of a treatment for all subjects in the reference population are computed and averaged. Zhang, Loberiza, Klein, and Zhang (2007) studied the direct adjusted survival curves based on a Cox model stratified by treatment groups. Such a model allows treatment effects to vary over time. Zhang et al. (2007) provided a SAS macro that implemented the pointwise comparison between treatments. The propensity-score-based approach has been studied by some statisticians and a review of this method can be found in MacKenzie, Brown, Likosky, Wu, and Grunkemeier (2012).
When treatment effects change over time, it may be crucial to evaluate at what times the outcomes of two treatments differ. Such a question can be answered by constructing a confidence band in an interval [t 1 , t 2 ] for the differences between two treatment-specific survivals. Lin, Fleming, and Wei (1994) proposed the simulation method to construct a confidence band for a survival function based on a Cox model. This method has recently been recognized to be an example of the general wild bootstrap which was first applied to regression analysis by Wu (1986). In the wild bootstrap method the residuals are multiplied by independent random variates with mean 0 and variance 1 and the subsequent variance estimator protects against error variance heteroscedasticity. We refer the reader to the recent paper by Beyersmann, Termini, and Pauly (2013) for a detailed description of the wild bootstrap method. Zhang and Klein (2001) applied the wild bootstrap method to the problem of treatment comparison based on the stratified Cox model. They predicted the survival functions of treatments for a given set of covariate values, and developed the simulation-based confidence band for differences between two survival functions. The band helps researchers to assess when the outcomes of two treatments differ for a given subject. This paper focuses on the comparison of direct adjusted survivals of treatments, based on a stratified Cox model. We aim at obtaining a confidence band for the differences between two direct adjusted survivals, using the simulation method. This study extends the previous work by Zhang and Klein (2001). Usage of direct adjusted survivals for the band allows for treatment comparison on a general basis. It provides a decision making tool for physicians and patients who want to know whether a short term survival benefit exists for one treatment or at which time point one treatment starts to have a survival advantage. We developed two SAS macros, SURVBAND and TIMEBAND, that compute the bands. The first macro was developed for data sets consisting of fixed covariates only. The second macro was developed based on the piecewise proportional hazards Cox model allowing for time-dependent variables. We illustrate both macros by analyzing the survival data of a cohort of lymphoma patients. The macros were written using the SAS/IML language (SAS Institute Inc. 2008) and the PHREG procedure in SAS/STAT (SAS Institute Inc. 2009). Both products, SAS/STAT and SAS/IML, need to be installed in order to run the macros.
The remainder of the paper is organized as follows. In Section 2 we briefly describe the theoretical foundations and present the formulas of the method. Section 3 presents two SAS macros and explains the macro parameters. A simulation study was conducted to assess the performance of the simulation-based band. The simulation results are provided in Section 4. In Section 5 we apply both macros to a real data set on survival of lymphoma patients. Final discussions are given in Section 6.

Notations
The sample can be summarized as {X ij , ∆ ij , Z ij } for i = 1, . . . , K and j = 1, . . . , n i , where the first subscript refers to the treatment group. For the jth subject in the ith treatment group, X ij is the observed time, ∆ ij is the event indicator, and Z ij is the vector of covariates. We consider a sample with two treatments (K = 2), and the sample size is given by n = n 1 + n 2 .
Assume that the Cox model, when stratifying on treatment group, is valid to describe the hazard functions. Given z, the conditional hazard function of ith treatment at time t is where λ 0i (t) is an arbitrary nonnegative function, β is a vector of regression coefficients, and z is a vector of covariates. Let Λ 0i (t) be the cumulative baseline hazard function, Λ 0i (t) = t 0 λ 0i (s)ds. The prediction of survival functions is a complex issue when time-dependent covariates are present. A detailed discussion of time-dependent covariates is given in Section 3.2. For simplicity, we only present the formulas for fixed covariates in Sections 2.1 and 2.2. For a given set of covariate values z, the survival function for the ith treatment is given by The direct adjusted survival of the ith treatment averages the survival estimates of the ith treatment for all subjects in the sample, In the following we introduce additional relevant notations. We define the counting process that counts occurrence of the event by time t by N ij (t) = I(X ij ≤ t, ∆ ij = 1) and the at-risk process by Y ij (t) = I(X ij ≥ t). The martingale process is given by We also define the following notations, where for a column vector a, a ⊗0 = 1, a ⊗1 = a, a ⊗2 = aa . Let β be the maximum likelihood estimate of the partial likelihood. Σ −1 is the covariance matrix for the limiting distribution of √ n( β − β). Σ denotes the observed information matrix. Λ 0i denotes the Breslow estimator of the cumulative hazard function for the ith stratum. The explicit formulas of these estimators can be found in Zhang and Klein (2001, pp. 245-246). The survival function for the ith treatment, given z, can be estimated by

Comparison of direct adjusted survivals
Treatment effect assessment is performed in terms of finding the confidence band forS 1 (t) − S 2 (t) over the interval [t 1 , t 2 ]. The hypothesis to be tested is versus the two-sided alternative. According to Zhang et al. (2007),S 1 (t) andS 2 (t) can be estimated by Let A general test statistic for testing H 0 is given by where K(t) is a positive weight function. Possible choices of K(t) include the size of the risk set and is the Kaplan-Meier estimate of the survival function of the whole cohort. When the size of the risk set or S(t) p is used as the weight function, the test statistic is adjusted to emphasize the survival differences during the early time period. In the remainder of this section as well as in the SAS macro developed, we let K(t) ≡ 1 and focus on the test statistic Following the standard asymptotic result for the Cox model (Andersen and Gill 1982) and the previous work on survival comparison (Zhang and Klein 2001), W (t) is asymptotically equivalent to . Based on the martingale central limit theorem, W (t) converges weakly to a zero mean Gaussian process. The pointwise variance of W (t) can be consistently estimated by where The process W (t) does not have independent increments, so the limiting distribution is difficult to describe analytically. We use the wild bootstrap resampling method of Lin et al. (1994) to approximate the limiting distribution. Note that the martingale M ij (t) in W (t) has mean 0 and variance E[N ij (t)]. Let G ij 's (i = 1, 2; j = 1, . . . , n i ) be independent standard normal variates. We replace the martingale M ij (t) with G ij N ij (t) and plug in the estimated regression coefficients and covariance matrix, leading to the process We treat the G ij 's as random and {X ij , ∆ ij , Z ij } (i = 1, 2; j = 1, . . . , n i ) as fixed. Following the proof by Lin et al. (1994), the limiting distribution of W (t) is the same as that of W (t).
To approximate the limiting distribution of W (t), a large number of realizations of W (t) can be generated. To construct the 100 It is useful to report the p value for testing H 0 . Bajorunaite and Klein (2007) proposed the simulation-based approach for the two-sample test of the cumulative incidence function. They provided a procedure for finding the p value for the test. We consider a similar procedure to derive the p value for testing H 0 : 1. Evaluate the test statistic on the sample.

Fixed covariates only
The majority of clinical studies collect only fixed covariates such as demographics and baseline clinical characteristics. The formulas described in the previous section are only applicable to data of fixed covariates and were implemented in the SAS macro SURVBAND. The macro should be loaded to the current SAS session using the %include statement. Suppose that the file containing the macro is named band.sas, and is saved in the local directory. The following statement should appear in a SAS program to load the macro.
%include "band.sas"; The name of the macro is SURVBAND and has the following parameters: %macro SURVBAND(inputdata, time, event, group, covlist, starttime, stoptime, alpha, nsimu, seed, outdata); There are nine macro parameters. A full explanation of the macro parameters, as well as the data preparation required, is given in the following.
(1) inputdata: This parameter specifies the input SAS data set name. This SAS data set needs to contain the observed time variable, the event indicator, the group variable with values 1, . . . , K indicating K treatment groups, as well as the covariates as regressors in the Cox model.
(2) time: This indicates the name of the variable of observed time. Either the time-to-event or the follow-up time is provided in this variable.
(3) event: The event indicator takes values 0 or 1. The value is 1 if time-to-event is observed, and it is 0 if the follow-up time is observed.
(4) group: The group variable must take values in {1, . . . , K} to distinguish the K treatment groups in the sample. A K-strata Cox model will be constructed which does not require constant hazards ratio between any two treatments.
(5) covlist: This parameter specifies the variables that directly enter as regressors in the stratified Cox model. Continuous variables can be directly included. Categorical factors need to be recoded as binary variables. For example, the race factor, containing three levels, White, Black, and Other, has to be recoded as two binary variables. If White is used as reference, one should create the binary variables black, which takes value 1 for Black race and 0 otherwise, and other, which takes value 1 for other race, 0 otherwise. The variables in the list should be separated by white space. Please note that proportionality tests should be performed beforehand to guarantee all variables pass the tests.
(6) starttime: This parameter together with the next one specifies the time interval [t 1 , t 2 ] for which the band will be generated. Determination of t 1 and t 2 should guarantee large enough risk sets as well as a stable comparison between treatments. Subramanian and Zhang (2013) recommended setting t 1 to be slightly greater than the maximum of all first event times from the treatment groups. The comparison of treatments after such a time point has a valid precision estimation. We followed this guideline in implementing the macro under the two-sample framework. The larger value of the first event times from two treatment groups is the default value for t 1 . The macro sets t 1 to this value if a smaller value is specified for t 1 by the user.
(7) stoptime: This parameter specifies t 2 . Subramanian and Zhang (2013) suggested t 2 to be slightly smaller than the minimum of all last event times from the treatment groups. However, such a time may not guarantee a sufficiently large risk set if events are observed for the majority of the study cohort. When analyzing the real data set, Zhang and Klein (2001) chose t 2 so that both treatment groups had at least ten subjects remaining in the study. We believe this is a more practical guideline for determining t 2 .
(8) alpha: This parameter controls the confidence level for the band of the differences between two direct adjusted survival curves. The significance level α needs to be supplied for a 100(1 − α)% confidence band. For example, one should set α to be 0.05 to produce a 95% confidence band. It should be noted that one may want to set a Bonferroni adjusted α value in case of multiple comparisons. Suppose that there are three treatment groups and one wishes to produce bands for all three pairwise comparisons, then the Bonferroni adjusted α value is 0.0167. It is well-known that Bonferroni adjustment is conservative and easily produces unnecessarily wide confidence intervals or bands. Due to this reason, one should carefully decide on the set of pairwise comparisons. For example, assume that a study involves one control group and three treatment groups. If study interest centers on the three pairwise comparisons between treatments and the control, one can set Bonferroni adjusted α to be 0.0167. Although confidence bands for all pairwise comparisons will be included in the output data set, only the bands for comparisons between treatments and the control shall be fully examined and interpreted.
(9) nsimu: This parameter sets the number of simulation realizations that will be generated for determining the confidence band. Bajorunaite and Klein (2007) evaluated the number of realizations in a simulation study and recommended 1000 simulated samples. Their finding was that a larger number does not improve accuracy but prolongs the computational time. A couple of thousands should be sufficient to acquire an accurate band. A size of this level will not invoke a substantial computational time even with a large sample.
(10) seed: This parameter has to be a large positive integer and is used as the seed number of the random number generation algorithm. Random number generation routinely utilizes the congruential generator to generate pseudo-random numbers. A seed number has to be supplied as the initial value of the iterative algorithm. When the value zero is supplied, SAS uses the current computer time to create a large positive integer as the seed. As a result, running the program at some other time will lead to a different set of pseudo-random numbers, and a slightly different critical value.
We also wrote an R (R Core Team 2016) function that implements the same method as the SAS macro SURVBAND. We hope that this R function will offer flexibility to users who prefer to work in the R environment. We posted this function, together with instructions and the example data file, at the following link http://www.umc.edu/faculty/xu_zhang/.

One time-dependent variable with piecewise proportional hazards
Unlike demographics, patients' clinical characteristics may change over time. In addition, treatment regimen may have to be adjusted due to, for example, exhibition of toxicity symptoms. The covariates which have changing values over time are known as time-dependent covariates. The distinction between internal and external time-dependent covariates was introduced for convenience when discussing survival function prediction. The aforementioned changing clinical characteristics or treatment regimen belong to the internal category. Survival function prediction with internal time-dependent covariate is subtle because observing the covariate value indicates that the patient is alive. Survival function prediction is normally discussed with external time-dependent covariates, which are characterized by the independence between the covariate processes and the survival status of the subject. For example, longitudinally monitored air pollution level is a typical external time-dependent covariate. It would be practically meaningful to predict the probability of asthma onset for a senior given a process of air pollution level.
One important application of external time-dependent covariates is to evaluate and solve the problem of nonproportionality in the Cox analysis. A commonly used method for testing proportionality of a fixed covariate Z is to test the significance of Z(t) = Z × f (t). Here, Z(t) is an intentionally created external time-dependent covariate and common choices for f (t) include t and logarithm of t. Significance of Z(t) would indicate violation of proportionality. One method to deal with nonproportionality on Z is to search for a suitable time cut point c so that the hazard ratios in the time interval before and after c are two constants, leading to the piecewise proportional hazards model. In general, more than one cut point on time can be employed but this increases complexity in modeling and interpretation. For the single cut point c, the coding technique for the piecewise proportional hazards is to create two external time-dependent covariates based on Z, The next step is to include both Z 1 (t) and Z 2 (t) in the Cox regressor. This is a useful method of addressing the issue of nonproportionality. Therefore, we developed the second macro of band generation based on the piecewise proportional hazards model in which the effect of one covariate is modeled as two constant hazard ratios. The macro is saved in the file timeband.sas with the macro name TIMEBAND.
(1) timecov: The covariate that is modeled as two time-dependent variables and has two constant hazard ratios. It should be noted that this covariate should not be included in covlist.
(2) cutpoint: The time cut point c used to create the time-dependent variables.
In Section 5.2 the macro is illustrated with a real example of survival data of lymphoma patients.

Simulation study
A simple simulation study was conducted to examine the actual performance of the simulationbased confidence band given in Section 2.2. We considered a sample size of 100 and two Setting Baseline hazard functions % of replications where H 0 was rejected (1) λ 1 (t) = λ 2 (t) = 0.1 0.062 (2) λ 1 (t) = 0.1, λ 2 (t) = 0.15 0.332 (3) λ 1 (t) = 0.1, λ 2 (t) = 0.2 0.665 Table 1: Proportion of replications where the bands did not cover the zero reference line. treatment groups. The treatment assignment was determined by a Bernoulli random variable with p = 0.5 so that the two groups were roughly of equal size. The baseline hazard functions associated with two treatments were constant. A total of three settings were simulated: (1) λ 1 (t) = λ 2 (t) = 0.1, We considered the same covariate setting that contained a standard normal covariate and a discrete covariate with value 0/1. Distribution of the standard normal covariate was independent from treatment assignment while for the discrete covariate value 1 appeared more frequently in treatment 2 group (65%) than treatment 1 group (40%). Censoring variables were generated from exponential distributions with varying hazard rates for different settings to control the censoring rate to be about 30%.
There were 1000 replications in each setting. Since the range of event times varied dramatically across samples, we chose to use a sample-based interval [t 1 , t 2 ] for simulation. In each sample, t 1 was set to be the smallest time when events have occurred in both treatment groups. The largest event time was used as t 2 . The 95% confidence band was constructed based on 1000 simulated processes. We examined if the band fully covered the zero reference line in [t 1 , t 2 ]. Rejection of the null hypothesisS 1 (t) −S 2 (t) = 0 was concluded if zero was outside of the band for some subintervals contained in [t 1 , t 2 ]. Table 1 depicts the proportion of replications where the null hypothesis was rejected in the three simulation settings.
The results demonstrated good performance of the simulation-based band. The proportion of rejection should be interpreted as the realized power of the test using a 5% significance level. In Setting (1), the proportion of rejection is 0.062, close to the nominal type I error rate 0.05. Reasonable proportions of rejection are observed in Settings (2) and (3) where the two treatment arms are associated with different baseline hazard functions. In addition, as the treatment-specific hazard functions become more different, the observed proportion of rejection increases.

Real data example
Fifty-eight patients with primary central nervous system (CNS) lymphoma received chemotherapy for blood-brain barrier disruption (BBBD) at Oregon Health Sciences University from 1982 to 1992. 39 patients received initial chemotherapy for BBBD without prior radiation, while 19 patients were administered cranial radiation prior to the referral for chemotherapy. The survival data of these 58 patients were originally analyzed by Dahlborg, Henner, Crossen, Tableman, Petrillo, Braziel, and Neuwelt (1996). They presented the Kaplan-Meier curves of two treatment groups and evaluated the log-rank statistic. The curves and test result supported the conclusion that the initial chemotherapy without prior radiation is associated with a significantly lower chance of mortality. Although Dahlborg et al. (1996) reported low Karnofsky score and being male as worse prognostic factors, the investigators did not present the adjusted survival curves of the two treatments. Tableman and Kim (2003) reported the data set and conducted a Cox analysis to assess the treatment effect as well as the effects of a few clinical covariates. In Tableman and Kim (2003), based on a 5% significance level, the final Cox model included the treatment variable, the Karnofsky performance score, sex, dichotomized age split on 60, as well as the variable for interaction between sex and age. Although there was no evidence for violation of proportionality between the two treatments, we still considered the Cox model stratified by treatment group, so that we could capture dynamic changes in the treatment effect over time. Other covariates, Karnofsky performance score, sex, age, interaction between sex and age, consisted of the parametric part of the Cox model.

Data analysis using the first macro
A text file cns.txt, provided along with the paper, contains the data including survival time, vital status, treatment variable and three covariates. Suppose that this text file is saved in the local directory. The following SAS syntax performs the following two steps: First, the data step reads in the external text file and creates a SAS data set. Second, the PHREG procedure fits the aforementioned stratified Cox model. data cns; infile "cns.txt"; input time death treatment KFS male age60 sexage; run; proc PHREG; model time*death(0)=KFS male age60 sexage /rl; strata treatment; run; The SAS data set cns has the following variables: (1) time: the survival time in years from the first BBBD; (2) death: the death event indicator; (3) treatment: the treatment group variable, 1 for chemotherapy without prior radiation, 2 for chemotherapy with prior radiation; (4) KPS: the Karnofsky performance score ranging from 40 to 100; (5) male: the male sex indicator, 1 for male, 0 for female; (6) age60: the age variable, 1 if age is at least 60, 0 if age is less than 60; (6) sexage: the variable for interaction between sex and age, 1 for male aged 60 or older, 0 otherwise. The PHREG procedure implements the Cox regression, producing the partial-likelihood-based estimates of regression coefficients. The rl option in the model statement adds the hazard ratio estimates to the output. The hazard ratio estimates, as well as the 95% confidence intervals, can be found in Table 2 distribution of risk factors. The main issue discussed in this paper is to construct a band for the differences between direct adjusted survivals, so that one can identify when the survival outcomes differ. We decided to generate the band up to 2 years. This upper limit was chosen because only a few events occurred after 2 years and in the two treatment groups 17 and 4 patients remained under study at this time point. The sizes of the treatment groups are reasonable so that the treatment comparison is reliable. Below is the syntax to load the macro and invoke it: %include "band.sas"; %SURVBAND(cns, time, death, treatment, KFS male age60 sexage, 0, 2, 0.05, 2000, 785243, survout); The variables time, surv1 and surv2 in the output data set allows one to draw the direct adjusted survivals of the two treatments. One has to load the values into an Excel sheet, the R environment for statistical computing, or an other software package, to plot the curves. Some additional values have to be added to set proper x-axis ranges for the curves. The variable time in the output data set starts from the first event time and ends with the largest event time. One should let a curve start from time zero and extend to the largest censoring time of the given treatment group. We depicted the direct adjusted curves of two treatments in Figure 1.
The time-varying treatment effect can be examined from the band of the differences between direct adjusted survivals. The above syntax required simulation of 2000 processes. The critical value for constructing the 95% band was 2.729, which was printed in the output. The output data set contains the variables for the differences (diff1_2) and the lower, upper limits of the 95% confidence band (diff1_2_low, diff1_2_up). Figure 2 presents the differences, the 95% confidence interval as well as the 95% confidence band. Note that the the lower limit of the time interval was automatically adjusted to 0.06, though 0 was specified as the lower limit. The macro automatically checks the lower limit of the time interval and sets it to the larger value of two first event times of the two treatment groups if a smaller value is specified. Therefore, the band for the CNS lymphoma example was produced for the interval [0.06, 2.00]. In Figure 2, one can observe a monotone increase of the difference in survival probability, between treatment without radiation and treatment with radiation. The lower limit of the band rises beyond the zero reference line about at the time point 1.4 years, which confirms a long-term survival benefit for chemotherapy without prior radiation.

Data analysis using the second macro
We further used this data example to illustrate the second macro developed for the Cox model containing time-dependent variables. We modeled KPS as time-dependent variables with 1.5 Figure 1: Direct adjusted survival curves and 95% log-log transformed confidence intervals for chemotherapy without prior radiation and chemotherapy with prior radiation. Figure 2: The 95% confidence band for the differences in direct adjusted survival between chemotherapy without prior radiation and chemotherapy with prior radiation. 2000 realizations of the process were generated and yielded the critical value 2.729 for the band. years as the cut point. Note that this variable passed the proportionality test but was modeled to have piecewise proportional hazards for illustrative purpose only. More specifically, two time-dependent variables were created based on KPS, The following statements should be written in the PHREG procedure to create the variables KPS1 and KPS2, proc PHREG; model time*death (0) This model produced the hazard ratios 0.96 (95% CI 0.94-0.99) and 0.99 (95% CI 0.94-1.06) for KPS1 and KPS2, respectively. These two hazard ratios are the effect of KPS prior to and after 1.5 years, respectively. The estimated effects of other covariates remained almost the same as the estimation result from the Cox model treating KPS as fixed. The macro TIMEBAND can produce the direct adjusted survival of treatment based on the above Cox model, as well as the band for differences between two direct adjusted survival functions. For this example, write the following statements to call the macro, %include "timeband.sas"; %TIMEBAND(cns, time, death, treatment, male age60 sexage, KPS, 1.5, 0, 2, 0.05, 2000, 487012, survout); The critical value yielded from running this macro was 2.792, slightly larger than that in the model with only fixed covariates. This is reasonable because usage of two variables for KPS increased the variability in the estimated survival when the proportionality does hold for different values in KPS over the whole time range. The band yielded from the macro is depicted in Figure 3. The similar pattern is observed, i.e., that the band starts to be beyond the zero reference line at about 1.4 years.

Final discussions
The SAS macros presented in this paper can be used to compute the confidence bands for differences between direct adjusted survival curves based on stratified Cox models. The bands provide information regarding at what times the survival outcomes of two treatments differ. The macros are quite efficient. The run time can be expected to be less than one minute to generate 2000 realizations even for a large sample. The simulation method can be extended to compare the differences of direct adjusted survival curves based on other models, such as Aalen's additive model (Aalen 1989). The direct adjustment method can be utilized to compute a quantity for a general reference population. Our macro averages the survivals of a treatment across all subjects in a sample, which is the routine reference population. Laubender and Bender (2010) discussed usage of a subgroup of a sample as a reference population, so that one can estimate the beneficial effect when patients receiving inferior treatment would switch to the better treatment.
When there are more than two treatment arms, the Bonferroni method can be used to adjust for multiple pairwise comparisons. The Bonferroni method is conservative because it ignores the correlation between tests. The sharper method that accounts for correlation among tests was considered by Logan, Wang, and Zhang (2005). Define W ij (t) as the difference between the direct adjusted survivals of treatment i and treatment j, and form a vector W = (W 12 , . . . , W K−1,K ). The vector W would converge to a multivariate normal process with mean 0 and some covariance matrix. To control the familywise error, one may set the critical value as the 1 − α percentile of the distribution of max |W ij (t)|. One can use the wild bootstrap method to obtain realizations of the vector and then find the appropriate percentile. This method is not implemented in the macros because we focus on the two-sample problem in this paper. The reader can revise the source code to implement this method.