FHtest: An R Package for the Comparison of Survival Curves with Censored Data

The Fleming-Harrington class for right-censored data was first introduced by Harrington and Fleming (1982). This class is widely used in survival analysis studies and it is a subset of the so-called weighted logrank test statistics. Recently, Oller and Gómez (2012) proposed an extension of this class for interval-censored data. This paper introduces the R package FHtest, which implements the Fleming-Harrington class for right-censored and interval-censored survival data. It provides an integrated approach for performing two-sample, k-sample and trend tests based on either counting process theory, likelihood theory, or permutation distributions. In this paper, we summarize the main aspects of the theory framework and present several examples with R codes to illustrate the usage of the main functions of FHtest.


Introduction
Survival times are encountered in many areas of research: for example, the time from cancer diagnosis until death; the time until first failure of a new machine; the sensory shelf life of yogurt, or the time an unemployed worker needs to find a new job.One common feature of all these examples is the possible presence of censored data, that is, incomplete information on the times of interest.Several types of censored data are distinguished: • Right-censored data: the (unknown) survival time lies above a certain value.
• Left-censored data: the survival time lies below a certain value.
• Interval-censored data: the survival time lies in an interval between two values.
• Double-censored data: survival times are both left-and right-censored.
As with other areas in statistics, common interests in survival analysis comprehend the estimation of distribution functions, hypothesis testing, or the fit of regression models.However, due to the presence of censored data, survival time data require specific methods that take into account that only partial information is available.
In R (R Core Team 2017), there are plenty of packages that provide functions for the analysis of survival times including the survival package (Therneau 2017), which is one of the recommended packages of R. The CRAN task view on survival analysis (Allignol and Latouche 2017) gives an exhaustive overview on all these packages available on CRAN (https: //CRAN.R-project.org) or on Bioconductor (http://www.bioconductor.org/).The paragraph in the CRAN task view on hypothesis testing to compare survival functions lists several packages that offer functions for this purpose, such as function survdiff of the survival package.However, none of these packages included functions that implemented weighted logrank tests for interval-censored data based on the Fleming-Harrington class (Harrington and Fleming 1982).This lack of R functions for this specific type of statistical tests to compare survival functions was the principal motivation for the development of the package FHtest (Oller and Langohr 2017).The package implements the Fleming-Harrington class for right-censored and interval-censored survival data and provides an integrated approach for performing twosample, k-sample and trend tests based on either counting process theory, likelihood theory, or permutation distributions.
Several differences exist between functions provided by the FHtest package and those of other packages.For example, function survdiff of the survival package implements the Fleming-Harrington class with right-censored data and the counting process approach.Function survdiff, however, is limited to the one-parameter subclass instead of the whole class of two parameters.Recently, package survMisc (Dardis 2016) contains function comp, which implements the whole class of two parameters.However, neither survdiff nor comp consider one-sided alternatives, perform the permutation approach, or work with interval-censored data.Function ictest of the interval package (Fay and Shaw 2010) is programmed for interval-censored data, implements both the permutation and the score vector approach, and allows for one-sided alternatives.However, only two test statistics from the Fleming-Harrington class, the logrank and the Wilcoxon-Peto statistics, can be used.This paper introduces the FHtest package which is available from the Comprehensive R Archive Network (CRAN) at https://CRAN.R-project.org/package=FHtest and it is organized around four sections: In Section 2, the theoretical background of the nonparametric tests of the Fleming-Harrington class for right and interval-censored data is set forth.Following, in Sections 3 and 4, the four main functions of the FHtest package are presented and applied to real data examples, respectively.Section 5 concludes the paper with some final remarks on the current state of the package and possible enhancements.

Comparison of survival curves with censored data
We denote by T the time from a well-defined start point until the occurrence of an event of interest, E. T is, hence, a positive random variable.One of the functions characterizing its distribution is the survival function, S, which corresponds to the probability of surviving a time t: S(t Given the longitudinal nature of studies on survival times, a common feature of these are cen-sored data.These arise whenever the occurrence of E is not observed exactly, but only partial information is given.If E does not occur during the follow-up of a study, the corresponding time is right-censored.That is, it is only known that T is larger than the study duration, but not by how much.Contrary to that, left censoring is present if T is below a certain value, but it is not known by how much.If E occurs during a time interval, T is said to be intervalcensored.The general convention is to consider the intervals left-open, that is T ∈ (L, R].Both left and right-censored data can be considered particular cases of interval-censored data with L = 0 and R = ∞, respectively.Alternatively, intervals can also be considered to be closed intervals, that is T ∈ [L, R].Under this convention, exact observation can also be considered a particular case of interval-censored data, namely if L = R.In the current paper, we do not need to choose between both conventions because the FHtest package can handle both types of interval-censored data.For a more thorough introduction on survival times and censored data, see, e.g., Section 1.4 in Gómez, Calle, Oller, and Langohr (2009).
In R, the nonparametric estimation of S(t) in the presence of right-censored data by means of the Kaplan-Meier estimator (Kaplan and Meier 1958) can be accomplished by using the function survfit of the survival package (Therneau 2017).For the estimation of the survival function with interval-censored data using the Turnbull estimator (Turnbull 1976), one may use the function icfit of the interval package (Fay and Shaw 2010) or different functions of the Icens package (Gentleman and Vandal 2017).
When dealing with samples from two or more different populations, one is generally interested in comparing the corresponding distributions by testing one of the following hypotheses of interest: • Two-sample test: (1) Alternatively, H 1 can also consider the case of a one-sided alternative.
• k-sample test: • Increasing trend test: Alternatively, H 1 can also consider the case of a decreasing alternative.
The R package FHtest provides functions for each of the three hypotheses above in the presence of right-or interval-censored data based on the Fleming-Harrington class.At this point, it is important to note that this family is geared to detect alternative hypotheses where the hazards between groups differ but do not cross.Under quite general conditions, the basic test statistic U = U 1 , . . ., U k is asymptotically normal with zero mean and covariance matrix V .The explicit form of U and V is given below and depends on the type of censoring and the asymptotic approach.
The test statistic for the hypothesis given in (1) is where v 22 is the second value in the diagonal of V and the asymptotic distribution of W 1 is the standard normal distribution.The hypothesis in the k-sample problem (2) can be tested using the test statistic where V − is the generalized inverse of V and the asymptotic distribution of W 2 is a χ 2 k−1 distribution.The increasing trend alternative (3) can be tested with the statistic where a = (a 1 , . . ., a k ) , a 1 < . . .< a k , are relevant covariate values for each group distribution.The asymptotic distribution of W 3 is the standard normal distribution.A one-tailed test in the left (or right) direction may show evidence of an increasing (or decreasing) trend.
In the following, we briefly describe the test statistics of the Fleming-Harrington class for rightcensored (RC) and interval-censored (IC) survival data and derive the distribution approaches.

Right-censored data: Counting process approach
The Fleming-Harrington class for right-censored data was first introduced by Harrington and Fleming (1982).This class of test statistics is a subset of the so-called weighted logrank test statistics.Fleming and Harrington (2005) give a theoretical framework for these statistics based on martingale and counting process methods.The Fleming-Harrington class has received much attention in the literature and many extensions have been proposed.For instance, Gillen and Emerson (2007) showed that these statistics are non-transitive under general alternatives and proposed an extension to avoid this drawback.More recently, Garès, Andrieu, Dupuy, and Savy (2015) proposed another extension which is designed to have good power against both late effects and proportional hazards alternatives.
Consider a random sample of possibly right-censored data [L i , R i ], i = 1, . . ., n, with either L i = R i (uncensored data) or R i = +∞ (right-censored data).Let t 1 < . . .< t m be the m distinct uncensored times and suppose that there are n − m right-censored times.Then, the components of the weighted logrank test statistic U RC are of the form where d r stands for the number of deaths at time t r and n r gives the number of individuals at risk at time t r , that is, the number of individuals alive just prior to time t r .Both d rj and n rj correspond to the same definitions but restricted to observations in the jth group.
The weights suggested by Fleming-Harrington are where Ŝ(t) is the nonparametric estimation of the survival function obtained with the socalled Kaplan-Meier estimator.When (ρ, λ) = (0, 0) and (ρ, λ) = (1, 0), the statistic U RC corresponds to the logrank and Prentice-Wilcoxon test statistics, respectively.The choice of the weights has to be made prior to the examination of the data taking into account that they should provide the greatest statistical power, which in turns depends on how the null is believed to be violated.For instance, in a given clinical trial, if one wanted to assess whether the effect of a treatment or therapy on the survival is stronger at the earlier phases of the therapy, λ = 0 should be chosen, with increasing values of ρ emphasizing stronger early differences.If there were clinical reasons to believe that the effect of the therapy would be more pronounced towards the middle or the end of the follow-up period, it would make sense to choose ρ = λ > 0 or ρ = 0, respectively, with increasing values of λ emphasizing stronger middle or late differences.Statistical power may not be the only consideration in determining the weighting scheme, clinical aspects should also be contemplated.For instance, in cases where an intervention is thought to not alter the quality of the patient's life, upweighting later differences in the survival curves might be important.
The asymptotic null distribution of ( 7) is derived through a martingale central limit theorem.Under the null hypothesis, U RC is asymptotically normal with zero mean and variancecovariance matrix V RCC given by

Right-censored data: Permutation approach
As shown in Abd-Elfattah and Butler (2007), the statistic (7) can also be written in a linear form where z i = Z 1i , . . ., Z Ki is a covariate vector of group indicators (Z ji equals 1 or zero according to whether or not the ith individual is in the jth group) and c i is a score value associated with each individual: The permutation approach can be applied easily to this linear form.The idea is that if the null hypothesis is true and the underlying censoring process is identical across groups, the labels on the scores c i are exchangeable.The permutation distribution of U RC is then obtained by permuting the labels and recomputing the test statistic for all the possible rearranged labels.The p values from the permutation distribution may be calculated either exactly using complete enumeration or a network algorithm, or by an approximation method like Monte Carlo resampling, or asymptotically using a version of the central limit theorem for exchangeable random variables (Sen 2006).This last method yields a normal approximation with zero permutation expectation and variance-covariance matrix The permutation approach is, in fact, a conditional approach since the distribution of the test statistic is computed conditioned on the observed data.It is not obvious whether the permutation approach gives power properties similar to an unconditional approach.With right-censored data, Heimann and Neuhaus (1998) show that the permutation version of the logrank test and the unconditional version are asymptotically equivalent even under unequal censoring.A more recent contribution concerning this subject is given in Wang, Lagakos, and Gray (2010).Oller and Gómez (2012) proposed a new class of test statistics for interval-censored data that extends the Fleming-Harrington class for right-censored data given in ( 7).The asymptotic behavior of the proposed tests was derived using a permutation approach and an observed Fisher information approach.

Interval-censored data: Permutation approach
Consider a random sample of interval-censored data (L i , R i ], let t 1 < . . .< t m denote the unique ordered elements of {L i , R i ; i = 1, . . ., n}, and let Ŝ(t) be the nonparametric estimation of the survival function obtained with the so-called Turnbull estimator.Then, the components of U IC are of the form where ) stands for the estimated number of deaths at time t r and n IC r = n Ŝ(t r −) gives the estimated number of individuals at risk at time t r .The computation of d IC rj and n IC rj is equivalent to d IC r and n IC r replacing Ŝ by Ŝj = 1 Ŝi where N j is the sample size of the jth group and Ŝi is an estimate of the individual survival function as defined by Fay and Shih (1998).
The weights proposed by the authors are There are two reasons for these unusual weights.First, these weights follow from the score vector approach given in Section 2.4 for interval-censored data.Second, the equivalence between the weighted logrank form given in (11) and the linear form given in ( 12) is based on these weights.Further insights on these weights can be obtained from the following results.When (ρ, λ) = (0, 0) and (ρ, λ) = (1, 0), the statistic U IC reduces to the logrank and Wilcoxon-Peto test statistics originally proposed in Peto and Peto (1972); for further details, see also Fay (1996Fay ( , 1999)).Asymptotically, as Ŝ is nearly continuous at the point t r , the weight function w This class U IC can also be written in a linear form (9) where Under the assumption that the underlying censoring process is identical across groups, this linear form of U IC allows the use of the permutation distribution (with zero expectation and variance-covariance matrix V ICP ).The approach is the same as explained in Section 2.2.
Herein, it is considered that the observed intervals are half open intervals, however the definitions above are easily modifiable if we observe closed or open intervals.It is also worth mentioning that the methodology for interval-censored data can also be used for right-censored data as a particular case.In this case, both ( 10) and ( 12) would give similar but not identical results.

Interval-censored data: Score vector approach
When we restrict the Fleming-Harrington class U IC to the subclass where λ = 0 (the so called G ρ family), each test statistic (for a given value of ρ) can be seen as an efficient score test in a linear transformation model.We have to assume discrete or grouped continuous intervalcensored data: L, R ∈ {t 0 , t 1 , . . ., t m } where 0 = t 0 < t 1 < . . .< t m < t m+1 = +∞.We also assume a linear transformation model g(T i ) = −z i β + i where g is an unknown increasing function and i has survival function where Lik(β, θ) is the likelihood function and θ = (θ r ) m r=1 are nuisance parameters such that θ r = g(t r ).
Under discrete interval-censored data and the null hypothesis, the asymptotic behavior of the score vector U IC follows from maximum likelihood theory.This approach needs that none of the estimated θ approach the boundary of the parameter space, i.e., 1 > Ŝ(t 1 ) > . . .> Ŝ(t m ) > 0. The distribution is asymptotically normal with zero mean and variance-covariance matrix V ICS .The explicit formula for V ICS is not presented here.Instead, see Gómez and Oller (2008) for details.
We note that U RC and V RCC can also be derived by the score vector approach when λ = 0.In this case, however, a partial likelihood is used instead of a full likelihood.For this reason, though U RC and V RCC or alternatively U IC and V ICS may be used to analyze right-censored data, both approaches do not exactly coincide.

Installation and dependencies
The FHtest package can easily be installed from any CRAN repository by executing the R command

R> install.packages("FHtest")
To use FHtest, several R packages need to be available.As shown on the corresponding web page on CRAN (https://CRAN.R-project.org/package=FHtest),FHtest depends on the packages interval (Fay and Shaw 2010) and KMsurv (Yan, Klein, and Moeschberger 2012), and it imports functions from survival (Therneau 2017), perm (Fay and Shaw 2010), and MASS (Venables and Ripley 2002).The FHtest package is partially constructed over the interval package, which itself depends on the packages Icens (Gentleman and Vandal 2017) and MLEcens (Maathuis 2013): functions in both packages use similar structures in the input arguments and output values; some functions in FHtest are modifications of functions in interval and use the same internal procedures.The KMsurv package gives right-censored data sets that are used for illustration purposes.Both the survival and the MASS packages are recommended packages of R and are, hence, included in all binary distributions of R. Finally, the package perm is from the same authors as interval and it is installed together with this package.
Three of the packages -interval, MLEcens, KMsurv -will be automatically installed together with FHtest, if this has not been already done before.By contrast, package Icens is not available at the CRAN repositories, but at the Bioconductor web site (http://www.bioconductor.org/).For this reason, it has to be installed separately, for example, by executing the following instructions in R: R> source("http://bioconductor.org/biocLite.R") R> biocLite("Icens") Once all packages are installed, package FHtest can be loaded and used:

The main functions
The four main functions of the FHtest package correspond to the censoring schemes and approaches exposed in Sections 2.1 through 2.4: • FHtestrcc: The Fleming-Harrington test for right-censored data based on counting processes.
• FHtestrcp: The Fleming-Harrington test for right-censored data based on permutations.
• FHtesticp: The Fleming-Harrington test for interval-censored data based on permutations.
• FHtestics: The Fleming-Harrington test for interval-censored data based on a score vector distribution.
There are several features that are common to all four functions.First, the (censored) data and the group covariate can be specified in two different ways.One way is to assign two vectors, L and R, that define the censoring interval, and a group vector that gives the group covariate.The additional arguments Lin and Rin determine whether the endpoints are included or not.Another way is to use the common formula notation by including a 'Surv' object of the survival package; see the help page of function Surv for more details.
Second, all functions allow for two-sample, k-sample and trend tests with hypotheses and alternatives as discussed in Section 2. The group vector determines the type of test: a twosample test when group has two unique values, otherwise a k-sample test when group is a character or factor vector, and a trend test when group is numeric.By default, the common argument alternative is set to "different", which corresponds to a two-sided alternative.
For a one-sided alternative as in (3), the user can choose alternative = "increasing" for an increasing trend test or alternative = "decreasing" for a decreasing trend test.
Third, the four functions have two argumentsrho and lambda -corresponding to the parameters ρ and λ of the weights of the Fleming-Harrington test statistic shown in (8).The only exception is function FHtestics, which does not allow values of lambda different from 0 as explained in Section 2.4.The default values are in all cases rho = 0 and lambda = 0.
Finally, the outputs of all functions have the same structure indicating, among others, the type of test, the censoring scheme, the values of rho and lambda, and the alternative.In addition, the four functions, actually, return lists that can be stored as R objects each with its own class according to the function in use.
The arguments shared by the functions FHtestrcp and FHtesticp, but not available for FHtestrcc and FHtestics, have to do with the permutation approach.For example, argument method can be used to choose the way the p values are computed.There are four methods, the last two of which are only available for the two-sample test: "pclt" to use the permutational central limit theorem (Sen 2006); "exact.mc" to use an exact method via Monte Carlo; "exact.network"for an exact method using a network algorithm (see, e.g., Agresti, Mehta, and Patel 1990), and "exact.ce"for an exact method using complete enumeration, which is appropriate for very small sample sizes.The default value is "pclt".
More details on all these aspects will be shown and explained in Section 4.

Data sets provided by package FHtest
Two data sets with interval-censored data are provided by package FHtest: duser and illust3.
The data frame duser contains the data of 940 injection drug users on five variables that come from the data base of the patients of the detoxification unit of the university hospital Germans Trias i Pujol in Badalona (Spain).The following output shows the first six observations: R> data("duser", package = "FHtest") R> head(duser) Variables left and right are, respectively, the left and right endpoints of the intervalcensored time (in months) from the start of injection drug use until seroconversion, which can be used as a proxy of HIV infection.Whenever left is 0, a left-censored time until seroconversion is given, whereas right-censored times are identified by a right endpoint of 9999.If both endpoints coincide, the time to seroconversion can be considered exact.Their frequencies are as follows: R> sum(duser$left == 0) [1] 530 R> sum(duser$right == 9999) [1] 343 R> sum(duser$left == duser$right) [1] 2 Hence, 940 − 530 − 343 − 2 = 65 observations are strictly interval-censored.
The three remaining variables of the data frame are calendar period (zper; a value between 1 and 4), sex (zgen; 0: male; 1: female), and age at first injection drug use.For more information on the data set, see, e.g., Gómez, Calle, Egea, and Muga (2000) and Oller and Gómez (2012).
The second data set, illust3, is a data frame with 1607 observations on 3 variables.These data come from an AIDS clinical trial that started enrollment in 1987 and was designed to study the benefits of Zidovudine therapy in patients in the early stage of HIV infection (Volberding, Lagakos, Grimes, Stein, Rooney, Meng, Fischl, Collier, Phair, Hirsch, Hardy, Balfour, and Reichman 1995;Oller, Gómez, and Calle 2004).Herein, the interval-censored variable is the time (in months) from treatment start until either AIDS onset or death.The third variable is the treatment group (1, deferred therapy; 2, 500 mg/day dosage; 3, 1500 mg/day dosage).The first six observations of illust3 are the following: R> data("illust3", package = "FHtest") R> head(illust3) Hence, 1607 − 326 − 672 − 149 = 460 observations are strictly interval-censored.

Examples with right-censored data
The data sets ovarian and bmt of the R packages survival (Therneau 2017)  The data come from a randomized clinical trial that compared two different treatments, in the following called Treatment 1 (rx = 1) and Treatment 2 (rx = 2), whose survival functions are shown in the left panel of Figure 1.According to that figure, there was a better survival at short term among patients with Treatment 2, but differences diminished after 1.5 years.
The data of the data frame bmt come from a study on the survival of 137 patients with leukemia.The complete data frame has 22 variables, among which are the disease-free survival time, that is, the time to either relapse or death (t2), the corresponding censoring indicator (d3; 1: relapse or death, 0: censored survival time), and the disease group (group): acute lymphoblastic leukemia (ALL), low risk acute myeloid leukemia (AML), and high risk AML.See the first six observations of these variables below: R> data("bmt", package = "KMsurv") R> dim(bmt) [1] 137 22 R> head(bmt[, c("group", "t2", "d3")]) group t2 d3 1 1 2081 0 2 1 1602 0 3 1 1496 0 4 1 1462 0 5 1 1433 0 6 1 1377 0 The estimated survival functions of the three groups are presented in the right panel of Figure 1.On average, the disease-free survival times were largest among low risk AML patients and smallest among high risk AML patients.
For further information on both data sets, see the corresponding help pages in R and the articles cited therein.
In the remainder of Section 4.1, the use of functions FHtestrcc and FHtestrcp for twosample, k-sample, and trend tests will be illustrated.Notice that for all tests used in this section, either function could be used.

Two-sample tests
For the first example, consider the logrank test with null and alternative hypothesis as shown in (1) in order to compare both treatments of the ovarian data set with respect to survival  Compared with the output of function survdiff, the information on the observed and (under H 0 ) expected number of events provided by function FHtestrcc is the same.What differs between both is that FHtestrecc's output is somewhat more detailed.For example, information is given on the values of the parameters (ρ and λ) of the weights of the Fleming-Harrington test statistic in (8) or on the alternative hypothesis H 1 .The test statistic does also differ since FHtestrecc provides the one presented in (4), which follows (asymptotically) a standard normal distribution, whereas survdiff uses its quadratic value, whose asymptotic distribution is the χ 2 distribution.The advantage of using the former that the alternative hypothesis may be one-sided.
As explained in Section 3.2, with all four functions of the FHtest package, the (censored) survival times can also be specified with two vectors, L and R, instead of using a 'Surv' object.In the case of right and uncensored data, vector L must contain the observed times, whereas vector R must contain either the uncensored survival times or infinity in the case of right-censored survival times.Hence, the following instruction returns the same result as before: R> with(ovarian, FHtestrcc(futime, futime / fustat, rx)) The logrank test used in the previous example puts the same weights on all addends in ( 7), but a researcher's interest could actually be to detect either short or long term differences between survival functions.In the second case, for example, more weight needs to be put on latter differences between observed and expected number of events.This is accomplished in the following example with λ = 1 while ρ remains 0. For this example, function FHtestrcp is used, which computes the p value using the permutation approach: R> FHtestrcp(Surv(futime, fustat) ~rx, data = ovarian, lambda = 1) Two-sample test for right-censored data Parameters: rho=0, lambda=1 Distribution: permutation approach (asymptotic approximation using central limit theorem) Data: Surv(futime, fustat) by rx N O-E rx=1 13 -0.00447rx=2 13 0.00447 Statistic Z= 0, p-value= 0.992 Alternative hypothesis: survival functions not equal The command syntax used is the same as with function FHtestrcc with the only difference that parameter λ was set to 1.The reasons for the very large p value is two-fold: on the one hand, differences between both survival functions at long term are rather small (see Figure 1), and, on the other hand, both samples are comprised by only 13 patients each.
Compared with function FHtestrcc's output, the most remarkable difference is the table on the difference between observed and expected events in both samples.Here, columns (O-E)ˆ2/E and (O-E)ˆ2/V are not provided since the p value is computed based on permutations (see Equations 9 and 10) and not on the (asymptotic) counting process distribution of the test statistic in (7).

k-sample tests
For the illustration of the k-sample test in (2), we will use the bmt data set and for the sake of a clearer output, we first put labels to each of the three study groups: R> bmt$group <-factor(bmt$group, labels = c("ALL", "AML low risk", + "AML high risk")) R> table(bmt$group) ALL AML low risk AML high risk 38 54 45 The corresponding survival functions are presented in the right panel of Figure 1.The estimation of the median survival is as follows: R> survfit(Surv(t2, d3) ~group, data = bmt) The use of function FHtestrcc for the k-sample test is apparently the same as before, but it is important to remark that the grouping variable must be either a factor or a character variable.Otherwise, a trend test is performed.
Concerning the output of the k-sample test, it can be seen that the test statistic used, the one in ( 5), follows a χ 2 distribution with k − 1 = 2 degrees of freedom.It also shows a very small p value, that is, the null hypothesis can be rejected and we can assume that there are differences between the three risk groups with respect to disease-free survival after a bone marrow transplant.Moreover, we can observe that, contrary to the output of function survfit, the three rows of the output's table are ordered alphabetically and not according to the internal coding of the factor group.

Trend tests
In the above situation or when dealing with an ordinal grouping variable, a trend test may be, actually, of greater relevance than the k-sample test.We reorder the factor group such that the risk group with best disease-free survival, low risk AML, is internally coded 1 and the one with worst disease-free survival, high risk AML, is coded 3: R> bmt$group <-relevel(bmt$group, ref = "AML low risk") R> levels(bmt$group) [1] "AML low risk" "ALL" "AML high risk" Hence, the null and alternative hypothesis of the trend test of interest are Contrary to the k-sample test before, in the case of the trend test, the grouping variable must be a numeric variable.For this reason, in the example below, function as.numeric is applied to factor group in order to use its internal numeric codes.It is also important to specify the value of argument alternative, the direction of the alternative hypothesis, which in ( 14) is decreasing: R> FHtestrcp(Surv(t2, d3)~as.numeric(group),data = bmt, rho = 1, lambda = 1, + alternative = "decreasing", method = "exact.mc") Trend FH test for right-censored data Parameters: rho=1, lambda=1 Distribution: permutation approach (exact method using Monte Carlo with 999 replications) Data: Surv(t2, d3) by as.numeric(group) N O-E as.numeric(group)=1 54 -2.633 as.numeric(group)=2 38 0.769 as.numeric(group)=3 45 1.864 Statistic= 3, p-value= 0.001 99 percent confidence interval on p-value: 0.00000 0.00529 Alternative hypothesis: decreasing survival functions (higher group implies earlier event times) Among other aspects, the output shows in its first line that the trend test has been applied, and in its last line it indicates the direction of the alternative hypothesis.The function makes use of test statistic (6).The order of the rows of the output's table follows the internal coding of the grouping variable.Therein, a negative sign in column O-E indicates less events observed than expected (under the null hypothesis), i.e., better survival than expected.Hence, the increase of the differences O-E from the first row to the last is in agreement with the alternative hypothesis of decreasing survival functions.
This example, actually, not only illustrates the accomplishment of the trend test with rightcensored data, but also the use of an exact method via Monte Carlo to compute the p value.For this purpose, the value of FHtestrcp's argument method was set to "exact.mc".As a result, in addition to the p value, the 99% confidence interval for the p value is also returned.
Both, the number of the Monte Carlo replications (999) and the confidence level for the estimation of the p value, can be modified by changing the default values of argument permcontrol.
To see which other characteristics of the permutation tests can be controlled through this argument, see the help page of (auxiliary) function permControl of the perm package, which is called by function FHtestrcp.
Further examples for the functions FHtestrcc and FHtestrcp can be found and studied by calling example using the function names as topic argument: R> example("FHtestrcc") R> example("FHtestrcp")

Examples with interval-censored data
For the illustration of functions FHtesticp and FHtestics, the data sets duser and illust3 of the FHtest package, which are presented in Section 3.3, will be used.As shown therein, This recodification is also preferable for the nonparametric estimation of the survival functions by means of the Turnbull estimator, since no upper limit for the survival times is assumed.

Two-sample tests
For the two-sample test, we will use the data set duser, which contains the times from first injection drug use until HIV seroconversion among 940 injection drug users (IDUs).The left panel of Figure 2 shows the estimation of the corresponding survival functions of both male and female IDUs.According to this graph, which is drawn with the plot method for 'icfit' objects from the interval package, the probability for seroconversion during the first three to four years is higher among female IDUs, after which the survival functions are nearly the same with the only difference that beyond 13 years, no more seroconversion is observed among male IDUs.
In order to test the null hypothesis of equal survival functions among male and female IDUs, in the following two examples, we will use function FHtesticp, which implements the permutation approach.In the first example, we carry out the logrank test (ρ = λ = 0), use the permutational central limit theorem to compute the p value and choose the two-sided alternative hypothesis as shown in (1).In addition, we use arguments that are only available with functions FHtesticp and FHtestics: The arguments Lin and Rin are both set to TRUE to indicate that the censoring intervals are closed intervals, which is meaningful since time is measured in months.Moreover, the argument icontrol is used to increase the maximum number of iterations used internally by the EM algorithm to estimate the survival functions.This is necessary here because convergence problems arise when using the default value of 10000 iterations.The output is very similar to the one of function FHtestrcp with the title being the only difference.Here, we can see that the data set consists of 759 men (zgen = 0) and 181 women and that the p value is 0.0694.What the output does not show is that the computational cost for this example is high.The execution of the previous command, with 940 interval-censored data, takes more than 40 seconds under OS X Yosemite with an Intel processor of 2.5 GHz and the following session info: R> sessionInfo() R version 3.3.3(2017-03-06) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: OS X Yosemite 10.10.5 If we used function FHtestics instead, the computational cost would roughly be the same.That is, the high computation time is not due to the use of the permutational central limit theorem, but due to the use of interval-censored data, in particular due to the necessary nonparametric estimation of the common survival function.
For this reason and in order to reduce computation times, if we are interested in carrying out further hypothesis tests with the same data, it is recommended to save the 'FHtesticp' object and use its element fit a posteriori as the value of FHtesticp's argument icFIT.This is done in the following example, where we put more weight on early differences and use a one-sided alternative that could be motivated by some prior knowledge: decreasing survival functions (group=1 has earlier event times) In this example, the use of the stored 'FHtesticp' object has greatly reduced the computational time to less than one second.

k-sample tests
The data set illust3, which contains the data from an AIDS clinical trial, will be used for the illustration of the k-sample test with interval-censored data.Since the group indicator is a numeric variable, we first put the corresponding labels to each of the three study groups, which are roughly of the same sample size: R> illust3$group <-factor(illust3$group, labels = c("Deferred therapy", + "500 mg/day dosage", "1500 mg/day dosage")) R> table(illust3$group) Deferred therapy 500 mg/day dosage 1500 mg/day dosage 541 538 528 The Turnbull estimates of the survival functions for time from treatment start until either AIDS onset or death are shown in the right panel of Figure 2. Therein, we can see that the estimated survival functions are nearly identical throughout the first year under treatment and then start to split: The survival function of the deferred therapy lies below the other two.These remain the same for nearly four years, after which the probabilities for disease-free survival are somewhat larger among the 1500 mg/day dosage group.
To test, whether these difference allow us to assume that there are really differences between the three treatments with respect to disease-free survival, we now use function FHtestics, which implements the score vector approach set forth in Section 2.4.The null and alternative hypotheses are the ones of the k-sample test as shown in (2) and for this reason, the grouping variable used in the following command syntax has to be a factor or a character vector.By setting rho = 3, we put more weight on early differences than to differences at long term, and since disease-free survival is given in months, arguments Lin and Rin are, again, set to TRUE.Like before, when we applied the k-sample test to right-censored data, the value of test statistic (5) is computed, which asymptotically follows a χ 2 with 2 degrees of freedom.This value is fairly large and, consequently, the p value is quite small.

Trend tests
We also apply the trend test to this data set using the same null but the opposite alternative hypotheses as in ( 14): H 0 : S 1 (t) = S 2 (t) = S 3 (t) ∀t > 0 vs.
H 1 : S 1 (t) ≤ S 2 (t) ≤ S 3 (t) ∀t > 0 with, at least, one inequality.(15) Here, S 1 is the survival function that corresponds to the deferred therapy, S 3 the one of the 1500 mg/day dosage treatment, and since H 1 states increasing disease-free survival functions, the argument alternative will now be set to "increasing".As with the trend test with right-censored data, the grouping variable must now be a numeric variable: increasing survival functions (higher group implies later event times) In both examples with function FHtestics, we either used equal weights for all differences between observed and expected events or put more weight on short-term differences.If, contrary to that, we wanted to put more weight on late differences, we would need to use function FHtesticp, because function FHtestics does not allow for values of lambda different from 0.
Further examples for the functions FHtesticp and FHtestics can be found and studied by executing the following instructions: R> example("FHtesticp") R> example("FHtestics")

Final remarks
In this paper we have presented the R package FHtest, whose functions perform tests for right-and interval-censored survival data based on the Fleming-Harrington class.We have shown that the FHtest package is a flexible and user-friendly tool that integrates and extends methods in other R packages and we have given several detailed examples of the use of each function.
The FHtest package has some limitations.For example, it covers left-and double-censored data as particular cases of interval-censored data, however, it does not cover neither truncated data nor doubly-censored (i.e., both the time origin and the event of interest are censored) data.Another intrinsic drawback is that weighted logrank tests are not appropriate for crossing hazards alternatives.
In the future, we plan to perform a simulation study to compare the four FHtest methods.With right-censored data and for both small and large samples, we are interested in studying the differences between distinct approaches: (i) the RC counting process versus the IC score vector method; (ii) the RC permutation versus the IC permutation method; and (iii) the RC counting process versus the RC permutation method.With interval-censored data, we are also interested in studying the differences between both IC approaches: the IC permutation versus the IC score vector method.
λ and, consequently, the interpretation of the parameters ρ and λ reproduces the interpretation in the original right-censored family.

Figure 2 :
Figure 2: Left panel: Survival functions of time from first injection drug use until HIV seroconversion among injection drug users.Right panel: Survival functions of time from treatment start until either AIDS onset or death among HIV-infected patients.Shaded rectangles represent the intervals within which the Turnbull estimate of S(t) is not identified.

R>
FHtestics(Surv(left, right, type = "interval2")  ~as.numeric(group), + data = illust3, Lin = TRUE, Rin = TRUE, alternative = "increasing") Trend FH test for interval-censored data Parameters: rho=0, lambda=0 Distribution: score vector approach Data: Surv(left, right, type = "interval2") by as.numeric(group) N O-E as.numeric(group)=1 541 51.4 as.numeric(group)=2 538 -8.0 as.numeric(group)=3 528 -43.4 Statistic Z= -4.2, p-value= 1.19e-05 Alternative hypothesis: Herein, the code for right-censored observations is where right equals 999 and identical values of left and right like in observation 5 indicate uncensored times.Contrary to data frame duser, there are no left-censored times, but there are right-censored times with a zero left endpoint, which, actually, can be considered missing since they do not contribute any information.The numbers of exact times, right-censored times, and missing data are: (Yan et al. 2012) al. 2012), respectively, will be used as examples for the use of the functions FHtestrcc and