Regression Modeling for Recurrent Events Possibly with an Informative Terminal Event Using R Package reReg

Recurrent event analyses have found a wide range of applications in biomedicine, public health, and engineering, among others, where study subjects may experience a sequence of event of interest during follow-up. The R package reReg offers a comprehensive collection of practical and easy-to-use tools for regression analysis of recurrent events, possibly with the presence of an informative terminal event. The regression framework is a general scale-change model which encompasses the popular Cox-type model, the accelerated rate model, and the accelerated mean model as special cases. Informative censoring is accommodated through a subject-specific frailty without any need for parametric specification. Different regression models are allowed for the recurrent event process and the terminal event. Also included are visualization and simulation tools.


Introduction
Recurrent event data arise when a study subject can experience a sequence of nonfatal events such as hospital admissions, repeated infection episodes, and tumor recurrences, during followup.A simple approach is to perform survival analysis with the first event only; however, this approach discards information on subsequent events and might not properly characterize the covariate effects on the recurrent event process (e.g., Claggett, Pocock, Wei, Pfeffer, McMurray, and Solomon 2018).Thus, approaches that address the sequential feature of the recurrent event times without information loss have attracted considerable attention (e.g., Cook and Lawless 2007;Amorim and Cai 2015;Charles-Nelson, Katsahian, and Schramm 2019).
A few R packages offer nonparametric methods for recurrent events.The survfit() function from the survival package (Therneau 2021) can compute the Nelson-Aalen estimator (Lawless and Nadeau 1995) of the cumulative intensity function of the recurrent event process, with the standard errors obtained from an infinitesimal Jackknife approach (Efron 1982).This allows users to perform two-sample tests by visually checking if the confidence intervals of the Nelson-Aalen estimates overlap.The reda package (Wang, Fu, Chiou, and Yan 2021) also provides an implementation of the Nelson-Aalen estimator via the mcf() function, with different options for variance estimation and the pseudo-score test (Cook, Lawless, and Nadeau 1996) for comparing Nelson-Aalen estimate from different groups.
Many regression models have been implemented via R packages.The popular Andersen-Gill (AG) model (Andersen and Gill 1982) can be fit with the coxph() function from the survival package as a generalization of the Cox relative risk model, with possibly time-varying coefficients.The cph() function from the rms package (Harrell Jr. 2018) extends the AG model to allow for time-dependent covariates.Both the coxph() and cph() provide an option to calculate a robust sandwich variance estimator of Lin, Wei, Yang, and Ying (2000) to account for the within-subject dependence.The tpr() function from the tpr package (Yan 2019) fits a temporal process regression that models the recurrent event process at each time point marginally (Fine, Yan, and Kosorok 2004).The condGEE package (Clement 2013) models the recurrent gap time process using methods of generalized estimating equation (Clement and Strawderman 2009).A different approach to account for the association between the recurrent events is to introduce frailty variables in the model.The rateReg() function from the reda package implements a gamma frailty model (Fu, Luo, and Qu 2016;Ma, Qu, and Fu 2021), whose estimating procedure requires a parametric assumption on the frailty variable and the recurrent event process.All these approaches require the censoring time to be independent of the recurrent event process given the covariates.
Recurrent events can be terminated by an informative censoring or a terminal event.The frailtypack package (Rondeau, Mazroui, and González 2012) provides a collection of frailty models to relax the conditional independent censoring assumption and jointly model the recurrent event process and the terminal event.Although these frailty models allow for a positive or negative association between the two outcome process, the validity of the inferences relies on a correct parametric assumption on the frailty variable.For this reason, frailty models that account for informative censoring without imposing parametric assumptions are particularly appealing.Such models can be formulated as, for example, Cox-type models (Wang, Qin, and Chiang 2001;Huang and Wang 2004;Huang, Qin, and Wang 2010) or accelerated mean models (Xu, Chiou, Huang, Wang, and Yan 2017).The regression parameters are estimated without information about the frailty variable.Both types of models are special cases of the generalized scale-change model (Xu, Chiou, Yan, Marr, and Huang 2020).An efficient implementation of the models have been made available in the reReg (Chiou and Huang 2022) package, which has not been widely known to users who need them.
The reReg package provides a comprehensive collection of practical and easy-to-use tools for exploratory and regression analyses of recurrent events.Features include event plots and Nelson-Aalen estimators to visualize recurrent events and terminal events for each subject with possibly multiple event types through ggplot2 objects (Wickham 2009).The package also provides regression model based on the general semiparametric scale-change model for the recurrent event process (Xu et al. 2020) that covers some of the most commonly used forms, including the Cox-type model, the accelerated rate model, and the accelerated mean model, as special cases.In the presence of terminal events, the package allows users to choose to jointly model the recurrent event process with the terminal event via a shared frailty, or to treat the terminal events as nuisances.In the former case, a general scale-change model or a special case of it can be specified for the hazard function of the terminal event.No parametric assumption on the latent frailty variable is needed.Standard errors of the estimates are obtained through efficient resampling procedures, which can run parallel on computers with multicores.These features make the reReg package appealing in facilitating recurrent event analyses in many application fields.
The rest of the article is organized as follows.Notation used to describe the underlying and observed recurrent event data structure is introduced in Section 2. The methodological framework and the estimating procedure for modeling the recurrent event process are described in Section 3. The structure of the package is presented in Section 4. Illustrations are provided in Sections 5 and 6.Section 7 concludes with a few remarks.

Notation for recurrent events
Let N (t) be the number of recurrent events occurring over the interval [0, t] and D be the failure time of interest subjects to right censoring by C. Define the composite censoring time Y = min(D, C, τ ) and the failure event indicator ∆ = I{D ≤ min(C, τ )}, where τ is the maximum follow-up time.We assume the recurrent event process N (•) is observed up to Y .Let X be a p-dimensional covariate vector.Consider a random sample of n subjects, the observed data are independent and identically distributed (iid) copies of {Y i , ∆ i , X i , N i (t), 0 ≤ t ≤ Y i }, where the subscript i denotes the index of a subject for i = 1, . . ., n.Let m i = N i (Y i ) be the number of recurrent events the ith subject experienced before time Y i , then the jump times of N i (t) give the observed recurrent event times t i1 , . . ., t im i when m i > 0. Thus, the observed data can also be expressed as iid copies of {Y i , ∆ i , X i , m i , (t i1 , . . ., t im i )}.The primary interests in recurrent event data analysis often lie in making inference about the recurrent event process and the failure event and understanding the corresponding covariate effects.The built-in dataset, simDat, is a simulated example of a recurrent event data: R> library("reReg") R> subset(simDat, id %in% c(5, 7, 12)) The simDat data set consists of 200 hypothetical subjects, whose event times and covariate information were generated by the simGSC() function described in Section 5.6.The data set consists of seven variables.The id variable is used to denote the subject identification.The t.start and t.stop variables mark the start and end of each time interval, respectively.The event variable is the recurrent event indicator that indicates whether a recurrent event was observed (event = 1) at the end of the time interval.The status variable is the failure event indicator (∆) that indicates whether the recurrent event process was terminated by a terminal event (status = 1).Finally, the x1 variable is a binary covariate and the x2 variable is a continuous covariate.
The simDat data presents the recurrent event in person time, where the beginning of time is zero.Thus, for the ith subject, the endpoint of the time intervals, i.e., t.stop, represents a recurrent event time (t ij ) when event = 1 or a censoring time (Y i ) when event = 0.For example, in simDat, Subject 5 did not experience a recurrent event before the terminal event at 1.170 while Subject 7 experienced a recurrent event at 0.469 before the terminal event at 0.832.On the other hand, Subject 12 experienced two recurrent events at 3.222 and 13.561 before the non-terminal event at 60.We demonstrate the usage of the reReg package for data sets in forms similar to simDat in Section 5.

Nonparametric estimation of rate function
Statistical methods for recurrent event data often involve modeling the recurrent event process via its intensity function or rate function.Define the history of the recurrent event process at time t as H(t) = {N (s); 0 ≤ s < t}, t > 0, the intensity function for N (t) is On the contrary, the rate function for N (t) is unconditional on the history and is defined as The intensity function completely specifies a recurrent event process while the rate function only specifies the population average of the occurrence rate.The latter gives more direct interpretations for explaining covariate effects under weaker assumptions and is generally preferred in practice.The cumulative rate function, defined as is also the expected number of recurrent events occurring in [0, t].
Under independence censoring, a nonparametric estimate of Λ(t) known as the Nelson-Aalen estimator is (Lawless and Nadeau 1995) b which is also known as the mean cumulative function (MCF).The MCF presents the average number of recurrent events per subject observed by time t while adjusting for the risk set.
In the case when all subjects remain at risk of recurrent events throughout the study, i.e., , Equation 1 reduces to the cumulative sample mean function introduced in Chapter 1 of Cook and Lawless (2007).
When censoring is not independent of the recurrent event process, nonparametric estimators proposed by Wang et al. (2001); Wang and Chiang (2002) that account for informative censoring via a latent frailty variable can be considered.Let Z be a non-negative subject-specific latent frailty variable such that N (•) is conditionally independent of Y given Z.The multiplicative intensity model assumes that N (t) is a non-stationary Poisson process with the intensity function Zλ 0 (t), where the baseline intensity function λ 0 (t) is a continuous function.Conditioning on Z, the independent increments assumption of the Poisson process implies that the rate function is the intensity function.Thus, the rate function and the cumulative rate function can be expressed as λ(t) = E{Zλ 0 (t)} = µ z λ 0 (t) and Λ(t) = µ z Λ 0 (t), respectively, where µ z = E(Z) and Λ 0 (t) = R t 0 λ 0 (u)du.The multiplicative model structure implies that the event occurrence rate is inflated (or deflated) by the frailty variable Z.
To estimate Λ(t), Wang et al. (2001) argued that, given (Y i , Z i , m i ), the event times {t i1 , . . ., t im i } are the order statistics of independent copies of random variables with the density function and ) is a truncated density function of f (t) with right-truncation on Y i and does not depends on Z, the conditional likelihood based on {t i1 , . . ., t im i } is in the form of .
The conditional likelihood L c is maximized at the nonparametric maximum likelihood estimator (NPMLE) (Wang, Jewell, and Tsai 1986) b where {s (ℓ) } are the ordered and distinct values of is the number of events occurring at s (ℓ) , and R (ℓ) is the number of events satisfying The estimating procedure does not require any distributional assumption on Z, making it more appealing than conventional parametric approaches under informative censoring.This estimator can be plotted by applying the generic function plot() to a reReg object constructed as an intercept-only-model of (6) as illustrated in Section 5.4.

A joint Cox-type model
The AG model is commonly considered when the interest is in evaluating the multiplicative covariate effect on the rate function of the recurrent event process.The AG model specifies the rate/intensity function of the recurrent event process N (t) as where λ 0 (•) is an unspecified baseline rate function, and β is a p-dimensional regression parameter.The estimate of β can be obtained through the partial likelihood approach (Cox 1975) under the Poisson assumption, where the time increments between events are conditionally independent given covariates.When the correlation among recurrent events is not induced by the covariates, Pepe and Cai (1993) and Lin et al. (2000) relaxed the Poisson assumption and proposed a robust sandwich variance estimator.Applying the AG model is straightforward as these inference procedures are implemented in major software packages.In the R environment, the AG model with the robust sandwich variance estimator can be fitted by the coxph() function from the survival package (Therneau 2021) with the subjects' identification specified via the cluster option.The same procedure can also be called conveniently with the reReg() function as illustrated in Section 5.4.
The AG model and many of its extension require the non-informative censoring assumption, that assumes the recurrent events and the censoring times are independent given covariates.However, the non-informative censoring assumption could be violated when the recurrent event process is terminated by informative dropouts or failure events.Wang et al. (2001) extends the AG model to accommodate informative censoring via the use of a frailty variable.Specifically, Wang et al. (2001) assumes the Cox-type model where Z is a non-negative subject-specific latent frailty variable.When covariate effects on both the recurrent event process and the failure time are of interest, Huang and Wang (2004) extends Model 4 to the joint model where h 0 (•) is the baseline hazard function for the failure time, θ is a p-dimensional regression parameter.The frailty variable, Z, has a proportional effect on both λ(t) and h(t), thus inducing a positive association between the recurrent event process N (•) and the censoring events (D, C).The proportionality assumption implicates a wide range of situations.For example, patients who survived longer tend to have fewer cardiovascular-related hospitalizations (Rogers, Yaroshinsky, Pocock, Stokar, and Pogoda 2016).For model identifiability, we assume Λ 0 (τ ) = R τ 0 λ 0 (u)du = 1 and E(Z|X) = E(Z) = µ Z for some constant µ Z .In contrast to many shared-frailty models that require a parametric assumption, the distribution of Z is left unspecified throughout the estimating procedure.The joint model (5) presents a common phenomenon where both λ(t) and h(t) can be both inflated (or deflated) by the shared subject-specific frailty Z.For example, if the recurrent events are recurrent infections after transplant and the failure event is death, then it is reasonably to presume that patients with higher rate of recurrent events are potentially more vulnerable and are more likely to experience a failure event.Models 4 and 5 are available in reReg by calling the reReg() function with argument model = "cox" and model = "cox|cox", respectively.
The general strategy in the estimating procedure for Model 5 is to first estimate the regression parameter in the rate model and the baseline cumulative rate function, then apply the borrowing-strength method of Huang and Wang (2004) to incorporate recurrent event information to estimate the regression parameters in the hazard model and the baseline cumulative hazard function.In the following, we outline the estimating procedure implemented in reReg that is used to estimate the parameters in Model 5.The parameter β in the rate model of the Model 5 can be estimated by solving the following estimating equation: 2, and 1 is a vector of 1's.Denote the estimator of β by b β n .Under Model 5, the relationship The score function for θ can be derived directly from the partial likelihood under the hazard model specified in (5); however, it cannot be evaluated because the frailty variable Z is unobservable.The borrowing-strength method of Huang and Wang (2004) proposed to plug in b Z i for Z i in the score function.The score functions, with Z i and b Z i , attain the same convergence function and the zero-crossing of the latter serves as an estimator for θ.Following the arguments in Huang and Wang (2004), the estimating equation used to estimate θ is Given the estimates, b β n and b θ n , the baseline cumulative hazard function can be estimated by b Detailed derivations of estimating equations as well as the asymptotic properties of the estimators can be found in Huang and Wang (2004).
The nonparametric bootstrap method for clustered data is adopted to estimate the standard errors of the estimators.The bootstrap samples are formed by resampling the subjects with replacement of the same size from the original data.The above estimating procedures are then applied to each bootstrap sample to provide one draw of the bootstrap estimate.With a large number of replicates, the asymptotic variance matrices are estimated by the sample variance of the bootstrap estimates.The borrowing-strength approach yields stable performance in most of our simulation studies, but numerical issues occasionally arise when the majority of b Z i 's are zero or when extremely large values are observed in b Z i .In the latter case, extremely large values of b Z i are more likely to occur when evaluating b Λ 0n (•) at small Y i 's where the number of events satisfying s (ℓ) ≤ Y i is small.We observe that such numerical issues are more frequent in the bootstrap procedure.In an attempt to enhance numerical stability, we provide an option for a heuristic adjustment (Wang, Ma, and Yan 2013;Chiou, Xu, Yan, and Huang 2018) A warning message will be issued if a convergence issue is detected in reReg().When a subset of bootstrap estimates do not converge, the asymptotic variance matrices are estimated by the sample variance of the converged bootstrap estimates.

A generalized joint frailty scale-change model
The Cox-type models in Equations 4 and 5 assumes that the covariates have proportionate effects on the rate function of the recurrent event process and the hazard function of the failure event over time.When the proportionality assumption is not met or when the Coxtype models cannot properly characterize the covariate effects the reReg package provides a general class of semiparametric scale-change models that includes the Cox-type models as special cases.In the same spirit of the joint model in Equation 5, we consider the generalized joint frailty scale-change model of the form where the p-dimensional regression coefficients (α, η) and (β, θ) correspond to the shape and the size parameters, respectively.We impose the same model identifiability assumption as in Model 5 and assume Λ 0 (τ ) = R τ 0 λ 0 (u)du = 1 and E(Z|X) = E(Z) = µ Z for some constant µ Z , without making a full parametric assumption on Z.We further assume λ 0 (t) and h 0 (t) are not in the Weibull class, i.e., λ 0 (t) ̸ ∝ t q and h 0 (t) ̸ ∝ t r for some q and r.The Weibull class assumption is also assumed for models that focus only on the rate model or the hazard model of the joint model in Equation 6(e.g., Chen and Jewell 2001;Sun and Su 2008;Xu et al. 2020).
The flexible formulation of (6) includes several popular semiparametric models as special cases, some of which have been recognized by different authors.For example, in the absence of the shape parameters, i.e., α = η = 0, Model 6 reduces to the joint frailty Cox-type model in (5).On the other hand, in the absence of the size parameters, i.e., β = θ = 0, Model 6 reduces to the joint frailty accelerated rate model.When the shape and size parameters are assumed equal, i.e., α = β and η = θ, Model 6 reduces to the joint frailty accelerated mean model considered in Xu et al. (2017).Setting η = θ = 0, Model 6 reduces to the generalized scale-change model considered in Xu et al. (2020), which also includes different types of rate models as special cases.The Weibull class assumption excludes the scenario when the three submodels (the Cox-type model, the accelerated rate model, and the accelerated mean model) coincide, which results in the identifiability between the shape and the size parameters.On the other hand, the Weibull class assumption is not needed for the submodels because they only involve one type of regression coefficient.A possible approach for checking the Weibull class assumption is described in Xu et al. (2020), which is motivated by the fact that, under Figure 1: Illustration of the covariate effects on the rate function in Model 6 when α < 0, β < 0, and X > 0. An accelerated rate model modifies the time-scale of λ 0 (t) to λ 1 (t).An accelerated mean model modifies λ 0 (t) to λ 2 (t) in a way that the cumulative mean function of λ 2 (t) is a time-scale change of that of λ 0 (t).A Cox-type model modifies the scale of λ 0 (t) to λ 3 (t).The proposed model modifies the time-scale and the magnitude of λ 0 (t) simultaneously to λ 4 (t), either through the Weibull model, the proposed model reduces to the Cox-type model of Wang et al. (2001), and log Λ 0 (t) is linear in log(t).
The joint model ( 6) allows two types of covariate effects: the shape parameters (α and η) impose a scale-change effect that alters the time scale and the size parameters (β and θ) impose a multiplicative effect that modifies the risk and hazard.In addition, the differences of the shape and size parameters, β − α and θ − η, modify the cumulative risk and hazard, respectively.Figure 1 illustrates the different covariate effects on the rate function where λ i (t), i = 0, . . ., 4, are hypothetical rate functions under the different model specifications of (6).A one-unit increase in the covariate would change λ 0 (t) to λ 1 (t) by modifying the time scale by a factor of e α and then change λ 1 (t) to λ 4 (t) by modifying the magnitude by a factor of e β ; or change λ 0 (t) to λ 3 (t) by modifying the magnitude by a factor of e β and then change λ 3 (t) to λ 4 (t) by modifying the time scale by a factor of e α .If X is a treatment indicator, then e β characterizes the risk ratio between the treatment group (X = 1) at time t and the control group (X = 0) at time te α .When α = β, the combined changes in time scale and in magnitude are in such a way that the resulting cumulative mean function has a time scale modification.Similar arguments can be made when interpreting η and θ on the hazard function.
The estimating procedure for the generalized joint frailty scale-change model is motivated by those used in Huang and Wang (2004) and Xu et al. (2020).Let a be a p-dimensional vector and define the transformations Following the estimating procedure proposed in Xu et al. (2020), the shape parameter, α, can be estimated by solving the following estimating equation where ϕ i (α), i = 1, . . ., n, is a weight function.Some common choices of ϕ(α) are 1 and } corresponding to the log-rank weight and the Gehan weight, respectively.Both the log-rank weight and the Gehan weight are implemented in reReg().

With b
α n as the estimator of α, the baseline cumulative rate function can be estimated by , γ can be estimated by solving the following estimating equation for ψ: Then β can be estimated by b The estimating function, S 2n (ψ), is monotone and continuously differentiable with respect to ψ, and its root can be easily obtained.The detailed derivation of the estimating procedure and the asymptotic properties for α, β, and Λ 0 (•) can be found in Xu et al. (2020).
With the estimated parameters from the rate model, the aforementioned borrowing-strength procedure can be extended to estimate the parameters in the hazard model.Conditioning on which implies the individual frailty value Z i can be estimated by .
The following estimating equations used to estimate η and θ in the hazard model are generalized from these proposed in Chen and Jewell (2001) to accommodate informative censoring through Z: where φ i (η, θ), i = 1, . . ., n, is a weight function that plays a similar role as ϕ(α) in S 1n (α).
The weight function φ i (η, θ) = 1 and φ i (η, θ) = Several equation solvers are available in the package to find the root of the estimating equations.The default solver is the derivative-free Barzilai-Borwein spectral method (Barzilai and Borwein 1988).This method is an iterative procedure that updates the solution guided by a scalar spectral steplength and a linear search direction.We found that choosing good initial values for (η, θ) improves the convergence performance when solving for S 3n (η, θ) = and S 4n (η, θ) = 0, while it is less sensitive to the choice of the initial value when solving for In addition to the nonparametric bootstrap approach, an efficient resampling-based sandwich estimator proposed in Xu et al. (2020) is implemented to estimate the asymptotic variance for α and β in Model 6.The estimating equation to obtain the regression parameters in the special cases of Model 6 can be constructed by modifying the above estimating equations.For example, the estimating equation used in estimating β in the Model 4 and the rate model in Model 5 can be constructed by setting α = 0 in S 2n (ψ) = 0.When the rate model is an accelerated rate model (β = 0), the estimating procedure only consists of solving S 1n (α) = 0. On the other hand, when the rate model is an accelerated mean model (α = β), the estimating equation is in the form of S 2n (α) = 0, i.e., 1 Similarly, the estimating equation for the different form of the hazard model can be constructed by modifying S 3n (η, θ) with the corresponding relationship between η and θ.For example, when the hazard function is in the Cox-type form as in the Model 5, the estimating equation used to estimate θ can be constructed by S 3n (0, θ).On the other hand, when the hazard function is an accelerated rate model (θ = 0) or an accelerated failure time model (η = θ), the estimating equations can be constructed by using S 3n (η, 0) or S 3n (η, η), respectively.The estimation of the individual frailty and the baseline functions in the special cases of Model 6 can be modified similarly.Since the estimation of the regression parameters in the hazard model only depends the parameters in the rate model through the estimate of the individual frailty, the above estimating procedure can be adopted to scenarios where the rate model and the hazard model do not have the same form.
The structure of Model 6 facilitates model selection among the submodels via hypothesis testing of model parameters.For example, as described in Xu et al. (2020), the Cox-type proportional assumption in the rate model can be tested through H 0 : α = 0 versus H 1 : α ̸ = under Model 6.In this case, the test statistic can be constructed with where Σ(b α n ) −1 is the estimated covariance matrix of n 1/2 (b α n − α).Xu et al. (2020) has shown that T cox converges weakly to a Chi-square distribution with p degrees of freedom.
Similarly, test statistics β n can be used to test the null hypotheses H 0 : γ = 0 and H 0 : β = 0, respectively.

Package structure
The reReg package provides useful functions for data exploration, nonparametric estimation, regression analysis, and recurrent event data simulation.Most functions in the reReg package are built around recurrent event objects, which are created by the Recur() function imported from the reda package.The recurrent event object plays a similar role as the survival object created by the Surv() function in the survival package as both are used as a response variable in a model formula for the respective packages.However, the recurrent event object is designed to track the study subjects' recurrent times, in addition to failure time and censoring status.
The main function for regression modeling in recurrent event analysis is reReg().The implementation of reReg() is based on a series of recently developed methods that accommodate informative censoring via a unspecified frailty variable and allows users to either treat the terminal time as a nuisance or incorporate it in a joint model framework.Once a regression model is fitted with reReg(), the S3 methods, including coef(), vcov(), and summary() can be applied to a reReg object to easily extract relevant regression results.The S3 plot method can be applied to obtain the baseline cumulative rate/hazard function and the predicted cumulative rate/hazard functions function.The plot method utilizes the ggplot2 plotting environment (Wickham 2009) for visualization to allow extensive flexibility and customization.When an intercept-only model is specified with reReg(), the proposed method provides nonparametric estimation based on the NPMLE of Wang et al. (2001).Furthermore, the reReg() function includes some of the common recurrent event models such as the AG model as alternative options.
Besides fitting regression models, simple descriptive statistics and event plot of the recurrent event process can be obtained by directly apply the S3 methods summary() and plot() to the recurrent event object, created by Recur().For a more sophisticated event plot, the recurrent event object can then be passed to the plotEvents() function in a model formula.For convenience, and as a counterpart to the NPMLE of Wang et al. (2001) in the absence of informative censoring, the function mcf() is imported from the reda package to compute the MCF estimates.The reReg package allows users to simulate recurrent event data from the generalized frailty joint scale-change model in Equation 6 via the simGSC() function.The simGSC() function provides a great flexibility in the specification of the baseline rate/function functions, regression parameters, covariate information, censoring distribution, and frailty distribution.Table 1 summarizes the functions mentioned above as well as their compatible S3 methods.These functions are then illustrated with simulated data in Section 5 and are applied to real data in Section 6.The event times are specified via the time argument as a vector that represents the time of recurrent events and censoring, or as a list of time intervals that contains the starting time and the ending time of the interval.In the latter, the intervals are assumed to be open on the left and closed on the right, where the right end points are the time of recurrent events and censoring.When the events are recorded in patient times, where each patient enters the study at time zero as in the simDat data, the time argument can be specified with time = t.stop or with time = t.start%to% t.stop.The infix operator %to% is used to create a list of two elements containing the endpoints of the time intervals.When the patients have different starting time, the different starting time can be specified via the origin argument.Alongside the event times, optional arguments including id, event, and terminal, are used to represent the subject's identification, the recurrent event indicator, and the failure event indicator, respectively.In addition to numeric values, Date and difftime are also allowed.Finally, the check argument specifies how to perform validation checks.

Creating a recurrent event object
The following code creates a Recur object from the simDat data.R> (reObj <-with(simDat, Recur(t.start%to% t.stop, id, event, status))) The Recur object is an S4 class object and the show() method for the Recur object presents recurrent events in intervals.The endpoint of the last interval within each subject indicates the failure time and its censoring status; * and + indicate whether the recurrent process is censored by a terminal event or non-terminal event, respectively.For concise print, the maximum number of intervals printed per subject is limited to three controlled by options("reda.Recur.maxPrint").For example, in simDat, subjects 1 to 4 experienced more than three events, of which only the first two and the last intervals are displayed.The generic method function summary can be applied to an Recur to produce simple descriptive statistics.

Creating event plots
An easy way to glance at recurrent event data is by plotting the event plots, which can be created by several approaches in the reReg package.One way to create a unstratified event plot is to apply the generic function plot() function to Recur objects.The plot method for Recur objects internally calls the plotEvent() function to create event plots, which is created in the ggplot2 environment (Wickham 2009) to allow extensive flexibility and customization.The synopsis of plotEvents() is: R> args(plotEvents) ## function (formula, data, result = c("increasing", "decreasing", ## "asis"), calendarTime = FALSE, control = list(), ..

.) ## NULL
The required argument for plotEvent() is formula, which contains an Recur object and categorical covariates separated by ~, where the categorical covariates are used to stratify the event plots.Plotting the Recur object by plot() is only available in reReg and is equivalent to calling the plotEvent() function with an intercept-only-model.The data argument is an optional data frame contains the variables occurring in the formula.The result argument is an optional character string indicating whether to arrange the subjects by their follow-up times in an ascending order (default), descending order, or plot as-is.The calendarTime argument is a logical value indicating whether to create the event plot in calendar time or patient time.When plotting in calendar time, the result argument sorts the subjects by the chronological order of terminal events.The plotEvent() also includes convenient graphical parameters for plot annotation that can be specified in the control argument.The full list of these parameters with their default values are displayed below.The parameters xlab, ylab, and main are adopted from the base R graphics package for modifying labels to the x-axis, the y-axis, and the main title, respectively.The parameters terminal.nameand recurrent.nameare character labels displayed in the legend.The parameter recurrent.typespecifies the recurrent event types to be displayed in the legend when more than one recurrent event types is specified in the Recur object.The parameters legend.positionand base_size are adopted from the ggplot2 theme to control the location of the legend box and the base font size in pts, respectively.The remaining parameters control the event plot appearance, including the size (cex) and transparency (alpha) of the symbols, the width (width) and color (bar.color) of the event bars, and the color, shape, and stroke of the recurrent event, terminal event, and non-terminal event symbols (recurrent.color,recurrent.shape,recurrent.stroke,etc.).The following code demonstrates the plot method for the Recur object, with the resulting unstratified event plot displayed in Figure 2a.

R> plot(reObj)
The above plot method is an alias for the following plotEvents() call.
R> plotEvents(Recur(t.start%to% t.stop, id, event, status) ~1, + data = simDat) In the event plot, the gray horizontal lines represent each subjects' follow-up times, • marks the recurrent events, and ▲ marks terminal events.With the default setting, the event plot is arranged so that subjects with longer follow-up times are presented on the top.Stratifying the event plot could provide new insights into the different covariate effects.For example, an event plot stratified by the binary variable x1 is presented in Figure 2b, which is created by specifying a formula with the Recur object as the response variable and x1 as a covariate in plotEvents().
R> plotEvents(Recur(t.start%to% t.stop, id, event, status) ~x1, + data = simDat) In this case, subjects with x1 = 1 seems more likely to experience recurrent events and terminal event sooner.The plotEvents() can accommodate multiple recurrent types specified via the event argument in the Recur object.To illustrate this feature, we perturb the event indicator so that the value of event indicates the index of the different recurrent event.The following command is used to create the stratified event plot with different recurrent event types presented in Figure 2c.In this case, subjects with later censoring times are presented on the top and the date string are printed on the axis label of the event plot.

Plotting mean cumulative functions
The MCF is useful in describing and comparing the average number of events of an individual and between groups.Thus, it provides additional insights into the longitudinal patterns of the recurrent process.To create the NPMLE of Wang et al. (1986) under the informative censoring assumption, one can fit an intercept-only model with reReg(), then applying the generic function plot() to the reReg object.The following code illustrates this feature with output displayed in Figure 3a.R> basebind(g1, g2, legend.title= "X1", legend.labels= 0:1) Under the independent censoring assumption, the reReg package imports the mcf() function from the reda package to compute the MCFs.The usage of the mcf() function is similar to that of the plotEvent(), as they both allows the recurrent event process to be specified in a model formula.When the overall sample MCF is of interest, the Nelson-Aalen estimate in Equation 1 can be created by applying the generic function plot() to the Recur object with an additional argument mcf = TRUE.This then internally calls the mcf() function with an intercept-only-model.
The following command plots the Nelson-Aalen estimate in Figure 3c, where the 95% confidence interval is enabled by additionally setting mcf.conf.int= TRUE.

Fitting regression models
The main function for fitting semiparametric regression models in the reReg package is the reReg() function.When covariates are specified in the reReg() function, the reReg() function fits Model 6 using the following arguments R> args(reReg) ## function (formula, data, subset, model As in the plotEvents(), the arguments formula and data are used to specify the model formula and the optional data frame, respectively.The argument model is a character string used to specified the model type.The possible model types are cox, ar, am, and gsc, corresponding to the Cox-type model, the accelerated rate model, the accelerated mean model, and the generalized scale-change model, respectively.When the interest is the covariate effects on the risk of recurrent events and the terminal event is treated as nuisances, model = "cox" and model = "gsc" give the Cox-type model ( 4) and the generalized scale-change rate model considered in Wang et al. (2001) and Xu et al. (2020), respectively.When the recurrent event process and the terminal events are modeled jointly, the types of rate function  For inference, the variance estimates are obtained from the nonparametric bootstrap approach with the argument se = "boot".When fitting the generalized scale-change rate model of Xu et al. (2020), an efficient resampling-based sandwich estimator is also available via se = "sand".The number of bootstrap or resampling is controlled by the argument B. When B = 0 (default), the variance estimation procedure will not be performed and only the point estimates will be returned, which can be useful when the variance estimation is time consuming.The argument control is a list specifying changes to the algorithm control parameters.The full list of control parameters is: R> args(reReg.control)## function (eqType = c("logrank", "gehan", "gehan_s"), solver = c("BB::dfsane", ## "BB::BBsolve", "BB::BBoptim", "optim"), tol = 1e-07, init = list(alpha = 0, ## beta = 0, eta = 0, theta = 0), boot.parallel= FALSE, boot.parCl= NULL, ## maxit1 = 100, maxit2 = 10, trace = FALSE, numAdj = 1e-07) ## NULL The argument eqType is a character string used to specify the type estimating equation is used in the estimating procedure.The available options are the log-rank-type ("logrank"), the Gehan-type ("gehan"), or the induced smoothing Gehan-type ("gehan_s") estimating equation is used in the estimating procedure.The default value is "logrank" corresponding to setting ϕ i (α) = 1 in S 1n (α) and φ i (η, θ) = 1 in S 3n (η, θ) and S 4n (η, θ).The argument solver is a character string used to specify the equation solver used in solving the estimating equations.The default equation solver ("BB::dfsane") uses the derivative-free Barzila-Borwein spectral approach for solving nonlinear equations implemented in dfsane() from the package BB (Varadhan and Gilbert 2009).Setting solver = "BB::BBsolve" calls the wrapper function BBsolve() in the BB package to locate a root with different Barzilai-Borwein steplengths, non-monotonicity parameters, and initialization approaches.Based on our observation, the "BB::BBsolve" algorithm generally exhibited more reliable convergence but the solver = "BB::dfsane" algorithm provides a better balance between convergence and speed.Alternative options are solver = "BB::BBoptim" and solver = "optim" that attempt to identify roots by minimizing the ℓ 2 -norm of the estimating functions.The options solver = "BB::BBoptim" and solver = "optim" call the BBoptim() function from the package BB and the base function optim(), respectively.The argument tol is the absolute tolerance used in the convergence criteria, while the arguments arguments maxit1 and maxit2 control the maximum numbers of iterations in the risk model and the hazard model, respectively.The argument numAdj specifies the ϵ used in the heuristic adjustment.The argument init is a list of initial values used in the root-finding algorithms.The list members alpha, beta, eta, and theta correspond to the parameters α, β, η, and θ in Model 6, respectively.The default values for these initial values are zeros.
In an attempt to overcome the computational burden in bootstrap variance estimation, parallel computing techniques based on methods in the parallel package will be applied when boot.parallel = TRUE.The number of CPU cores used in the parallel computing is controlled by the argument boot.parCl,whose default value is half of the total number of CPU cores available on the current host identified by parallel::detectCores() %/% 2.
To illustrate the usage of the reReg() function, we first fit Cox-type rate model (4) to the simDat data with the variance estimate obtained from the nonparametric bootstrap approach with 200 bootstrap replicates in the following.
R> fm <-Recur(t.stop,id, event, status) ~x1 + x2 R> set.seed(0)R> system.time(fit1<-reReg(fm, data = simDat, model = "cox", B = 200)) ## user system elapsed ## 0.972 0.008 0.982 In this example, the implemented method finished in a reasonable computing time, which is based on a Linux machine with Core i7-6700@3.40GHz processor and without parallel computing in the variance estimation.The summary of the model is presented below.The summary suggests statistically significant negative effects of both covariates on the rate function of the recurrent event process.This indicates that subjects with larger x1 and x2 values are likely to experience less frequently throughout the follow-up.In this case, we assumed the primary interest is the covariate effects on the risk of recurrent events and treat the terminal events as nuisances.When it is of interest to analyze the recurrent events and the terminal events simultaneously, we fit the joint Cox-type model ( 5) and present the summary as below: The top panel of the summary is identical to that from fitting the Cox-type rate model (4) because the rate models in Models 4 and 5 are identical and the parameters therein are obtained from the same estimating equation.This reiterates the implemented estimation procedure is in a two-step fashion.When the terminal events are modeled as in fit2, reReg() additionally compute the parameters in the hazard model with the results presented in the lower panel of the summary.The summary exhibits statistically significant positive covariate effects on the hazard function, indicating that subjects with larger x1 and x2 values are in higher risk of terminal events.
The estimates of the baseline cumulative functions, e.g., Λ 0 (•) and H 0 (•), can be plotted by applying the generic function plot() to the reReg object.The following shows the S3 method for plotting a reReg object.

..) ## NULL
When a joint model is specified in reReg(), the baseline argument provides options to display only the baseline cumulative rate function with baseline = "rate", only the baseline cumulative hazard function with baseline = "hazard", or both the baseline cumulative rate function and the baseline cumulative hazard function with baseline = "both" (default).Even though the identifiability assumption, Λ 0 (τ ) = 1, is used in the estimating procedure, we scaled b Λ 0n (•) by µ Z so the values represent the expected number of recurrent events.The smooth argument is a logical variable specifying whether the baseline cumulative functions will be smoothed by the monotonic increasing P -spline (Pya and Wood 2015) implemented in the scam package (Pya 2020).
The plot() method also allows user to plot the predicted cumulative functions given covariates.As in many prediction models, the plot method allows users to specify the covariates in a data frame via the argument newdata.When newdata is not specified (or is NULL), newdata is set to zeros and the baseline cumulative functions are produced.When newdata has more than one rows, each unique row will be treated as an observation and the corresponding predicted cumulative functions will be plotted.The frailty argument specifies the frailty value to be used in constructing the predicted cumulative functions; b µ Z is used when frailty is NULL.The showName is a logical variable indicating whether to label the objects' name at the end of the curves.The objects' names are determined by the row names of the newdata.The control is an optional list containing graphical parameters including xlab, ylab, main, etc.As the plot method for Recur objects, a ggplot2 object is returned to allow easy customization.
The baseline cumulative rate function and the baseline cumulative hazard function of the joint frailty Cox model is plotted below and presented in Figure 4a.

R> plot(fit2)
To illustrate the feature in predicting cumulative functions given covariates, we consider two hypothetical subjects, one with x1 = 0 and the other with x1 = 1.Holding the value of x2 at the average, the predicted cumulative functions are presented in Figures 4b using the

Other common recurrent event models
Under the non-informative censoring assumption, the AG model with robust variance estimation can be called via reReg() with model = "cox.LWYY": The joint frailty scale-change model ( 6) also includes many models as special cases under the non-informative censoring assumption when the frailty is degenerated, i.e., Z = 1.Some of these special cases include the Cox-type models of Pepe and Cai (1993); Lin et al. (2000); Ghosh and Lin (2002), accelerated rate model Chen and Wang (2000); Ghosh (2004), accelerated means regression Ghosh and Lin (2003), and a general class of accelerated means model Sun and Su (2008).Different estimating procedures are proposed in each method.Of these, setting model = "cox.LWYY", model = "cox.GL", and model = "am.GL", calls the methods proposed by Lin et al. (2000), Ghosh and Lin (2002), and Ghosh and Lin (2003), respectively.

Simulating recurrent event data
The reReg package allows users to generate simulated data from the generalized frailty joint scale-change model ( 6) via function simGSC().The arguments of simGSC() are presented below.
R> args(simGSC) ## function (n, summary = FALSE, para, xmat, censoring, frailty, ## tau, origin, Lam0, Haz0) ## NULL The only required argument is the number of observation, represented by n.The remaining arguments are optional and will be assigned default values when not specified.The argument summary is a logical value indicating whether descriptive statistics of the simulated data will be printed after the data generation.The argument para is a list of regression parameters.The list members alpha, beta, eta, and theta are p-dimensional numerical vectors corresponding to regression parameters, α, β, η, and θ in Model 6, respectively.The xmat argument is the n× p design matrix.The censoring and frailty arguments are n-dimensional numerical vectors that specify the independent censoring time, C, and the frailty variable, Z, respectively.The argument tau is the maximum follow-up time τ and the argument origin the time origin for each subject.Finally, the arguments Lam0 and Haz0 are single argument functions used to specify the baseline cumulative rate function, Λ 0 (t), and the baseline cumulative hazard function, H 0 (t), respectively.
At the default setting, the simGSC() function assumes p = 2 and the regression parameters to be α = η = (0, 0) ⊤ , β = (−1, −1) ⊤ , and θ = (1, 1) ⊤ .When the xmat and the censoring arguments are not specified, the simGSC() function assumes X i is a two-dimensional vector X i = (X i1 , X i2 ), i = 1, . . ., n, where X i1 is a Bernoulli variable with rate 0.5 and X i2 is a standard normal variable.With the default xmat, the censoring time C is generated from an independent uniform distribution in [0, 2τ X i1 + 2Z 2 τ (1 − X i1 )].Thus, the censoring distribution is covariate dependent and is informative when Z is not a constant.On the other hand, when xmat is specified by the users, the censoring time C is generated from an independent uniform distribution [0, 2τ ].When the frailty argument is not specified, the frailty variable Z is generated from a gamma distribution with a unit mean and a variance of 0.25.The default values for tau and origin are 60 and 0, respectively.When arguments Lam0 and Haz0 are left unspecified, the simGSC() function uses Λ 0 (t) = 2 log(1 + t) and H 0 (t) = log(1+t)/5, respectively.This is equivalent to setting Lam0 = function(x) 2 * log(1 + x) and Haz0 = function(x) log(1 + x) / 5.In summary, the default specifications generate the recurrent events and the terminal events from the model: The simGSC() function generates simulated data from the above specification and returns a data.frame in the same format as the built-in data set, simDat.Specifically, simDat was generated using the default settings of simGSC().The following example illustrates the computational performance of the implemented method.We simulated data from simGSC() under the default settings.Each data is then fitted with the joint Cox-type model (Huang and Wang 2004), the joint accelerated mean model (Xu et al. 2017), and the general scale-change model (Xu et al. 2020), corresponding to model = "cox|cox", model = "am|am", and model = "gsc", respectively.
The following summarizes the average computing times in Table 2.
Other than the sample size, several factors could impact the computing speed, such as the number of recurrent events and covariates.A comprehensive illustration of the computational performance of the implemented method under different scenarios might worth further investigation.Nonetheless, Table 2 suggests that analyses can be made available within seconds or minutes using the reReg package.

Application to colorectal cancer data
We demonstrate the usage of the reReg package to the colorectal dataset from the frailtypack package (Rondeau et al. 2012).The dataset consists of a random selection of 150 patients with advanced colorectal cancer enrolled in a randomized phase III clinical trial FFCD 2000-05 conducted between 2002 and 2007 (Ducreux, Malka, Mendiboure, Etienne, Texereau, Auby, Rougier, Gasmi, Castaing, Abbas, Pierre, Gargot, Azzedine, Lombard-Bohas, Geoffroy, Denis, Pignon, Bedenne, and Bouché 2011).The enrolled patients were randomized to receive either sequential or combination chemotherapy.One of the primary interests is to compare the two treatment groups in disease progression and overall survival.In this study, the recurrent events are identified as the appearances of new lesions, and the terminal event is death.The structure of the colorectal dataset is presented below.
R> fm <-Recur(time0 %to% time1, id, new.lesions,state) ~treatment R> plotEvents(fm, data = colorectal, + recurrent.name= "New lesions", terminal.name= "Death") Additional graphical arguments are included to improve the readability of the event plot.Overall, patients in the sequential treatment group seem to have more new lesions early, while the number of deaths appears to spread out.On the other hand, the MCF estimates presented in Figure 5b suggest that the MCF estimates for the two treatment groups are not significantly different, as indicated by the overlapping 95% confidence intervals.To evaluate the treatment effect in regression models, we first fit the joint Cox-type model ( 5) using the sequential treatment group as the reference group.In addition to the treatment effect, we also include baseline covariates including age (age), World Health Organization (WHO) performance status (who.PS), and previous resection indicator (prev.resection).The age variable is categorized into three groups; < 60 years, 60-69 years, and > 69 years.
The WHO performance status is used to quantify patients' ability to carry out daily tasks.The available WHO performance status scores are 0, 1, and 2, where fully active patients who can carry out all activities without restriction have a score of 0. The following results show that, though not statistically significant, patients in the combination treatment group were more likely to experience fewer new lesions ( β = −0.240,p = 0.436) and had a lower risk of death ( θ = −0.088,p = 0.810) than those in the sequential treatment group.

Discussion
The reReg package provides a comprehensive toolkit for analyzing recurrent event data.It allows easy access to create event plots and MCF plots for exploratory data analysis and a generalized joint frailty scale-change model for regression analysis.The implemented methods accommodate informative censoring via the use of a subject-specific frailty variable.In contrast to existing frailty models, the implemented estimation procedure does not require distributional information on the frailty variable.Using the borrowing-strength approach in the estimating procedure, our model allows users to specify any combination of the sub-models between the recurrent event process and the terminal events when they are fitted jointly.
Future work will be devoted to improving computational efficiency.In particular, we plan to generalize the efficient resampling-based sandwich estimator to all the sub-models.In addition, we plan to apply the induced smoothing method (e.g., Brown and Wang 2007;Chiou, Kang, and Yan 2015) to facilitate numerical reliability.Including different types of recurrent event models such as the additive rate model and the semiparametric transformation model would be of interest too.When longitudinal outcomes at recurrent events are available, we plan to expand the current version of the package to adopt time-dependent covariates (e.g., Huang et al. 2010).Another interesting extension is to use multiple frailty variables to allow different frailty effects.
MCF stratified by treatment.

Figure 5 :
Figure 5: Event plot and MCF plot stratified by treatment type.

Table 1 :
An overview of the main functions in reReg.
(Wang et al. 2021)on imported from the reda package(Wang et al. 2021)creates recurrent event objects needed for reReg.The function interface of Recur() is R> args(reda::Recur)## function (time, id, event, terminal, origin, check = c("hard",  ##"soft", "none"), ...)## NULL The different types of recurrent events are marked in different colors in Figure2c.The plotEvents() can also create event plots in calendar time by setting calendarTime = TRUE.For illustration, we create a new simulated data based on simDat with time intervals shift proportionally to x2.We further convert the time intervals to a Date class.The construction of the new data and the stratified event plot in Figure2dis included in the following The reported computing times are without variance estimation and are averaged over 50 Monte Carlo runs.

Table 2 :
Average runtime (in seconds) under different sample sizes.
## $ gap.time : num 0.71 1.28 ...There are 150 sampled patients with 77 in the sequential treatment group.Key variables needed to create a recurrent event object with Recur() are subject identification (id), start and end of time intervals (time0, time1), recurrent event indicator (new.lesions), and terminal event indicator (state).The recurrent event object and its summary are printed with the following codes.