Semi-Parametric Joint Modeling of Survival and Longitudinal Data: The R Package JSM

This paper is devoted to the R package JSM which performs joint statistical modeling of survival and longitudinal data. In biomedical studies it has been increasingly common to collect both baseline and longitudinal covariates along with a possibly censored survival time. Instead of analyzing the survival and longitudinal outcomes separately, joint modeling approaches have attracted substantive attention in the recent literature and have been shown to correct biases from separate modeling approaches and enhance information. Most existing approaches adopt a linear mixed eﬀects model for the longitudinal component and the Cox proportional hazards model for the survival component. We extend the Cox model to a more general class of transformation models for the survival process, where the baseline hazard function is completely unspeciﬁed leading to semiparametric survival models. We also oﬀer a non-parametric multiplicative random eﬀects model for the longitudinal process in JSM in addition to the linear mixed eﬀects model. In this paper, we present the joint modeling framework that is implemented in JSM , as well as the standard error estimation methods, and illustrate the package with two real data examples: a liver cirrhosis data and a Mayo Clinic primary biliary cirrhosis data.


Introduction
In medical studies it is common to record important longitudinal measurements (e.g., biomarkers) up to a possibly censored event time (or survival time) along with additional baseline covariates collected at the onset of the study. The objectives of investigations based on this kind of data often are to characterize the effects of the longitudinal biomarkers as well as baseline covariates on the survival time to understand the pattern of changes of the longitu-dinal biomarkers, and to study the joint evolution of the longitudinal and survival processes. Examples that appear frequently in the literature are HIV (human immunodeficiency virus) clinical trials where the CD4 (cluster of differentiation 4) counts are collected at baseline and subsequent clinical visits and the event of interest is progression to AIDS (acquired immunodeficiency syndrome) or death. Many existing methods are well-suited to analyze the two outcomes separately, however, difficulties arise when: (1) the longitudinal measurements are only available intermittently or may be contaminated by measurement errors, and (2) occurrence of the event, such as death, results in patient's informative drop-out from the longitudinal study. To overcome these difficulties and provide valid inference, jointly modeling the survival and longitudinal processes is necessary and has become a mainstream research topic.
Under the joint modeling framework, the hazard for the survival time and the model for the longitudinal process are assumed to depend jointly on some latent random effects. Many different methods have been proposed for parameter estimation, e.g., the two-stage method (Tsiatis, Degruttola, and Wulfsohn 1995), the Bayesian approach (Faucett and Thomas 1996;Xu and Zeger 2001;Brown and Ibrahim 2003), the maximum likelihood approach (Wulfsohn and Tsiatis 1997;Song, Davidian, and Tsiatis 2002;Hsieh, Tseng, and Wang 2006) and the conditional score approach (Tsiatis and Davidian 2001). For the maximum likelihood approach, computations are typically performed by the EM (expectation and maximization) algorithm (Dempster, Laird, and Rubin 1977). Several excellent review papers, e.g., Tsiatis and Davidian (2004), Yu, Law, Taylor, and Sandler (2004), and Gould et al. (2015), are available. The edited volume by Verbeke and Davidian (2008), a 2012 book by Rizopoulos (2012), a special issue in Statistical Methods in Medical Research by Rizopoulos and Lesaffre (2014), and a recent book by Elashoff, Li, and Li (2016) provide additional perspectives of the joint modeling approaches.
Parametric assumptions on the baseline hazard functions of the survival times are common in the joint modeling literature to facilitate likelihood inference. These lead to tractable computation and are practical when the assumptions can be verified. The joint modeling implementation in SAS (SAS Institute Inc. 2013) and WinBUGS (Lunn, Thomas, Best, and Spiegelhalter 2000;Spiegelhalter, Thomas, Best, and Lunn 2003) as discussed by Guo and Carlin (2004) are based on such parametric assumptions, where the baseline hazard functions are assumed to follow Weibull distributions. To avoid the strong model assumptions and extra effort of goodness-of-fit tests, some later software tools proposed to approximate the baseline hazard function by piecewise constant or spline functions, e.g., the SAS macro JMFit (Zhang, Chen, Ibrahim, Boye, and Shen 2016), R (R Core Team 2020) packages JM (Rizopoulos 2010), JMBayes (Rizopoulos 2016), lcmm (Proust-Lima, Philipps, and Liquet 2017; Proust-Lima, Philipps, Diakite, and Liquet 2019) and frailtypack (Rondeau, Mazroui, and Gonzalez 2012;Król, Mauguen, Mazroui, Laurent, Michiels, and Rondeau 2017), as well as Stata (StataCorp 2019) module stjm (Crowther, Abrams, and Lambert 2013). Both piecewise constant and spline function approximations are in the spirit of non-parametrics and the survival component can indeed be modeled non-parametrically if the number of knots is allowed to increase with the sample size when selected by a model selection criterion. However, as currently implemented in these software packages, the tuning parameters (e.g., the number and locations of the knots) for the survival component are fixed and not data adaptive. Therefore, the approximations correspond to flexible parametric models, which have their merits but may contain non-ignorable bias. Some other software packages, e.g., R packages joineR (Philipson, Sousa, Diggle, Williamson, Kolamunnage-Dona, and Henderson 2020) and joineRML (Hickey, Philipson, Jorgensen, and Kolamunnage-Dona 2018), adopted the Cox proportional hazards model (Cox 1972) for the survival times, leaving the baseline hazard completely unspecified. However, joineR and joineRML estimate the standard errors of the regression parameters through the bootstrap method of Efron (1994), which is computationally intensive and may suffer from convergence problems. The package JM also has an option to apply a completely unspecified baseline hazard, but it underestimates the standard errors of the regression parameters in that case.
In light of this and the difficulty to choose a parametric baseline hazard function, we follow the semi-parametric spirit of the Cox model to focus on joint modeling with semi-parametric models for the survival time and provide valid and computationally efficient standard error estimates in the JSM package (Xu, Hadjipantelis, and Wang 2020) which is freely available from the Comprehensive R Archive Network (CRAN) at https://CRAN.R-project.org/ package=JSM. The acronym JSM stands for "joint statistical modeling" and reflects the fact that the package explores many different joint models. The semi-parametric models are not restricted to the Cox proportional hazards model and can be chosen from a class of transformation models presented in Section 2. For the longitudinal process, both a linear mixed effects model and a non-parametric multiplicative random effects model (Ding and Wang 2008) are incorporated. Since it might be difficult to detect the time trends in the linear mixed effects model, just like JM, JSM also allows users to employ B-spline basis functions as an option to model the time trend of the longitudinal data. This adds flexibility to an otherwise parametric approach for the longitudinal component. An additional feature of JSM is that we also include two ways to link the longitudinal and survival processes, one where the longitudinal process serves as a covariate of the survival outcome and another setting where the longitudinal and survival data each have their own model but are linked together by sharing some common random effects. This results in four different types of joint models between the survival and longitudinal components. Details are given in Section 2, where model fitting and estimation are presented. Precision estimation for the procedures is presented in Section 3, where two standard error estimation approaches are described. These are the profile likelihood method by Van der Vaart (1999, 2000) and the method of numerically differentiating the profile Fisher score by Xu, Baines, and Wang (2014). Details about the implementation of the algorithms in JSM and two illustrating data examples (a liver cirrhosis data and a primary biliary cirrhosis data) are provided in Sections 4 and 5, respectively. Finally, Section 6 concludes with a short summary and discussion.

Model setup
In Wulfsohn and Tsiatis (1997), the longitudinal and survival outcomes are modeled with a linear mixed effects model and a Cox proportional hazards model, respectively. We adopt a more flexible framework here as described below.
For subject i (i = 1, 2, . . . , n), let T i be the event time which may be right censored by the censoring time C i . Thus, instead of observing T i , we are only able to observe V i = min(T i , C i ) and the censoring indicator ∆ i = I(T i ≤ C i ). Let Y i (t) denote the longitudinal outcome for subject i evaluated at time t which is the sum of a "true" underlying value m i (t) and a measurement error. The longitudinal measurements are recorded intermittently . Two models for m i (t) are incorporated. The first model is the frequently used linear mixed effects model where X i (t) and Z i (t) are vectors of observed covariates for the fixed and random effects, respectively. Each of the covariates in X i (t) and Z i (t) can be either time-independent or time-dependent. The vector β denotes the unknown regression coefficients for the fixed effects while the vector b i denotes the random effects. Model (1) is an additive model with users specifying what kind of covariates X i and Z i they prefer. We note here that users can choose to model both covariates through B-spline bases with their choice of knots or through a model selection criterion. Thus, model (1) can be adapted to be a non-parametric approach.
The second model incorporated in JSM is the non-parametric multiplicative random effects model proposed in Ding and Wang (2008) where B(t) = (B 1 (t), . . . , B L (t)) is a vector of B-spline basis functions and γ is the corresponding regression coefficient vector. The number of basis functions involved in B(t) (i.e., L) is not fixed and instead is chosen by some model selection criterion, e.g., AIC (Akaike information criterion) or BIC (Bayesian information criterion). Hence, this makes model (2) "non-parametric" in the sense that the mean function of m i (t) is modeled non-parametrically. Note that the b i in (2) is different from the b i in (1): b i is a vector of random effects while b i is a single scalar random effect which dramatically lowers the computational cost. In model (2), B (t)γ is the population mean and each subject is assumed to have a longitudinal profile proportional to it with b i capturing the variation. Such a model may seem restrictive at first glance but is applicable in many real life examples and has the advantage that it allows for non-parametric fixed effects. For instance, it is applicable when the major mode of variation is along the population mean function and the first functional principal component (Yao, Müller, and Wang 2005) explains a high proportion of the total variation of the data. This is illustrated by the primary biliary cirrhosis data in Section 5, where the first functional principal component explains about 80% of the total variation. To make the model more applicable, we also allow for the inclusion of baseline covariates as columns of B(t) in addition to the B-spline basis functions.
To complete the model specification of the longitudinal processes, we need to pose proper distribution assumptions on b i in (1) and b i in (2). Assuming normality for b i and b i is a standard choice which provides computational advantages as will be described in Section 2.3. Although this is a parametric assumption, simulation studies reported in Song et al. (2002) suggest that the maximum likelihood estimates in these joint models are remarkably robust against departures from normal random effects. A theoretical explanation for such robustness properties was later provided in Hsieh et al. (2006). Therefore, we assume b i ∼ N (0, Σ b ) and b i ∼ N (1, σ 2 b ) in our JSM package. For the survival processes, the Cox proportional hazards model is used most frequently where the hazard functions are modeled as where W i (t) is also a vector of observed covariates that can include both baseline covariates, which are constant overtime t, and other longitudinal covariates that were observed continuously without errors, such as interactions between treatments and time effects. In reality, the proportional hazards assumption may be violated. For example, it is a common phenomenon in clinical studies that patients under a more aggressive treatment may suffer elevated risk of death in the beginning but benefit in the long term if they manage to tolerate the treatment. In such cases, the proportional hazards assumption does not hold and more flexible models for the survival processes are needed. The transformation model proposed by Dabrowska and Doksum (1988) and studied in Zeng and Lin (2007) is a general class of models which includes both the Cox proportional hazards model and the proportional odds model (Bennett 1983) as special cases; so we consider such a general approach in JSM. We make a remark here about the terminology "transformation model". There are several transformation models in the literature but we adopt the original name proposed in Dabrowska and Doksum (1988), which is the most commonly used designation in the survival literature. For a strictly increasing and continuously differentiable function G(·), the cumulative hazard functions in the transformation model (Zeng and Lin 2007) are modeled as: Common choices for G(·) include the Box-Cox transformations and the logarithmic transformations. In our JSM package, we focus on the logarithmic transformations where ρ = 0 corresponds to G(x) = x, yielding the Cox proportional hazards model, and ρ = 1 corresponds to G(x) = log(1 + x) yielding the proportional odds model. For general ρ > 0, it yields the "proportional γ odds model" (Dabrowska and Doksum 1988). Here we use ρ instead of γ since γ has already been used in (2). Note that we leave the baseline hazard λ 0 (·) completely unspecified, as discussed in Section 1, so the resulting survival model is a very flexible semi-parametric model.
Lastly, we enlarge the scope of JSM by introducing another type of dependency between the survival and longitudinal outcomes, which differs from model (3). In (3), the entire longitudinal process (free of error) enters the survival model as a covariate. In JSM we also incorporate the dependency where the survival and longitudinal outcomes only share the same random effects: and with (5) and (6) corresponding to longitudinal models (1) and (2), respectively. Such a shared random effects model is quite common and has been adopted by Henderson, Diggle, and Dobson (2000) among others. The choice between (3) and (5)

Likelihood formulation
In the JSM package, we focus on the maximum likelihood approach and follow the model setup in Section 2.1. In the following, we illustrate only one of the four joint models discussed in Section 2.1. The likelihood based on the joint model of form (1) and (3) and the corresponding EM algorithm are derived as an illustrating example, those based on other joint models follow analogously.
denote the observed and complete data, respectively. The parameter to be estimated is ψ = (θ, Λ 0 ) with θ = (φ, α, β, σ 2 e , Σ b ) being the finite dimensional (p-dimensional) parameter and Λ 0 being the non-parametric cumulative baseline hazard. To derive the joint-likelihood function, the following common assumptions are made: (i) the measurement errors ε ij and the random effects b i are independent; (ii) the censoring time and longitudinal measurement times are both non-informative; (iii) the random effect b i fully captures the association between the longitudinal and survival outcomes (i.e., the longitudinal and survival outcomes are conditionally independent given b i ). Then the joint likelihood for the observed data can be expressed as Direct maximization of (7) is computationally challenging due to the integral (possibly multidimensional) and the infinite dimensional parameter Λ 0 . Fortunately, by treating the random effects b i as "missing data", the EM algorithm can be readily applied to obtain the maximum likelihood estimates (MLEs). Note that the likelihood approach leads to non-parametric maximum likelihood estimates (NPMLEs; Kiefer and Wolfowitz 1956) for the parameters wherê Λ 0 , the estimate for Λ 0 , is a step function with jumps only at the uncensored survival times. Details of the algorithm are provided in the next section.

EM algorithm
Since first introduced systematically by Dempster et al. (1977), the EM algorithm has been extensively used to find the MLEs in statistical models where unobserved latent variables are involved. When a general class of transformation models is assumed for the survival outcomes, the EM algorithm is not as straightforward as that described in Wulfsohn and Tsiatis (1997), where the Cox proportional hazards model is assumed and the Breslow-type form suggested by Breslow (1972) is available for estimating Λ 0 in the M-step without relying on any numerical maximization method. The transformation G(·) would prohibit the simple Breslow-type estimator for Λ 0 in the M-step, instead a high dimensional non-linear optimization would be needed to estimate Λ 0 . To overcome this difficulty, we adopt the clever idea of Zeng and Lin (2007) to introduce an artificial random effect and treat it as "missing data" in the EM algorithm.
To elaborate on this idea let exp{−G(x)} be the Laplace transformation of some density function π(x) (this function exists for the class of logarithmic transformations defined in (4) (8) can be rewritten as: where the integrand is the likelihood function under the Cox proportional hazards model with conditional hazard function ξ i λ 0 (t) exp{W i (t)φ + αm i (t)}. Treating both b i and ξ i as "missing data", the complete-data joint log-likelihood can be written as, up to a constant, where N = n i=1 n i . Let K be the number of distinct uncensored survival times ({V i : ∆ i = 1}) with t 1 < t 2 < . . . < t K the corresponding ordered statistics, then the NPMLE of Λ 0 is a step function with jumps at {t 1 , t 2 , . . . , t K }. Replacing Λ 0 (·) by the jump sizes {Λ 1 , Λ 2 , . . . , Λ K }, the target function to be evaluated in the E-step and maximized in the M-step can be written as withψ = (θ,Λ 0 ) being the current parameter estimate. By maximizing Q(ψ|ψ) w.r.t. Λ k , we obtain the NPMLE: which is a Breslow-type estimate. Moreover, it is straightforward to get However, there are no explicit expressions for the NPMLEs of β, φ and α so one-step Newton-Raphson updates in each EM iteration are needed. The score functions needed for the Newton-Raphson updates are computed by taking the derivative of (11) w.r.t. the corresponding parameters and are not shown here. Note that in the E-step, to evaluate

and subsequently obtain
Eψ which is an integration over b i only by plugging in (13). Therefore, the E-step involves integrations over b i only. This suggests that by introducing an artificial random effect ξ i , we greatly simplify the M-step while not complicating the E-step. We note that the integrations over b i in the E-step do not have explicit solutions and have to be computed numerically. Fortunately, assuming a normal distribution for b i enables the application of the (adaptive) Gauss-Hermite quadrature, providing more numerical precision and efficiency than other numerical integration methods, e.g., Monte Carlo integration and Laplace approximation.

Standard error estimation
For the standard error (SE) estimation under semi-parametric joint modeling settings, i.e., when the baseline hazard function λ 0 (t) is left completely unspecified, Xu et al. (2014) proposed two new methods and systematically examined their performance along with the profile likelihood method by Van der Vaart (1999, 2000) and the bootstrap method of Efron (1994). According to their results, numerically differentiating the profile Fisher score with Richardson extrapolation (PRES) is recommended as the best choice. In our JSM package, we apply the PRES method as the default but also provide two alternative approaches: the PFDS (numerically differentiating the profile Fisher score with forward difference) method in Xu et al. (2014) and the PLFD (numerically deriving the second derivative of the profile likelihood function with forward difference) method in Murphy and Van der Vaart (1999). The bootstrap method, being computationally intensive, is not implemented in JSM but users can easily construct their own bootstrap SE estimators using our code. Details of the methods are described below.

PLFD: Numerically differentiating the profile likelihood function
Our interests are mainly to infer the finite dimensional parameter θ. The PLFD method obtains the observed-data information matrix I o of θ at θ * (here ψ * = (θ * , Λ * 0 ) denotes the NPMLE of ψ) by taking the second derivative of the profile likelihood function with forward difference. The profile likelihood function is where L n is defined in (7), e i is the ith coordinate vector and δ > 0 is the increment used by forward difference. Then the variance-covariance matrix estimate of θ at θ * is obtained by i.e., the estimate of Λ 0 given any θ.

PRES/PFDS: Numerically differentiating the profile Fisher score
The key idea of the PRES and PFDS methods is that the observed-data information matrix can be approximated by numerically differentiating the Fisher score defined as (11). Note that S(·) is a (p + K)-dimensional vector where K is the number of distinct uncensored survival times, which is usually large and increases with the sample size. Luckily, differentiating the entire Fisher score vector is unnecessary and Xu et al. (2014) proposed a profile version of the Fisher score, i.e., the profile Fisher score with the form: which is a p-dimensional vector withΛ k (θ) corresponding to theΛ k defined in (12). Then the step-by-step algorithm of the PRES and PFDS methods is: defined by (15). 2. For l = 1, 2, 3, 4, let ψ (16). 3. Compute the ith row of I o : 12δ .
In step 3, the difference between the PRES and PFDS method is in that PFDS adopts the forward differencing method to estimate a derivative while PRES adopts the stabled Richardson extrapolation method for derivatives. The PRES method costs more computing time but provides more reliable SE estimates based on the numerical experiments in Xu et al. (2014).

JSM: Implementation details
The JSM package is an add-on package to R which has two model-fitting functions named jmodelTM() and jmodelMult(). It requires additional packages from R: nlme (Pinheiro, Bates, DebRoy, Sarkar, and R Core Team 2020), Rcpp (Eddelbuettel and François 2011), RcppEigen (Bates and Eddelbuettel 2013), splines (R Core Team 2020), statmod (Giner and Smyth 2016) and survival (Therneau and Grambsch 2000). For the longitudinal outcome, jmodelTM() fits the linear mixed effects model in (1) while jmodelMult() fits the non-parametric multiplicative random effects model in (2) as described in Section 2.1. For the survival outcome, both jmodelTM() and jmodelMult() accommodate the transformation models with the class of logarithmic transformations defined by (4).
The model argument of jmodelTM() and jmodelMult() specifies the dependency between the survival and longitudinal outcomes adopted by the joint model. If model = 1, the entire longitudinal outcome enters the survival model as a covariate described by (3); otherwise, the survival model only shares the same random effects with the longitudinal model as in (5) (for jmodelTM()) and (6) (for jmodelMult()). In addition, the rho argument allows users to choose the ρ in the logarithmic transformations (4) to specify the survival models they would like to fit. For example, the default rho = 0 indicates that a Cox proportional hazards model is fitted whereas a proportional odds model is fitted with rho = 1. Sophisticated users can also perform their own model selection method to choose the value of rho using the code in JSM, see the example shown in Section 5.1.
The control argument of jmodelTM() and jmodelMult() is a list of control values for the estimation algorithm with components: tol.P, tol.L, max.iter, SE.method, delta and nknot. Section 2.3 provides the EM algorithm, an iterative algorithm, used to obtain the parameter estimates. Naturally, a convergence criterion is needed to implement the algorithm and in JSM we claim convergence whenever denote the estimated parameters in the tth EM iteration, where n (ψ) = log L n (ψ|O) is defined by (7). The default values of tol.P and tol.L are 10 −3 and 10 −6 , respectively, which are the standard choices to claim convergence of iterative algorithms. The component max.iter states the maximum number of iterations for the EM algorithm with the default value being 250. The method to compute the standard error estimates is specified via SE.method and users could assign "PRES" (the default), "PFDS" or "PLFD" to it (refer to Section 3 for detailed explanation). Moreover, the increment δ used for numerical differentiation (as in (14) and (17)) can be customized through the component delta. By default, delta = 10 −5 if SE.method = "PRES" and delta = 10 −3 otherwise, according to the simulation studies performed in Xu et al. (2014).
On the other hand, nknot stands for the number of quadrature points used in the Gauss-Hermite rule for numerical integration. As mentioned at the end of Section 2.3, the integrations involved in the E-step of the EM algorithm are computed by applying the Gauss-Hermite quadrature rule. In JSM, we implement the adaptive Gauss-Hermite rule (Pinheiro and Bates 1995) which centers and scales the quadrature points in each EM iteration according to the conditional distribution of the random effects with current parameter ψ (t) , i.e., f (b i |Y i ; ψ (t) ). Although it seems to demand more computational effort, the adaptive rule is actually more efficient since it requires fewer quadrature points to reach the same level of precision. The default values of nknot differ between jmodelTM() and jmodelMult(). For jmodelTM(), nknot = 9 for a one-dimensional random effect; nknot = 7 for a two-dimensional random effect; nknot = 5 for a higher-dimensional random effect. For jmodelMult(), nknot = 11. The reason why we assign a larger default value to nknot for jmodelMult() is that the model assumes a multiplicative random effect which requires more quadrature points to reach a satisfactory level of accuracy. These default values are selected based on a numerical study regarding the performance of the model-fitting functions with different values of nknot, which is provided in Appendix A.
The following commonly used methods are currently available for the objects returned by jmodelTM() and jmodelMult(): AIC() and BIC() for computing AIC and BIC values; confint() for returning the confidence intervals of the parameter estimates; fitted() and ranef() for extracting the fitted values and random effects; residuals() for computing the residuals; summary() for outputting model-fitting details; vcov() for returning the variancecovariance matrix of the parameter estimates. Although the residuals() method is provided, we do not encourage its usage for diagnostic or model-assessment purposes under the joint modeling setting. An example is demonstrated in Section 5.1 to explain the reason.
The JSM package also provides a data preprocessing function called dataPreprocess() to prepare the data in a format that could be used to fit the joint model. An example is given in Section 5.1.

Real data analysis using JSM
The usage of jmodelTM() and jmodelMult() are illustrated through two examples.

Example I: Liver cirrhosis data analysis
Consider a randomized control trial involving 488 liver cirrhosis patients who were randomly allocated to prednisone (251) or placebo (237). The patients were followed until death or end of the study and variables such as prothrombin index (a measure of liver function) were recorded at study entry and several follow-up visits during the trial which differed among patients. Therefore, both survival and longitudinal data were collected and the main purpose is to investigate the development of prothrombin level over time as well as its relationship with the survival outcome. The data, available in JSM under the named liver, were reported in Andersen, Borgan, Grill, and Keiding (1993) and have been analyzed by Henderson, Diggle, and Dobson (2002). Figure 1 summarizes the data.
R> library("JSM") R> liver.unq <-liver[!duplicated(liver$ID), ] R> liver.pns <-liver[liver$Trt == "prednisone", ] R> liver.pcb <-liver[liver$Trt == "placebo", ] R> times.pns <-sort(unique(liver.pns$obstime)) R> times.pcb <-sort(unique(liver.pcb$obstime)) R> means.pns <-tapply(liver.pns$proth, liver.pns$obstime, mean) R> means.pcb <-tapply(liver.pcb$proth, liver.pcb$obstime, mean) R> par(mfrow = c(1, 2)) R> plot(lowess(times.pns, means.pns, f = 0.3), type = "l", col = "blue", + ylab = "Prothrombin index", xlab = "Time (years)", ylim = c(50, 150), + main = "Smoothed mean prothrombin index", lwd = 2) R> grid() R> lines(lowess(times.pcb, means.pcb, f = 0.3), col = "red", lty = 2, + lwd = 2) R> legend("topright", c("Prednisone", "Placebo"), col = c("blue", "red"), + lty = 1:2, lwd = 2, bty = "n") R> plot(survfit(Surv(Time, death)~Trt, data = liver.unq), lty = c(2, 1), + col = c("red", "blue"), mark.time = FALSE, ylab = "Survival", + xlab = "Time (years)", main = "Kaplan-Meier survival curves", lwd = 2) R> grid() R> legend("topright", legend = c("Prednisone", "Placebo"), lty = 1:2, + col = c("blue", "red"), lwd = 2, bty = "n") We observe from the left plot of Figure 1 that the average prothrombin level increases over time in both groups and that the trend is approximately linear. Therefore, we fit a linear mixed effects model for the mean prothrombin response Y assuming a linear trend in time with different intercept and slope for each treatment group. For the survival component we first adopt the proportional hazards model with a single treatment effect, as was done in Andersen et al. (1993) and Henderson et al. (2002), and then fit the joint model: We start by fitting the linear mixed effects model and proportional hazards model separately: R> fitLME <-lme(proth~Trt * obstime, random =~obstime | ID, + data = liver) R> fitCOX <-coxph(Surv(start, stop, event)~Trt, data = liver, x = TRUE) Note that the variables start and stop denote the limits of the time intervals between visits, and event denotes whether death occurred by the end of each interval. When the user does not have the longitudinal and survival data in a single dataset, dataPreprocess() could help to merge the data and generate the corresponding start, stop, and event columns. For example, liver.long contains only the longitudinal data for the liver cirrhosis patients, with one row per longitudinal measurement, while liver.surv contains only the survival data, with one row per patient. Then liver.join returned by the following code could be used similarly as the liver data which we are using: R> liver.join <-dataPreprocess(liver.long, liver.surv, "ID", "obstime", + "Time", "death") The argument x = TRUE is required in the call to coxph() to include the design matrix of the fitted model in the returned object fitCOX. Then objects fitLME and fitCOX are supplied to jmodelTM() to fit the joint model: R> fitJT.ph <-jmodelTM(fitLME, fitCOX, liver, timeVarY = "obstime") R> summary(fitJT.ph) Call: jmodelTM(fitLME = fitLME, fitCOX = fitCOX, data = liver, timeVarY = "obstime") Note that jmodelTM() requires the argument timeVarY to specify the name of the time variable used in the linear mixed effects model. The results suggest that the linear trend in time of the longitudinal process is not statistically significant since the p value of β 3 is 0.158. This seems contradictory to the strong linear upward trend of the prothrombin level in the left plot of Figure 1 but is in fact a perfect example to demonstrate the bias incurred by a marginal, i.e., separate, analysis of the longitudinal data. This bias, as also pointed out by Henderson et al. (2002), is due to the fact that the observed mean profiles are based on the average values over all patients who remain alive at each time point so that the observed linear trend is not due to a substantive increase in mean prothrombin level, but to the loss of data from high-risk patients with low prothrombin levels as time increases. This bias is corrected in the joint analysis, by leveraging information from the survival component, as the fitted mean profiles for both groups now appear to be quite flat in the left plot of Figure 2 (generated by the R code below) which provides the observed and fitted mean longitudinal profiles.
R> fittedVal <-fitted(fitJT.ph) R> fitted.pns <-fittedVal[liver$Trt == "prednisone"] R> fitted.pcb <-fittedVal[liver$Trt == "placebo"] R> residVal <-residuals(fitJT.ph) R> par(mfrow = c(1, 2)) R> plot(lowess(times.pns, means.pns, f = 0.3), type = "l", col = "blue", + ylab = "Prothrombin index", xlab = "Time (years)", ylim = c(50, 150), + main = "Smoothed mean prothrombin index", lwd = 2) R> grid() R> lines(lowess(liver.pns$obstime, fitted.pns, f = 0.3), + lwd = 2, col = "deepskyblue") R> lines(lowess(liver.pcb$obstime, fitted.pcb, f = 0.3), + lwd = 2, lty = 2, col = "lightsalmon") R> plot(fittedVal, residVal, pch = 20, col = "gray", xlab = "Fitted Values", + ylab = "Residuals", main = "Residuals vs. Fitted Values") R> grid() R> lines(lowess(fittedVal, residVal, f = 0.5), lwd = 2, col = "red") R> abline(h = 10, lty = 5, lwd = 2) Caution about residual plots. The right plot of Figure 2 shows the residuals versus fitted values for the prothrombin index, which is commonly used to check misspecification of the mean structure. If the mean structure has been correctly specified, the residual plot should show a random behavior around zero. However, a caveat in the joint modeling setting is the bias induced by the informative dropout inherited in the longitudinal data when the event time is death. Thus, it is not surprising that the residual plot shows a systemic pattern. Therefore, a systemic behavior of the residuals does not necessarily indicate a lack-of-fit under the joint modeling setting and model diagnostics based on residual plots are not encouraged. Our next analysis is to fit a joint model with a proportional odds model for the survival outcome instead of a proportional hazards model. This is motivated by the fact that the two survival functions in the right plot of Figure 1 cross each other in the right tail, suggesting that the proportional odds model might be a better fit than the proportional hazards model for these data. Thus, we adopt the following survival model: Λ(t|b i , Trt i ) = log 1 + t 0 λ 0 (s) exp{φTrt i + αm i (s)}ds and the same model as in (18) for the longitudinal outcome. This could be realized by specifying rho = 1 in jmodelTM() and the corresponding R code is given below with the AIC values. fitJT.po has a smaller AIC value than fitJT.ph, suggesting that under the Akaike information criterion, the proportional odds model is preferred over the proportional hazards model for the survival outcome in the joint model for the liver cirrhosis data (AIC value 30130.2 against 30128.2).
R> fitJT.po <-jmodelTM(fitLME, fitCOX, liver, rho = 1, timeVarY = "obstime") R> AIC(fitJT.ph); AIC(fitJT.po) [1] 30130.18 [1] 30128.19 In addition to the proportional odds model, we also tried the logarithmic transformation model (4) with other values for ρ. Since a very huge sample size will be needed to estimate the optimal transformation parameter ρ and it is not necessary to have the exact optimal ρ in practice, we conduct a simple grid search over the candidate ρ values.
The results (Figure 3) strongly suggest that the value ρ = 1 offers the best model (i.e., the proportional odds model) according to AIC. While interpreting the covariate effects on the hazard function under the logarithmic transformation is straightforward when ρ = 0 or ρ = 1, the interpretation under more general values of ρ is less transparent. Below we provide an interpretation.
Following the idea of Dabrowska and Doksum (1988), we define the "odds function": When ρ is a positive integer, ρΓ T (t) is the odds that ρ independent individuals (with the same covariate values) would not all survive to time t. By a simple derivation, our transformation model with transformation parameter ρ is equivalent to so that the regression parameters under our transformation model can be interpreted similarly as those under the proportional hazards model, the only difference being that their effects are on the "odds function" instead of on the hazard function.
Therefore, if the data analysis has an emphasis on interpretation, the researcher can choose ρ as the value that results in the smallest AIC value among all possible integers. For the liver cirrhosis data, this leads to ρ = 1, i.e., the proportional odds model.

R> summary(fitJT.po)
Call: jmodelTM(fitLME = fitLME, fitCOX = fitCOX, data = liver, rho = 1, timeVarY = "obstime") For the longitudinal process, the results indicate that the clinical trial was not well-randomized as the prednisone group had higher baseline prothrombin index than the placebo group, aŝ β 2 = 6.998 with p value < 0.05 and 95% confidence interval [3.201, 10.796]. The p values of β 3 and β 4 are 0.309 and 0.757, respectively, suggesting that the mean prothrombin index did not significantly vary over time for both treatment groups. For the survival process, the p value of φ is 0.127, indicating the lack of efficacy of prednisone. Lastly,α = −0.055 with p value < 0.05 and 95% confidence interval [−0.065, −0.044] provides evidence that high prothrombin index is associated with long-term survival. Note that the confidence intervals reported above are obtain by running confint(fitJT.po).

Example II: Mayo Clinic primary biliary cirrhosis data analysis
Here we consider a data set collected by the Mayo Clinic from a randomized control trial on PBC (primary biliary cirrhosis) patients during a ten-year interval, from 1974 to 1984. PBC is a rare chronic liver disease which develops slowly but would eventually lead to cirrhosis and even death. PBC patients usually display abnormalities in their blood values related to liver function, such as serum bilirubin, albumin and alkaline phosphatase, however, what triggers PBC remains unclear. A total of 312 PBC patients were included in the trial with 158 of them given the drug D-penicillamine and the other 154 given a placebo. Detailed descriptions of the clinical background and the variables recorded in the trial can be found in Fleming and Harrington (2011). Several baseline covariates, e.g., age at first diagnosis, gender and presence of some physical symptoms, were recorded at the study entry; some longitudinal covariates such as the blood values mentioned above were also collected at irregular subsequent visits until death or the end of the trial. Here, we focus on the serum bilirubin, the most significant biomarker, and include the drug type as the only time-independent covariate as in Ding and Wang (2008) did.
R> library("lattice") R> xyplot(log(serBilir)~obstime | drug, group = ID, data = pbc, + type = "l", xlab = "Years", ylab = "log(serum bilirubin)", + col = "darkgray") Figure 4 (generated by the R code above) shows the observed log-transformed serum bilirubin trajectories for all of the 312 patients. Apparently, the variations of the trajectories are large so a non-parametric multiplicative random effects model described by (2) may be more suitable than a traditional parametric model. Further support for the multiplicative random effects, based on functional principal component analysis (Yao et al. 2005), was provided in Ding and Wang (2008) and we refer the reader to that paper for more details.
We proceed by fitting a joint model where the mean log-transformed serum bilirubin Y follows a non-parametric multiplicative random effects model with two internal knots and quadratic B-spline basis. To show that baseline covariates can also be incorporated, drug is included in the model for Y . For the survival model we adopt the proportional hazards model, as it has been shown in the literature to fit these data well, The R code corresponding to the model is provided below. Note that we need fitLME.pbc as an input of jmodelMult() just to specify the B(t) in model (2) but not to fit a linear mixed effects model for the longitudinal outcome. The Boundary.knots argument is recommended and should include the longest follow-up period among all subjects in the trial.
For the longitudinal outcome, we plot the expected logarithm serum bilirubin trajectories of six randomly selected patients in Figure 6 (R code given below). It demonstrates that the

Patient 220
Time (years) log(serBilir) Figure 6: Expected log-transformed serum bilirubin trajectories (solid line) of six randomly selected patients. The dots represent the observed log-transformed serum bilirubin.
shape of the longitudinal trajectories are relatively well captured by the fitted joint model, given that only a one-dimensional random effect is assumed in the model. Had a linear mixed effects model been fitted, at least a random intercept as well as a random slope would be needed to capture the feature that some of the individual trajectories are increasing over time while some are not. Although there appears to be some lack of fit, one needs to bear in mind that this could be due to the bias issue discussed in Example 1, where the effect is more prominent for the fitted mean trajectory. For individual trajectories the effect of bias due to informative dropout is less prominent but still exists.
For the survival outcome, the estimated α is 1.094; the extremely small p value indicates a strong association between the serum bilirubin and time-of-death. While the estimated φ is 0.074 with p value 0.680 suggesting that there is no direct effect of D-penicillamine on survival after adjusting for serum bilirubin, there is an indirect effect through the association of Dpenicillamine with serum bilirubin, as shown by the estimated γ 7 and γ 10 . D-penicilamine is associated with lower serum bilirubin, which is in turn associated with improved survival.

Summary
Joint modeling has emerged as a key tool in data analysis as evident by the addition of several recent software tools. We have briefly mentioned a number of software tools in Section 1 and here we present them with more details. The R package JM and joineR, as well as Stata module stjm can be considered as the first generation of software tools for joint modeling, allowing the joint analysis of a single continuous longitudinal outcome and a single survival outcome based on a mixed effects model for the longitudinal outcome and a relative risk model for the survival outcome. Each of them has its own characteristics, e.g., JM and joineR can model competing risks survival outcome while JM and stjm offers various association structures to link the survival and longitudinal processes. The more recent software tools try to explore the functionality of joint modeling from different aspects. For instance, the R package JMBayes fits joint models under a Bayesian approach using Markov chain Monte Carlo algorithms; the R package joineRML extends the joineR package to incorporate the joint modeling of a survival outcome and multiple continuous longitudinal outcomes by a Monte Carlo EM algorithm. Other than these tools originally designed for joint modeling, some tools designed for other purposes have also been extended to fit joint models of survival and longitudinal outcomes. For example, the R package frailtypack has added two new functions, i.e., longiPenal() (joint model for longitudinal data and a terminal event) and trivPenal() (trivariate joint model for longitudinal data, recurrent events and a terminal event).
To summarize, this paper presents the R package JSM, which complements existing packages by offering a variety of joint models of survival and longitudinal data, most of which are not available elsewhere. The choice for the survival process is much broader compared to existing packages that typically are confined to the proportional hazards formulation. Package JSM is able to fit a whole class of transformation models, including the Cox proportional hazards model and the proportional odds model as special cases. Two different ways to model the longitudinal component (i.e., the linear mixed effects model and the non-parametric multiplicative random effects model) further distinguish JSM from existing packages. Moreover, a number of the package's internal functionalities are written in C++ in the background to improve computational speed. Last but not least, JSM provides reliable standard error estimates for valid statistical inference.
The JSM package is being actively curated and developed further to include additional features. Possible future developments are more sophisticated model selection methods, the capability to handle categorical and multiple longitudinal outcomes, and the incorporation of more complicated censoring patterns of the survival outcome.

A. Effect of the number of quadrature points
Here we present a numerical study to explore the performance of the two model-fitting functions, i.e., jmodelTM() and jmodelMult(), running with different number of quadrature points. The results, obtained by running on an Intel Core i7 4.0 GHz iMac with 16GB DDR3, are summarized in Tables 1 and 2. Note that nknot is always odd as an odd number of quadrature points provides higher degree of precision. From Table 1, we observe that the parameter estimates and log-likelihood values are quite stable under varying nknot. Specifically, the model fits with nknot = 7 and code = 11 yield exactly the same parameter estimates. On the other hand, Table 2 indicates that although not as stable as jmodelTM(), the model fits by jmodelMult() with nknot = 11 and nknot = 15 yield very similar results.