Estimation of extended mixed models using latent classes and latent processes: the R package lcmm

The R package lcmm provides a series of functions to estimate statistical models based on linear mixed model theory. It includes the estimation of mixed models and latent class mixed models for Gaussian longitudinal outcomes (hlme), curvilinear and ordinal univariate longitudinal outcomes (lcmm) and curvilinear multivariate outcomes (multlcmm), as well as joint latent class mixed models (Jointlcmm) for a (Gaussian or curvilinear) longitudinal outcome and a time-to-event that can be possibly left-truncated right-censored and defined in a competing setting. Maximum likelihood esimators are obtained using a modified Marquardt algorithm with strict convergence criteria based on the parameters and likelihood stability, and on the negativity of the second derivatives. The package also provides various post-fit functions including goodness-of-fit analyses, classification, plots, predicted trajectories, individual dynamic prediction of the event and predictive accuracy assessment. This paper constitutes a companion paper to the package by introducing each family of models, the estimation technique, some implementation details and giving examples through a dataset on cognitive aging.


Introduction
The linear mixed model (Laird and Ware 1982;Verbeke and Molenberghs 2000;Hedeker and Gibbons 2006;Fitzmaurice, Davidian, Verbeke, and Molenberghs 2009) has become a standard statistical method to analyze change over time of a longitudinal Gaussian outcome and assess the effect of covariates on it. Yet, longitudinal data collected in cohort studies may be too complex to enter the framework of the linear mixed model: -the longitudinal outcomes are not necessarily Gaussian but possibly binary, ordinal or continuous but asymmetric (e.g., absence/presence of symptoms, psychological scale); -not only one but several longitudinal outcomes may be collected, especially when the interest is in a biological or psychological process that can not be measured directly (e.g., quality-of-life, cognition, immune response); -the longitudinal process may be altered by the occurence of one or multiple times-toevent (e.g., death, onset of disease, disease progression); arXiv:1503.00890v1 [stat.CO] 3 Mar 2015 -a non-observed heterogeneity may exist in the population (e.g., responders/non responders, patterns of subjects linked to an unknown behaviour/disease risk/set of genes).
The study of cognitive decline in the elderly gathers all these complexities together. Indeed, cognition which is the central longitudinal process in aging is not directly observed. It is measured by one or multiple psychometric tests collected repeatedly at cohort visits. These tests are not necessarily Gaussian variables; they can be sumscores, items, ratios. They are usually bounded with asymmetric distributions or even ordinal (Proust-Lima, Dartigues, and Jacqmin-Gadda 2011). In addition, cognitive decline is very associated with onset of dementia and death so that it may be necessary to jointly model these processes (Proust-Lima, Joly, and Jacqmin-Gadda 2009). Finally, a population of elderly subjects usually mixes groups of subjects with different types of cognitive trajectory (e.g., normal and pathological aging toward dementia) (Proust and Jacqmin-Gadda 2005).
The aim of the R package lcmm is to provide estimation functions that address these different extensions of the linear mixed model. The package is named lcmm in reference to the Latent Class Mixed Models as all the estimation functions can handle heterogeneity through latent classes of trajectory. However, beyond latent class mixed models, the package provides estimation functions for mixed models involving latent processes, and joint models.
The estimation relies on the Maximum Likelihood Theory with a powerful modified Marquardt iterative algorithm (Marquardt 1963), a Newton-Raphson like algorithm (Fletcher 1987), and involves strict convergence criteria. Post-fit functions are also provided in order to assess the goodness-of-fit of the models, their predictive accuracy, as well as to compute outputs and average or individual predictions from the models. In its current version (V1.7.2), lcmm includes four main estimation functions: hlme, lcmm, multlcmm and Jointlcmm.
The paper is organized as follows. Section 2 describes the different mixed models implemented in the package. Section 3 details the common estimation process. Section 4 provides a list of post-fit analyses and outputs. Section 5 describes the functions implementation. Section 6 provides a series of examples based on the paquid dataset available in the package, and section 7 concludes.

Extended Mixed Models
The package estimates models from the linear mixed model theory. The first three subsections detail the standard linear mixed model, its extension to other types of longitudinal outcomes and its extension to the multivariate setting. The fourth subsection is dedicated to the extension to heterogeneous populations with the latent class mixed model theory that applies to all the models estimated within lcmm. Finally, the joint latent class mixed model for jointly analyzing a longitudinal marker and a time-to-event is introduced.

The Linear Mixed Model
For each subject i in a sample of N subjects, let consider a vector of n i repeated measures Y i = (Y i1 , ..., Y ij , ..., Y in i ) T where Y ij is the outcome value at occasion j that is measured at time t ij . We distinguish the time of measurement t ij from the occasion j because an asset of the linear mixed model is that the times and the number of measurements can vary from a subject to the other. This makes it possible for example to include subjects with intermittent missing data and/or dropout, or to consider the actual individual time of measurement rather than the planned visit, which in some application can greatly differ.
Following Laird and Ware (1982), we define the linear mixed model as follows: where X Li (t ij ) and Z i (t ij ) are two vectors of covariates at time t ij of respective length p and q.
The vector X Li (t ij ) is associated with the vector of fixed effects β and Z i (t ij ), which includes typically functions of time t ij , is associated with the vector of random-effects u i . Shapes of trajectories considered in X Li and Z i can be of any type (polynomial (Proust and Jacqmin-Gadda 2005), specifically designed to fit the trajectory (Proust-Lima and Taylor 2009), or approximated using a basis of splines).
The vector u i of q random-effects has a zero-mean multivariate normal distribution with variance-covariance matrix B, where B is an unspecified matrix. The measurement errors ij are independent Gaussian errors with variance σ 2 . Finally, the process (w i (t)) t∈R ) is a zeromean Gaussian stochastic process (e.g., Brownian motion with covariance cov(w i (t), w i (s)) = σ 2 w min(t, s) or a stationary process with covariance cov(w i (t), w i (s)) = σ 2 w exp(−ρ|t − s|). The vector of parameters to estimate is (β T , vec(B) T , σ w , ρ, σ ) T where vec(B) is the vector of parameters involved for modelling the symmetric positive definite matrix B. In the package, vec(B) will be the q standard errors of the random-effects in case of a diagonal matrix B or the q(q+1) 2 -vector of parameters in the Cholesky transformation of B.

The Latent Process Mixed Model
The linear mixed model applies to longitudinal markers that are continuous, have Gaussian random deviations (random-effects, correlated errors and measurement errors) and it assumes that the covariate effects are constant (β) along the entire range of the marker values. In practice, these assumptions do not hold for many longitudinal outcomes, especially psychological scales. The generalized linear mixed model extends the theory to binary, ordinal or Poisson longitudinal outcomes (Hedeker and Gibbons 2006;Fitzmaurice et al. 2009). In order to study non Gaussian longitudinal markers, we chose another direction by defining a family of mixed models called the latent process mixed models (Proust, Jacqmin-Gadda, Taylor, Ganiayre, and Commenges 2006;Proust-Lima, Amieva, and Jacqmin-Gadda 2013). Coming from the latent variable framework, this approach consists in separating the structural model that describes the quantity of interest (a latent process) according to time and covariates from the measurement model that links the quantity of interest to the observations.
The latent process Λ i (t) is defined in continuous time according to a standard linear mixed model without error of measurement: where X Li (t), Z i (t), β, u i and w i (t) are defined in section 2.1.
In order to take into account different types of longitudinal markers, a flexible nonlinear measurement model is defined between the latent process Λ i (t ij ) and the observed value Y ij at the measurement time t ij : where ij are independent Gaussian measurement errors with variance σ 2 , H is a parameterized link function andỸ ij denotes the noisy latent process at time t ij .
For a quantitative marker, H −1 is a monotonic increasing continuous function. Are currently implemented: -the linear transformation that reduces to the Gaussian framework of the linear mixed model: -the rescaled cumulative distribution function (CDF) of a Beta distribution: dx, B(η * 1 , η * 2 ) is the complete Beta function. For positiveness properties of canonical parameters η * 1 and η * 2 and computation reasons, the Beta distribution is parameterized as follows: η * 1 = e η 1 e η 2 (1 + e η 1 ) and with the constant Y > 0 and min(Y) and max(Y) the (theoretical or observed) minimum and maximum values of Y.
-a basis of quadratic I-splines with m knots: the basis of I-splines (Ramsay 1988).
For an ordinal or binary marker (with M levels), equation (3) reduces to a probit (cumulative) for the noisy latent process.
Latent process mixed models need two constraints to be identified: one on the location of the latent process managed by the intercept effect β 0 = 0 and one for the scale of the latent process managed by σ 2 = 1. So the vector of parameters to estimate is (β T , vec(B) T , σ w , ρ, η T ) T where vec(B) is defined in section 2.1.

The Latent Process Mixed Model for multivariate longitudinal markers
The concept of separation between the structural model for the underlying quantity of interest and the measurement model for its observations applies naturally to the case where multiple longitudinal markers of the quantity of interest are observed. It consists in assuming that the underlying latent process (Λ i (t)) defined in section 2.2 generated K longitudinal markers instead of a unique one. In this case, the latent process can be seen as the common factor underlying the markers (Proust et al. 2006;Proust-Lima et al. 2013).
In this multivariate mixed model, the structural model for (Λ i (t)) according to time and covariates is exactly the same as defined in (2) but the measurement model defined in (3) is extended to the multivariate setting in order to take into account the specific relationship between the underlying latent process and each longitudinal marker.
Let denote Y kij the measure of marker k (k = 1, ..., K) for subject i (i = 1, ..., N ) at occasion j (j = 1, ..., n ki ). The corresponding time of measurement is denoted t kij . We note that the number of repeated measurements n ki and the times of measurement can differ according to the subject and the marker for more flexibility.
The measurement model now takes into account two aspects of the measure: -the measurement error is accounted for through the definition of the intermediate vari- where X Y i (t kij ) are covariates with a marker-specific effect γ k called contrast as K k=1 γ k = 0. As in the Item Response Theory with the Differential Items Functioning (Clauser and Mazor 1998), they capture a differential marker functioning that could have induced a measurement bias if not taken into account. The random intercept b ki also captures a systematic deviation for each subject that would not be captured by covariates; b ki ∼ N (0, σ 2 α k ). The independent random measurement error kij ∼ N (0, σ 2 k ). -the marker-specific nonlinear relationship with the underlying quantity of interest is modelled through the marker-specific link function H k : where each H k is defined using a rescaled Beta CDF, I-splines or a linear transformation as detailed in section 2.2.
As in the univariate version of the latent process mixed model, two constraints are required to obtain an identified model. In the multivariate setting, the dimension of the latent process is constrained by the intercept β 0 = 0 (for the location) and the variance of the random intercept Var(u i0 ) = 1 (for the scale) rather than the standard error of one marker-specific residual error. As a consequence, a random intercept is required and no mean intercept is allowed in the structural model defined in (2).
The vector of parameters to estimate is now (

The Latent Class Linear Mixed Model
The linear mixed model assumes that the population of N subjects is homogeneous and described at the population level by a unique profile X Li (t) T β. In contrast, the latent class mixed model consists in assuming that the population is heterogeneous and constituted of G latent classes of subjects characterized by G mean profiles of trajectories.
Each subject belongs to one and only one latent class so that the latent class membership is defined by a discrete random variable c i that equals g if subject i belongs to latent class g (g = 1, ..., G). The variable c i is latent; its probability is described using a multinomial logistic model according to covariates X ci : where ξ 0g is the intercept for class g and ξ 1g is the q 1 -vector of class-specific parameters associated with the q 1 -vector of time-independent covariates X ci . For identifiability, the scalar ξ 0G = 0 and the vector ξ 1G = 0. When no covariate predicts the latent class membership, this model reduces to a class-specific probability.
The G mean profiles are defined according to time and covariates through latent class specific mixed models. The difference with a standard linear mixed model is that both fixed effects and the distribution of the random-effects can be class-specific. For a Gaussian outcome, the linear mixed model defined in (1) becomes for class g: where X Li (t ij ) previously defined is splitted in X L1i (t ij ) with common fixed effects β over classes and X L2i (t ij ) with class-specific fixed effects υ g . The vector Z i (t ij ) is still associated with the individual random-effects u i | c i =g called u ig in equation (7) whose distributions is now class-specific. In class g, they have a zero-mean multivariate normal distribution with variance-covariance matrix ω 2 g B, where B is an unspecified variance covariance matrix and ω g is a proportional coefficient (ω G =1 for identifiability) allowing for a class-specific intensity of individual variability. The autocorrelated process w i (t) and the errors of measurement ij are the same as in section 2.1.
This extension of the linear mixed model also applies to the latent process mixed model described in sections 2.2 and 2.3 by replacing the structural model in (2) by: The location constraint for this model becomes β 01 = 0 that is the mean intercept in the last class is constrained to 0. The scale constraint remains unchanged. The measurement models remain the same by assuming the heterogeneity in the population only affects the underlying latent process of interest. The vector of parameters to estimate defined in sections 2.1, 2.2 and 2.3 include now also ((ξ 0g , ξ T 1g ) g=1,G−1 , (υ T g ) g=1,G−1 , (ω g ) g=1,G−1 ).

The Joint Latent Class Mixed Model
The linear mixed model assumes the missing data are at random (Little 1995) that is the probability that a value of Y is missing is explained by the observations (dependent markers Y and covariates X). When this assumption does not hold, the longitudinal process and the missing data process can be simultaneously modelled in a so called joint model. More generally, it is usual that the longitudinal process is associated with a survival process (e.g. disease onset, death, disease progression), and the joint model captures this correlation to provide valid inference.
Among joint models, one can distinguish two families: the shared random-effect models (see Rizopoulos 2012) in which functions of the random-effects from the linear mixed model are included in the survival model, and the joint latent class model (Lin, Turnbull, McCulloch, and Slate 2002;Proust-Lima, Sène, Taylor, and Jacqmin-Gadda 2014b). The latter is a direct extension of the latent class mixed model described in section 2.4.
It consists in assuming that each of the G latent classes of subjects is characterized by a classspecific linear mixed model for the longitudinal process and a class-specific survival model for the survival process. As such, it is constituted of three submodels: the multinomial logistic model defined in (6), the class-specific linear mixed model defined in (7) or the class-specific latent process mixed model defined in (8) and (3), and finally the class-specific survival model defined below.
Let T * i denote the time-to-event of interest,T i the censoring time, T i = min(T * i ,T i ) and In latent class g, the risk of event is described using a proportional-hazard model: where X Si1 and X Si2 are vectors of covariates respectively associated with the vector of parameters common over classes ν and of class-specific parameters δ g . The class-specific baseline hazard is defined according to a vector of parameters ζ g . It can be stratified on the latent class structure (λ 0g (t) = λ 0 (t; ζ g )) or proportional in each latent class (λ 0g (t) = λ 0 (t; ζ * )e ζg with e ζg the proportional factor and ζ G = 0). A series of parametric baseline risk functions parameterized by a vector ζ are considered: -Weibull specified either by λ 0 (t; ζ) = ζ 1 ζ 2 t ζ 2 −1 or λ 0 (t; ζ) = ζ 1 ζ 2 (ζ 1 t) ζ 2 −1 depending on the transformation used to ensure parameters positivity; -piecewise constant specified by λ 0 (t; ζ) = nz−1 l=1 ζ l 1 t∈[t l ,t l+1 ] with n z the number of knots; -cubic M-splines specified by λ 0 (t; ζ) = nz+2 l=1 ζ l M l (t) with n z the number of knots and (M l (t)) l=1,...nz+2 the basis of cubic M-splines ).
In these three families of baseline risk functions, parameters are restricted to be positive. This was ensured in practice by a square transformation or an exponential transformation (see paragraph 4.4).
Instead of a unique cause of event, multiple causes of event can be considered in a competing setting (Proust-Lima, Dartigues, and Jacqmin-Gadda 2014a). This is achieved by denoting T * ip the time to the event of cause p (p = 1, ..., P ), andT i the time to censoring so that T i = min(T i , T * i1 , ..., T * iP ) is observed with indicator E i = p if the event of nature p occurred first or E i = 0 if the subject was censored before any occurrence. In this case, the causespecific and class-specific proportional-hazard model is: where covariates X Si2 and effects ν p and δ gp can be cause-specific, as well as baseline risk functions λ 0gp . The cause-specific baseline risk functions can be stratified on the latent classes or proportional across latent classes as when one cause is modelled.

Estimation
These extended mixed models can be estimated within the maximum likelihood framework. For each model, we note θ G the entire vector of parameters involved in the model as estimation is performed at a fixed number G of latent classes (G = 1 for the homogeneous case). The log-likelihood l(θ G ) = N i=1 log(L i (θ G )) with L i the individual contribution to the likelihood of the model considered.

Linear mixed model
The individual contribution to the likelihood of a linear mixed model as defined in section 2.1 is: with φ i the density function of a multivariate normal distribution with mean µ i = X Li β and variance V i = Z i BZ T i + R i + Σ i with X Li and Z i the design matrices with row j vectors X Li (t ij ) T and Z i (t ij ) T , Σ i = σ 2 I n i with I n the identity matrix of size n and R i the variancecovariance matrix for the stochastic process (w i (t)). For example, for element j, j and a Brownian motion,

Latent process mixed model
For continuous link functions, the individual contribution to the likelihood of a latent process mixed model as defined in section 2.2 is: where φ i is the same density function of a multivariate normal variable as defined in equation (11), and J is the Jacobian determinant of the inverse of the link function, that is the derivative of the linear transformation, the rescaled Beta CDF or the quadratic I-splines.
For ordinal and binary link functions (with M levels), the individual contribution to the likelihood of a latent process mixed model as defined in section 2.2 is defined conditionally to the random-effects and as such, no stochastic process is considered for the moment (∀t, w i (t) = 0): where φ u is the density function of a zero-mean multivariate normal variable with variancecovariance matrix B and P (Y ij |u i ; with Φ the CDF of a standard Gaussian variable. In the presence of random-effects, the integral over the random-effects distribution in (13) needs to be evaluated numerically. This is done using either the univariate Gauss-Hermite quadrature with 30 points in the presence of a unique random-effect or using the multivariate Gauss-Hermite quadrature implemented by Genz and Keister (1996) otherwise.

Latent process mixed model for multivariate longitudinal markers
Currently, only continuous link functions were considered in the multivariate version of the latent process mixed model defined in section 2.3. In this case, the individual contribution to the likelihood is: where J k is the Jacobian determinant of the link function H −1 k and φ i is the density function of a multivariate normal variable with mean In these definitions, the matrices Z ik , X Lik and X Y ik have row vectors Z i (t kij ), X Li (t kij ) and X Y i (t kij ) for j = 1, ..., n ik , , J n the n × n-matrix of elements 1, and I n the n × n-identity matrix.

Latent class linear mixed model
The individual contribution to the likelihood of a latent class linear mixed model as defined in section 2.4 is: where π ig is given in (6) and φ ig is the density function of a multivariate normal distribution with mean µ ig = X L1i β + X L2i υ g and variance V ig = Z i B g Z T i + R i + Σ i and X L.i is the matrix with row j vector X L.i (t ij ).
Individual contributions to latent process mixed models with latent classes for one or multiple longitudinal markers are obtained by replacing φ ig in (15) by the individual contribution given in (12), (13) or (14) with appropriate class-specific parameters.

Joint latent class mixed model
The individual contribution to the likelihood of a joint latent class mixed model as defined in section 2.5 for a P -cause right-censored time to event is: where π ig and φ ig are defined in (15), λ p (t | c i = g; θ G ) is the cause-p-specific instantaneous hazard defined in (10) and A p (t | c i = g; θ G ) is the corresponding cumulative hazard.
With curvilinear longitudinal outcomes, φ ig is replaced by the individual contribution given in (12) with appropriate class-specific parameters.
In the case of a left-truncated time-to-event with delayed entry at time T 0i , the contribution for the truncated data becomes with the marginal survival function in

Iterative Marquardt Algorithm
The log-likelihoods of models based on the mixed model theory can be maximized using algorithms in the EM family (e.g., Verbeke and Lesaffre (1996); Muthén and Shedden (1999); Xu and Hedeker (2001) for latent class mixed models) or the Newton-Raphson family (e.g., (Proust and Jacqmin-Gadda 2005) for latent class mixed models). In our work, whatever the type of model, we chose the latter using an extended Marquardt algorithm because of better convergence rate and speed found in previous analyses.
In the extended Marquardt algorithm, the vector of parameters θ G is updated until convergence using the following equation for iteration l + 1 : The step δ equals 1 by default but is internally modified to ensure that the log-likelihood is improved at each iteration. The matrixH is a diagonal-inflated Hessian to ensure positivedefiniteness: if necessary diagonal termsH ii are inflated so thatH where H is the Hessian matrix with diagonal terms H ii , and λ and η are initially fixed at 0.01 and are reduced ifH is positive definite and increased if not. ∇(L(θ (l) G )) is the gradient of the log-likelihood at iteration l. First derivatives are computed by central finite differences with steps 2 × max(10 −7 , 10 −4 |θ Gv |) for parameter v. Second derivatives are computed by forward finite differences with steps max(10 −7 , 10 −4 |θ Gu |) and max(10 −7 , 10 −4 |θ Gv |) for parameters u and v.
Three convergence criteria are used: -one based on the parameters stability n θ -one based on the log-likelihood stability |L (l) − L (l−1) | ≤ b ; -one based on the size of the derivatives The default values are a = b = d = 10 −4 . The thresholds might seem relatively large but the three convergence criteria must be simultaneously satisfied for convergence and the criterion based on derivatives is very stringent so that it ensures a good convergence even at d = 10 −4 . A drawback of other algorithms may be that they only converge according to likelihood or parameters stability, and that in complex settings such as latent class mixed models or joint latent class mixed models, the log-likelihood can be relatively flat in some areas of the parameters space so that likelihood or parameters stability does not ensure systematically convergence to an actual maximum.
An estimate of the variance-covariance matrix of the maximum likelihood estimates V (θ G ) is provided by the inverse of the Hessian matrix.

Implementation
The package includes currently 4 estimation functions which calls are described below. Linear mixed models and latent class linear mixed models are estimated with hlme. Univariate latent process mixed models possibly including latent classes are estimated with lcmm. Multivariate latent process mixed models possibly including latent classes are estimated with multlcmm, and joint latent class mixed models are estimated with Jointlcmm. These 4 estimation functions rely on estimation programs (log-likelihood computation, optimization algorithm) written in Fortran90. Function print systematically provides a brief summary of the estimation, function summary further provides a table of the maximum likelihood estimates with estimated standard errors.

hlme call
The call of hlme is hlme(fixed, mixture, random, subject, classmb, ng = 1, idiag = FALSE, nwg = FALSE, cor=NULL, data, B, convB=0.0001, convL=0.0001, convG=0.0001, prior, maxiter=500, subset=NULL, na.action=1) Argument fixed defines the two-sided formula for the linear regression at the population level with the dependent variable (Y ) on the left-hand side and the combination of covariates (X L1 and X L2 ) with fixed effects on the right-hand side. Argument random defines the one-sided formula with the covariates having a random-effect (Z). Argument subject provides the name of the identification variable for the random-effects. Argument ng indicates the number of latent classes G. When G > 1, mixture indicates a one-sided formula with the subset of covariates having a class-specific effect (X L2 ) and classmb provides the optional covariates explaining the latent class membership (X c ). An optional argument prior provides a vector of a priori known class-memberships when relevant (very rare).
Argument idiag indicates whether the variance-covariance matrix for the random-effects (B) is diagonal (TRUE) or unstructured (FALSE by default), the boolean nwg indicates whether the matrix B is proportional over classes (ω g = 1 , g = 1, G − 1), and cor indicates the nature of the optional zero-mean Gaussian stochastic process, either a Brownian motion with cor=BM(time) or a stationary process with cor=AR(time) with time the time variable (by default, none is included). Argument data provides the name of the dataframe containing the data in the longitudinal format, that is with n i raws by subject (or max k (n ik ) for multlcmm).
Optional subset provides the vector of rows to select in the dataframe and na.action is an indicator for the management of missing data which are omitted by default.
Argument B provides the optional vector of initial values. Arguments convB, convL, convG indicate the thresholds for convergence criteria on the parameters, the log-likelihood and the derivatives respectively. Finally, argument maxiter indicates the maximum number of iterations in the optimization algorithm.

lcmm call
The call of lcmm has the same structure as the one of hlme. It is lcmm(fixed, mixture, random, subject, classmb, ng = 1, idiag = FALSE, nwg = FALSE, link = "linear", intnodes = NULL, epsY = 0.5, cor=NULL, data, B, convB = 1e-04, convL = 1e-04, convG = 1e-04, maxiter=100, nsim=100, prior, range=NULL, subset=NULL, na.action=1) Most of the arguments are detailed in section 4.1. Arguments are added to specify the link function (H or H −1 ) in equation (3). Argument link indicates the nature of the link function, either link="linear" for a linear transformation, link="beta" for a rescaled Beta CDF, link="thresholds" for the cumulative probit model or link="splines" for a I-splines transformation. In the case of a splines transformation, the number and location of the knots (5 knots placed at quantiles of Y by default) can be precised by link="X-type-splines" where X is the total number of knots (X> 2) and type is equi, quant or manual for knots placed at regular intervals, at the percentiles of Y distribution or knots entered manually. In the latter case, argument intnodes provides the vector of internal knots. Optional argument epsY provides the constant used to rescale Y with the rescaled Beta CDF. Optional range indicates the range of Y that should be considered when different from the observed one in the data and when using Splines or Beta CDF transformations only. Finally, nsim indicates the number of equidistant values within the range of Y at which the estimated link function should be computed in output.

multlcmm call
The call of multlcmm uses the same structures as those of hlme and lcmm. It is multlcmm(fixed, mixture, random, subject, classmb, ng = 1, idiag = FALSE,nwg = FALSE, randomY=FALSE, link = "linear", intnodes = NULL, epsY = 0.5, cor=NULL, data, B, convB = 1e-04, convL = 1e-04, convG = 1e-04, maxiter=100, nsim=100, prior,range=NULL, subset=NULL, na.action=1) To account for the multivariate nature of the model estimated by multlcmm, the left-hand side of fixed formula includes the sum of all the dependent variables names, and the right-hand side can now include covariates with a mean effect on the common factor or marker-specific effects in addition to the mean effect (with contrast(X) instead of X). When the family of link functions is not the same for all dependent variables, a vector of link function names is provided in link. Argument intnodes provides now the vector of internal knots for all the Splines link functions involving knots entered manually. Argument range possibly indicates the range of the dependent variables with transformations Splines or Beta CDF when it differs from the one observed in the data.
The boolean argument randomY is the only specific argument of multlcmm. It indicates whether marker-specific random intercepts (b ki ) should be considered in the measurement model defined in (4).

Jointlcmm call
The call of Jointlcmm is Jointlcmm(fixed, mixture, random, subject, classmb, ng=1, idiag=FALSE, nwg=FALSE, survival, hazard="Weibull", hazardtype="Specific", hazardnodes=NULL,TimeDepVar=NULL, link=NULL, intnodes=NULL, epsY=0.5, range=NULL, cor=NULL, data, B, convB=1e-4, convL=1e-4, convG=1e-4, maxiter=100, nsim=100, prior, logscale=FALSE, subset=NULL, na.action=1) Most arguments of Jointlcmm call are the same as defined in hlme or lcmm calls. Arguments defining the class-specific survival model are added. Argument survival is a two-sided formula that defines the structure of the survival model. The left-hand side includes a Surv object as defined in survival package (Therneau 2013). The right-hand side indicates the covariates involved in the survival model with mixture(X) when covariate X has a class-specific effect, cause(X) when X has a different effect on all the causes of event or cause1(X) when X has an effect on cause of type 1 (similar functions for causes 2 to P ). Argument hazard indicates the family of the baseline risk functions or the vector of families of the cause-specific baseline risk functions in the presence of competing events. The program includes "Weibull" for 2-parameter Weibull hazards, "piecewise" for piecewise constant hazards or "splines" for hazard approximated using M-splines. By default, "piecewise" and "splines" consider 5 regular knots within the range of event times. The first knot is at the minimum time of entry and the last knot is at the maximum observed time. The number and locations of the knots can be specified by indicating "X-type-piecewise" or "X-type-splines" where X is the total number of knots (X> 2) and type is equi, quant or manual for knots placed at regular intervals, at the percentiles of the event times distribution or knots entered manually. In the latter case, argument hazardnodes provides the corresponding vector of internal knots. Argument hazardtype indicates whether the baseline risk functions are stratified on the latent classes (hazardtype="Specific") or proportional across latent classes (hazardtype="PH") or common over latent classes (hazardtype="Common").
Two parameterizations are implemented to ensure the positivity of the parameters of the baseline risk functions: logscale=TRUE uses the exponential transformation and logscale=FALSE uses the square transformation. For Weibull, these parameterizations also imply two different specifications of the baseline hazard λ 0 (t; ζ) = ζ 1 ζ 2 t ζ 2 −1 with logscale=TRUE (by noting ζ * the vector of unconstrained parameters to estimate, ζ = exp(ζ * )) and λ 0 (t; ζ) = ζ 1 ζ 2 (ζ 1 t) ζ 2 −1 with logscale=FALSE (by noting ζ * the vector of unconstrained parameters to estimate, ζ = (ζ * ) 2 ). Indeed, depending on the range of times to events, one specification or the other may be better suited to ensure convergence of the program. The optional argument nsim indicates the number of points within the range of event times at which the estimated baseline risk functions and cumulative risk functions should be computed in output.

Initial values
Iterative estimation algorithms need to be initialized using a set of initial values for the vector of parameters θ. In the four estimation functions, default initial values are generated when argument B is not given in input. We list here the default initial values provided in all the functions of lcmm package when there is only one latent class (by default) in table 1.
In the presence of at least two latent classes, initial values are crucial for the correct convergence of the program so that a specific attention should be put on this section. Indeed, in mixture modelling, the log-likelihood may have multiple maxima and algorithms based on maximisation of the likelihood might converge to local maxima (Redner and Walker 1984). This means that convergence towards the global maximum of the log-likelihood is not ensured when running the algorithm only once. To ensure the convergence to the global maximum, Table 1: Automatic choice of initial values for the iterative estimation process when G = 1.Ȳ , med(Y ) and min(Y ) respectively indicates the mean, median and minimum of the dependent variable, (T i , E i ) the couple of survival data, n z the number of knots for the baseline risk functions.

Parameters
Initial value Linear mixed model in hlme, lcmm, multlcmm, Jointlcmm Link functions in lcmm, multlcmm, Jointlcmm (for k = 1, ..., K) η k for link="linear" (Ȳ ,1) η k for link="Splines" (-2,0.1,...,0.1) η k for link="Beta" Survival or cause-specific model in Jointlcmm (for p = 1, ..., P ) we thus strongly recommend to run each model various times from different sets of initial values (typically from a grid of initial values) entered manually. We are aware that this step is difficult to apprehend but we chose at the beginning not to provide automatic grid search and are currently working on alternative strategies.
For an easier discovery of the program, we propose an automatic choice of the set of initial values. However, this automatic choice does not ensure convergence towards the global maximum, may be inapropriate for a specific analysis, and thus should only be used in a first attempt. For a model with G > 1, this automatic procedure consists in deriving the initial values of the parameters from the estimation obtained under the assumption that G = 1. For all class-specific parameter generically called here θ g , initial value θ are the corresponding estimated parameter and standard error under the G = 1 assumption. There are exceptions for ξ 0g and ξ 1g (for g = 1, ..., G − 1) set to 0, ω g (for g = 1, ..., G − 1) set to 1 and the proportional coefficients over classes for hazardtype="PH" in Jointlcmm set to g 2 for g = 1, ..., G − 1. This automatic choice of initial values implies that when ng>1 and B=NULL, two models are actually estimated (the model under 1 class in addition to the one called). This can substantially increase the estimation time.

Post-fit computations
A series of post-fit analyses and computations is available in the package. In the following, the symbol hat (ˆ) denotes the value of a parameter/vector/matrix/function computed at the maximum likelihood estimatesθ G .

Maximum Likelihood Estimates
This subsection applies to the four estimation functions. The table of the maximum likelihood estimates along with their estimated standard error are given in function summary. The vector is directly given by function estimates or in output value best.
The estimated variance-covariance matrix of the maximum likelihood estimates is given in function VarCov and in output value V. In the latter, the upper triangular matrix is given as a vector.
The parameters of the variance-covariance matrix of the random-effects are not directly estimated although they are provided in the summaries. The Cholesky parameters used for the estimation are available in output vector cholesky or in function estimates. Estimated standard-errors of the parameters of the variance-covariance matrix are computed in function VarCovRE.
Function WaldMult provides univariate and multivariate Wald tests for combinations of parameters from hlme, lcmm, multlcmm or Jointlcmm objets.

Posterior classification
In models involving latent classes, a posterior classification of the subjects in each latent class can be done. It is based on the posterior calculation of the class-membership probabilities.
It is used to characterize the classification of the subjects as well as to evaluate the goodnessof-fit of the model (Proust-Lima et al. 2014b).

Class-membership posterior probabilities and classification
The posterior class-membership probabilities are computed using the Bayes theorem as the probability of belonging to a latent class given the information collected. In a longitudinal model, they are defined for subject i and latent class g aŝ In a joint latent class model, the complete information also includes the time-to-event so that for subject i and latent class g, the posterior class-membership probability can also be defined for subject i and latent class g aŝ A posterior classification can be obtained from these posterior probabilities by assigning for each subject the latent class in which he has the highest posterior class-membership probability (ĉ i = argmax g (π In hlme, lcmm and multlcmm objects, the output table pprob provides the posterior probabilitiesπ

Posterior classification
The posterior classification can be used to assess the goodness-of-fit of the model (for the selection of the number of latent classes for instance) and the discrimination of the latent classes. Many indicators can be derived from it (Proust-Lima et al. 2014b). The package lcmm provides two indicators in the function postprob: -the proportion of subjects classified in each latent class with a posterior probability above 0.7, 0.8 and 0.9. This indicates the proportion of subjects not ambiguously classified in each latent class.
-the posterior classification table as defined in table 2 which computes the mean of the posterior probabilities to belong to the latent class among the subjects classified a posteriori in each latent class. A perfect classification would provide ones in the diagonal and zeros elsewhere. In practice, high diagonal terms indicate a good discrimination of the population.

Final
Mean of the probabilities of belonging to each class

Longitudinal predictions and residuals
The four estimation functions rely on the linear mixed model theory, and as such empirical bayes estimates and longitudinal predictions are naturally derived.

Empirical Bayes Estimates of the random-effects
Empirical bayes estimates of the random-effects u i are provided in output of the four estimation functions with the output table predRE.
For a standard linear mixed model defined in equation (1), these empirical Bayes estimates In the latent process mixed models defined by equations (2) and (3) (for the univariate case) or (4) (for the multivariate case), the empirical bayes estimates are computed when only continuous link functions are assumed. In these cases, the random-effects are predicted in the latent process scale byû i =BZ T iV −1 i (Ŷ i − X Liβ ) whereŶ i is the vector of transformed marker valuesŶ ij = H −1 (Y ij ;η) for j = 1, ..., n i in the univariate case, or of transformed markers valuesŶ kij = H −1 k (Y kij ;η k ) with k = 1, ..., K and j = 1, ..., n ki in the multivariate case.
In models involving latent classes, class-specific empirical bayes estimates areû ig =ω 2 ) for latent process mixed models with continuous link functions (defined in (8)). In models involving latent classes, marginal empirical bayes estimates are obtained asû

Longitudinal predictions and residuals
Predictions and residuals of the linear mixed model are computed in the four estimation functions and provided in output table pred. Both subject-specific and marginal predictions/residuals are computed.
For hlme and Jointlcmm, marginal and subject-specific predictions are respectivelyŶ For G > 1, class-specific marginal and subject-specific predictions are respectivelyŶ To compute residuals, class-specific marginal and subject-specific predictions are averaged over latent classes asŶ ijg , and corresponding residuals are R For lcmm and multlcmm, the exact same predictions are computed and provide marginal and subject-specific predictions (Ŷ for multlcmm. Note that in these two functions, variable obs in table pred contains the transformed datẫ Y . By default (option which="residuals"), function plot provides graphs of the marginal and subject-specific residuals. With option which="fit", function plot provides graphs of the class-specific marginal and subject-specific mean evolutions with time and the observed classspecific mean evolution and its 95% confidence bounds. With which="fit", time is split in time intervals provided in input. When G > 1, the class-specific mean evolutions are weighted by the class-membership probabilities. kij ). When G=1, they are computed using a numerical integration of H(ỹ ij ;η) for a lcmm object or of H k (ỹ kij ;η k ) for a multlcmm object over the multivariate Gaussian distribution ofỹ i at the maximum likelihood. In these formula, a Newton algorithm is used to compute H or H k values. When G > 1, the same method is used but conditional to each latent class g. Numerical integrations are managed by a MonteCarlo method.

Special case of longitudinal predictions in the marker scale for latent process mixed models
For lcmm and thresholds link functions, fitY computes the marginal longitudinal predictions in the marker scale (Ŷ ij ) as: where Φ is the standard Gaussian cumulative distribution function.

Predicted mean trajectory according to a profile of covariates
The predicted mean trajectory of the markers Y according to an hypothetical profile of covariates can be computed (and represented). This is provided by the functions predictY and the plot function applied on predictY objects.
The computations are exactly the same as described previously, except that the longitudinal predictions are computed for an hypothetical (new) subject from a table containing the hypothetical covariate information required in X Li , Z i , X ci , X Si and referred to as X. The predicted mean vector of the marker E(Y |X = x,θ G ) is computed as the (class-specific) marginal vector of predictions for hlme and Jointlcmm. For lcmm and multlcmm objects, he predicted mean vector of values for the latent process are computed in function predictL and the predicted mean vector of values for the markers are computed in function predictY.
In the latter case, two numerical integrations can be specified: either a MonteCarlo method or a Gauss-Hermite technique. The latter neglects the correlation between the repeated measurements.
Instead of the mean trajectory at the point estimateθ G , the posterior prediction distribution can be approximated by a Monte Carlo method by computing the quantity for a large number of draws from the asymptotic distribution of the parameters N (θ G , V (θ G ))). Then the 2.5%, 50% and 97.5% provide the mean prediction and its confidence interval.

Link functions
This section is specific to lcmm and multlcmm.

Predicted link functions
Table estimlink provides the (inverse of the) link functions computed for a vector of marker values at the maximum likelihood estimatesη in outputs of lcmm and multlcmm. These estimated link functions can be plotted using function plot with option which="link" or which="linkfunction". Function predictlink further computes the 50%, 2.5% and 97.5% percentiles of the posterior distribution of the estimated (inverse) link functions using a Monte Carlo method (large set of draws from the asymptotic distribution of the parameters). This function can also be used to compute the estimated link function at specific marker values.

Discrete log-likelihood and derived criteria
In the case of ordinal outcomes with a large number of levels, the latent process mixed model estimated in function lcmm with continuous nonlinear link functions constitutes an approximation of the cumulative probit mixed model (a lot easier to estimate as it does not involve any numerical integration over the random-effects). However, it is important to assess whether the approximation is acceptable. This can be done by using the discrete loglikelihood and the derived information criteria: discrete AIC (Proust-Lima et al. 2013) and UACV (Commenges, Proust-Lima, Samieri, and Liquet 2015) that are computed with respect to the counting measure instead of the Lebesgues measure. These measures are computed in the summary and in output of lcmm.

Prediction of the event
This section is specific to Jointlcmm function.

Profile of survival functions according to covariates
Class-specific baseline risks are plotted in plot function with option which="baselinerisk" or which="hazard". Class-specific survival functions in the category of reference can also be plotted with plot and option which="survival" when there is a unique cause of event.
Otherwise, predicted cumulative incidences of a specific cause of event can be computed with cuminc and plotted with the associated plot.cuminc function for any profile of covariates given in input. Again, a Monte Carlo method is implemented to provide the 2.5%, 50% and 97.5% of the posterior distribution of the predicted cumulative incidences.

Individual dynamic predictions
Individual dynamic predictions as developed in , 2014b can be computed with dynpred and plotted with plot.dynpred. They consist in the predicted probability of event (of a cause p if multiple causes) in a window of time (s, s + t) computed for any subject according to his/her own longitudinal information collected up to time s that is Y , j = 1, ..., n i , such as t ij ≤ s}, X Si = {X Si1 , X Si2 } and X ci . For cause p, (= 1, ..., P ), it is: where the density of the longitudinal outcomes f (Y i , c i = g; θ G ) in class g, the classspecific membership probability P (c i = g|X ci ; θ G ) and the class-specific survival function S i (s|X Si , c i = g; θ G ) are defined similarly as in section 3.1.5. Finally, with a unique cause of event, the class-specific cumulative incidence is: With multiple causes of event (P > 1), the class-specific cause-specific cumulative incidence is: with λ p (t | c i = g; θ G ) the cause-p-specific instantaneous hazard defined in (10) and A p (t | c i = g; θ G ) the corresponding cumulative hazard. When P > 1, the cause-specific cumulative incidence requires the numerical computation of the integral. This is achieved by a 50-point Gauss-Legendre quadrature.
Individual dynamic predictions can be computed for any subject (included or not in the dataset used for estimating the model). Times s and t are respectively called the landmark time and the horizon of prediction. Computation in dynpred is performed for any vectors of landmark times and of horizons.
Individual predictions are computed either inθ or the posterior distribution is approximated using a Monte Carlo method with a large number of draws in which case the 2.5%, 50% and 97.5% provide the median prediction and its 95% confidence interval.

Assessment of predictive accuracy
Predictive accuracy of dynamic predictions based on Jointlcmm objects can be assessed using the prognostic information criterion EPOCE (Commenges, Liquet, and Proust-Lima 2012) implemented in the function epoce. Predictive accuracy is computed from a vector of landmark times on the subjects still at risk at the landmark time and the biomarker history up to the landmark time.
On external data, the function provides the Mean Prognostic Observed Log-Likelihood (MPOL) for each landmark time. When applied on the same dataset as used for the estimation, the function provides for each landmark time both the MPOL and the Cross-Validated Observed Log-Likelihood (CVPOL). The latter corrects the MPOL estimate for the possible over-optimism by approximated cross-validation. Further details on these estimators can be found in Commenges et al. (2012) and Proust-Lima et al. (2014b).
Predictive accuracy of two models can also be compared using Diffepoce which computes the difference in EPOCE estimators along with a 95% tracking interval. Functions epoce and Diffepoce include a plot functionality. Further predictive accuracy measures can be computed by using timeROC (Blanche, Proust-Lima, Loubère, Berr, Dartigues, and Jacqmin-Gadda 2014) on individual dynamic predictions computed by dynpred from a Jointlcmm object.

Examples
This section provides a series of examples on the paquid dataset which is provided in the lcmm R package. The first steps are to load lcmm and paquid data.

Paquid Data
paquid dataset consists of a random subsample of 500 subjects (identified by ID) from the Paquid prospective cohort study (Letenneur, Commenges, Dartigues, and Barberger-Gateau 1994) that aimed at investigating cerebral and functional aging in southwestern France. Repeated measures of 3 cognitive tests (MMSE, IST, BVRT), physical dependency (HIER, a 4-level factor) and depression symptomatology (CESD) were collected over a maximum period of 20 years along with age at the visit (age), age at dementia diagnosis or last visit (agedem) and dementia diagnosis (dem). Three time-independent socio-demographic information are provided: education (CEP), gender (male), age at entry in the cohort (age_init). In some cases, due to the very asymmetric distribution of MMSE, a normalized version of MMSE (normMMSE) obtained with package NormPsy (Philipps, Amieva, Andrieu, Dufouil C.and Berr, Dartigues, Jacqmin-Gadda, and C. 2014) will be analyzed instead of crude MMSE scores. For computation and interpretation purposes, age will be usually replaced by age65, which is the age minus 65 and divided by 10. The centering around 65 makes the interpretation of the intercepts easier and the division by 10 reduces numerical problems due to too large ages in quadratic models (and so too small effects or variances of random-effects).

hlme
The latent class linear mixed models implemented in hlme are illustrated through the study of the quadratic trajectories of normMMSE with age65 adjusted for CEP and assuming correlated random-effects for the functions of age65. Next line estimates the corresponding standard linear mixed model (1 latent class) in which CEP is in interaction with age functions: R> m1a <-hlme(normMMSE~age65 + I(age65^2) + CEP*age65 + I(age65^2)*CEP, + random=~age65+ I(age65^2) ,subject= ID ,data=paquid,ng=1) R> summary(m1a) Heterogenous linear mixed model fitted by maximum likelihood method hlme(fixed = normMMSE~age65 + I(age65^2) + CEP * age65 + I(age65^2) * CEP, random =~age65 + I(age65^2), subject = "ID", ng = 1, data = paquid) The first part of the summary provides information about the dataset, the number of subjects, observations, observations deleted (since by default, missing observations are deleted), number of latent classes and number of parameters. Then, it details the convergence process with the number of iterations, the convergence criteria and the most important information which is whether the model converged correctly: "convergence criteria satisfied". Next block provides the maximum log-likelihood, Akaike criterion and Bayesian Information criterion. Finally, tables of estimates are given with the estimated parameter, the estimated standard error, the Wald Test statistics (with Normal approximation) and the corresponding p-value. For the random-effect distribution, the estimated matrix of covariance of the random-effects is displayed (see section 5.1 for details). Finally, the standard error of the residuals is given along with its estimated standard error.
Goodness-of-fit of the model can be assessed by displaying the residuals as in figure 1 and the mean predictions of the model as in figure 2 according to the time variable given in var.time: R> plot(m2) #Figure 1 R> plot(m2,which="fit",var.time="age65",bty="l",ylab="normMMSE", + xlab="(age-65)/10",lwd=2) # Figure 2 (left panel) R> plot(m2,which="fit",var.time="age65",bty="l",ylab="normMMSE", + xlab="(age-65)/10",lwd=2,marg=FALSE) # Figure 2 (right panel) Class-specific predictions can be computed for any data contained in a dataframe as soon as all the covariates specified in the model are included in the dataframe. In the next lines, such dataframe is created by generating a vector of age values between 65 and 95 and defining CEP at 1 or 0. The predictions are computed with predictY and plotted with the associated plot functionality or using standard R tools as illustrated below and in figure 3.

lcmm
The latent process mixed models implemented in lcmm are illustrated through the study of the linear trajectory of the depressive symptoms (as measured by CES-D scale) with age65 adjusted for male and assuming correlated random-effects for the intercept and age65. Next lines estimate the corresponding latent process mixed model with different link functions: R> mlin <-lcmm(CESD~age65*male, random=~age65 ,subject= ID ,data=paquid) R> mbeta <-lcmm(CESD~age65*male, random=~age65 ,subject= ID ,data=paquid, + link= beta ) R> mspl <-lcmm(CESD~age65*male, random=~age65 ,subject= ID ,data=paquid, + link= splines ) R> mspl5q <-lcmm(CESD~age65*male, random=~age65 ,subject= ID ,data=paquid, + link= 5-quant-splines ) Objects mlin, mbeta, mspl and mspl3eq are latent process mixed models that assume the exact same trajectory for the underlying latent process but respectively a linear, BetaCDF, I-splines with 5 equidistant knots (default with link='splines') and I-splines with 5 knots at percentiles. We note that mlin reduces to a standard linear mixed model (link='linear' by default). The only difference with a hlme object is the parameterization for the intercept and the residual standard error that are considered as rescaling parameters.
CES-D is an ordinal scale with more than 50 levels so it might be estimated with a cumulative probit mixed model even if it is rarely done in practice because of the very high induced number of parameters as well as the substantial additional numerical complexity.
Due to the numerical integration at each evaluation of the log-likelihood when assuming a threshold link function, estimation of cumulative probit mixed model can be very long. We thus recommend to estimate first the model without random-effects to obtain satisfying inital values for the thresholds before any inclusion of random-effects as shown in next lines: R> mord0 <-lcmm(CESD~age65*male,random=~-1,subject= ID ,data=paquid, + link= thresholds ) R> binit <-NULL R> binit[1:6] <-mspl$best[1:6] R> binit[7:56] <-mord0$best[4:53] R> mord <-lcmm(CESD~age65*male,random=~age65,subject= ID ,data=paquid, + link= thresholds ,B=binit) We note here than mord takes a lot of time to be estimated (can be more than 1 hour depending The output of a lcmm object is very similar to the one of a hlme object as shown for mspl for example:

multlcmm
The latent process mixed models for multivariate longitudinal data as implemented in multlcmm are illustrated through the study of the quadratic trajectory with time of the global cognitive level defined as the common factor underlying 3 psychometric tests: MMSE, BVRT and IST.
Here the timescale is the years since the entry in the cohort and the model is adjusted for age at entry. To further investigate the effect of gender, both an effect on the common factor and differential effects (contrasts) on each marker are included. Correlated random-effects on the time functions are considered, as well as a Brownian motion with cor=BM(time) and a marker-specific random intercept with randomY=T. Next lines estimate the corresponding R> paquid$time <-(paquid$age-paquid$age_init) R> paquid$age0_centered <-paquid$age_init -75 R> mult <-multlcmm(MMSE+IST+BVRT~age0_centered + male+ contrast(male) + + time + I(time^2/10), random=~time+I(time^2/10),subject= ID , + data=paquid,randomY=T,cor=BM(time),link=c( beta , beta , beta )) R> summary(mult) General latent class mixed model fitted by maximum likelihood method multlcmm(fixed = MMSE + IST + BVRT~age0_centered + male + contrast(male) + time + I(time^2/10), random =~time + I(time^2/10), subject = "ID", randomY = T, link = c("beta", "beta", "beta"), cor = BM(time), data = paquid) In this example, squared time was divided by 10 to avoid very small numbers for the corresponding fixed effect and variance of the random-effect. The summary has the same appearance as the summaries for hlme or lcmm objects. In addition to the fixed effects on the latent process (common factor), the marker-specific contrasts are given and their global significance is tested with a multivariate Wald test. Then the estimated variance-covariance of the random-effects, the standard error of the marker-specific intercept ("Standard error of the random-effect") are given along with the standard error of the independent Gaussian error ("Residual standard error"). Finally, the marker-specific link function parameters are provided.

Estimated link functions
Latent process The percentage of variance explained by the common latent process (that is ∀t, Var(Λ(t)) Var(Y k (t)) ) can be computed at a given time with function VarExpl. We note that VarExpl is also available with lcmm, hlme and Jointlcmm objects. In these cases, it computes the percentage of variance not explained by the measurement error.
Here is the call for the explained variance at time 0 and time 5 in model mult: In this example, the latent process explains between 26% and 42% of the total variance of the markers at time 0, and between 80% and 89% after 5 years.
The model was estimated with a unique latent class but the exact same function applies also with a higher number of latent classes specified in input of multlcmm according to the same syntax as explained in 6.2. All the functions described in hlme and lcmm sections (i.e. sections 6.2 and 6.3) apply with a multlcmm object.

Jointlcmm
The joint latent class mixed models implemented in Jointlcmm are illustrated through the study of the trajectories of normMMSE with age and the associated risk of dementia. Indeed, cognitive change over time and the risk of dementia are two processes that are closely linked. Their joint study is useful both to better understand the natural of cognitive aging and dementia, and to provide dynamic tools to evaluate individual risk of dementia based on observed repeated cognitive measures. For the sake of simplicity and because this illustration only aims at explaining the function implementation, we do not take into account the competing risk of death although this could be done with the package, and neglecting death in dementia context may lead to biased estimates of dementia incidence.
The summary of the selected 4-class joint model is given below:

R> summary(mj4b)
Joint latent class model for quantitative outcome and competing risks fitted by maximum likelihood method Jointlcmm(fixed = normMMSE~age65 + I(age65^2) + CEP, mixture =~age65 + I(age65^2), random =~age65 + I(age65^2), subject = "ID", ng = 4, survival = Surv(age_init, agedem, dem)~CEP + male, hazard = "Weibull", data = paquidS) The summary of a Jointlcmm object is very similar to the summaries of hlme or lcmm objects (depending on whether a link function was assumed in Jointlcmm). The main difference is that in addition to estimates of the multinomial model and of the mixed model, estimates from the survival model are also given. The summary also provides the statistic of a score test for the conditional independence assumption (see Jacqmin-Gadda, Proust-Lima, Taylor, and Commenges (2010); Proust-Lima et al. (2014b) for more details). We note here that the conditional independence assumption between the longitudinal and survival processes given the latent classes is rejected although the statistic of the test was much lower with 4 classes than with 3 or 2.
Postfit functions plot, and predictY (along with its plot functionality) described in section 6.2 are also available for Jointlcmm objects. plot provides longitudinal residuals (with option which="residuals") and the comparison between observed and predicted longitudinal data (with option which="fit") as shown below with subject-specific predictions (and in figure  7): R> plot(mj4b,which="fit",var.time="age65",marg=F,break.times=10,bty="l", + ylab="normMMSE",xlab="Age in decades from 65 years") Class-specific predicted longitudinal trajectories can be computed using predictY for a given covariate profile, and plotted with plot. The baseline risks and baseline survival functions can also be plotted with function plot and options which="hazard" and which="survival".
As for other objects in package lcmm, the classification can be summarized with function postprob:  The classification provided in the classification table is relatively good with mean posterior probability in each class above 71% and up to 88.7%. We note that here, two posterior classifications are provided, the main one based on all the information, and the classification based only on the longitudinal information. The corresponding classifications along with the posterior class-membership probabilities are provided in output values pprob and pprobY.
One objective of joint models may be the dynamic prediction of the event. Function epoce computes the predictive ability of the models using the expected prognostic observed crossentropy (EPOCE) at different landmark times. When comparing different models, epoce can be plotted to visualize the predictive abilities at different landmark times. Difference in two models predictive abilities can also be computed with Diffepoce function and plotted with the associated plot function. Here is an example of code: Surprisingly, in this example joint models for normMMSE change and incidence of dementia do not give a better predictive ability, that is lower EPOCE, than the simple survival model for dementia (with 1 class) although these joint models provided a better goodness-of-fit to the data with lower BIC). The differences in EPOCE also suggest that the model with 3 classes performs worse than the 2-class model before age 77.
Finally, individual dynamic prediction of the event can be computed using dynpred function. For a specific subject whose data are provided in input, the probabilities of occurrence of the event from landmark times indicated landmark and at horizons indicated in horizon are computed from the estimated model, using the longitudinal information up to the landmark times (for G>1). We give here the example (including the graphs in figure 10) for a subject from the estimation data but the exact same computation could have be done for any individual (not included in the estimation data). For illustration purpose only, we computed these predictions from the 4-class model although this was not necessarily the best candidate highlighted with EPOCE measure. We considered two landmark ages (80 and 90) and computed the probility of dementia for horizons of 1, 3, 5, 8 and 9 years: R> paq72 <-paquid[which(paquid$ID==72),] R> dynp <-dynpred(mj4b,paq72,landmark=c(80,90),horizon=c (1,3,5,8,9), + var.time="age65",fun.time=function(x) 10*x+65,draws=TRUE) R> plot(dynp,landmark=80,ylim=c(55,85,0,1),col=1,ylab="normMMSE", + main="At landmark age 80",xlab="age in years",pch=20)  Figure 9: Cross-validated EPOCE (expected prognostic observed cross-entropy) estimates for joint models with 1 to 4 classes (on the left) and difference in EPOCE estimates between joint models with 2 and 3 latent classes or 3 and 4 classes (on the right).

Concluding Remarks
The package lcmm provides a series of functions that extend the linear mixed model to various settings including specific types of nonlinear mixed models and multivariate mixed models, but also latent class mixed models and joint models. Although initially motivated by the analysis of cognitive data in aging cohort studies such as available in the dataset paquid, the functions also apply to many other settings. In particular, the latent process mixed model is designed for the longitudinal analysis of scales that usually have asymmetric distributions with possibly ceiling effect, floor effects and unequal interval scaling, which was the case for the CES-D for depressive symptoms for instance.
To our knowledge, no other programs estimate latent process mixed models nor joint latent class mixed models. However, several programs consider latent class mixed models. The most famous software is Mplus (Muthén and Muthén 2001). Other free programs include SAS macros HETMIXED and HETNLMIXED (https://ibiostat.be/software/longitudinal# MixedNonLin) that are numerically limited, free Fortran90 programs HETMIXLIN and HETMIXSURV that might be faster but also non user-friendly ( (Komarek 2009). Traditionnaly, when interested in classification of trajectories, exploratory methods as the latent class growth analysis (Nagin 1999) are usually preferred. They especially became very popular in psychology (Bongers, Koot, van der Ende, and Verhulst 2004) and more recently in public health (Gill, Gahbauer, Han, and Allore 2010). It should be stressed that the latent class growth analysis as implemented in SAS with proc TRAJ (Jones, Nagin, and Roeder 2001) is a specific case of latent class mixed models in which no random-effect is included. As such, it assumes that given a specific latent class, the repeated measures of a same subject are independent. Although of possible interest in an exploratory analysis, the assumption that repeated measures are independent given a restricted number of latent groups is very strict and unlikely so that inference based on this approach is usually impossible.
Further developments of lcmm will include the development of parallel computations. Indeed, although computationally efficient code in Fortran90 was used for the estimation procedure, the computation may still be long and could benefit from parallel computation as it was done in the original Fortran90 executable HETMIXSURV available online (Proust-Lima et al. 2014a). The other direction of improvement will consist in exploring and implementing techniques to run the same model from different sets of initial values, and update a vector of initial values from nested models.