jmcm : An R Package for Joint Mean-Covariance Modeling of Longitudinal Data

Longitudinal studies commonly arise in various ﬁelds such as psychology, social sci-ence, economics and medical research, etc. It is of great importance to understand the dynamics in the mean function, covariance and/or correlation matrices of repeated measurements. However, high-dimensionality (HD) and positive-deﬁniteness (PD) constraints are two major stumbling blocks in modeling of covariance and correlation matrices. It is evident that Cholesky-type decomposition based methods are eﬀective in dealing with HD and PD problems, but those methods were not implemented in statistical software yet, causing a diﬃculty for practitioners to use. In this paper, we ﬁrst introduce recently developed Cholesky decomposition based methods for joint modeling of mean and covariance structures, namely modiﬁed Cholesky decomposition (MCD), alternative Cholesky decomposition (ACD) and hyperspherical parameterization of Cholesky factor (HPC). We then introduce our newly developed R package jmcm which is currently able to handle longitudinal data that follows a Gaussian distribution using the MCD, ACD and HPC methods. The use of package jmcm is illustrated and a comparison of those methods is made through the analysis of two real datasets.


Introduction
A longitudinal study usually involves repeated observations of the same variables over a long period of time and is often used in psychology, sociology and medical research. The covariance matrix plays a prominent role in analyzing data from longitudinal studies since the components of collected measurements within the same subject are not independent. A good covariance modeling approach improves statistical inference of the mean of interest and the covariance structure itself may be of scientific interest in some circumstances (Diggle and Verbyla 1998).
However, modeling of covariance structure is challenging because the estimated covariance matrices in general should be positive definite and there are many parameters in covariance matrices. To overcome these two obstacles, Pourahmadi (1999) advocated a data-driven joint mean-covariance modeling method based on a modified Cholesky decomposition (MCD) of the marginal within-subject covariance matrix. The decomposition leads to a reparameterization where entries can be interpreted in terms of innovation variances and autoregressive coefficients. See Pan and Mackenzie (2003) for a related discussion. Another Cholesky-type decomposition (ACD) proposed by Chen and Dunson (2003) can be understood as modeling certain innovation variances and moving average parameters, and the method is compared with MCD in detail by Pourahmadi (2007). These two Cholesky-type approaches demonstrate parsimonious and effective strategies, but their corresponding variance functions cannot be directly interpreted as those of the repeated observations. Therefore additional efforts are needed for interpreting the variance and covariance functions. More recently, Zhang, Leng, and Tang (2015) considered a regression approach based on the standard Cholesky decomposition of the correlation matrix and the hyperspherical parameterization (HPC) of its Cholesky factor, where parameters are directly interpretable with respect to variance and correlation. A brief review of these approaches is presented in the following sections. Note this paper is not an exhaustive survey, and many other covariance structure modeling methods are also commonly used in the literature, see Fan, Liao, and Liu (2016) for a more general overview on the estimation of large covariance and precision matrices.
Software for analyzing longitudinal data using some conventional approaches has been implemented in the R environment for statistical computing and graphics (R Core Team 2017) for many years. For instance, several packages provide functions for determining maximum likelihood estimates of the parameters in a linear mixed-effect model (LMM) that incorporates both fixed effects and random effects in the linear predictor, such as the lme function in package nlme (Pinheiro, Bates, DebRoy, Sarkar, and R Core Team 2017) and the lmer function in package lme4 (Bates, Mächler, Bolker, and Walker 2015). Similar commercial statistical programs are also available for LMM such as PROC MIXED in SAS (SAS Institute Inc. 2013), MIXED in SPSS (SPSS Inc. 2015) and fitlme in MATLAB (The MathWorks Inc. 2015). The method of generalized estimating equation (GEE; Liang and Zeger 1986) is widely used as it focuses on models for the mean of the correlated observations without fully specifying the joint distribution of the responses. Several implementations of GEE are available through packages gee (Carey, Lumley, and Ripley 2015) and geepack (Halekoh, Højsgaard, and Yan 2006). The Gaussian copula model provides a flexible general framework for marginal regression analysis of continuous, discrete and categorical responses, and is available through package gcmr Varin 2012, 2017). However all of these procedures are based on specific model assumptions like existence of an expectation or homogeneous variances and are not intuitive for a joint mean-covariance modeling framework. In this paper we focus on the joint meancovariance modeling for both balanced and unbalanced longitudinal data, and we present a user friendly R package jmcm (Pan and Pan 2017) which is freely available from the Comprehensive R Archive Network (CRAN) at https://CRAN.R-project.org/package=jmcm and that can be used to handle such joint models. For efficiency, the core part of package jmcm is implemented in compiled C++ code using Rcpp (Eddelbuettel and François 2011;Eddelbuettel 2013) and RcppArmadillo (Eddelbuettel and Sanderson 2014) for numerical linear algebra. All the implemented R functions are well documented with some examples. The main objective of this paper is to introduce the joint mean-covariance modeling approaches in package jmcm to a wide audience of statisticians and practitioners who need to analyze longitudinal data.
The rest of the paper is organized as follows. In Section 2 we present the joint mean-covariance modeling methods, and discuss different choices for modeling strategies of covariance structure mentioned above. Furthermore, we consider the maximum likelihood estimations for each type of the models, with particular emphasize on numerical optimization techniques. Section 3 provides a detailed overview on the implementation of the methods introduced in Section 2 using the package jmcm and gives a brief illustration of the computational tools and Section 4 concludes the paper with further discussions.

Joint mean-covariance models
Denote longitudinal measurements by y i = (y i1 , y i2 , . . . , y im i ) (i = 1, 2, . . . , n) that are collected from n subjects and measured at time points t i = (t i1 , t i2 , . . . , t im i ) . Here we assume the number of measurements m i and time t i are subject specific, so that unbalanced longitudinal datasets with observations taken at irregular time points can be modeled. The basic linear model used in the joint mean-covariance modeling frame work of longitudinal data analysis can be described by the distribution of a vector-valued random response variable y i , which is assumed to be multivariate normal, where µ i = (µ i1 , µ i2 , . . . , µ im i ) is an m i × 1 vector and Σ i is an m i × m i within-subject covariance matrix. The mean µ i of y i is usually modeled by a linear regression, where X i denotes an m i × (p + 1) model matrix with an intercept on the first column followed by covariates of the ith subject, β is a (p + 1) × 1 regression coefficient vector. The subjectspecific within-subject covariance matrix, Σ i , may be modeled similarly based on different decomposition approaches, and will be discussed in detail in the following sections.
Estimates of the joint mean-covariance model parameters θ, including θ 1 = β in the mean model and other unspecified parameters θ 2 in the covariance matrices, can be obtained by maximum likelihood (ML) estimation. In particular, a maximum likelihood estimator (MLE) of the unknown parameter vector is defined as any solutionθ n of: is minus twice the log-likelihood function without the constant term. Note that this is the form of the log-likelihood function that is implemented by default in the package, though the value of the full log-likelihood including the constant term can be easily obtained by setting a specific option before model fitting. After obtaining the score functions U (θ) = (U (θ 1 ) , U (θ 2 ) ) based on f (θ) = −2l(θ) by direct calculations, we then estimate θ via the iterative quasi-Newton method (BFGS) that solves the score equations. More specifically, we apply the following quasi-Newton algorithm.
1. Select an initial value θ (0) = ((θ 2 ) ) . Set the superscript k = 0 for the iteration number. A convenient initial value for θ 2 is set to form a diagonal covariance matrix, and will be discussed in detail respectively with the choice of covariance structure models later.
3. Update the search direction (Newton step) and compute the step sizeλ by performing an approximate line minimization: 4. Update θ as and then the new gradient 5. Compute the difference θ (k+1) − θ (k) and U (k+1) − U (k) and update the inverse Hessian with the BFGS updating formula where u is defined as the following vector 6. Set k = k + 1 and repeat Steps 3 to 5 until a pre-specified criterion is met.
See Press, Teukolsky, Vetterling, and Flannery (2007) for a more detailed discussion of the BFGS optimization algorithm with line-search and its implementations. Note that currently only the BFGS algorithm is implemented in the package since it proves to be one of the best quasi-Newton methods for solving smooth unconstrained optimization problems and works very well in our problem. Other quasi-Newton algorithms will also be implemented as alternatives in the future. In practice, we find that estimation of the parameters θ of the joint mean-covariance model can be further improved by solving for the parameters one by one with the other parameters fixed in the optimization, and this will be discussed in more detail in the following sections.

Modified Cholesky decomposition (MCD)
The two major obstacles in modeling covariance matrices are high-dimensionality (HD) and positive-definiteness (PD). The HD problem can be largely reduced by introducing the regression techniques and the PD constraint can be potentially removed by employing the Cholesky decomposition in covariance structure modeling (Pourahmadi 2013). The standard Cholesky decomposition of an m i × m i positive definite covariance matrix is of the following form where C i = (c ijk ) is a lower triangular matrix with positive diagonal elements and its entries are difficult to interpret (Pinheiro and Bates 1996). We will find that the task of statistical interpretation can be much easier by reducing C i to a unit lower triangular matrix by postor pre-multiplying with the inverse of

Defining modified Cholesky decomposition (MCD)
The first case, post-multiplying C i by the inverse of D i , leads to the modified Cholesky decomposition (MCD) and keeps D i inside (Zhang and Leng 2012), or in another more commonly used form (Pourahmadi 1999), where can be considered as a standardized version of C i , dividing each column by its corresponding diagonal entry (Maadooliat, Pourahmadi, and Huang 2013).
The below-diagonal entries of T i are the negatives of the so-called generalized autoregressive parameters (GARPs), φ ijk , in the AR model for the actual measurements on subject i. The diagonal entries of D 2 i are the innovation variance σ 2 ij = VAR( ij ), see Pourahmadi (1999) for the details. It is helpful to invert Equation 12 by using y i1 = i1 and repeating substitution for y it in terms of it to obtain where the matrix form reveals L i = (φ ijk ) so that its entries on the jth row can be interpreted as regression parameters when y ij is regressed on the present and past innovations ij , i(j−1) , . . . , i1 . Then we can prove by settingφ ijj = 1 andφ ijk = 0 for j < k and 1 ≤ s, t ≤ m i . Thus, the correlation coefficient between y is and y it depends on both theφ ijk 's and the σ 2 ij 's.
Under the model in (15), minus twice the log-likelihood function, except for a constant, is given by where The maximum likelihood estimating equations for β, λ and γ become where the matrix G i , of order m i × (q + 1), has typical row are the m i × (d + 1) matrix of covariates and the m i × 1 vector of squared fitted residuals respectively, and 1 m i is the m i × 1 vector of 1's.
Since the solutions satisfy Equation 17 and the parameters β, λ, γ are asymptotically independent (Ye and Pan 2006), the three parameters can also be sequentially solved one by one with the other parameters kept fixed. More specifically, we apply the following algorithm.

Update the search direction as
Compute step sizeλ by performing an approximate line minimizatioñ 7. Set k = k + 1 and repeat Steps 2 to 6 until a pre-specified criterion is met.

Defining alternative Cholesky decomposition (ACD)
The second case, pre-multiplying C i by the inverse of D i , leads to alternative Cholesky decomposition (ACD; Chen and Dunson 2003) and keeps D i outside, obtained from a slightly different standardized C i , dividing each row by its corresponding diagonal entry (Maadooliat et al. 2013).
For statistical interpretation of the below-diagonal entries ofL i , it is clear that D −1 i (y i − µ i ) hasL iL i as the standard Cholesky decomposition of its covariance matrix and i = (D iLi ) −1 (y i − µ i ), its vector of innovations, has COV( i ) = I m i . Thus, withL i = (φ ijk ) and D i = (σ ij ), we can obtain the variable order, MA representation for the standardized residual Then we can prove for any 1 ≤ s, t ≤ m i , so that the correlation coefficient between y is and y it given by is determined solely byφ ijk 's.
Under the model in (21), minus twice the log-likelihood function, except for a constant, is given by where The score functions can be obtained and simplified as where . . , im i are independent standard normal random variables, and I m i is an m i × m i identity matrix.
Since the covariance structures of MCD and ACD are quite close, the initial guess of parameters β (0) , λ (0) and γ (0) in ACD can be obtained using the same approach as described for determining the initial parameter setting of MCD: We then estimate θ by minimizing expression in (22) via the iterative quasi-Newton algorithm, as explained in Section 2.1, after substitution of U (θ) by (−2U 1 (β) , −2U 2 (λ) , −2U 3 (γ) ) .
Since the solutions satisfy Equation 23 and the parameters λ and γ are not asymptotically orthogonal (Maadooliat et al. 2013), the three parameters can be split into two groups, θ 1 = β and θ 2 = (λ , γ ) and can be sequentially solved one by one with the other parameters kept fixed. More specifically, we apply the following algorithm.

Update the search direction as
Compute the step sizeλ by performing an approximate line minimizatioñ 6. Set k = k + 1 and repeat Steps 2 to 5 until a pre-specified criterion is met.

Hyperspherical parameterization of the Cholesky factor (HPC)
Even though modified Cholesky decomposition (MCD; Pourahmadi 1999) and alternative Cholesky decomposition (ACD; Chen and Dunson 2003) provide parsimonious unconstrained and statistically interpretable parameterizations of a covariance matrix, the innovation variance is not the same as the marginal variances of the repeated measurements within the same subject.

Defining the hyperspherical parameterization of the Cholesky factor (HPC)
It is well known that the variance-correlation decomposition has the form where H i = diag(σ i1 , σ i2 , . . . , σ im i ) with σ ij being the standard deviation of the jth measurement for subject i and R i = (ρ ijk ) m i j,k=1 is the correlation matrix of y i with ρ ijk = CORR(y ij , y ik ) being the correlation between the jth and kth observations of the ith subject. By using this decomposition, one can directly model the variances and correlations of observations separately.
Not surprisingly, the development of a regression method to model the correlation structure proves to be difficult. Specifically, a correlation matrix must be positive semi-definite and symmetric with 1's as the main diagonal entries and values between −1 and 1 as the off-diagonal entries. The new challenge is mitigated by employing the standard Cholesky decomposition on the correlation matrix R i , and parameterizing its Cholesky factor B i via hyperspherical co-ordinates (HPC; Zhang et al. 2015), where c ijk = cos(θ ijk ) and s ijk = sin(θ ijk ).

Maximum likelihood estimation of HPC
The correlation matrix R i is guaranteed to be positive semi-definite since it is constructed by its corresponding standard Cholesky factor B i and the angle parameters in B i are uncon-strained except that θ ijk ∈ [0, π). We are free to model the log-variances and angles through regression by using some covariates As for the range of θ ijk , our experience from data analysis and simulation studies indicates all the estimated θ ijk s fall in the range [0, π). Transformation such as the inverse tangent transformation can be applied to ensure that θ ijk definitely falls in [0, π), and can be implemented in a future version. Under the model in (26), minus twice the log-likelihood function, except for a constant, is given by where r ij = y ij − x ij β is the jth element of r i = y i − X i β, the vector of residuals for the ith subject.
The score functions can be obtained and simplified as where . . , im i are independent standard normal random variables, and I m i is an m i × m i identity matrix.
Since the solutions satisfy Equation 28 and the parameters λ and γ are not asymptotically independent (Zhang et al. 2015), the three parameters can be split into two groups, θ 1 = β and θ 2 = (λ , γ ) and can be sequentially solved one by one with the other parameters kept fixed. More specifically, we apply the following algorithm.

Update the search direction as
Compute the step sizeλ by performing an approximate line minimizatioñ 6. Set k = k + 1 and repeat Steps 2 to 5 until a pre-specified criterion is met.

Comparison of MCD, ACD and HPC
For modeling the covariance and correlation structure, the three discussed Cholesky-type decomposition-based approaches have been demonstrated to be effective in the sense that the estimated covariance and correlation are guaranteed positive (semi-)definite, and the number of parameters is considerably reduced through regression techniques.
It is clear that MCD and ACD have a closer relationship since they are constructed in a similar way through standardization of the Cholesky factor C i , and the resulting unconstrained parameters have a nice statistical interpretation in terms of innovation variance, autoregressive and moving average parameters respectively. The main drawbacks of these two approaches are the potential need for a natural order (e.g., time series), which makes it difficult to find a reasonable statistical interpretation and may result in a different estimation of the covariance and correlation matrix with each single ordering. A recent application of the Cholesky-based approach for estimating the covariance matrix of multiple stocks within a portfolio and a more detailed discussion of the ordering problem can be found in Dellaportas and Pourahmadi (2012) and Pedeli, Fokianos, and Pourahmadi (2015). Additional effort and extra care are needed in practice for interpreting their corresponding variance and correlation functions. Moreover, owing to the decomposition, the resulting correlation function of MCD depends on both the innovation variance and autoregressive parameters, indicating that MCD is not robust against the misspecification of the innovation variance when the correlation is the main interest (Maadooliat et al. 2013). We also need to note that MCD is the most computationally efficient approach among the three approaches due to the fact that its Fisher information matrix is block diagonal (Ye and Pan 2006).
The parameterization of HPC is very attractive because the resulting parameters are unconstrained and directly interpretable with respect to the variances and correlations. The angles in the Cholesky factor of the correlation matrix have a geometric connection with correlations. However, modeling covariance and correlation using HPC can be computationally expensive since the problem of estimating the Cholesky factor is transformed into the problem that actually first estimates a matrix consisting of angles. For details see Zhang et al. (2015).

Analysis of a balanced longitudinal dataset
In this section, we provide our first example that illustrates how to apply joint mean-covariance models in analyzing a balanced longitudinal data by using jmcm. Kenward (1987) analyzed an experiment in which cattle were assigned randomly to two treatment groups A and B, and their weights were recorded. Thirty animals received treatment A and another thirty received treatment B. The animals were weighted n = 11 times over a 133-day period; the first 10 measurements on each animal were made at two-week intervals and the final measurement was made one week later. Since no observation was missing, it is considered to be a balanced longitudinal dataset. The data is loaded simply using the data() instruction: R> library("jmcm") R> data("cattle", package = "jmcm") R> head ( We present in Figure 1 the subject-specific longitudinal profiles of the cattle data using the following code: R> library("lattice") R> xyplot(weight~day | group, group = id, data = cattle, xlab = "days", + ylab = "weight", col = 1, type = "l") and observe that in both groups the variability of the weights seems to increase over time with a severe weight loss on the final measurement in group B.
Following Pourahmadi (1999), Pan and Mackenzie (2003), Pan and MacKenzie (2006), Pan and MacKenzie (2007) and Zhang et al. (2015), we re-analyzed group A data by using a saturated mean model with the common measurement time rescaled to t = 1, 2, . . . , 10, 10.5. The Bayesian information criterion (BIC), which is closely related to the Akaike information criterion (AIC) and introduces a larger penalty term for the number of parameters in the model than the AIC aiming to solve the problem of over-fitting, is used as the criterion to select the optimal model where p, d and q are respectively the orders of three polynomials andl max is the value of the maximum log-likelihood function for the given order. By default, the value of the likelihood does not include the constant term as defined in Equation 3 but it can be switched to the full likelihood containing the constant term easily by explicitly specifying control = jmcmControl(ignore.const.term = FALSE) in the jmcm function.
The basic use of jmcm is to indicate the model formula, data, choice of poly (p, d, q) and the covariance structure model. For example, a joint mean-covariance model based on the modified Cholesky decomposition (MCD) is estimated using: R> cattleA <-subset(cattle, group == "A") R> fit1 <-jmcm(weight | id | I(day / 14 + 1)~1 | 1, data = cattleA, + triple = c(8, 3, 4), cov.method = "mcd") R> fit1 The R package Formula of Zeileis and Croissant (2010) is used to extract information from a two-sided linear formula object which is used to describe both longitudinal data and covariates of the model, with the response, subject id and observation time point on the left of a "~" operator separated by vertical bars ("|") and covariates for the mean model and innovation variance, also separated by a "|" operator, on the right. Here both covariates for mean model and innovation variance are marked as 1, and only time is used to construct design matrices. Optimal model selection involves identifying the best integer triple poly (p, d, q), specified by option triple, representing the degrees of three polynomial functions for the mean structure, log innovation variance and autoregressive coefficients respectively. To make the model fitting comparable with the results reported in the literature, in this paper we focus on the fitting using polynomials in time. The use of other covariates is also possible and will be demonstrated later. By default, the jmcm function uses the profile likelihood (i.e., estimating parameters one by one with other parameters fixed in each iteration) for having a better estimating result. Alternatively, the non-profile method can be applied by specifying control = jmcmControl(profile = FALSE). When the estimation of the model has finished, an object of the S4 class 'jmcmMod' is returned by the function and it automatically displays the basic information by calling the corresponding print method. The getJMCM function can be used to extract various objects (e.g., estimation of mean vector and covariance matrix) from a fitted joint mean-covariance model. In this example, the global optimal triple poly(8, 3, 4) reported in Pan and Mackenzie (2003) is modeled and produced a better result withl max = −771.0007 and BIC = 53.4408.
Since it is a balanced longitudinal dataset, we produced the sample regressograms and fitted curves for the cattle data using the following function: R> regressogram(fit1, time = 1:11) By examining the log innovation variance versus time in Figure 2, it is clear that the curvature pattern is well captured by the fitted polynomial function curve. Figure 2 also indicates a good fit for autoregressive coefficients by examining the autoregressive coefficient versus time lag between measurements and the fitted curve.
The same triple poly (8, 3, 4), representing the degrees of three polynomial functions for the mean structure, the log innovation variance and moving average coefficients respectively, is used in joint mean-covariance model fitting based on ACD for the cattle data. The covariance structure option should be specified as cov.method = "acd". By comparing this to the maximized log-likelihood and the BIC value for MCD modeling, we clearly see that the ACD method produces a larger log-likelihoodl max = −747.6994 and a smaller BIC value of 51.8873.
When the joint mean-covariance model approach based on HPC is applied to the cattle data, the covariance structure option should be specified as cov.method = "hpc". The same twosided linear formula object is used to describe both longitudinal data and covariates of the model, but on the right of the "~" operator, the covariates for the mean model and variance are specified on the right side of the operator instead of the mean model and innovation variance in MCD and ACD. The integer triple poly (p, d, q) is specified by option triple, representing the degrees of three polynomial functions for the mean structure, log variance  and angles respectively. More specifically, the optimal triple poly(8, 2, 2) of the HPC model fitting reported in Zhang et al. (2015) can be reproduced using the following command: R> fit3 <-jmcm(weight | id | I(day / 14 + 1)~1 | 1, data = cattleA, + triple = c(8, 2, 2), cov.method = "hpc") R> fit3  We need to note that there is no general form for calculating angles. The corresponding angles φ ijk of the empirical correlation matrix are calculated iteratively using expression: where 0 1 is taken as 1. By examining the log variance versus time (left) and angle versus time lag (right) in Figure 4, it is clear that the curvature patterns on the two sample regressograms are well captured by the two fitted models, indicating a good model fit.
The comparisons between MCD-, ACD-and HPC-based joint mean-covariance models on the cattle data are made in Table 1 with different choices of triple and execution time (in seconds) is measured for each fitted model. We find that the HPC-based model performs best in most cases with larger log-likelihood and smaller BIC values than compared to the MCD-and ACD-based models at the cost of a much longer execution time. From Table 1, we also find that MCD and ACD will produce quite close results in term of likelihood and BIC values while the MCD-based model is the most time efficient among the three approaches. Our tests were conducted under Windows 10 (64-bit version) on ThinkPad T410 equipped with an Intel(R) Core(TM) i5 M 480@2.67GHz with 4.00GB of RAM.

Analysis of an unbalanced longitudinal dataset
In this section, we apply the proposed joint mean-covariance modeling approach to an unbalanced CD4 + cell dataset analyzed by Ye and Pan (2006) and Zhang et al. (2015). The dataset comprises a total of 2376 CD4 + cell counts of 369 HIV-infected men covering a period of approximately eight and half years. The number of measurements m i for each individual vary from 1 to 12 and the times are not equally spaced. The CD4 + cell data are highly unbalanced and included in the package.
When the optimal triplet poly(8, 1, 1) of the HPC approach reported in Zhang et al. (2015) is fitted, the optimal BIC value turns out to be 26.7268, andl max = −4892.68, producing the best model among the three proposed covariance and correlation structure modeling methods.
We also compared the MCD-, ACD-and HPC-based joint mean-covariance models on the CD4 + cell data in Table 2. We find that the HPC-based model proves to be a better fitting (p, d, q) No. of

Conclusion
In this article, we illustrated the capabilities of package jmcm for the joint mean-covariance modeling of both balanced and unbalanced longitudinal data using three popular covariance and correlation structure modeling approaches. In particular, we provide: functions for estimation of MCD-, ACD-and HPC-based joint mean-covariance models, devices for displaying regressograms and fitted model curves. By using these models, the estimated covariance and correlation are guaranteed to be positive (semi-)definite and the estimation of a highdimensional covariance and correlation matrix is reduced to solving a series of regression problems. The likelihood-based estimation procedure permits extensions such as regularizationbased model selection, so that the package can be compared with other likelihood-based R packages.
However, the package is currently limited to handle longitudinal data with a multivariate Gaussian distribution. It is worthwhile to develop methods further that are robust with nonnormally distributed data by introducing the Cholesky-based covariance structure modeling methods to the GEE model and/or the Gaussian copula model. We plan to update the package jmcm on a regular basis with new statistical procedures available for the joint meancovariance modeling approach.