Linear Quantile Mixed Models: The lqmm Package for Laplace Quantile Regression

Inference in quantile analysis has received considerable attention in the recent years. Linear quantile mixed models (Geraci and Bottai 2014) represent a ﬂexible statistical tool to analyze data from sampling designs such as multilevel, spatial, panel or longitudinal, which induce some form of clustering. In this paper, I will show how to estimate conditional quantile functions with random eﬀects using the R package lqmm . Modeling, estimation and inference are discussed in detail using a real data example. A thorough description of the optimization algorithms is also provided.


Introduction
In classical statistics, a common assumption is that sample observations are drawn independently from the same population. However, dependent data arise in many studies. For example, clinical observations, such as blood pressure or insulin measurements, taken repeatedly on the same individuals are likely to be more similar than observations from different individuals; environmental measurements (e.g., air pollution or rainfall measurements) that are taken in the same geographic area will show substantial degree of spatial correlation. Groups of dependent observations are commonly called clusters.
Mixed-effects models (or mixed models; Pinheiro and Bates 2000;Demidenko 2004) are highly popular and flexible regression models used to analyze the conditional mean of clustered outcome variables. The extent of applications of mixed models is vast, ranging from medical studies, in which responses to an exposure or a treatment show a strong degree of heterogeneity between subjects due to unobserved genetic factors, to agricultural field trials, environmental and wildlife ecology studies, to mention a few.
Mixed models are based on the assumption that predictors affect the conditional distribution of the outcome only through its location parameter (i.e., the mean). Empirical evidence shows that this assumption is inappropriate in a number of real-world applications. For example, the negative effects of maternal smoking during pregnancy or lack of prenatal care have been amply documented. In particular, these factors have been shown to decrease the average weight of infants at birth. However, infants who rank lower in the distribution (i.e., low birthweight infants) are affected by smoking and lack of prenatal care at a greater extent than average-weighting infants, and the latter at a greater extent than those who rank higher in the distribution (Abrevaya 2001;Koenker and Hallock 2001;Geraci 2013). In general, individuals who rank differently according to some outcome variable (e.g., blood pressure, body mass index, size of a tumor) might be affected by risk factors (e.g., age, gender, socioeconomic status) to a different extent or even in opposite ways. Quantile regression (QR) is a statistical tool that extends regression for the mean to the analysis of the entire conditional distribution of the outcome variable. Therefore, location, scale and shape of the distribution can be examined through the analysis of conditional quantile models to provide a complete picture of the distributional effects.
The application of QR methods to clustered data is an emerging area of research in statistics. There have been several proposals of QR for dependent data, including Lipsitz, Fitzmaurice, Molenberghs, and Zhao (1997), Koenker (2004), Geraci and Bottai (2007), Reich, Bondell, and Wang (2010), and Canay (2011). Recently, Geraci and Bottai (2014) developed a class of models, called linear quantile mixed models (LQMMs), which extends quantile regression models with random intercepts (Geraci 2005;Geraci and Bottai 2007) to include random slopes, and introduced new computational approaches. These are based on the asymmetric Laplace (AL) likelihood (Hinkley and Revankar 1977), which has a well-known relationship with the L 1 -norm objective function described by Koenker and Bassett (1978).
Briefly, consider a sample of observations x i , y i , i = 1, . . . , M , drawn independently from a population with continuous distribution function F y i |x i . The latter is assumed to be unknown, with quantile function given by its inverse Q y i |x i ≡ F −1 y i |x i . In linear QR problems, the goal is to estimate models of the type Q y i |x i (τ ) = x i β (τ ) , where τ , 0 < τ < 1, denotes the quantile level of interest. The τ th regression quantile is defined as any solution of (Koenker and Bassett 1978) where ρ τ (v) = τ max(v, 0)+(1−τ ) max(−v, 0) is the asymmetrically weighted L 1 loss function. Nonlinear QR is discussed by Koenker (2005).
Over the past few years, a number of QR methods based on the AL distribution appeared in the literature (Koenker and Machado 1999;Yu and Moyeed 2001;Geraci and Bottai 2007;Lee and Neocleous 2010;Farcomeni 2012;Wang 2012). A continuous random variable w ∈ R is said to follow an AL density with parameters (µ, σ, τ ), w ∼ AL(µ, σ, τ ), if its density can be expressed as where µ ∈ R is the location parameter, σ ∈ R + is the scale parameter, and 0 < τ < 1 is the skewness parameter. Mean and variance of this distribution are given by (1−τ ) 2 τ 2 (Yu and Zhang 2005). Note that ρ τ (·) is the loss function introduced in (1). It is easy to verify that the location parameter µ is the τ th quantile of w, i.e., P (w ≤ µ) = τ . Therefore, for a fixed τ , it is convenient to estimate the τ th regression quantile from the model where µ   Figure 1: Trellis plot of pituitary-pterygomaxillary fissure distance conditional on age in 10 girls (subjects F1-F10) and 10 boys (subjects M01-M10). A loess smooth is shown in each panel.
(MLE) of β (τ ) from Equation 2 is equivalent to the solution of (1). Such computational equivalence provided Geraci (2005) with a (quasi-)likelihood framework within which estimating the conditional quantiles of longitudinal outcomes.
This paper's focus is on the package lqmm (Geraci 2014) for the R statistical computing environment (R Core Team 2013), which is available from the Comprehensive R Archive Network (CRAN) at http://CRAN.R-project.org/package=lqmm. In the next section, the dataset that will be used to illustrate LQMM methods is briefly described. The models and the estimation algorithms are introduced in Sections 3 and 4, respectively. Section 5 is dedicated to lqmm methods. Short examples and related R code are given throughout to illustrate individual commands. A more extensive data analysis using LQMMs is offered in Section 6. The notation used in this paper, as well as the labeling adopted in the lqmm package, follows closely that of Geraci and Bottai (2014). Vectors and matrices are denoted in bold, I k denotes the k × k identity matrix, and n i=1 A i is the direct sum of the n matrices A i .

Orthodontic growth data
These data collect repeated measurements of the distance between the center of the pituitary to the pterygomaxillary fissure, two points that are identified on x-ray exposures of the side of the head, in 27 children (16 boys, 11 girls). Measurements were taken by researchers of the University of North Carolina Dental School at four different ages (8, 10, 12, 14 years), giving 108 observations in total, to study growth patterns by sex. The dataset was reported in Potthoff and Roy (1964) and used for illustration of mixed modeling methods by Pinheiro and Bates (2000). The dataset is available in the package nlme (Pinheiro, Bates, DebRoy, Sarkar, and R Core Team 2014) as well as in lqmm. I load the former as it provides useful functions for objects of class 'groupedData' and then I obtain the summary of the dataset as follows: R> library("nlme") R> data("Orthodont", package = "nlme") R> Orthodont$Subject <-as.character(Orthodont$Subject) R> Orthodont <-update(Orthodont, units = list(x = "(years)", y = "(mm)"), + order.groups = FALSE) R> summary ( A trellis plot (Sarkar 2008) of selected individual temporal trajectories in the pituitarypterygomaxillary fissure distance is shown in Figure 1. This was obtained with the plot method for 'nfnGroupedData' objects provided by package nlme. To simplify some of the examples, only a subset (i.e., girls) is used (see also Pinheiro and Bates 2000, p. 35). The full dataset is analyzed in Section 6.

Linear mixed models
Consider clustered data in the form x ij , z ij , y ij , for j = 1, . . . , n i and i = 1, . . . , M , N = i n i , where x ij is the jth row of a known n i × p matrix X i , z ij is the jth row of a known n i × q matrix Z i and y ij is the jth observation of the ith response vector y i = (y i1 , . . . , y in i ) .
Mixed effects models represent a common and well-known class of regression models used to analyze data coming from similar designs. A typical formulation of a linear mixed model (LMM) for clustered data is given by where β and u i , i = 1, . . . , M , are, respectively, fixed and random effects associated with p and q model covariates and the response vector y i is assumed to follow a multivariate normal distribution characterized by some parameter θ. The dependence among the observations within the ith cluster is induced by the random effect vector u i , which is shared by all observations within the same cluster. However, the random effects and the within-cluster errors are assumed to be independent for different clusters and to be mutually independent for the same cluster (Pinheiro and Bates 2000).
In matrix form, the model above can be written as where y = (y 1 , . . . , y M ) , X = X 1 | . . . |X M , Z = M i=1 Z i , and u = (u 1 , . . . , u M ) . From a conditional point of view, the LMM is a location-shift model. That is, the predictors X = {x 1 , . . . , x p } and Z = {z 1 , . . . , z q }, where often X ⊇ Z, are assumed to shift the conditional expectation E (y|u) = Xβ + Zu only, without affecting the response distribution in any other respect. However, a particular marginal model can be derived from (3) (Lee and Nelder 2004). Suppose X = Z and COV ( ) = ψ 2 I N . Equation 3 can be rewritten as y = Xβ + * , where * = Zu+ , with E (y) = Xβ and COV ( * ) = ZCOV (u) Z +ψ 2 I N . The within-cluster random term then confers a specific heteroscedastic structure to the model's error term * , despite the iid assumptions on .
For example, consider the orthodontic growth data described earlier. Using the package lme4 (Bates, Maechler, Bolker, and Walker 2014), a random-intercept model is fitted by restricted maximum likelihood (REML) to estimate the growth rate in girls. The random effects u i , i = 1, . . . , 11, are therefore assumed N 0, ψ 2 u . Note also that the variable age is centered at 11 years so that the cross-product between intercept and slope is zero.

Age 8
Distance ( which suggests that measurements within the same subject are strongly correlated at the mean of the marginal distribution of the response. In other words, 87% of the total variability in the response not explained by the population average and the population age effect is due to unobserved heterogeneity between subjects. These results, however informative, do not tell us what happens in the rest of the conditional distribution. Most importantly, the data show signs of non-normality ( Figure 2) for which the assumptions of a location-shift model may prove inappropriate.

Linear quantile mixed models
As seen in Section 1, the convenience of using an AL distribution allows estimating the τ th conditional quantile using maximum likelihood methods. Let us assume that the y i 's, i = 1, . . . , M , conditionally on a q×1 vector of random effects u i , are independently distributed according to an unknown continuous distribution F y i |u i (therefore the Gaussian assumption for y i |u i of a LMM is abandoned). Independence is also assumed for the within-cluster errors, though in principle extensions to allow for within-cluster correlation can be considered. A joint AL model for y i |u i is introduced, with location and scale parameters given by µ x ∈ R p is a vector of unknown fixed effects. The τ th LQMM is given by where , which can be compactly written in matrix form as µ (τ ) = Xθ (τ ) x + Zu. The skewness parameter τ is set a priori and defines the quantile level to be estimated. Also, u i = (u i1 , . . . , u iq ) , for i = 1, . . . , M , is assumed to be a zero-median random vector independent from the model's error term and distributed according to where Ψ (τ ) is a q × q covariance matrix. Note that all the parameters are τ -dependent. The random effects vector u depends on τ through Ψ (τ ) . The superscript τ will be omitted when this is not source of confusion.
From the LQMM in Equation 4, the joint density of (y, u) based on M clusters in the τ th quantile is given by It is worth stressing that, although the conditional distribution F y ij |u i is assumed to be unknown, its τ th quantile is conveniently estimated as the location parameter µ x + z ij u i of an AL distribution with scale σ (τ ) and (given) skewness τ . Since the model's interpretation is conditional, one could also define the parameter θ (τ ) x as the τ th quantile of y ij |u i = 0.
Following the definition of the joint density in Equation 5, the next step consists in adopting an estimation strategy for the parameters of interest, which prompts considerations on how to deal with the unobserved random effects u. There is a rich literature on mixed models estimation. The typical approach is to integrate the random effects out from the joint distribution and then optimize the integrated (log-) likelihood. The marginal likelihood of a LMM, for example, has a closed-form expression (Pinheiro and Bates 2000) and (restricted) maximum likelihood estimation is carried out using iterative algorithms such as EM (Dempster, Laird, and Rubin 1977) or Newton-Raphson (Laird and Ware 1982) algorithms. In models where integrals do not have a closed-form solution, one needs to resort to approximate methods such as marginal and penalized quasi-likelihood methods, Markov Chain Monte Carlo methods, and numerical integration. This is the case of generalized linear mixed models (Booth and Hobert 1999;Rabe-Hesketh, Skrondal, and Pickles 2002;Pinheiro and Chao 2006), nonlinear mixed models (Pinheiro and Bates 1995) and models for non-Gaussian continuous responses (Staudenmayer, Lake, and Wand 2009).
The first attempt to fit quantile regression models with random intercepts led to a Monte Carlo EM procedure (Geraci and Bottai 2007), which, however, can be computationally intensive and inefficient. A different approach based on Gaussian quadrature has been proposed by Geraci and Bottai (2014) and implemented in the R package lqmm.
Briefly, the log-likelihood for M clusters integrated over R q is (up to an additive constant) given by where I used the simplified notation σ (τ ) . For fitting purposes, the covariance matrix of the random effects is parameterized in terms of an m-dimensional vector of non-redundant parameters θ z , i.e., Ψ (θ z ). The parameter of interest is defined as θ = θ x , θ z .
It is immediate to verify that there is a strict relationship between the choice of a distribution for u and the type of quadrature. The assumption of normal random effects, u ∼ N (0, Ψ), which is often introduced in mixed models, leads directly to a Gauss-Hermite quadrature  for the approximate AL-based log-likelihood with nodes v k 1 ,...,kq = (v k 1 , . . . , v kq ) and weights w k l , l = 1, . . . , q, respectively. The constant K is an integer giving the number of points for each of the q one-dimensional integrals over the real line. As a robust alternative to the Gaussian distribution, it is also possible to consider the symmetric Laplace (double-exponential) distribution, which leads to an approximation similar to Equation 6, but where the nodes and weights are those from a Gauss-Laguerre quadrature rule. In lqmm, the quadrature nodes and weights are obtained using the command gauss.quad from statmod (Smyth, Hu, Dunn, Phipson, and Chen 2013).

The main call lqmm
I now illustrate the basic arguments in the main command lqmm. After loading the package lqmm R> library("lqmm") Package lqmm (1.5) loaded. Type citation("lqmm") on how to cite this package the documentation for lqmm is accessed through help("lqmm"). The arguments of this function can also be displayed on the R console: R> args(lqmm) function (fixed, random, group, covariance = "pdDiag", tau = 0.5, nK = 7, type = "normal", rule = 1, data = sys.frame(sys.parent()), subset, weights, na.action = na.fail, control = list(), contrasts = NULL, fit = TRUE) Let us start with a simple model for the conditional median of distance using the orthodontic growth data. As in Section 3.1, a random intercept is included in the linear predictor to account for the correlation of repeated observations within Subject. The first two arguments, fixed and random, are formula objects that define, respectively, the fixed and the random parts of the linear predictor µ . . , M , while the clustering or grouping variable is defined in the argument group. All variables are taken from the optional data frame data. Finally, the quantile level τ is specified using the argument tau (by default, the median): R> fit.lqmm <-lqmm(fixed = distance~age.c, random =~1, group = + Subject, tau = 0.5, nK = 7, type = "normal", data = Orthodont.sub) R> fit.lqmm Call: lqmm(fixed = distance~age.c, random =~1, group = Subject, tau = 0.5, nK = 7, type = "normal", data = Orthodont.sub) Quantile 0.5 Before entering in the details of the estimation algorithms, let us read the output obtained from the above call to lqmm, which returns an R list of class 'lqmm'. The estimated fixed effectŝ θ x = (22.94, 0.44) show that the median distance at age 11 years in girls is 22.94 mm while the median slope or growth rate is 0.44 mm per year. The random effects have an estimated variance ofψ 2 u = 2.34 mm 2 . The ICC is given byψ 2 u /(ψ 2 u +ψ 2 ) = 2.34/(2.34 + 0.84 2 ) = 0.77. One of course might ask where the number 0.84 comes from. Recall that a measure of the scale of the residuals obtained after solving (1) can be calculated as (Koenker 2005 which is also the MLE of the scale parameter of an AL distribution (Yu and Zhang 2005). As seen previously, σ is related to the variance of an AL, which provides a model-based 'residual variance' to derive the ICC above. Givenσ = 0.2969 and τ = 0.5, one can also calculate R> sqrt(varAL(sigma = 0.2969, tau = 0.5)) [1] 0.83976 The estimated parameters are stored in the fitted object fit.lqmm as fit.lqmm$theta (θ), fit.lqmm$theta_x (θ x ), fit.lqmm$theta_z (θ z ), and fit.lqmm$scale (σ). The generic function coefficients (or coef) can be used to obtain the fixed effects while the function VarCorr extracts the matrix Ψ, as shown below. The use of VarCorr is recommended since, as mentioned before, Ψ is parametrized in terms of θ z .
R> coef(fit.lqmm) In general, the interpretation of the LQMM parameters is specific to the quantile being estimated . For example, the output below shows that rate of growth in girls is 0.50 mm per year at the third quartile (τ = 0.75). The random effects have an estimated variance ofψ 2 u = 2.21 mm 2 , yielding an ICC equal to 0.71. The basic idea of LQMM is that the covariates might exert different effects at different quantiles of the outcome distribution, as assessed in standard quantile regression (Koenker and Bassett 1978), and that the degree of unobserved heterogeneity might also be characterized with τ -specific variance parameters.
R> lqmm(fixed = distance~age.c, random =~1, group = Subject, + tau = 0.75, nK = 7, type = "normal", data = Orthodont.sub) Call: lqmm(fixed = distance~age.c, random =~1, group = Subject, tau = 0.75, nK = 7, type = "normal", data = Orthodont.sub) Quantile 0.75  Let us now focus on the arguments type and covariance, which are relevant to the choice of the distribution of the random effects and, ultimately, to numerical integration. The number of quadrature nodes K is specified with nK and this is set to 7 by default. Since guidance on how to choose K in LQMMs is given by Geraci and Bottai (2014), here I will not discuss this issue further. However, a general recommendation is to start with low values of K (say, K < 10), as the total size of the quadrature grid, K q , grows exponentially with the number of random effects. The argument type takes the character string "normal" for Gaussian random effects (i.e., Gauss-Hermite quadrature) or "robust" for Laplace random effects (i.e., Gauss-Laguerre quadrature). However, while the Gauss-Hermite quadrature allows for all types of covariance matrix Ψ implemented in lqmm, the Gauss-Laguerre quadrature can be used for uncorrelated random effects only (see Geraci and Bottai 2014, for further details). Table 1 gives a summary of the options currently available in the package. The types of covariance matrices specified in covariance are named as in nlme. The table also includes the number of unique parameters for each type of matrix, that is, the dimension m of θ z .
Another argument related to the choice of the quadrature rule is rule, which introduces integration on sparse grids (Heiss and Winschel 2008) and nested integration rules for Gaussian weights (Genz and Keister 1996) as implemented in the package SparseGrid (Heiss and Winschel 2008;Ypma 2013). This part of the code, which has not yet been extensively tested, aims at introducing computational relief when the size of the quadrature grid is large. Further studies are needed to assess the performance of these approaches.

Estimation algorithms 4.1. Optimization control
In lqmm, there are currently two algorithms to minimize the negative integrated log-likelihood in Equation 6. The default is the gradient-search method described by Geraci and Bottai (2014), which makes use of the Clarke's derivative of the objective function to find the path of steepest descent. The alternative is Nelder-Mead optimization, as implemented in optim, which belongs to the class of direct search methods.

Gradient-search optimization
The function lqmm.fit.gs executes the gradient-based estimation of θ. It is a wrapper for the C function gradientSd_h and it performs pre-and post-estimation checks. The gradientsearch algorithm ) works as follows. From a current parameter value, the algorithm searches the positive semi-line in the direction of the gradient for a new parameter value at which the likelihood is larger. The algorithm stops when the change in the likelihood is less than a specified tolerance. At iteration t, let s θ t denote Clarke's gradient of the negative approximate log-likelihood, rewritten compactly as app θ t , σ 0 , given σ 0 . The minimization steps for θ are: 1. Initialize θ 0 ; δ 0 ; σ 0 and set t = 0.
A check on the convergence of the parameter θ can be introduced in Step (b) by verifying max | δ t s θ t |< ω 2 , where ω 2 > 0 controls the tolerance. By default, this check is not carried out in lqmm but it can be changed by setting check_theta = TRUE in lqmmControl.
The iterative loop for θ is the 'lower' level of the optimization. The 'upper' level of the algorithm consists in updating the scale parameter: once the algorithm finds a solution for θ, given σ 0 , the scale parameter is estimated residually to obtain σ 1 . If the change in the parameter, is sufficiently small, say |σ 0 − σ 1 | < ω 3 , with ω 3 > 0, the algorithm stops. Otherwise another iterative loop for θ is initialized by setting σ = σ 1 and so forth until convergence is achieved for σ as well.
The starting values θ 0 and σ 0 are specified in lqmm.fit.gs's arguments theta_0 and sigma_0, respectively. The starting values for θ x and σ are provided automatically by lqmm, either using ordinary least-squares estimates or, if startQR = TRUE, L 1 -norm estimates using lqm (see Section 7). The elements of theta_z are all set equal to one. The initial step δ 0 > 0 is specified in the argument LP_step of lqmmControl. If not provided by the user, this is set equal to the standard deviation of the response variable. Additionally, it is possible to instruct the algorithm to reset δ to its initial value (reset_step = TRUE) at Step (2.ii) of the algorithm above, i.e., δ t+1 = δ 0 .
The values for the tolerance parameters ω 1 , ω 2 , and ω 3 can be passed to the arguments LP_tol_ll, LP_tol_theta and UP_tol, respectively. The contraction step factor a ∈ (0, 1) and the expansion step factor b ≥ 1 are specified in the arguments beta and gamma, respectively. Finally, the maximum numbers of iterations for the lower and upper loops are given in LP_max_iter and UP_max_iter, respectively. It is possible to monitor the value of the objective function as the algorithm proceeds by setting verbose = TRUE. The output above shows that setting the starting value for theta_z (which in this case is the third element of theta) to 0.001 causes the algorithm to converge to a different optimum for θ z , in the vicinity of the starting point itself. Care, therefore, must be taken in defining the initial values. The output also reports the number of iterations at convergence for the upper loop (opt$upp_loop) and that for the last cycle of the lower loop (opt$low_loop). If the algorithm fails to converge, a warning will be produced. By way of example, let us set the maximum number of iterations for θ to a small value, say LP_max_iter = 10: R> fit.args$control$LP_max_iter <-10 R> do.call("lqmm.fit.gs", args = fit.args) The warning message will suggest using less restrictive convergence criteria, while reporting in parentheses those last used. Note that opt$low_loop is equal to −1, which is the value that the function errorHandling interprets as 'algorithm did not converge', as opposed to −2, which is the code for 'algorithm did not start'.

Derivative-free optimization
As mentioned before, the derivative-free algorithm is based on Nelder-Mead optimization routines. The function lqmm.fit.df is a wrapper for the command optim, which in turn minimizes the negative approximate log-likelihood as returned by the C function ll_h_R. It proceeds similarly to gradient-search by alternating a loop for θ and a step to update σ. The parameters LP_tol_ll, LP_max_iter and verbose in lqmmControl are passed to optim via the arguments abstol, maxit and trace, respectively.
Two successive calls to fit.lqmm.df using the list fit.args are shown below: the first with theta_z left equal to 0.001, the second with theta_z changed back to 1. The maximum number of iterations is restored to 500.

The summary and bootstrap functions
Consider first a call to lmer using the orthodontic growth data, in which both random intercepts and slopes are specified in the linear predictor: To estimate a similar model for the quartiles of distance conditional on age.c (age centered on 11 years), the vector c(0.25, 0.5, 0.75) is passed to the argument tau. The resulting lqmm object will contain the estimates of the three quantile models, which are fitted sequentially and independently using the same formulas for fixed and random effects. The covariance model of the random effects "pdSymm" will have in this case m = 3 unique parameters (two variances, one for the intercept and one for the slope, and one intercept-slope covariance).
R> fit.lqmm <-lqmm(distance~age.c, random =~age.c, group = Subject, + covariance = "pdSymm", tau = c(0.25, 0.5, 0.75), nK = 7, + type = "normal", data = Orthodont.sub, + control = lqmmControl(method = "df")) The summary method for 'lqmm' objects produces a summary object that provides standard errors, (1 − α)% confidence intervals and p values for the coefficients and the scale parameter of each quantile model. Inference on parameters is based on block-bootstrap , which is currently the only method implemented in the package. The number of bootstrap replications (default 50) can be specified in the summary method using the additional argument R. The results of the likelihood ratio test and AIC values are also produced.
R> system.time(print(fit.lqmm.s <-summary(fit.lqmm, R = 100, seed = 52))) Call: lqmm(fixed = distance~age.c, random =~age.c, group = Subject, covariance = "pdSymm", tau = c(0.25, 0.5, 0.75), nK = 7, type = "normal", data = Orthodont.sub, control = lqmmControl(method = "df")) tau = 0.25 It is interesting to note in the output above that the magnitude of the slope for age increases with increasing quantile level. The elapsed CPU time was about one minute to run 100 replications for three quantiles (approximately 0.04 seconds per sample). The 'non-convergence' warnings R> warnings() Warning messages: 1: In errorHandling(OPTIMIZATION$low_loop, "low", control$LP_max_iter, ... : Lower loop did not converge in: lqmm. Try increasing max number of iterations (500) or tolerance (1e-05) [...] may be of less concern if they occur during bootstrap. This may happen when the algorithm requires a certain number of data points to estimate the specified regression model but one or more bootstrap samples do not provide adequate information because of a particular configuration of their units. Clearly, this situation is more likely to happen when estimating tail quantiles with a number of parameters that is relatively high given the size and design of the dataset. In the first instance, one can assess the summary output by using less stringent optimization parameters. This is shown below by increasing the number of maximum iterations and the tolerance for the previous example. As a result, essentially the same estimates are obtained but with no warnings: R> fit.lqmm$control$LP_tol_ll <-1e-3 R> fit.lqmm$control$LP_max_iter <-1000 R> summary(fit.lqmm, R = 100, seed = 52) The function boot.lqmm executes the bootstrapping, producing an object of class 'boot.lqmm' that is stored within the summary output (e.g., fit.lqmm.s$B). The function boot.lqmm can also be applied directly to a fitted 'lqmm' object: R> fit.boot <-boot(fit.lqmm, R = 100, seed = 52, startQR = FALSE) Among the boot.lqmm's arguments, it is worth calling the attention on startQR. If set to TRUE, the fitted parameters in the 'lqmm' object are used as starting values for each bootstrap sample. On the one hand this may speed up the fitting process. However, it may also cause the algorithm to converge too often to a similar optimum, which would ultimately result in underestimated standard errors. Finally, all bootstrap estimates are stored in an array of dimension c(R, p + m, nt), where R represents the number of bootstrap replications, p + m the length of theta and nt the length of tau. Estimated parameters for fixed (theta_x) and random (theta_z) effects can be extracted separately from a 'boot.lqmm' object using the generic function extractBoot, while a summary can be produced using function summary method for 'lqmm' objects: R> extractBoot(fit.lqmm.s$B, "random") R> extractBoot(fit.boot, "random") R> summary(fit.lqmm.s$B) R> summary(fit.boot) Bootstrap estimates can also be used to perform inference on the difference between regression quantiles. A 95% bootstrap confidence interval for the interquartile regression coefficients can be computed as follows: R> B <-extractBoot(fit.boot, "fixed")[, "age.c", c(1, 3)] R> quantile(apply(B, 1, diff), probs = c(.025, 0.975)) 2.5% 97.5% -0.4141825 0.1731508

Prediction functions
Other important functions include ranef.lqmm and methods for predict and residuals for 'lqmm' objects. Here I introduce the former in detail and briefly describe the other two.
The BLP in Equation 8 resembles that obtained from a LMM. It is therefore reasonable to expect that predicted random effects from a mean and a median mixed model should be comparable when the parameters' estimates are similar. The code to obtain predicted subjectspecific intercepts and slopes of the mean and median models for the orthodontic growth data is shown below: The BLP of the random effects is implemented in the function ranef.lqmm, which takes a 'lqmm' object as the only argument. If more than one quantile model has been fitted, the output of ranef.lqmm will be a named list of predictions, with names given by tau.
Prediction of the response can be carried out using the corresponding predict method. The argument level specifies whether predictions should be returned at the 'population' level (level = 0), that isŷ (τ ) = Xθ (τ ) x , or at the 'cluster' level (level = 1), that isŷ (τ ) = Xθ BLP . This is similar to the argument level in the predict method for 'lme' objects in package nlme (the reader is referred to Geraci and Bottai 2014 for a discussion on the interpretation of the regression coefficients at the population level in LQMM). Analogously, residuals can be calculated at one or the other level by using the corresponding residuals method as shown below: R> predict(fit.lqmm, level = 0) R> residuals(fit.lqmm, level = 0) By way of example, Figure 3 shows the mean and quartile regression lines predicted for each of the 11 girls (that is, level = 1) in the orthodontic growth data (see Figure 1.14 in Pinheiro and Bates 2000, p. 38). While medians and means tally in magnitude and sign, for some subjects the distance between the first and third quartiles varies by age, which suggests the presence of a slight heteroscedasticity in the conditional distribution y|u.

Modeling conditional quantiles
So far individual lqmm commands have been described separately. In this section the focus is on performing a quantile analysis and discussing related modeling choices.
The estimates from Model 1 suggest that at τ = 0.75 the population growth rate in boys is faster than the rate at lower quantiles. In other words, the effort required to stay on the same quantile level between any two time points is greater for those who rank higher in the outcome distribution than for those who rank lower in the distribution (clearly, this does not imply that if a given subject ranks, say, 75th at age 8 years will necessarily rank similarly at a later time). However, the difference in slopes between the first and third quartiles contributes to a mere 0.66 mm over a 6-year time period, which may not be clinically relevant. Growth trajectories in girls start from lower levels and proceed at slower pace than in boys at all quantiles. The estimated variance of the random effects, assumed to be the same for age, sex and age-sex interaction, is very small or null at τ = 0.25, but larger at higher quartiles. As compared to Models 2-4, Model 1 is the most parsimonious but it provides the worst AIC values for τ = 0.25 and τ = 0.75.
Model 2, which allows random effects to be correlated, yet imposes the same variance parameters, does not offer an improvement with respect to Model 1, except for τ = 0.25.
Models 3 and 4, in contrast, show substantially lower AIC values, and these are similar between the two models at all quartiles. Note also that, as compared to the first two models, Models 3 and 4 provide a better AIC value for the mean. Furthermore, the AIC for Model 4 (429.52) is lower than that for Model 3 (450.60). This improvement is explained by the heteroscedastic component of the model. Without it, Model 4 provides an AIC value of 448.58. The estimates of the fixed slopes for age, in either boys or girls, are approximately constant across quartiles. The estimated variance parameters also indicate that the random intercepts vary considerably between subjects, slightly less for τ = 0.5 as compared to the other two quartiles and the mean. In contrast, there is little or null variation between subject-specific slopes. The conclusion that can be drawn based on the orthodontic growth data is that there is no evidence of unequal growth rates in terms of pituitary-pterygomaxillary fissure distance for either boys or girls ranking below and above the median. There seems to be, however, a reduced level of heterogeneity near the median.
The analysis can be further extended to assess individual growth trajectories. Each panel in Figure 4 plots individual measurements for boys and girls at different ages, together with predicted marginal (that is, level = 0) quartiles using Model 4 ( Table 2). Note that the slopes differ by sex but not by subject. This type of plot provides centile curves analogous to those typically used for screening purposes. For example, the growth paths of subjects 'F01' and 'F10' lie below the first quartile at all ages. A plot like the one in Figure 3, on the other hand, can be used for 'conditional' screening (Wei and He 2006). Note that in the conditional  Table 2: Fitted models of pituitary-pterygomaxillary fissure distance conditional on age (baseline: 11 years) and sex (baseline: boys) for three quartiles and the mean (LMM). Standard errors are in parentheses.
plot the measurements for subjects 'F01' and 'F10' rank within the first and third conditional quartiles or very close to them.

Quantile regression for independent data
The package lqmm deals mainly with clustered data. However, some functions are also provided to estimate quantile regression models for independent data via maximization of the AL likelihood with a gradient-search algorithm (Bottai, Orsini, and Geraci 2014). See also the package quantreg (Koenker 2013) for estimation based on linear programming algorithms.
The main function lqm has the typical syntax of R regression fitting functions, with some arguments analogous to those in lqmm:

Conclusion
The lqmm package implements methods developed by Geraci and Bottai (2014) for conditional quantile estimation with clustered data, originally proposed by Geraci (2005). The R code is written in S3-style, while main fitting procedures are coded in C. Ongoing methodological work in LQMMs includes developing approaches with a reduced computational burden for both model's estimation and standard error calculation. Future extensions of the lqmm package will also provide functions for complex survey estimation and multiple imputation as proposed by Geraci (2013).