SNP_NLMM: A SAS Macro to Implement a Flexible Random Effects Density for Generalized Linear and Nonlinear Mixed Models

Generalized linear and nonlinear mixed models (GMMMs and NLMMs) are commonly used to represent non-Gaussian or nonlinear longitudinal or clustered data. A common assumption is that the random effects are Gaussian. However, this assumption may be unrealistic in some applications, and misspecification of the random effects density may lead to maximum likelihood parameter estimators that are inconsistent, biased, and inefficient. Because testing if the random effects are Gaussian is difficult, previous research has recommended using a flexible random effects density. However, computational limitations have precluded widespread use of flexible random effects densities for GLMMs and NLMMs. We develop a SAS macro, SNP_NLMM, that overcomes the computational challenges to fit GLMMs and NLMMs where the random effects are assumed to follow a smooth density that can be represented by the seminonparametric formulation proposed by Gallant and Nychka (1987). The macro is flexible enough to allow for any density of the response conditional on the random effects and any nonlinear mean trajectory. We demonstrate the SNP_NLMM macro on a GLMM of the disease progression of toenail infection and on a NLMM of intravenous drug concentration over time.


Introduction
Generalized linear mixed models (GLMMs) are commonly used to represent non-Gaussian longitudinal or repeated measures data. Nonlinear mixed models (NLMMs) have been traditionally employed to model pharmacokinetic/pharmacodynamic, growth curve, disease dynamic, or other longitudinal data where the mean trajectory is an arbitrary non-linear function of a vector of parameters.
Let Y ij , i = 1,…, m, j = 1,…, n i , be the j th response on the i th subject or cluster. For both GLMMs and NLMMs we assume that Y ij conditioned on a q-dimensional vector of centered subject-specific random effects b i has density f Y ij |b i (y ij |b i ; β, x ij ), which depends on a p-dimensional vector of fixed effects β and covariates x ij . Common choices for f Y ij |b i are the normal density and the Bernoulli and Poisson probability mass functions. We also assume that the random effects b i , independent across i, have density g b i (b i ; ξ) which may depend on a vector of parameters ξ. If we assume that Y ij are independent across i and, given b i , are independent across j, then the likelihood of θ = (β ⊤ ,ξ ⊤ ) ⊤ is given by (1) where Y is the vector of all responses across all m subjects. For brevity, we write the qdimensional integral using a single integral throughout the remainder of the manuscript.
Typically, one assumes that the random effects density g b i (b i ; ξ) is a q-dimensional normal density. The assumption is usually made for convenience because the normal density is the only density for the random effects supported in the SAS/STAT procedures GLIMMIX and NLMIXED used to estimate GLMMs and NLMMs, respectively (SAS Institute Inc. 2008).
However, in many applications the assumption of Gaussian random effects may be too restrictive, and one stands to lose insight into the data generating process if one assumes a priori that the random effects are Gaussian. Maximum likelihood estimators of the fixed effects, β, are also, in general, not consistent if the random effects density g b i has been misspecified for GLMMs and NLMMs. Furthermore, researchers have noted that there is considerable bias and loss of efficiency in the maximum likelihood parameter estimators assuming Gaussian random effects when either the true random effects density deviates substantially from normality or the variance of the random effects is large and there is moderate misspecification of the random effects (Neuhaus, Hauck, and Kalbfleisch 1992;Hartford and Davidian 2000;Heagerty and Kurland 2001;Agresti, Caffo, and Ohman-Strickland 2004;Litière, Alonso, and Molenberghs 2008).
Several researchers have proposed methods to relax the assumption of Gaussian random effects (or latent traits), and we briefly review the relative merits of the proposed approaches. At one extreme, one can develop consistent and asymptotically normal estimators for the fixed effects in a GLMM or generalized linear latent variable model that do not require the data analyst to correctly posit the density for the random effects or latent traits. For example, Ma and Genton (2010) use semiparametric theory to find the efficient estimating function for parameters in a generalized linear latent variable model, that is, the projection of the score vector onto the complement of the nuisance tangent space (where the density of the latent trait is the infinite dimensional nuisance parameter), and show that the estimating function is unbiased regardless of the true distribution of the latent traits. The general semiparametric approach is explained in Tsiatis (2006). The approach advocated by Ma and Genton (2010) is similar to the conditional likelihood inference proposed by Sartori and Severini (2004) for generalized linear mixed models although the the derivation is completely different. However, these semiparametric approaches treat the random effect/ latent trait density as a nuisance parameter which may be of scientific interest in many applications. Furthermore, deriving the efficient estimating functions for the fixed effects that do not require estimating the random effects density is not trivial and depends on the conditional density f Y ij |b i (·).
Alternatively, the random effects distribution that does not assume any parametric form may also be estimated by maximum likelihood. Several authors have proposed computational methods to find the nonparametric maximum likelihood estimate of the random effects density (Laird 1978;Bock and Aitkin 1981;Follmann and Lambert 1989;Lesperance and Kalbfleisch 1992;Aitkin 1999). Knott and Tzamourani (2007) proposed using the bootstrap to obtain confidence intervals for the estimated nonparametric density. Lindsay (1983) showed that the nonparametric maximum likelihood estimator is discrete and has a limited number of points of support. This is a serious limitation because in many applications we would expect that the distribution of the random effects or latent traits to be continuous and a discrete approximation may not provide sufficient insight into the true data generating mechanism. Furthermore, we may be able to gain substantial efficiency by making the minimal and realistic assumption that the random effects density is continuous. Magder and Zeger (1996) and Knott and Tzamourani (2007) both propose a smooth nonparametric maximum likelihood estimator of the random effects density. The method developed by the former results in a finite mixture of Gaussian densities while the later suggest smoothing the discrete estimate using kernel methods. Both involve ad hoc approaches to determine the amount of smoothing and the approach advocated by Magder and Zeger (1996) can be computationally intensive. Tao, Palta, Yandell, and Newton (1999) nonparametrically estimate the density of a random effect using a predictive recursive algorithm. The estimate is guaranteed to be continuous if the starting density in the algorithm is chosen to be a continuous density, but unfortunately the authors note the sample size must be large to obtain promising results and only implement their method on a onedimensional random effect. Verbeke and Lesaffre (1997) and Litière et al. (2008) propose approximating the random effects distribution using a finite mixture of Gaussian distributions and develop an EM algorithm to fit the model. Like many EM algorithms, their algorithm may be computationally intensive and slow. Ghidey, Lesaffre, and Eilers (2004) similarly assume that the random effects are a Gaussian mixture and propose to estimate the mixing proportions using a penalized log-likelihood. However, their method has only been applied to linear mixed models.
Alternatively, many researchers have developed efficient computational methods to fit models with flexible, parametric classes of random effects densities. For example, prior work has developed computational methods to fit models where the random effects distribution is assumed to belong to the class of t-distributions (Pinheiro, Liu, and Wu 2001;Lee 2006, 2007;Lee and Thompson 2008), skew extensions of the t-and normal distributions (Ghosh, Branco, and Chakraborty 2007;Lin and Lee 2008;Ho and Lin 2010;Huang and Dagne 2010;Labra, Garay, Lachos, and Ortega 2012), normal/independent distributions (a class of thick tailed-distributions which contain the multivariate t, slash, and contaminated normal distributions, see Rosa, Padovani, and Gianola 2003;Lachos, Bandyopadhyay, and Dey 2011), and skew extensions of the normal/independent distributions (Lachos, Dey, and Cancho 2009;Lachos, Ghosh, and Arellano-Valle 2010;Bandyopadhyay, Lachos, Castro, and Dey 2012). Although each of these classes of distributions may be appropriate in certain domain-specific applications, none is likely flexible enough for general use. For example, none of the above classes permit multiple modes.
In another approach given extensive attention in the recent literature, one could assume that the random effects density follows a smooth density that can be represented by the seminonparametric (SNP) formulation proposed by Gallant and Nychka (1987). Although this class of densities does not include all continuous densities, the class is sufficiently flexible to include skewed, multimodal, and think and thin-tailed densities. The density does not contain densities which have kinks, jumps, or oscillations, densities unlikely to occur in common practice. The SNP representation has been previously used for the random effects density in GLMMs (Chen, Zhang, and Davidian 2002) and NLMMs (Davidian and Gallant 1993;Vock, Davidian, Tsiatis, and Muir 2012) as well as used for the latent trait density of generalized linear latent variable models (Irincheeva, Cantoni, and Genton 2012). The SNP representation is flexible enough to cover a broad variety of applications unlike some of the parametric classes discussed above. Furthermore, the SNP representation works for multidimensional random effects and can be implemented as the random effects density for nonlinear mixed models unlike some of its competitors. Finally, by assuming that the random effects density is "smooth" without kinks or oscillations, a minimal but realistic assumption, we may gain efficiency over non-parametric methods. Overall, fitting a GLMM or NLMM assuming the random effects follow the SNP representation is a prudent modeling approach when the random effects density is of scientific interest and there is little subjectarea knowledge to suggest a priori that the random effects are Gaussian. Of course the choice of a method for a flexible random effects density depends on the goal of the analysis. If the random effects density is not of interest to the investigator, then the non-parametric approaches described above may be sufficient.
Previous work has used Fortran to estimate NLMMs and GLMMs where the random effects are assumed to follow the SNP representation which has precluded widespread use of the SNP density. There are two main problems in implementing the SNP density using standard software. First, as noted earlier, standard software only permits Gaussian random effects, so the software must be "tricked" to allow non-Gaussian random effects. Secondly, to obtain maximum likelihood parameter estimates of θ, we must be able to approximate the integral in Equation 1. This can be difficult for complex densities g b i ; as described by Irincheeva et al. (2012) many of the standard approaches for integral approximation work poorly when the random effects follow the SNP representation.
We develop a SAS macro, SNP_NLMM, that overcomes those computational challenges and allows one to fit easily GLMMs and NLMMs where the random effects density is assumed to follow the SNP density. In Section 2, we discuss the SNP density in detail. Section 3 describes the input parameters for the SNP_NLMM macro as well as the output data sets. We describe the computational details used in the macro in Section 4. Section 5 illustrates the use of SNP_NLMM to fit a GLMM of longitudinal measurements on toenail infection. Section 6 describes the results from implementing the macro to estimate a NLMM of intravenous drug concentration over time. We conclude in Section 7.

Seminonparametric density
To motivate the development of the SNP density, we consider a nonlinear mixed model of intravenous drug concentration over time. We will return to this example in detail in Section 6. Let Y ij and t ij be the concentration and time since infusion began, respectively, for the i th subject at the j th measurement. The concentration profile is assumed to follow the model (2) where D i is the infusion rate subject i received; Cl i and V i are the unknown subject-specific clearance and volume distribution parameters; and . Because we assume that Cl i and V i vary across the population, they are subject-specific random effects. Let b i = (log Cl i , log V i ) ⊤ . Throughout the remainder of the manuscript, we write the q-dimensional vector of centered (non-mean-zero) random effects b i as (3) where μ is a q-dimensional vector, R is a q × q lower triangular matrix, and Z i is a qdimensional random effects vector. One would traditionally assume that Z i follows a standard q-dimensional normal density which implies that E(b i ) = μ and var(b i ) = RR ⊤ . We rewrite the likelihood given in Equation 1 in terms of Z i as (4) where r = vech(R) is the non-zero components of R, and, in the example given, Instead of assuming that Z i is distributed as standard normal, we assume that Z i belongs to the smooth class of densities, , described by Gallant and Nychka (1987). These authors give a mathematical description of densities in this class which are sufficiently differentiable to rule out densities with jumps or oscillations but flexible enough to include thick/thintailed, skewed, and multi-modal densities. Mathematically, densities g ∈ may be expressed as a infinite Hermite series, , plus a lower bound on the tails, where P ∞ (z) is an infinite dimensional polynomial and ψ(z) is the standardized form of a known density with a moment generating function. The density ψ(z) is usually taken to be the standard normal density as is the case for this macro. For modeling purposes the lower bound is typically disregarded and the polynomial is truncated. Densities in the truncated class which disregard the lower bound have been referred to as seminonparametric or SNP.
The SNP density with degree of truncation K is given by (5) where ϕ q (·) is the q-dimensional standard normal density, j l ≥0 for l = 1,…, q, and K is the order of the polynomial P K (z). For example, with K = 2 and q = 2, . Note that the subscripts of the coefficients in the polynomial match the powers of the q random effects in the same term.
Because is a polynomial of order K squared times a standard normal density, will be non-negative for any choice of ξ. However, for to be a proper density we must constrain the d-dimensional vector of coefficients a = {a j 1 ,…,j q : j 1 +…+ j q ≤K}, such that integrates to 1. This is equivalent to imposing the constraint a T Aa = 1, where A is the matrix with (j, k) element equal to , j 1 ,…, j q (similarly k 1 ,…, k q ) are the subscripts of the j th (k th ) element of a, and U 1 is distributed as a standard normal (Zhang and Davidian 2001). Because A is positive definite, there is a matrix B so that A = B 2 ; this implies c T c = 1 where c = Ba. Rather than impose the constraint c T c = 1, we can use a polar coordinate transformation to write c using only d−1 parameters where c 1 = sin(ξ 1 ), c 2 = cos(ξ 1 ) sin(ξ 2 ),…,c d−1 = cos(ξ 1 )… cos(ξ d−2 ) sin(ξ d−1 ), c d = cos(ξ 1 )…cos(ξ d−1 ), −π/2 ≤ ξ j ≤ π/2 for j = 1,…, d − 1, and ξ = (ξ 1 ,…,ξ d−1 ) ⊤ (Zhang and Davidian 2001).
We note that when K = 0, then P K (z) is just a constant, a 0,…,0 , which must equal 1, and the density simplifies to the standard q-dimensional normal. For K > 0, the SNP density does not force E(Z i ) = 0. Therefore, E(b i ) = μ + RE(Z i ) and var(b i ) = Rvar(Z i )R ⊤ . These moments can easily be computed by noting that the moments of Z i are just linear combinations of moments of a standard normal density.
The above discussion has assumed a fixed K, the value of K controls the departure from the base density, here the standard normal, and, therefore, the flexibility for approximating the true random effects density. K is not the number of components in a mixture or the number of subpopulations in the underlying population. Instead, K plays a similar role to the bandwidth in kernel density estimation with smaller bandwidths and larger values of K capturing more local features of the random effects densities but leading to more variable estimators (Song, Davidian, and Tsiatis 2002). The class of possible densities grows monotonically as K increases, and Irincheeva et al. (2012) relate the number of possible modes to K and q. However, none of the extensive methodological work on the SNP representation has related K to other features of the corresponding class of densities such as the maximum skewness and kurtosis permitted in the class (see, for example, Gallant and Nychka 1987;Davidian and Gallant 1993;Zhang and Davidian 2001;Chen et al. 2002;Song et al. 2002;Doehler and Davidian 2008;Zhang and Davidian 2008;Vock et al. 2012). Although one could a priori fix a choice of K, an objective selection method is often preferred because of the lack of relationship between K and the features of the distribution permitted. Specifically, most work has advocated treating K as a tuning parameter and has suggested fitting models for several values of K and then choosing K based on information criteria and/or visual inspection of the estimated densities (Davidian and Gallant 1993;Zhang and Davidian 2001;Chen et al. 2002;Vock et al. 2012). Prior research cited above has indicated that K need not be larger than two to capture many complicated densities.
We can obtain estimates of θ by using the maximum likelihood estimator. For a given K, the optimization problem is exactly the same as in standard finite dimensional maximum likelihood estimation provided we can approximate the integral in Equation 4 (see Section 4). However, optimization of the likelihood with the SNP density can be highly dependent on starting values for ξ; therefore, it is advisable to maximize the likelihood using a few sets of "good" starting values to ensure that one obtains the global maximum. Our macro uses the following routine advocated by Doehler and Davidian (2008) to generate starting values and parameter estimates. SNP_NLMM obtains initial estimates for β, E(b i ), and var(b i ) by fitting the GLMM or NLMM assuming Gaussian random effects. The likelihood over a grid of values for ξ is evaluated where β is set to the initial estimate and starting values for μ and r are set to the values that result in the same values of E(b i ) and var(b i ) as their initial estimates from assuming Gaussian random effects. The macro then uses a small number of those sets of parameter values with large likelihoods as starting values in the maximization of the likelihood.
• subject: Specifies the subject/cluster variable. Observations that have the same value of subject are assumed to have the same realization of the random effects.
• response: Specifies the response variable of interest.
• prog_stat: Additional programming statements needed to specify the conditional density. The syntax is the same as the NLMIXED procedure. We emphasize that the centered (non-mean zero) random effects must be referred to as b_1 and b_2. The programming statements should be enclosed with the %str() macro command. Typically one would need to specify any parameters that are part of the conditional density including the mean and variance if the conditional density is Gaussian or the probability of success if the conditional density is Bernoulli.
• log_cond_dens: The logarithm of the density of the response given the random effects. We emphasize that the centered (non-mean zero) random effects must be referred to as b_1 and b_2. The statements should be enclosed with the %str() macro command. While having the user specify the conditional density and corresponding programming statements does require greater involvement, this allows for greater flexility.
• start_values: Starting values for the fixed effect parameters, μ and β, as well as variance components, R, assuming Gaussian random effects. The starting values should be given using the syntax of the parms statement in in NLMIXED procedure in SAS, for example beta1 = 5 beta2 = 1. The variance of the random effects has been parameterized using R where var(b i ) = RR ⊤ and R is a lower triangular matrix. The (i, j) non-zero components of R should be referred to as r_ij. The i th component of the vector μ should be referred to as mu_i. The statements should be enclosed with the %str() macro command.
• q: Specifies the dimension of the random effects. Currently only q = 1 and q = 2 are supported.
• Kmax: Specifies the maximum degree of flexibility considered. As noted above, prior research has indicated that K need not be larger than two to approximate most densities encountered in practice. Currently Kmax must be less than or equal to four for one-dimensional random effects and less than or equal to three for twodimensional random effects. The default is Kmax = 2.
• all_K: If all_K = 1, then models with K, the degree of flexibility in the random effects density, equal to zero through Kmax are fit. If all_K = 0, then models with K equal to just zero and Kmax are fit. The default is all_K = 1.
• info_crit: Specify the information criteria, BIC or AIC, that should be used to select the degree of flexibility in the random effects density. The default is info_crit="BIC".
• quad_pt_K0 and similarly for _K1, _K2, _K3, _K4: Specifies the number of quadrature points used for K = 0 and similarly for other values of K. If quad_pt_K0 = 0 the number of quadrature points is adaptively selected. Specifically, the macro continues to increase the number of quadrature points until the relative change in the log likelihood evaluated at the first set of starting values is below a prespecified tolerance. We recommend first running the macro with quad_pt_K0 = 0 to select the number of quadrature points adaptively. Then one can re-run the optimization a couple of times with larger numbers of quadrature points to ensure that the maximum likelihood estimates are not sensitive to the number of quadrature points. This is especially helpful if the number of quadrature points selected adaptively is small. The default is to have the number of quadrature points selected adaptively.
• grid_pt_K1 and similarly for _K2, _K3, _K4: Specifies the number of grid points in each dimension of ξ that should be used in the grid search for starting values. We note that for q = 1 the dimension of ξ equals K and for q = 2 the dimension of ξ equals (K + 1)(K + 2)/2 − 1. For q = 2 and large values of K, the number of grid points can be quite large and the time to evaluate the likelihood at each of these grid points can be extensive. However, we have found that a fine grid is the best way to ensure that the global maximum is obtained. We recommend that the number of grid points in each dimension be at least 9 for q = 1 with K less than 4 and for q = 2 with K less than or equal to 2. For scenarios with larger values of K, we would recommend the number of grid points in each dimension be at least 5-7.
• max_K1 and similarly for _K2, _K3, _K4: Specifies the number of optimizations with different sets of starting values that should be performed to ensure that one obtains the global maximum. The default is max_K1 = 1.

Output datasets
The following datasets are generated from the macro, are stored in the work directory, and are printed as part of the output: • dim_total: Contains the number of observations used, number of observations not used, number of subjects, number of parameters, and number of quadrature points for each value of K considered.
• fitstat_total: Contains the fit statistics AIC and BIC as well as −2 log likelihood for each value K considered.
• parm_total: Contains the parameter estimates and standard errors for each value of K considered.
• parm_final_K: Contains the parameter estimates, standard errors, p values, and confidence intervals for the value of K selected by the information criteria in info_crit.
In addition to the datasets generated, SNP_NLMM plots the estimated density for each value of K considered.

Computational details
We describe here some of the computational issues in implementing the SNP density for GLMMs and NLMMs. This section may be skipped by the reader solely interested in implementing the macro. However, these issues may be of interest to investigators trying to implement other flexible random effects densities in standard software.
As discussed previously, to obtain the maximum likelihood parameter estimator, we must be able to approximate the contribution to the likelihood from each subject, (θ, Y i ), where We have found traditional numerical integration techniques available are either too computationally intensive or not appropriate if either f Y i |Z i of g Z i are non-Gaussian. For example, the first-order approximation (Beal and Sheiner 1988) requires that the conditional density of the response given the random effects to be Gaussian. Also, the Laplace approximation does not perform well if the random effects distribution has multiple modes. Although Gaussian quadrature can be used to approximate a broad array of integrals, quadrature, as currently implemented, has limitations has well. Non-adaptive Gaussian quadrature centers the quadrature points at the mean of the random effects and scales them based on the variance of the random effects. However, the region over which the integrand in Equation 6 contains most of its mass may not be near the mean of the random effects. Therefore, a large number of quadrature points may be needed to achieve a reasonable approximation. When g Z i is the SNP density, we have found the number of quadrature points needed is usually too large to be computationally feasible for q ≥ 2.
To reduce the number of quadrature points needed to obtain a reasonable approximation, Liu and Pierce (1994) suggest centering the quadrature points at the mode of the integrand and scaling by the inverse of the negative Hessian matrix. Because, in the context of random effects models, both the mode and Hessian depend on the values of the fixed effect parameters, Pinheiro and Bates (1995) suggest the following iterative algorithm to obtain estimates for θ. Let Ẑ i,k be the empirical Bayes estimate of Z i given the current (i.e., k th ) estimates of β, μ, r, and ξ, β̂ (k) , μ̂( k) , r̂( k) , and ξ̂( k) , respectively, and Ĝ i,k be the estimated covariance matrix of Ẑ i,k . That is, We can write in terms of the random variable U i and then perform a change of variable to have the integral in Equation 6 taken with respect to U i : (8) This integral can then be approximated by a weighted q-dimensional sum such that (9) where and u k l and w k l , k l = 1,…, T, l = 1,…, q, are the abscissas and weights, respectively, for the one-dimensional Gaussain quadrature rule based on the standard normal kernel with T quadrature points.
Using this approximation to Equation 6, we can then maximize Equation 4 with respect to β, μ, r, and ξ to obtain β̂( k+1) , μ̂( k+1) , r̂( k+1) , and ξ̂( k+1) , respectively. Typically one would iterate, until convergence, between finding the empirical Bayes estimates to re-approximate Equation 6, and then updating β, μ, r, and ξ. In many applications, the number of quadrature points needed for an accurate approximation is small; and, therefore, the dimension of the integral can be large and still be computationally feasible (see Schilling and Bock 2005, who considered numerical integration up to 8 dimensions). We have described (adaptive) Gaussian quadrature in the context of a single-level random effect, but the approach may be extended to multi-level models (Rabe-Hesketh, Skrondal, and Pickles 2005). Additional efficiency may be gained by using spherical quadrature rules although those are not considered as part of SNP_NLMM.
However, adaptive quadrature requires m maximizations at each iteration because the integral in Equation 6 is re-centered and re-scaled about the updated empirical Bayes estimates at each iteration. The maximizations can be too time consuming if g Z i is sufficiently complex as is the case for the SNP density. Adaptive importance sampling, another numerical integration technique which is a stochastic version of adaptive Gaussian quadrature, also requires m maximizations at each iteration and, therefore, suffers from the same limitation (Pinheiro and Bates 1995). A novelty of SNP_NLMM is the computational approach to approximate Equation 6 when g Z i (·) is a flexible and complex random effects density, which we describe below.
Heuristically, we would like to center and scale the integral one time at something reasonably close to Z i , the true value of the random effect, to avoid m maximizations at each iteration. As suggested in Vock et al. (2012), we propose to center and scale the quadrature points at the empirical Bayes estimates and their estimated variance from assuming Gaussian random effects. The approach advocated here is similar to the computational approach taken by Rizopoulos (2012) in the context of joint longitudinal-survival models. Rizopoulos suggested using Gaussian quadrature to approximate the integral over the random effects in the joint model with quadrature points centered and scaled at the empirical Bayes estimates and estimated variance obtained from the model of only the longitudinal data. In our application, we need to be careful as Z i is assumed to have mean zero when Gaussian random effects are assumed, but this need not be the case for a flexible random effects density, such as the SNP density. Let bî ,G be the empirical Bayes estimate of b i , the centered random effect, from assuming Gaussian random effects and Ĝ i,G the estimated variance of the empirical Bayes estimates.
Then we have (10) which we can substitute into Equation 6 and take the integral with respect to U i : (11) As before we can now use Gaussian quadrature to approximate Equation 11; that is, (12) SNP_NLMM uses the NLMIXED procedure in SAS for numerical integration and optimization of the likelihood. Because NLMIXED does not allow the user to center the quadrature points at arbitrary values and does not permit non-Gaussian random effects, to force NLMIXED to fit the GLMM or NLMM with the SNP density using the integration strategy above we must creatively use programming statements. More specifically, if we consider (13) to be the conditional "density" of the response given the random effect u i (which one can specify using the general log likelihood function in the model statement of the NLMIXED procedure), specify a standard q-dimensional normal for the random effects density, and use the noad option to avoid adaptive quadrature, then the NLMIXED procedure will fit the desired GLMM or NLMM with the SNP random effects density using our prosed method for numerical integration. These steps use the "likelihood reformulation" technique discussed by Liu and Yu (2008) to allow for non-Gaussian random effects densities. Overall, we have found that the above strategy provides an excellent approximation of the integral while being far less computationally demanding than either non-adaptive or adaptive Gaussian quadrature.

GLMM example: Toenail infection
As a motivating example, we consider data from a multicenter randomized trial on the effect of two oral treatments on dermatophyte onychonycosis, a toenail infection. Investigators measured the degree of onycholysis, a measure of the separation between the nailplate and nail bed, at baseline and 4, 12, 24, 36, and 48 weeks after starting therapy. A total of 294 patients were enrolled in the study and had a total of 1908 measurements were taken. These data have been previously analyzed by DeBacker, Vroey, Lesaffre, Scheys, and Keyser (1998) and Lesaffre and Spiessens (2001).
One question of interest to the investigators was whether there was a difference in the proportion of patients with moderate to severe onycholysis over time between the two treatments. Let Y ij be the indicator if patient i, i = 1, …,m, has moderate or severe onycholysis at the j th measurement, j = 1, …, n i . Similarly define t ij to be the time since subject i started treatment at j th measurement and a i to be the indicator for whether or not subject i was on treatment A.
In keeping with previous analyses, we posit that conditioned on a subject-specific intercept b 1i , Y ij are independent and follow a logistic regression model with linear predictor b 1i + β 1 t ij + β 2 a i + β 3 t ij a i . That is, we assume (14) where b 1i are independent across i and can be written as b 1i = μ + RZ i for constants μ and R.
Previous research has assumed that the density of Z i is a standard one-dimensional normal density (Lesaffre and Spiessens 2001); however, we assume that the density of Z i can be approximated by the SNP representation with K = 0, 1, and 2 (subsequent analysis with larger values of K, not shown, did not improve the model fit).
The fit statistics and parameter estimates from fitting the model implied by Equation 14 are given as part of the standard output of SNP_NLMM and are re-printed below. We note that expect_b1 and var_b1 refer to E(b 1 ) and var(b 1 ), respectively; the remaining parameter names of interest are specified by the user in the prog_stat statement, above. The parameter estimates are similar across all values of K, although the estimated mean of the random intercept is approximately one standard error smaller for K = 2. The estimate for β 2 is also considerably larger for K = 2 but remains non-significant. Of interest to the investigators was the value of β 3 , the interaction between treatment and time, which remained boarder-line significant across each of the three models.
Each of the information criteria prefer K = 2 which, based on the plot of the estimated density given in Figure 1, indicates a strong departure from normality and the presence of two groups. One group of subjects with the mode of b 1i at approximately −5 has a very low probability of moderate to severe onycholysis across all time points. The predicted probability of having moderate or severe onycholysis at each of the scheduled visit times is less than 1.1 percent for both treatment groups. This mode suggests that those who enter the trial without moderate to severe infection have very little probability of transitioning to a more infected state. However, those in the second group with the mode at approximately 1 experience a clear progression through the disease. The estimated probability of having significant onycholysis at weeks 0, 4, 12, 24, 36, and 48 for a patient taking treatment A with b 1i = 1 is 0.81, 0.71, 0.47, 0.16, 0.04, and 0.01, respectively.
Fitting a flexible random effects model not only alters estimates of parameters such as E(b 1i ), but perhaps more importantly improves our understanding of the underlying data generating process and disease progression.

NLMM example: Pharmacokinetics of argatroban
As an example of a NLMM, we consider a pharmacokinetic study of argatroban, an anticoagulant, reported in Davidian and Giltinan (1995, Chapter 9.5) and introduced in Section 2. In this study, 36 patients received a four-hour intravenous infusion of argatroban at one of nine infusion rates ranging from 1 to 5 μg/kg/min in increments of 0.5 μg/kg/min (four patients at each infusion rate). A 37 th patient had an infusion rate of 4.38 μg/kg/min.
The intravenous concentration of argatroban was recorded at 30,60,90,115,160,200,240,245,250,255,260,275,295,320, and 360 minutes after starting infusion. Let Y ij and t ij be defined as in Section 2. The concentration profile is assumed to follow the model given by and R is a 2 × 2 lower triangular matrix. We assume that the density of Z i can be represented by the density given in Equation 5 for K = 0, 1, and 2.
The fit criteria and parameter estimates from fitting the model implied by Equation 2 assuming K = 0, 1, and 2 are given as part of the standard output of the macro and shown below. We should note that with only 37 patients it is difficult to make any definitive statements on the joint density of log(Cl i ) and log(V i ). Nonetheless, all the information criteria prefer K = 1 which, based on the bivariate density plot given in Figure 2 and as part of the standard output of the macro, shows strong departure from normality with a group of patients having a much lower log clearance than the others. Fixed effect parameter estimates are nearly identical across all K considered although the standard errors are larger for K = 2 indicating that we may be overfitting the data.

Conclusion
For many applications of GLMMs and NLMMs, there is little subject area knowledge that would justify assuming a priori that the random effects are Gaussian. Given that fixed effect parameter estimators will be inconsistent, biased, and inefficient if the random effects density is misspecified, the one advantage to assuming Gaussian random effects density was computational ease. However, advances in computation and the SNP_NLMM macro allow flexible random effect densities to be easily and quickly implemented. The total computation time for all models in the toenail infection example is less than 30 seconds. When q = 2 as in the PK/PD study of argatroban, the grid search for K = 2 can be somewhat time consuming depending on the fineness of the grid and computing speed. But optimization takes less than a couple of minutes for all three models. In addition to improved inference of parameter estimators, one gains an improved understanding of the data generating process which can be crucial for scientific understanding. We hope that the SNP_NLMM macro allows flexible random effect densities to be fit routinely for GLMMs and NLMMs.  Estimated density of the log clearance and volume parameters from Equation 2 with K = 1 concerning the PK/PD study of argatroban.