Smoothing with mixed model software

Smoothing methods that use basis functions with penalization can be formulated as fits in a mixed model framework. One of the major benefits is that software for mixed model analysis can be used for smoothing. We illustrate this for several smoothing models such as additive and varying coefficient models for both S-PLUS and SAS software. Code for each of the illustrations is available on the Internet.


Introduction
Smoothing methodology offers a means by which non-linear relationships can be handled without the restrictions of parametric models. It has become a widely used tool for data analysis and inference and its integration into complex models and use in applications is becoming more and more pervasive.
When fitting models that involve smoothing the analyst has to choose between programming the method herself or using customized software. The latter can be somewhat restrictive. For example, generalized additive models can be handled in either PROC GAM in SAS or gam() in S-PLUS; but varying coefficient models cannot. On the other hand, self-implementation of smoothing models can be time consuming. In this article we demonstrate how mixed model representations of penalized splines can largely alleviate this problem. Most smoothing models in common use: nonparametric regression, kriging, additive models, varying coefficient models, additive mixed models; can be formulated as a mixed model. See, for example, Wahba (1978), Speed (1991), Verbyla (1994), O'Connell and Wolfinger (1997), Brumback, Ruppert and Wand (1999). This allows for their fitting to be achieved using software such as PROC MIXED in SAS (Littell et al., 1996) and lme() in S-PLUS (Pinheiro and Bates, 2000). Mixed model software also provides automatic smoothing parameter choice via (restricted) maximum likelihood estimation of variance components. Finally we note that mixed model representations of smoothers allow for straightforward combination of smoothing with other modelling tools such as random effects for longitudinal data. Ruppert, Wand and Carroll (2003) provides more background and materials for the class of semiparametric regression models. Wand (2003) is a companion article to this paper and provides more details on the connections between smoothing and mixed models.
We provide S-PLUS and SAS code that illustrates the use of mixed model software to do smoothing for several models. Sections 2 -8 treat increasing more sophisticated models, starting with the simple scatterplot smoothing, or nonparametric regression, model and finishing with varying coefficient models. Section 9 treats user specified amounts of smoothing, while Section 10 deals with standard error computation. Extensions to other basis functions and bivariate smoothing is treated in Sections 11 and 12. We close with discussion on generalized models in Section 13, plotting issues in Section 14 and some closing remarks in Section 15.
All of the code given in this article is available in text files on the Internet.

Scatterplot Smoothing
The formulation of penalized spline scatterplot smoothers as mixed model fits is fundamental to the thrust of this paper. Therefore we will spend a few paragraphs explaining this connection. The data in each panel of Figure 1 is identical, and was generated as for smoothing. This is for simplicity of exposition. Other smoother bases can be used instead and these are discussed in Section 11. However, the truncated line basis can perform adequately in many circumstances. The bar at the base of each panel shows the location of the knots. Panel (a) is just an ordinary least squares fit to the scatterplot; but is quite rough due to the large number of truncated line functions being fit. Panel (b) remedies this through one simple modification: For £¥ § © this shrinks the P G and leads to the smooth fit shown in Figure 1 (b).  (1) and (2) Scatterplot smoothers of the type, where the number of basis functions is less than the sample size, presented in this section go back at least to Parker and Rice (1985), O'Sullivan (1986O'Sullivan ( ,1988, Gray (1992) and Kelly and Rice (1990). More recent references are Eilers and Marx (1996), Hastie (1996) and Ruppert and Carroll (2000) where the following names: P-splines, penalised splines, pseudosplines, and low-rank smoothers have been coined. Each of these are virtually synonymous.
The next two subsections explain how (3) can be fit in the S-PLUS and SAS computing environments.

S-PLUS commands
For illustration of scatterplot smoothing we will use the fossil data described by Chaudhuri and Marron (1999). However, we will multiply the response variable (strontium ratio) by 100,000 to make the y-axis more readable.
Assign the scatterplot vectors x and y corresponding to the fossil data-frame: x <-fossil$age y <-100000*fossil$strontium.ratio The Z matrix requires a set of knots. For now we will take them to be knots <-seq (94,121,length=25) Section 3 describes good default choice of the knots for general x. However, it is important to realize that this default is not always appropriate and that selection of a good set of knots may need to be done manually. Read in fossil data and assign to vectors x and y.
Compute the mixed model fit using lme().

Default Knot Specification
A reasonable default rule for the knot locations is: See Ruppert (2002) for further discussion on default knot specification.

SAS code
The following SAS code obtains the default set of knots for given vector of © ¡ values. This algorithm does not produce identical knots that are generated by the Splus algorithm; however, as long as the underlying knots capture the variable's distribution, the smoothing results are quite similar. The algorithm selects a knot at every fifth value, and limits the number of knots generated. The option of specifying the number of knots to be selected is also allowed. These onions data are taken from Ratkowsky (1983). A detailed semiparametric analysis of the data is given by Young and Bowman (1995). We use the phrase "simple semiparametric" because the model has a parametric component (location term) and a nonparametric component (density term). Such models are also commonly referred to as "partially linear" (e.g. Härdle, Liang and Gao, 2000). The special case where the parametric component is binary is sometimes called a "binary offset model". The fitting of this model is a trivial extension of the content of Section 2: add a column to the matrix corresponding to the offset indicators (the ¡ in the onions example).

SAS code
The following SAS code fits the above simple semiparametric regression model.
sig.eps.hat <-fit$sigma sig.u.hat <-sig.eps.hat*exp(unlist(fit$modelStruct)) Print a summary of the fixed effects. The last row is the only one that has an interpretation and corresponds to the effect of air pollution (non-significant in this case). print(summary(fit)$tTable)

Additive Mixed Models
The sitka data are listed in Table 1.2 and displayed in Figure 1.3 of Diggle, Liang and Zeger (1995). They correspond to measurements of log-size for 79 Sitka spruce trees grown in normal or ozone-enriched environments.
A useful model for these data is the additive mixed model where ¥ is some smooth function. For the sitka spruce data¨£ ¥ ¡ and is an indicator variable corresponding to whether or not the trees are grown in normal or ozone-enriched environments.
Appropriate design matrices are

S-PLUS commands
Read in the sitka spruce data: Extract data corresponding to the sitka data frame: Construct the y response vector and X matrix.
y <-log.size X <-cbind(rep(1,length(y)),days,ozone) Create the spline component of the Z matrix. Note that the presence of knots for the days variable can be a known vector of knots. Notice that in the SAS code below, we use the knot vector generated by the Splus code. The estimates from both the Splus and SAS code are identical.
Consider the model corresponding to daily measurements for the years 1984- Note that the fixed effects component has year=1984 as a reference group. However, the random effects component does not use a reference group and all years are on equal footing.

Varying Coefficient Models
Let © be a predictor variable that, for given values of a modifying predictor , has a linear relationship with the mean of the response variable . If , are measurements on each then a varying coefficient model for these data is The model allows the intercept and slope coefficients to be arbitrary smooth functions of . The penalized linear spline version of this model is

S-PLUS commands
Varying coefficient models will be demonstrated on the ethanol data set in S-PLUS. Type help(ethanol) to find out more about these data. Extract the data as follows.

z <-ethanol$E x <-ethanol$C y <-ethanol$NOx
Set up the design matrices.

User Specified Smoothing Parameters
In the mixed model representation of smoothers described in Sections 2-8 the amount of smoothing is controlled by the variance components appearing in both Cov § $ and Cov § " " " . Mixed model software usually defaults to the REML or ML estimates of these variance components. Thus, the amount of smoothing is chosen automatically. However, there are situations where the analyst would like to specify the amount of smoothing. A simple example is a sensitivity analysis for a simple semiparametric model (Section 4) where the sensitivity of the estimate of the offset coefficient 9 C to different amounts of smoothing in the estimate of ¥ requires investigation (e.g. Bow-man and Azzalini, 1997). Another is the feature significance methodology described by Chaudhuri and Marron (1999), for example.
In SAS the problem of user specified smoothing parameters is relatively easy to overcome using the PARMS option -see Section 9.3. However versions of S-PLUS's lme() known to us at the time of writing do not support user specified variance components and direct computation is required. We will show how this can be done in the scatterplot smoothing situation. Extensions to other models follows relatively straightforwardly.
Recall the setting and notation described in Section 2. For given values of £¥ § and £¥ 8 application of ML and Best Prediction (BP) to obtain 9 9 9 and $ is equivalent to solving the penalized least squares problem  Robinson, 1991). This is an example of penalized least squares (e.g. Green, 1987)

Algorithm 1
Inputs: (1) Obtain the singular value decomposition of and obtain its singular value decomposition: & . An alternative approach to handling the ridge regressions that arise in penalized spline models is through QR decomposition (e.g., Golub and Van Loan, 1983;Hastie, 1996). Algorithm A.2 provides another fitting procedure for (12).

S-PLUS commands
We now give S-PLUS commands for Algorithm 1. Read in the fossil data and assign scatterplot vectors to x and y: fossil <-read.table("fossil.dat",header=T) x <-fossil$age y <-100000*fossil$strontium.ratio Set the value of the smoothing parameter (variance ratio) alpha: Set up design matrices, for linear splines in this case.
C.mat <-cbind(X,Z) D.mat <-diag(c(rep(0,ncol(X)),rep(1,ncol(Z)))) Carry out Steps 1 and 2 of Algorithm 1. Note that if a scatterplot smooth corresponding to a different value of ¡ is required then only the last command needs to be re-issued.
A meaningful measurement of the amount of smoothing being done is the degrees of freedom (e.g. Hastie and Tibshirani, 1990  User specified smoothing parameter selection may be handled in SAS through the PARMS option. This is illustrated in the following SAS code. Notice the use of the PARMS option in body of the mixed model specification. The example is taken from section 2.2. Here the variance components are specified whose ratios are equal to the smoothing parameter values given in section 9.2. Note that if the degree of freedom is specified, then SAS/IML can be used to implement Algorithm 1 to obtain the estimate of the smoothing parameter. The last equation in Algorithm 1 can be solved by using the nonlinear procedure NLIN. proc mixed noprofile; *noprofile stops the algorithm from profiling; *out the variance of the error term; model ratio = age / solution outp=paper.yhat; random Z1-Z&nk / type=toep(1) s; *specifying residual and smoothing term variance components; *parms (400) (1) / noiter; *noiter prevents Newton-Raphson iterative; *algorithm from changing variance components; *parms (3.2) (2) / noiter; parms (15) (100) / noiter; run;

Variability Bars
A common embellishment to a scatterplot smooth such as the one shown in Figure 2 is to add variability bars, as shown in Figure 4. The dashed lines in Figure 4 correspond to plus and minus twice the least squares problem gets replaced by a ridge regression problem which is usually more numerically stable (e.g. Draper and Smith, 1998).
An alternative to truncated polynomials with certain attractions are radial basis functions.
Penalised spline smoothers with radial bases, or radial smoothers, and their relationship to smoothing/thin plate splines and kriging are summarised in French, Kammann and Wand (2001). For © ¡ ¥ ¡ a useful class of low-rank radial smoothers is This form allows fitting through standard mixed model software.
is a so-called generalized covariance function and could be replaced by any of the proper covariance functions used in kriging (e.g. Cressie 1993;O'Connell & Wolfinger 1997;Stein 1999).

SAS code
The following SAS code shows the use of SAS/IML to apply to the extension of other bases.
The choice of the bivariate knots , is somewhat more challenging. We have had good experience with knots chosen via an efficient space filling algorithm (e.g. Johnson, Moore and Ylvisaker, 1990;Nychka and Saltzman, 1998). The S-PLUS module FUNFITS (Nychka, Haaland, O'Connell and Ellner, 1998) supports space filling algorithms. Figure 5 shows the result of applying such an algorithm to the (jittered) locations in the example used by Kammann and Wand (2003) for

S-PLUS commands
We will now illustrate mixed model-based bivariate smoothing using thin plate splines with¨£ § using lme() in S-PLUS. First, define the function tps.cov() corresponding to the thin plate spline generalised covariance function § The function is a bit more complicated so that zero arguments and matrix and vector arguments are handled. Read in the knots from a file. These were created using a space-filling algorithm.

Generalized models
The extension to generalized responses, such as binary and count variables, entails generalized mixed models. The most common is the generalized linear mixed model (GLMM) corresponding to the one-parameter exponential family and Gaussian random effects, for which 3 A very common extension is to allow for quasi-likelihood functions (e.g. Breslow and Clayton, 1993) McCulloch and Searle (2000) provides an excellent overview of GLMMs.
Fitting generalized linear mixed models is much more computationally challenging than the linear case (e.g. McCulloch and Searle, Chapter 10). The only software known to us for fitting GLMMs but with the provision for general design matrices as needed for smoothing is the SAS macro glimmix; although this relies on Laplace approximation of integrals (e.g. Wolfinger and O'Connell, 1993). We have recently learned from John Staudenmayer (University of Massachusetts) that the R version of lme() can be used to emulate glimmix because it allows for weights. Section 13.2 illustrates this for smoothing the scatterplot shown in Figure 6.
For a user specified degrees of freedom smoothing reduces to iteratively reweighted least squares ridge regression.