Bayesian Analysis for Penalized Spline Regression Using WinBUGS Bayesian Analysis for Penalized Spline Regression Using WinBUGS

Abstract Penalized splines can be viewed as BLUPs in a mixed model framework, which allows the use of mixed model software for smoothing. Thus, software originally developed for Bayesian analysis of mixed models can be used for penalized spline regression. Bayesian inference for nonparametric models enjoys the flexibility of nonparametric models and the exact inference provided by the Bayesian inferential machinery. This paper provides a simple, yet comprehensive, set of programs for the implementation of nonparametric Bayesian analysis in WinBUGS. Good mixing properties of the MCMC chains are obtained by using low-rank thin-plate splines, while simulation times per iteration are reduced employing WinBUGS specific computational tricks.


Introduction
The virtues of nonparametric regression models have been discussed extensively in the statistics literature. Competing approaches to nonparametric modeling include, but are not limited to, smoothing splines (Eubank 1988;Green and Silverman 1994;Wahba 1990), series-based smoothers (Ogden 1996;Tarter and Lock 1993), kernel methods (Fan and Gijbels 1996;Wand and Jones 1995), regression splines (Friedman 1991;Hansen and Kooperberg 2002;Hastie and Tibshirani 1990), penalized splines (Eilers and Marx 1996;Ruppert, Wand, and Carroll 2003). The main advantage of nonparametric over parametric models is their flexibility. In the nonparametric framework the shape of the functional relationship between covariates and the dependent variables is determined by the data, whereas in the parametric framework the shape is determined by the model.
In this paper we focus on semiparametric regression models using penalized splines (Ruppert et al. 2003), but the methodology can be extended to other penalized likelihood models. It is becoming more widely appreciated that penalized likelihood models can be viewed as particular cases of Generalized Linear Mixed Models (GLMMs, see Brumback, Ruppert, and Wand 1999;Eilers and Marx 1996;Ruppert et al. 2003). We discuss this in more details in Section 2. Given this equivalence, statistical software developed for mixed models, such as S-PLUS (Insightful Corp. 2003, function lme) or SAS (SAS Institute Inc. 2004, PROC MIXED and the GLIMMIX macro) can be used for smoothing (Ngo and Wand 2004;Wand 2003). There are at least two potential problems when using such software for inference in mixed models. Firstly, in the case of GLMMs the likelihood of the model is a high dimensional integral over the unobserved random effects and, in general, cannot be computed exactly and has to be approximated. This can have a sizeable effect on parameter estimation, especially on the variance components. The second problem is that confidence intervals are obtained by replacing the estimated parameters instead of the true parameters and ignoring the additional variability. This results in tighter than normal confidence intervals and could be avoided by using bootstrap. However, standard software does not have bootstrap capabilities and favors the "plug-in" method.
Bayesian analysis treats all parameters as random, assigns prior distributions to characterize knowledge about parameter values prior to data collection, and uses the joint posterior distribution of parameters given the data as the basis of inference. Often the posterior density is analytically unavailable but can be simulated using Markov Chain Monte Carlo (MCMC). Moreover, the posterior distribution of any explicit function of the model parameters can be obtained as a by-product of the simulation algorithm.
The Bayesian inference for nonparametric models enjoys the flexibility of nonparametric models and the exact inference provided by the Bayesian inferential machinery. It is this combination that makes Bayesian nonparametric modeling so attractive (Berry, Carroll, and Ruppert 2002;Ruppert et al. 2003).
The goal of this paper is not to discuss Bayesian methodology, nonparametric regression or provide novel modeling techniques. Instead, we provide a simple, yet comprehensive, set of programs for the implementation of nonparametric Bayesian analysis in WinBUGS (Spiegelhalter, Thomas, and Best 2003), which has become the standard software for Bayesian analysis. Special attention is given to the choice of spline basis and MCMC mixing properties. The R (R Development Core Team 2005) package R2WinBUGS (Sturtz, Ligges, and Gelman 2005) is used to call WinBUGS 1.4 and export results in R. This is especially helpful when studying the frequentist properties of Bayesian inference using simulations.

Low-rank thin-plate splines
The general methodology of semiparametric modeling using the equivalence between penalized splines and mixed models is presented in Ruppert et al. (2003). Consider the regression model is a smooth function. The smooth function could be modeled using natural cubic splines, B-splines, truncated polynomials, radial splines etc. In Bayesian analysis, the particular choice of basis has important consequences for the mixing properties of the MCMC chains. We will focus on low-rank thin-plate splines which tend to have very good numerical properties. In particular, the posterior correlation of parameters of the thin-plate splines is much smaller than for other basis (e.g. truncated polynomials) which greatly improves mixing.
The low-rank thin-plate spline representation of m(·) is where θ = (β 0 , β 1 , u 1 , . . . , u K ) is the vector of regression coefficients, and κ 1 < κ 2 < . . . < κ K are fixed knots. Following Ruppert (2002) we consider a number of knots that is large enough (typically 5 to 20) to ensure the desired flexibility, and κ k is the sample quantile of x's corresponding to probability k/(K + 1), but results hold for any other choice of knots. To avoid overfitting, we minimize where λ is the smoothing parameter and D is a known positive semi-definite penalty matrix. The thin-plate spline penalty matrix is where the (l, k)th entry of Ω K is |κ l − κ k | 3 and penalizes only coefficients of |x − κ k | 3 .
Let Y = (y 1 , y 2 , . . . , y n ) , X be the matrix with the ith row where β = (β 0 , β 1 ) and u = (u 1 , . . . , u K ) . Define σ 2 u = λσ 2 , consider the vector β as fixed parameters and the vector u as a set of random parameters with E(u) = 0 and cov(u) = σ 2 u Ω −1 K . If (u , ) is a normal random vector and u and are independent then one obtains an equivalent model representation of the penalized spline in the form of a LMM (Brumback et al., 1999). Specifically, the P-spline is equal to the best linear predictor (BLUP) in the LMM Using the reparametrization b = Ω 1/2 The mixed model (3) could be fit in a frequentist framework using Best Linear Unbiased Predictor (BLUP) or Penalized Quasi-Likelihood (PQL) estimation. In this paper we adopt a Bayesian inferential perspective, by placing priors on the model parameters and simulating their joint posterior distribution.

The Canadian age-income data
Figure 1 is a scatterplot of age versus log(income) for a sample of n = 205 Canadian workers, all of whom were educated to grade 13. These data were used in Ullah (1985), and their source is a 1971 Canadian Census Public Use Tape.

Model and priors
The mean of log(income) as a function of age was modeled using thin-plate splines with K = 20 knots chosen so that the k-th knot is the sample quantile of age corresponding to probability k/(K + 1). We used model (3) where y i , x i denote the log income and age of the i-th worker. The following priors were used where the second parameter of the normal distribution is the variance. In many applications a normal prior distribution centered at zero with a standard error equal to 1000 is sufficiently noninformative. If there are reasons to suspect, either using alternative estimation methods or prior knowledge, that the true parameter is in another region of the space, then the prior should be adjusted accordingly. The parametrization of the Gamma(a, b) distribution is chosen so that its mean is a/b = 1 and its variance is a/b 2 = 10 6 . In Section 8 we discuss several issues related to prior choice for nonparametric smoothing.

WinBUGS program for age-income data
We now describe the WinBUGS program that follows closely the description of the Bayesian nonparametric model in Equation (3) with the priors defined in (4). We provide the entire  Figure 1: Scatterplot of log(income) versus age for a sample of n = 205 Canadian workers with posterior median (solid) and 95% credible intervals for the mean regression function program in Appendix A. While this program was designed for the age-income data, it can be used for other penalized spline regression models with minor adjustments. Many features of the program will be repeated in other examples and changes will be described, as needed.
The likelihood part of the model (3)  The number of subjects, n, is a constant in the program. The first statement specifies that the i-th response (log income of the i-th worker) has a normal distribution with mean m i and precision τ = σ −2 . The second statement provides the structure of the conditional mean function, m i = m(x i ). Here beta[] denotes the 2×1 dimensional vector β = (β 0 , β 1 ), which is the vector of fixed effects parameters. The ith row of matrix X is . . , b k ) of random coefficients. Both the matrix X and Z = Z K Ω −1/2 K are design matrices obtained outside WinBUGS and are entered as data. In Section 9 we discuss an auxiliary R program that calculates these matrices and uses the R2WinBUGS package to call WinBUGS from R. Such programs would be especially useful in a simulation study. The formulae for mre110[] and mre1120[] could be shortened using the inner product function inprod. However, depending on the application, computation time can be 5 to 10 times longer when inprod is used.
The distribution of the random coefficients b is represented in WinBUGS as for (k in 1:num.knots){b[k]~dnorm(0,taub)} This specifies that the b k are independent and normally distributed with mean 0 and precision Here num.knots is the number of knots (K = 20) and is introduced in WinBUGS as a constant. The prior distributions of model parameters described in Equation (4) are specified in WinBUGS as follows The prior normal distributions for the β parameters are expressed in terms of the precision parameter and the Gamma distributions are specified for the precision parameters τ = σ −2 and τ b = σ −2 b . Note that the code is very short and intuitive presenting the model specification in rational steps. After writing the program one needs to load the data: the n-dimensional vector

Model inference
Convergence to the posterior distributions was assessed using several initial values of model parameters and visually inspecting several chains corresponding to the model parameters.
Convergence was attained in less than 1, 000 simulations, but we discarded the first 10, 000 burn-in simulations. For inference we used 90, 000 simulations. These simulations took approximately 6 minutes on a PC (3.6GB RAM, 3.4GHz CPU). Table 1 shows the posterior median and a 95% credible interval for some of the model parameters. We also obtained the posterior distributions of the mean function of the response, m i = m(x i ). Figure 1 displays the median, 2.5% and 97.5% quantiles of these posterior distributions for each value of the covariate x i . The greyed area corresponds to pointwise credible intervals for each m(x i ) and is not a joint credible band for the mean function. An important advantage of Bayesian over the typical frequentist analysis is that in the Bayesian case the credible intervals take into account the variability of each parameter and do not use the "plug-in" method. Prediction intervals at an in-sample x value can be obtained very easily by monitoring random variables of the type with * i being independent realizations of the distribution N (0, σ 2 ). This can be implemented by adding the following lines to the WinBUGS code

The wage-union membership data
Figure 2 displays data on wages and union membership for 534 workers described in Berndt (1991). The data were taken from the Statlib website at Carnegie Mellon University http: //lib.stat.cmu.edu/. This data set was analyzed in Ruppert et al. (2003) and standard linear, quadratic and cubic logistic regression are not appropriate in this case. Instead, they Raw data are plotted as pluses, but with values of 1 for union replaced by 0.5 for graphical purposes. A worker making USD 44.50 per hour was used in the fitting but not shown to increase detail. model the logit of the union membership probability as a penalized spline, which allows identification of features that are not captured by standard regression techniques. In this section we show how to implement a semiparametric Bernoulli regression in WinBUGS using low-rank thin-plate splines.

Generalized P-spline model
Denote by y the binary union membership variable, by x the continuous wage variable and by p(x) the union membership probability for a worker with wage x in USD per hour. The logit of p(x) is modeled nonparametrically using a linear (p = 1) penalized spline with K = 20 knots. We used the following model where z ik is the (i, k)th entry of the design matrix Z = Z K Ω −1/2 K defined in Section 2. The following prior distributions were used  (5) and (6) 4.2. WinBUGS program for wage-union data While model (5) is very similar to model (3) the Bayesian analysis implementation in MAT-LAB, C or other software is significantly different. Typically, when the model is changed one needs to rewrite the entire code and make sure that all code bugs have been removed. This is a lengthy process that requires a high level of expertise in statistics and MCMC coding.
WinBUGS cuts short this difficult process, thus making Bayesian analysis appealing to a larger audience.
In this case, changing the model from (3) to (5) requires only small changes in the WinBUGS code. Specifically, the two lines specifying the conditional distribution of the response variable are replaced with while the rest of the code remains practically unchanged. Given this very simple change, we do not provide the rest of the code here, but we provide a commented version in the accompanying software file. Table 2 shows the posterior median and a 95% credible interval for some of the model parameters. We also obtained the posterior distributions of p i = p(x i ) and Figure 2 displays the median, 2.5% and 97.5% quantiles of these distributions. The greyed area corresponds to credible intervals for each p(x i ) and is not a joint credible band. The credible intervals take into account the variability of each parameter. Convergence was attained in less than 1, 000 simulations, but we discarded the first 10, 000 burn-in simulations. For inference we used 90, 000 simulations. These simulations took approximately 80 minutes on a PC (3.6GB RAM, 3.4GHz CPU).

The Sitka spruce data
The mixed model representation of penalized splines allows simple extensions to additive mixed models. As an example we will use data on the growth of Sitka spruces displayed in Figure 1.3 in Diggle, Heagerty, Liang, and Zeger (2002). The data consist of growth measurements of 79 trees over two seasons: 54 trees were grown in an ozone-enriched atmosphere while the remaining 25 comprise the control group.

Additive mixed models
A useful mixed model for the Sitka data is where y ij , 1 ≤ i ≤ 79, 1 ≤ j ≤ 13, is the log size of spruce i at the time of measurement j taken on day x ij . Also U i are independent random intercepts for each tree, w i is the ozone exposure indicator and ij are random errors. We model f (·) using low-rank thin-plate splines where the x ij observations are stacked in one vector and (ij) corresponds to the {13 * (i − 1) + j}th observation. Here z ijk is the (13 * (i − 1) + j, k)th entry of the design matrix Z = Z K Ω −1/2 K defined in Section 2. The random parameters b k are assumed independent normal with σ 2 b controling the shrinkage of the thin-plate spline function towards the first degree polynomial.

WinBUGS program for the Sitka spruce data
The WinBUGS program has essentially the same structure as the previous programs.   (7) and (8) The indexing structure is induced by stacking the vectors of observations corresponding to trees. For example the kth observation corresponds to the index (i, j) such that k = 13 * (i − 1)+j. The first line of the program specifies that, conditional on its mean y ij are independent with mean µ ij and precision τ = 1/σ 2 . The second line of of code specifies the structure of the mean function as the sum between a random intercept U i , the ozone treatment effect αw i and a nonparametric mean function f (·). The third line describes the mean function f (·) as a low-rank thin-plate spline.
Nested indexing is a powerful feature of WinBUGS and was used here to define the clusters corresponding to trees. where M = 79 is a constant in the program and tauU is the precision τ U = 1/σ 2 U of the random intercept. The rest of the program is identical to the program for age and log income data and is omitted. A file containing the commented program and the corresponding R programs is attached. Table 3 shows the posterior median and a 95% credible interval for some of the model parameters. We also obtained the posterior distributions of f (x ij ) and Figure 3 displays the median, 2.5% and 97.5% quantiles of these distributions. The greyed area corresponds to credible intervals for each f (x ij )) and is not a joint credible band. Convergence was attained in less than 1, 000 simulations, but we discarded the first 10, 000 burn-in simulations. For inference we used 90, 000 simulations. These simulations took approximately 5.5 minutes on a PC (3.6GB RAM, 3.4GHz CPU).

The coronary sinus potassium data
We consider the coronary sinus potassium concentration data measured on 36 dogs published in Grizzle and Allan (1969). The measurements on each dog were taken every 2 minutes from 1 to 13 minute (7 observations per dog). The 36 dogs come from 4 treatment groups.
Four smoothing spline analyses of these data were presented in Wang (1998). In Crainiceanu and Ruppert (2004b) is presented a hierarchical model of curves including a nonparametric overall mean, nonparametric treatment deviations from the overall curve, and nonparametric subject deviations from the treatment curves. In this section we show how to implement such a complex model in WinBUGS.

Longitudinal nonparametric ANOVA model
Denote by y ij and t ij the potassium concentration and time for dog i at time j (in this example t ij = 2j − 1, but we keep the presentation more general). Consider the following model for potassium concentration where f (·) is the overall curve, f g(i) (·) are the deviations of the treatment group from the overall curve and f i (·) are the deviations of the subject curves from the group curves. Here g(i) represents the treatment group index corresponding to subject i. All three functions are modeled as low-rank thin-plate splines as follows where I (g>1) is the indicator that g > 1, that is that the treatment group is g = 2, or 3 or 4.
Here z tk is the (t, k)th entry of the design matrix for the thin-plate spline random coefficients, Z = Z K Ω −1/2 K corresponding to the overall mean function f (·). Similarly, we defined z (g) tk and z (i) tk as the (t, k)th entries of the design matrices for random coefficients corresponding to the group level f g (·), 1 ≤ g ≤ 4, and subject level curves f i (·), 1 ≤ i ≤ 36. The number of knots can be different for each curve and one can choose, for example, more knots to model the overall curve than each subject specific curve. However, in our example we used the same knots for each curve (K 1 = K 2 = K 3 = 3).
The model also assumes that the b, c, d and δ parameters are mutually independent and where σ 2 b , σ 2 c and σ 2 d control the amount of shrinkage of the overall, group and individual curves respectively and σ 2 0 and σ 2 1 are the variance components of the subject random intercepts and slopes. We could also add other covariates that enter the model parametrically or nonparametrically, consider different shrinkage parameters for each treatment group, etc. All these model transformations can be done very easily in WinBUGS.
To completely specify the Bayesian nonparametric model one needs to specify prior distributions for all model parameters. The following priors were used ∼ Gamma 10 −6 , 10 −6 . (12)

WinBUGS program for the dog data
We provide the entire WinBUGS code for this model in Appendix B. Equation (10)  The response is organized as a column vector obtained by stacking the information for each dog. Because there are 7 observations for each dog, the observation number k can be written explicitly in terms of (i, j), that is k = 7(i − 1) + j. The number of observations is n = 36 × 7 = 252.
We used two n × 1 column vectors with entries dog[k] and group [k], that store the dog and treatment group indexes corresponding to the k-th observation.
The first two lines of code in the for loop correspond to Equation (9), where dnorm specifies that response[k] has a normal distribution with mean m[k] and precision taueps. The mean of the response is specified to be the sum of f[k], fg[k] and fi [k], which are the variables for the overall mean, treatment group deviation from the mean and individual deviation from the group curves.
The following lines of code in the for loop describe the structure of these curves in terms of splines. We keep the same notations from the previous sections. Because in this example we use the same knots and covariates the matrices X and Z do not change for the three types of curves.
The definition of the overall curve f[k] follows exactly the same procedure with the one described in Section 3.2. The definition of fg[k] follows the same pattern but it involves two WinBUGS specific tricks. The first one is the use of the step function, described in Section 5.2. Here step(group[k]-1.5) is 1 if the index of the group corresponding to the k-th observation is larger than 1.5 and zero otherwise. This captures the structure of the f g (·) function in Equation (10)  The WinBUGS coding of the distributions of b, c, d and δ follows almost literally the definitions provided in Equation (11) for ( For example, the parameters c j,k are assumed to be independent with distribution N (0, σ 2 c ) and the WinBUGS code is c[g,k]~dnorm(0,tauc). Here num.knots, ngroups and ndogs are the number of knots of the spline, the number of treatment groups and the number of dogs respectively. These are constants and are entered as data in the program. Using the same notations as in Section 3.2 the normal prior distributions described in Equation (12)  Here taub, tauc, taud and taueps are the precisions σ −2 b , σ −2 c , σ −2 d and σ −2 respectively. taudelta[1] and taudelta[2] are the precisions σ −2 0 and σ −2 1 for the δ-parameters. that the treatment group functions are the sums between the overall mean function and the functions for the treatment group deviations from the mean functions, that is

Model inference
This is achieved in WinBUGS by monitoring a new variable fgroup[] defined as For inference we used 90, 000 simulations. These simulations took approximately 4.5 minutes on a PC (3.6GB RAM, 3.4GHz CPU).

Improving mixing
Mixing is the property of the Markov chain to move rapidly throughout the support of the posterior distribution of the parameters. Improving mixing is very important especially when computation speed is affected by the size of the data set or model complexity. In this section we present a few simple but effective techniques that help improve mixing.
Model parametrization can dramatically affect MCMC mixing even for simple parametric models. Therefore careful consideration should be given to the complex semiparametric models, such as those considered in this paper. Probably the most important step for improving mixing in this framework is careful choice of the spline basis. While we have experimented with other spline bases, the low-rank thin-plate splines seem best suited for the MCMC sampling in WinBUGS. This is probably due to the reduced posterior correlation between the spline parameters. The truncated polynomial basis provides similar inferences about the mean function but mixing tends to be very poor with serious implications about the coverage probabilities of the pointwise confidence intervals.
In our experience with WinBUGS, centering and standardizing the covariates also improve, sometimes dramatically, mixing properties of simulated chains.
Another, less known technique is hierarchical centering (Gelfand, Sahu, and Carlin 1995b,a). Many statistical models contain random effects that are ordered in a natural hierarchy (e.g. observation/site/region). The hierarchical centering of random effects generally has a positive effect on simulation mixing and we recommend it whenever the model contains a natural hierarchy. Bayesian smoothing models presented in this paper also contain the exchangeable random effects, b, which are not part of an hierarchy and they cannot be "hierarchically centered".
In Crainiceanu, Ruppert, Stedinger, and Behr (2002) it is shown that even for a simple Poisson-Log Normal model the amount of information has a strong impact on the mixing properties of parameters. A practical recommendation in these cases is to improve mixing, as much as possible, for a subset of parameters of interest. These model specification refinements pay off especially in slow WinBUGS simulations.

Prior specification
Any smoother depends heavily on the choice of smoothing parameter, and for P-splines in a mixed model framework, the smoothing parameter is the ratio of two variance components Ruppert et al. (2003). The smoothness of the fit depends on how these variances are estimated. For example, in Crainiceanu and Ruppert (2004a) it is shown that, in finite samples, the (RE)ML estimator of the smoothing parameter is biased towards oversmoothing.
In Bayesian mixed models, the estimates of the variance components are known to be sensitive to the prior specification, e.g., see Gelman (2004). To study the effect of this sensitivity upon Bayesian P-splines, consider model (3) with one smoothing parameter and homoscedastic errors. In terms of the precision parameters τ b = 1/σ 2 b and τ = 1/σ 2 , the smoothing parameter is λ = τ /τ b = σ 2 b /σ 2 and a small (large) λ corresponds to oversmoothing (undersmoothing). It is standard to assume that the fixed effects parameters, β i , are apriori independent, with prior distributions either [β i ] ∝ 1 or β i ∝ N (0, σ 2 β ), where σ 2 β is very large. In our applications we used σ 2 β = 10 6 , which we recommend if x and y have been standardized or at least have standard deviations with order of magnitude one.
As just mentioned, the priors for the precisions τ b and τ are crucial. We now show how critically the choice of τ b may depend upon the scaling of the variables. If and Also, and similarly for τ .
The prior does not influence the posterior distribution of τ when both A b and B b are small compared to K m /2 and ||b|| 2 /2 respectively. Since the number of knots is K m ≥ 1 and in most problems considered K m ≥ 5, it is safe to choose A b ≤ 0.01. When B b << ||b|| 2 /2 the posterior distribution is practically unaffected by the prior assumptions. When B b increases compared to ||b|| 2 /2, the conditional distribution is increasingly affected by the prior assumptions. E(τ b |Y , β, b, τ ) is decreasing in B b so large B b compared to ||b|| 2 /2 correspond to undersmoothing. Since the posterior variance of τ b is also decreasing in B b a poor choice of B b will likely result in underestimating the variability of the smoothing parameter λ = τ /τ b causing too narrow confidence intervals for m. The condition B b << ||b|| 2 /2 shows that the "noninformativeness" of the gamma prior depends essentially on the scale of the problem, because the size of ||b|| 2 /2 depends upon the scaling of the x and y variables. If y is rescaled to a y y and x to a x x, then the regression function becomes a y m(a x x) whose p-th derivative is a y a p x m (p) (a x x) so that ||b|| 2 /2 is rescaled by the factor a 2 y a 2p x . Thus, ||b|| 2 /2 is particularly sensitive to the scaling of x.
A similar discussion holds true for τ but now large B corresponds to oversmoothing and τ does not depend on the scaling of x. In applications it is less likely that B is comparable in size to ||Y − Xβ − Zb|| 2 , because the latter is an estimator of nσ 2 . Ifσ 2 is an estimator of σ 2 a good rule of thumb is to use values of B smaller than nσ 2 /100. This rule should work well whenσ 2 does not have an extremely large variance.
Alternative to gamma priors are discussed by, for example, in Gelman (2004); Natarajan and Kass (2000). These have the advantage of requiring less care in the choice of the hyperparameters. However, we find that with reasonable care, the conjugate gamma priors can be used in practice. Nonetheless, exploration of other prior families for P-splines would be well worthwhile, though beyond the scope of this paper. 9. Interface with and processing in R WinBUGS 1.4 provides a Graphical User Interface (GUI) that is user friendly and provides important information including the chain histories that can be used to asses mixing. However, the WinBUGS script language is relatively limited and is hard to use for effective simulation studies involving repeated calls for WinBUGS.
R2WinBUGS Sturtz et al. (2005) is an R package that calls WinBUGS 1.4 and exports results into R. We used this package into our own R function that also does processing of data. R functions for each model described in this paper are attached to this paper. We present here important parts of the R code, while commented R programs are attached to this paper.

Pros and cons
An advantage of WinBUGS is the simple programming that translates almost literally the Bayesian model into code. This saves time by avoiding the usually lengthy implementations of the MCMC simulation algorithms. For example, total programming time for one model is approximately 1 to 2 hours. Programs designed by experts for specific problems can be more refined by taking into account properties of the model and using a combination of art and experience to improve mixing and computation time. However, when we compare a WinBUGS with an expert program in terms of computation speed, programming time needs to be taken into account.
WinBUGS allows simple model changes to be reflected in simple code changes, which encourages the practitioner or the expert to investigate a much wider spectrum of models. Expert programs are usually restrictive in this sense.
Our recommendation is to start with WinBUGS, implement the model for the specific data set. If it runs in a reasonable time and has good mixing properties, then continue with WinBUGS. Otherwise consider designing an expert program. Even if one decides to use the expert program we still recommend using WinBUGS as a method of checking results. Programming errors and debugging time are also dramatically reduced in WinBUGS.