Intake_epis_food(): An R Function for Fitting a Bivariate Nonlinear Measurement Error Model to Estimate Usual and Energy Intake for Episodically Consumed Foods

We consider a Bayesian analysis using WinBUGS to estimate the distribution of usual intake for episodically consumed foods and energy (calories). The model uses measures of nutrition and energy intakes via a food frequency questionnaire (FFQ) along with repeated 24 hour recalls and adjusting covariates. In order to estimate the usual intake of the food, we phrase usual intake in terms of person-specific random effects, along with day-to-day variability in food and energy consumption. Three levels are incorporated in the model. The first level incorporates information about whether an individual in fact reported consumption of a particular food item. The second level incorporates the amount of intake from those individuals who reported consumption of the food, and the third level incorporates the energy intake. Estimates of posterior means of parameters and distributions of usual intakes are obtained by using Markov chain Monte Carlo calculations. This R function reports to users point estimates and credible intervals for parameters in the model, samples from their posterior distribution, samples from the distribution of usual intake and usual energy intake, trace plots of parameters and summary statistics of usual intake, usual energy intake and energy adjusted usual intake.


Introduction
There are many statistical challenges when modeling food intakes reported on two or more 24 hours recalls. Some of the challenges involve the presence of measurement error because estimating the distribution of usual intake of nutrients and foods in the population involves monitoring and measuring such intakes over time and their associated recall biases. In addition, many consumers report episodically consumed foods, foods that are not typically consumed every day. For example, fish may be reported in a particular day with a positive intake, but on a different day fish intake may equal zero. Consequently, it is difficult to estimate nutrition intake with recall surveys when one of these recalls incorporate excess zero measurements. Data of this nature is often modeled with measurement error models with zero inflated data.
Recently, in nutritional surveillance ) and nutritional epidemiology (Kipnis et al. 2009) two-part methods had been developed for analyzing episodically consumed foods. In the first part of the model, the probability of an episodically consumed food is estimated using logistic regression with a person-specific random effect. Then, in the second part of the model the amount of that episodically consumed food per day is modeled using linear regression on a transformed scale with also a person-specific effect. These two parts are linked allowing that the two person-specific effects are correlated as well as by allowing common covariates in both parts of the model. This method is known as the "NCI method" (http://riskfactor.cancer.gov/diet/usualintakes/). An extension into a three-part method that also incorporates the estimation of the amount of energy intake consumed per day is described in detail by Kipnis et al. (2010). These authors estimated this three-part method using nonlinear mixed effects models with likelihoods computed by adaptive Gaussian quadrature in SAS software. However, computationally it was found to have serious convergence issues within the context of nutritional epidemiology and nutritional surveillance as indicated by Kipnis et al. (2010) and Zhang et al. (2010).
The goal of this article is to present a function for the implementation of this nonlinear mixed effects model. The function Intake_epis_food() allows readers to input their data in R (R Development Core Team 2009) to generate and run the script of the three part model in WinBUGS (Spiegelhalter et al. 1999) and to run and obtain output simulations to R using the package R2WinBUGS (Sturtz et al. 2005). While the most important functions of the package R2WinBUGS are illustrated, we do not provide comprehensive documentation here; instead the reader is referred to the manual and online documentation with that package available from the Comprehensive R Archive Network at http://CRAN.R-project.org/package=plink. In the next sections, the function and corresponding algorithms are explained and an example is provided.

Method and parameter inference
The computational details of the Bayesian approach to fit the three-part model for an episodically consumed food and energy model  through Markov Chain Monte Carlo techniques are provided by Zhang et al. (2010) with a brief summary given here. Our model takes into consideration measures of nutrition and energy intakes via a food frequency questionnaire (FFQ) along with two repeated 24 hour recalls (24hr) and adjustment covariates. These two repeated 24hr provide information on the amounts of food and energy consumed by each individual. Consequently, an indicator variable of whether the food was consumed can be generated from the reported amount of food consumed. In addition, because two or more 24hr require distinguishing between and within person random error (Eckert et al. 1997), a type of classical measurement error is needed. Generally nutritional data are skewed, which may require transformations to reach normality and standardization. For person i = 1, …, n, and for the k = 1, 2 repeats of the 24hr, the data are Ỹ ik = (Y i1k , …, Y i3k ) T , where • Y i1k = Indicator of whether the food is consumed.
• Y i2k = Amount of the food consumed as reported by the 24hr, which equals zero if the food is not consumed.
• Y i3k = Amount of energy consumed as reported by the 24hr.
We will use a Box-Cox transformation to account for the skewness in the data. The Box-Cox transformation with transformation parameter λ, is h(y, λ) = (y λ −1)/λ if λ ≠ 0, and h(y, λ) = log(y) if λ = 0. We will allow for the user to specify different transformation parameters λ F and λ E for food and energy intake, respectively.
After Box-Cox transformation, we further standardize and center these transformed variables to have a mean of zero and variance of one. This is useful for making the prior distribution specifications given below to be sensible and allows rapid convergence of the posterior samples. Specifically, let μ λ F and σ λ F be the mean and standard deviation of the transformed non-zero food data h(Y i2k , λ F ), and let μ λ E and σ λ E be the mean and standard deviation of the transformed energy data h(Y i3k , λ E ). Then, our analysis is performed using the following transformations: (1) (2) There are also covariates such as age category, ethnic status and, in many cases the results of reported intakes from a food frequency questionnaire. In principle, the covariates can differ based on the three types of data, so we denote them as V i1 , V i2 , V i3 , which are vector-valued.
To improve linearity and homocedasticity of the model, we follow the recommendations of (i) implementing Box-Cox transformations on all or some covariates like intakes from a food frequency questionnaire (Kipnis et al. 2009), (ii) centering and (iii) scaling on all covariates. Without loss of generality, let λ c represent a vector containing the Box-Cox transformation parameter used for covariates. Let μ λ c and σ λ c be the vector means and standard deviations of the transformed covariates. Then, these transformed, center and scaled covariates are denoted by: Let (U i1 , U i2 , U i3 ) = Normal(0, Σ u ) be random effects associated with consumption, amount (if the food is consumed), and energy. Similarly, for k = 1, 2, (ε i1k , ε i2k , ε i3k ) = Normal(0, Σ ε ) accounts for day-to-day variation. The model for whether there is consumption can be stated as: (7) where Φ(·) is the standard normal distribution function and (W i1k , W i2k , W i3k ) represent their corresponding latent variables as follows. A food being consumed at visit k is equivalent to (8) The food when consumed and energy intake are modeled as The prior distribution of parameters β 1 , β 2 , β 3 is assumed to be multivariate Normal with vector mean 0 and variance covariance Σ u . An inverse Wishart denoted as IW(Ω u , m u ) prior was specified for Σ u . We set Ω u to have an exchangeable correlation structure with starting values with diagonal entries all equal to 1 and correlations 0.5. We have two necessary restrictions on Σ ε : (i) ε i1k and ε i2k are independent; and (ii) var(ε i1k ) = 1, so that β 1 is identifiable, along with the distribution of U i1 . Furthermore, we constrained this covariance matrix using a polar coordinate representation with γ ∈ (−1, 1) and θ ∈ (−π, π). With these considerations Σ ε can be written as: (11) where p 1 = γcos(θ) and p 2 = γsin (θ). The recommended prior distributions for s 22 and s 33 are Uniform(0,3), for γ is Uniform(−1,1) and for θ is Uniform(−π, π) (Zhang et al. 2010).
The corresponding inverse of Σ u is a Wishart distribution denoted as , and the inverse of Σ ε can be written as: (12) where .

Usual intake analysis
Besides the parameters and the random effects (U i1 , U i2 , U i3 ) in this context it is vital to also have the posterior samples of usual intake T Fi , usual energy intake T Ei , and energy adjusted usual intake 1000T Fi /T Ei , all on the original scale. We used the best-power method  for estimating both usual energy intake and usual intake for person i. We first consider energy: where Similarly, a person's usual intake of the dietary component on the original scale is defined as (15) When λ = 0, the back-transformation and second derivative are: Similarly, when λ ≠ 0, the back-transformation and second derivative are:

Using the Intake_epis_food() function in R
Users must install the R2WinBUGS (Sturtz et al. 2005) package in R and WinBUGS (Spiegelhalter et al. 1999) before running Intake_epis_food() function. This function assumes that there are no missing observations. Intake_epis_food() loads the libraries R2WinBUGS, stats and fSeries automatically.
The Intake_epis_food() function incorporates this three-part model, with truncated normal random variables for generating the latent W i1k , and Metropolis-Hastings computations for α 1 , α 2 , α 3 , β 1 , β 2 , β 3 , the elements of Σ u , and Σ ε . This function generates and runs the script of this model in WinBUGS and returns Markov Chain Monte Carlo (MCMC) computation. We emphasize that MCMC computation can either be thought of as a strictly Bayesian computation with ordinary Bayesian inference, or as a means of developing frequentist estimators of the crucial parameters. These MCMC produces estimates which are known to be asymptotically equivalent to maximum likelihood estimators, which are difficult to obtain (Zhang et al. 2010). In Bayesian terms we used the posterior means of the model. Users are mainly interested in estimates of equations (14), (15) as well as the energy-adjusted usual intake 1000T Fi /T Ei .
The Intake_epis_food() function produces seven output files. The first output file contains parameter estimates (posterior means) of α 1 , α 2 , α 3 , β 1 , β 2 , β 3 from equations (8), (9) and (10) as well as Σ μ , , Σ ε , and . The second output file contains summary statistics of Bayesian estimates of equations (14), (15) and the energy-adjusted usual intake 1000T Fi /T Ei . The third output file contains the iterations from the MCMC computations. The fourth output file contains summary statistics of Bayesian estimates for U i1 , U i2 , U i3 . The fifth output file contains trace plots of the intercept and coefficients in equations (8), (9) and (10). The sixth output file contains trace plots of the variance-covariances Σ μ , , Σ ε and . The seventh output file contains density plots of equations (15) and the energy adjusted usual intake 1000T Fi /T Ei .
file.istats : name of the file to save summary statistics of usual food intake, (15), usual energy intake, (14) and usual food intake per 1000 calories (1000T Fi /T Ei ) (default = estimates intake.csv). file.densityp : name of the file to save density plots of usual food intake and usual food intake per 1000 calories. Each one is generated individually and an overlayed plot is generated as well (default = density plots.pdf) bugs.directory : drive and folder location of WinBUGS (default="c:/Program Files / Win-BUGS14/") Once the data is input, Intake_epis_food() invokes WinBUGS from R. WinBUGS requires a file describing the model in equations (1)-(5) in Section 2. This model is included as file intake_program.txt and is loaded in R using its bugs() argument. Although, the default priors of the constrained variance-covariance matrix of random errors are geared to the prestandardization of the data as described in Section 2, the user may change those prior entries within the file intake_program.txt, possibly changing: Constrained variance-covariance matrix of random errors Users may change any of the defaults of Intake_epis_food() function when the function is called to run by replacing those arguments in the last sentence of the file accompanied by this paper. MCMC posterior means of α 1 , α 2 , α 3 , β 1 , β 2 , β 3 , the elements of Σ u , Σ ε , are displayed on the screen of R by Intake_epis_food() and saved automatically in numeral matrices on file names previously listed. See Section 4 for an example.

Application of the Intake_epis_food() function
We simulated data using parameter values identified from the calibration sub-study of the National Institutes of Health (NIH) and the American Association of Retired Persons (AARP) Diet and Health Study (Schatzkin et al. 2001). The NIH-AARP study is a cohort composed of people who resided in one of six states: California, Florida, Pennsylvania, New Jersey, North Carolina, and Louisiana or in two metropolitan areas: Atlanta, Georgia and Detroit, Michigan. From 1995 through 1996, 3.5 million questionnaires were mailed to members of the AARP, aged 50-71 years. The questionnaire included a dietary section as well as some lifestyle questions. Over 500,000 people returned the questionnaire after three mailing waves. This, then, is the largest study of diet and health ever conducted in the USA. In 1996-1997, these participants received a Risk-Factor Questionnaire which asked additional questions about lifestyle and behavior, and in 2004-2006 these participants received another follow-up questionnaire (http://dietandhealth.cancer.gov/history.html).
Participants for the calibration study were randomly selected from the 46,970 subjects who had responded to the first wave as of January 1996. Two thousand participants was the targeted sample size to attempt recruitment calls for a baseline and 24 hours follow up. The sample available to us included 920 men. Because the data are not publicly available, we simulated 920 food intakes that are similar in distribution to the men in the NIH-AARP calibration study.
We used as four covariates: age, body mass index (BMI), consumption of servings of whole grains from the FFQ, and energy intake from whole grains from the FFQ. Therefore, the degrees of freedom of the inverse Wishart were setup as m u = 5. The latter two covariates were transformed by the cube root and the logarithm, respectively.
We present results from Intake_epis_food() for this simulated data example with a sample size of 920 men, modeling whole grains consumption. We used a burn-in of 1, 000 steps followed by 10, 000 MCMC iterations; our Intake_epis_food() function took approximately 6 hours and 16 minutes (Pentium computer (R) D CPU 3.5GHz and 1.99GB of RAM). We used only every 10 th value of the chain. The first output file contains the average over 10,000 MCMC of posterior mean, posterior standard deviation, posteriors: 2.5 th percentile, 25 th percentile, 50 th percentile, 75 th percentile and 97.5 th percentile. The intercepts of the model are α 1 , α 2 , α 3 corresponding to each level of the three-part model. Also, estimates of regression parameters for each covariate appear for each level of the model. This is the notation used for the output: alpha_1 : represents the intercept in equation (8).
We used λ F = 0.32 as the Box-Cox transformation parameter of food intake λ E = 0.23 as the Box-Cox transformation of energy intake. We setup our working directory in R with the command setwd('C:') and using Intake_epis_food() function: Intake_epis_food(data.file.name="Sim_AARP_wg_Men.csv", numcov=4, df=5,lambdaf=0.32,lambdae=0.23,lvar1=999, lvar2=999, lvar3=0.33, lvar4=0, n.iter=11000,n.burnin=1000, n.thin=10, bugs. After calculations, estimates of α 1 , α 2 , α 3 , β 1 , β 2 , β 3 from equations (8), (9) and (10), Σ μ , , Σ ε , are shown as well as summary statistics using a Bayesian approach for three variables are provided immediately in the console screen (a) food usual intake, (b) usual food intake per 1000 calories, and (c) energy usual intake. The following shows the results for whole grains. Posterior density of the mean for whole grains. The solid line is the density estimate for usual intake from 1000 MCMC. The dashed line is the density estimate for usual intake per 1000 calories. Trace plot from 1000 MCMC of intercepts and estimated coefficients. First row shows trace plots of parameters for the intercept, age, body mass index, consumption of whole grains and energy intake from whole grains in equation (8). Second row shows trace plots of parameters for the intercept, age, body mass index, consumption of whole grains and energy intake from whole grains in equation (9). Third row shows trace plots of parameters for the intercept, age, body mass index, consumption of whole grains and energy intake from whole grains in equation (10). Trace plot from 1000 MCMC for each entry estimate of variance-covariance matrix Σ μ j=p=1,2,3. Trace plot from 1000 MCMC for each entry estimate of variance-covariance matrix j=p=1,2,3. Trace plot from 1000 MCMC for each entry estimate of variance-covariance matrix Σ ε j=p=1,2,3. The following parameters are neither estimated nor plotted because they are fixed: Σ ε11 =1 and Σ ε12 =Σ ε21 =0. Trace plot for each entry estimate of variance-covariance matrix j=p=1,2,3.