BANOVA: An R Package for Hierarchical Bayesian ANOVA

In this paper, we develop generalized hierarchical Bayesian ANOVA, to assist experimental researchers in the behavioral and social sciences in the analysis of the effects of experimentally manipulated within- and between-subjects factors. The method alleviates several limitations of classical ANOVA, still commonly employed in those fields. An accompanying R Package for hierarchical Bayesian ANOVA is developed. It offers statistical routines and several easy-to-use functions for estimation of hierarchical Bayesian ANOVA models that are tailored to the analysis of experimental research. Markov Chain Monte Carlo (MCMC) simulation is used to simulate posterior samples of the parameters of each model specified by the user. The core program of all models is written in R and JAGS. After preparing the data in the required format, users simply select an appropriate model, and estimate it without any advanced coding. The main aim of the R package is to offer freely accessible resources for hierarchical Bayesian ANOVA analysis, which makes it easy to use for behavioral researchers.


Introduction
Because of its computational attractiveness and the ease of interpretation of its statistical tests and the corresponding tables of means, ANOVA (Fisher 1921(Fisher , 1925 is implemented in most statistical packages and continues to garner tremendous popularity in applied research. As it is about to see its centennial, especially the behavioral sciences still rely heavily on ANOVA for the analysis of their data from experiments with human subjects (e.g. Cardinal and Aitken 2005). Yet, the standard approach to ANOVA is based on several assumptions that often do not hold for the data typically collected in those fields of research.
First, ANOVA assumes a continuous homoscedastic i.i.d. Normal distributed dependent variable. Categorical variables, however, for which ANOVA has long known not to be appropriate (Cochran 1940), abound in the behavioral sciences. These include measurements of perceptions, attitudes and intentions on categorical rating and multiple choice scales, and binary and count measures of human attention, memory and decision making (Nunnaly 1967;Thorndike 1971;Lord and Novick 1968). In addition, continuous measures such as response times, which often have high skewness or kurtosis, are commonly used as measures of human behavior. The distributional properties of most of these variables violate the assumptions underlying ANOVA, and extensive research into the effect of these violations has shown that they may lead to both excess type-I and type-II errors in significance testing (Ito 1980;Tan 1982;Tiku 1971). Transformations of the data, such as the log and square-root (for counts), logit and arcsine (for proportions), rank (for ordered categorical variables), and Box-Cox transformations (for various measures) have been used as a way to render the empirical distribution closer to the Normal (Bartlett 1947;Box and Cox 1964;Draper and Hunter 1969;Conover and Iman 1976). These transformations, however, often do not provide a satisfactory solution, because they may cause the ANOVA tables of means to lose some of their appealing interpretations, while significance levels of the transformed and original data do not necessarily correspond. But, modern statistical solutions are available in the form of Generalized Linear Models (McCullagh and Nelder 1989). The application of GLMs capitalizes on the fact that ANOVA is a special case of linear regression models, and GLMs extend those to a wide variety of distributions of the dependent variable in the exponential family. However, for applied researchers a downside of the use of GLM to analyze designed experiments that involve of multiple factors and interactions that need to be represented in the model through dummy variables, is that the interpretation of estimates of coefficients of these dummy variables is not as easy as interpreting the output of ANOVA, and indeed, may often be quite cumbersome.
Second, in the behavioral sciences experiments often employ a combination of between-and within-subjects factors, leading to nested and repeated measurement designs for which established (split-plot and repeated measures) ANOVA procedures are available in most statistical packages. These mixed ANOVA models may have both fixed and random effects (Hartley and Rao 1967;Scheffé 1957), but assume a balanced design, a continuous dependent variable and categorical independent variables. Unbalanced designs, unequally spaced measurements, and continuous covariates such as encountered in ANCOVA, violate these assumptions. As a consequence, in certain fields of behavioral science some experimental behavioral science researchers have resorted to hierarchical linear models (Breslow and Clayton 1993;Longford 1987;Raudenbusch 1988;Raudenbush 1999), in particular for the analysis of quasi experiments. These models, which take on a variety of forms and go under different names in the literature, are special cases of hierarchical Bayes models (Lindley and Smith 1972;Press 2003;Gelman, Carlin, Stern, and Dunson 2013). They allow for more general covariance structures and data hierarchies than repeated measures ANOVA. Gelman (2005) argued the importance of hierarchical Bayes formulations of ANOVA, and showed how the principles of ANOVA are helpful in understanding hierarchical linear models. In addition, hierarchical Bayes models can accommodate non-Normal dependent variables that render the application of classical ANOVA and hierarchical linear models problematic. The Bayesian approach in addition offers a number of theoretical and pragmatic advantages as a framework for inference and testing that have been widely acknowledged (Bernardo J 2000;Press 2003;Savage 1954). Indeed, in behavioral research several advantages of Bayesian inference are increasingly recognized, in that it provides inferences based on finite samples, avoids pitfalls of classical hypothesis testing, and may not only reject but also support hypotheses (Kruschke 2013;Rouder, Speckman, Sun, Morey, and Iverson 2009). Hierarchical Bayes models can now be relatively easily be implemented using existing statistical software, such as BUGS and JAGS (Lunn, Thomas, Best, and Spiegelhalter 2000).
Nevertheless, applied experimental researchers in the behavioral sciences continue to resort to standard ANOVA in many cases in spite of these limitations and in spite of the availability of these modern superior alternatives, because of its ease of application and interpretation, widespread availability in standard statistical packages, lack of familiarity with better alternatives, and/or the effort involved in programming alternative methods or interpreting their output. The present paper attempts to help remedy this undesirable state of affairs by developing an R package for hierarchical Bayes ANOVA that addresses the most salient limitations of classical ANOVA, yet is easy to use and retains many of the familiar features of the outputs of classical ANOVA. It deals with a wide range of distributions for the dependent variable, with hierarchical data structures and between-and within-subjects design factors, as well as continuous covariates.
The key insight behind the approach is that ANOVA and ANOCOVA are special cases of linear regression and that once an ANOVA model is formulated as a hierarchical linear model, subject-level parameters become incidental and inference focuses entirely on the populationlevel model, which is where main effects and interactions of within-and between-subjects factors are represented and tested. Assuming that lower level parameters describing subject heterogeneity follow Normal distributions, it follows that significance tests of main effects and interactions, variance decompositions and tables of means can be computed in a similar way as they would be in standard ANOVA. This then allows for the analysis of dependent variables with a wide variety of distributional forms with hierarchical models, but at the same time retaining much of the appealing output from standard ANOVA for experimental data. The underlying estimation methods are Markov Chain Monte Carlo algorithms implemented in the JAGS software. The user of the package needs to input the data and set up a few parameters. The package then sets up a JAGS program and analyses the data with a hierarchical Bayes ANOVA, using MCMC estimation. JAGS was chosen as an interface, because after calling the R package in addition to the estimation results, the JAGS code will be available for inspection and modification by the behavioral researcher. Importantly, although the underlying models are Hierarchical Bayes models, the output of these models is presented in a form that is very familiar to users of standard ANOVA, including (Bayesian) p-values and effect sizes, and tables of means with confidence intervals.
The remainder of this paper is organized as follows. In section 2, we discuss the hierarchical Bayesian approach to ANOVA. In section 3, the architecture and tutorial of the R package is discussed. Bayesian estimation of parameters and other quantities of interest are introduced as well. We include examples in section 4. The last section gives the conclusion and future work.

Bayesian ANOVA Models
We assume data are collected in an experiment in which a sample of subjects have partic-ipated, and have been subjected to between-subject as well as within-subject experimental manipulations. Repeated measurements of one or more dependent variables are taken on each subject, while continuous or categorical covariates may have been measured. The hierarchical Bayesian approach to ANOVA then consists of two sub-models: level-1 -the subject level, and level-2 -the population level. The subject-level model represents the effects of withinsubject factors and covariates, and the population-level model represents the influence of between-subject level factors. The population-level model expresses the ANOVA of interest.
In the subject-level model, each outcome of the dependent variable, y i , with i indexing data points, is assumed to be generated from a particular distribution in the exponential family, f (y i |µ i ) (and even other distributions can be accommodated). The mean, µ i , of the distribution depends on the independent variables through a suitable link function g(·) (McCullagh and Nelder 1989). The within-subject factors and their interactions are indexed by p(p = 1, 2, ..., P ). Each index p represents a batch of J p coefficients: β p j,s , j = 1, ..., J p ; s = 1, ..., S indexes subjects. Note that if a subject-level covariate is continuous, J p = 1, so that AN-COVA models are also accommodated (but the formulation here relaxes their "constant slope" assumption). The subject-level model is expressed as a generalized linear regression model, with a design matrix X that contains all within-subject factors and their interactions, as well as a constant term (p = 0): where s i is the subject index of data point i.
The population-level model allows for unobserved heterogeneity among subjects, because the subject-level coefficients β p j,s are assumed to follow a multivariate normal distribution. The between-subject factors and their interactions are indexed by q(q = 1, 2, ..., Q); q = 0 denotes the constant term. Then, using the notation in Gelman (2005), the population-level ANOVA can be written as: Each q represents a batch of K q coefficients: θ p,q j,k , k = 1, ..., K q ; K q s indexes coefficient k in batch q corresponding to the treatment of subject s. For example, in the simple case of one 3-level within-subject factor D (P = 2, J 1 = 1andJ 2 = 3) and two 2-level between-subject factors A and B, and the AB-interaction, Q = 3, equation (2) and equation (3) reduce to (with the parameter of the last level of each factor set to zero): Here, equation (4b) contains the overall intercept (θ 0 1 ) and the main effects of the betweensubject factors A (θ A 1,k A s ) and B (θ B 1,k B s ) and their interaction (θ AB 1,k AB s ). Equation (4c) contains the main effect of the (first level) of the within-subject factor D (θ D0 1 ), and its' two-and threeway interactions with the between-subject factors and B. Similarly, equation (4d) contains the main effect of the second level of D, and its interactions with A and B.
The population-level ANCOVA model can be expressed as a linear model with a design matrix Z that contains all between-subject factors and their interactions (and a constant term): where Z s,k is an element of Z, a S×K matrix of covariates, and K is the number of parameters. θ p j,k is a hyper-parameter which captures the effects of between-subject factor q on the parameter β p j,s of within-subject factor p. The error δ p j,s is assumed to be normal: δ p j,s ∼ N (0, σ −2 p ). Proper, but diffuse priors are assumed: θ p j,k ∼ N (0, s), and σ p ∼ Gamma(a, b), where s, a, b are hyper-parameters.
The Hierarchical Bayes ANOVA model is estimated capitalizing on the fact that it is a special case of hierarchical generalized linear models, that is, using equations (1), (2) and (5). We use effects-coding of the factors in the matrices X and Z (Overall, Spiegel, and Cohen 1975). It is important to note that equation (3) is the equation that is of key interest for inference. It contains the parameters that specify the population-level ANOVA model. It is as if the subject-level coefficients β p j,s are the (Normally distributed) "dependent variables" in an ANOVA, specified by the between-subject factors in equation (3). Thus, inferences focuses on the parameters in equation (3). We first specify the specific outcome variables that are accommodated in the R package below.
Continuous responses: To model continuous data, a normal distribution can be assumed for y i : where η i is defined in equation (2), and the scale parameter σ, σ ∼ Gamma(α, β).
To describe data with "outliers" or fatter tails than the normal, the distribution of i is assumed to follow a t-distribution, with an unknown number of degrees of freedom, assumed to follow a Poisson distribution: where α, β, λ are hyper-parameters.
Binary responses: To model data y i that take on the values 0 and 1, a Bernoulli distribution is assumed, where logit(x) = ln x 1−x is the standard logit link-function. If the data y i represents the number of successes in a sequence of B independent Bernoulli experiments, then, Count responses: To model count data y i that can take on values in 0, 1, 2, ..., the Poisson distribution is assumed: Ordered categorical responses: To model data y i that are ordered categorical and can take on the values 1, ..., M, an ordered logistic model is used, The cut-point parameters c k are constrained:0 = c 1 < c 2 < · · · < c K−1 . We assume c 1 = 0, and the other cut-points follow a Normal distribution c m ∼ N (0,σ 2 m ), withσ 2 m ∼ U nif orm(0, 100).
Multinomial responses: To model data y i that are categorical and can take on the values 1, ..., K, a multinomial logistic model (MNL) is used, where η i,k = P p=0 Jp j=1 X k,p i,j β p j,s i , and X k,p i,j is the design matrix corresponding to each response category k(k = 1, ..., K) of y i .

Obtaining the software
The BANOVA package is an add-on package to the statistical software R. It is free and can be downloaded from the Comprehensive R Archive Network (CRAN, http:// CRAN.Rproject.org/...). The base of BANOVA package is implemented in R and JAGS. Thus, an additional system requirement is the JAGS software, which can be freely downloaded from(http://mcmc-jags.sourceforge.net). The package also imports two other packages, runjags (Denwood 2013) and coda (Plummer, Best, Cowles, and Vines 2006) in order to connect R and JAGS and perform convergence diagnostics. Note, the above two imported packages do not necessarily need to be installed before installing BANOVA. They are automatically attached to the package and loaded when the package BANOVA is loaded. However, the JAGS software must be installed in order to estimate any of the models introduced above. The package will automatically detect the location of JAGS software and connect it via runjags. Once R and JAGS have been installed, BANOVA can be loaded using the following command, R> library('BANOVA')

Functionality
BANOVA can fit the Bayesian hierarchical ANOVA models introduced in the previous section. As explained there, the response data follows a wide variety of distributions including Normal, student's t, Poisson, binomial, ordered and unordered Multinomial distributions. Each of the corresponding models can be fitted by a specific function in the package. The names of these functions have the form of 'BANOVA.Bin', where the first part specifies the general name and the second part after the '.' specifies the form of the likelihood. Currently, there are seven models included in the package. The predictor for each Bayesian ANOVA model is specified as a regular R object, which is similar to the lm() and glm() objects in R. This means that the summary(), print() and predict() functions can be applied to the object in question after fitting the model. In addition to the common R object functions, the package also includes several useful functions such as conv.diag(), table.means(), table.pvalues() and so on. Their use is illustrated in following sections.

Data input
When the data is in '.csv' or other formats, it can be loaded with the R function read.csv() or other import functions. The package expects the data imported to be in a long format where each row corresponds to one trail, replication, or time point per subject. Thus, each subject will have data in multiple rows. Subject ID values must be included in the data (see Figure  1), and the other columns in the data set contain the dependent variable(s), the covariates, and the between-and within-subject experimental factors. The between-subject variables, which are constant within each subject, will have the same value in all rows containing the data for one subject. The attribute of each of the factors must be specified as one of the following three classes: "integer", "numeric" or "factor". The function class() in R can be used to check the classes of factors. For example, R> class(x) # will display 'integer' number of classes in variable x R> x <-as.factor(x) # class of x is changed to 'factor' For the multinomial response model (equation 12), the data format is somewhat more complex. The within-subject data for each subject must be stored in one item of a 'list' in R. For example, if there are 100 subjects, then the list must contain 100 items where each item includes multiple rows that denote the values of the within-subject variables. The betweensubject data is stored in a separate data frame where each row corresponds to one subject. The order of the between-subject data must match the order of within-subjects data. For example, although the 'choicedata' of BANOVA package is already in a long format, both within-subject and between-subject data needs to be further manipulated. The following R code can be used for that purpose: R> data(choicedata) R> #generate within-subject data(convert the within-subjects variables to a list) R> dataX <-list() R> for (i in 1:nrow(choicedata)){ R> logP <-as.numeric(log(choicedata[i,3:8])) R> dataX[[i]] <-as.data.frame(logP) -mean(logP) #mean center logP R> } R> #generate between-subject data R> dataZ <-choicedata [,9:13] 3.4. Estimation of the coefficients As explained above, the Hierarchical Bayes ANOVA model is estimated using the fact that it is a special case of a hierarchical generalized linear model, that is, using equations (1), (2) and (5). The conditional posterior distribution, denoted by π(·), of the parameters β p j,s is obtained from the likelihood and priors: where π(y i |β 1 , ..., β p , x i ) is the likelihood determined by equations (1), (2) and (6) to (10); .., J p , are the coefficients of factor p; θ p = (θ p1 j,k , θ p2 j,k , ..., θ pQ j,k ), j = 1, ..., J p , k = 1, ..., K q are the population-level parameters; π(θ p ), p = 1, ..., P is the prior of population-level parameters θ p ; and π(β p |θ p , Z), p = 1, ..., P , is the prior determined by the population model in equation (3). We are interested in the effects of between-subject level factors, captured by θ pq j,k , k = 1, ..., K q using the notation in equation (3). The conditional posterior distribution of the parameters θ pq j,k is: where each θ pq j,k is assumed to follow a normal prior.

Starting values and burn-in period
Successful implementation of MCMC algorithm requires proper starting values and a sufficiently long burn-in period to make sure the convergence of chains. For the burn-in period, typically the first 1000 to 5000 draws are discarded. Users can easily adjust that number in the arguments of all BANOVA.*() functions. Because the hierarchical Bayesian ANOVA models usually involves many hyper parameters, the starting values of all parameters are assigned by the R package. Specifically, the random parameters without any constraint are assigned values drawn from a Normal distribution using the rnorm() function (the starting values of θ pq j,k in equation (14), for example). For those parameters with constraints, for instance, the hyper parameters α, β, ν in equation (7), are assigned fixed values (α = 1, β = 1, ν = 1). These values can also be found through the corresponding JAGS code.

JAGS code
Based on equation (13) and (14), the models are built and estimated in JAGS (Just Another Gibbs Sampler, Plummer (2003)). The BANOVA package generates the JAGS code fully automatically, and runs it. The JAGS software takes care of the work involved in estimating model parameters by constructing a MCMC algorithm to sample from the posterior distributions. The JAGS program allows users to write their own models and prior distributions and frees them from dealing with the implementation details of different models and samplers.
The JAGS code is produced as part of the output of our package, so that users can inspect and modify it. The following R command provides users with the JAGS code generated for the model in question: R> cat(res$JAGSmodel) #res is a list returned from the BANOVA.* function

Convergence diagnostics
There is a large number of convergence diagnostics available (e.g. Gill 2007). In the output of the package, two convergence diagnostics are reported: the Geweke diagnostic (Geweke 1991), and the Heidelberg and Welch (Heidelberger and Welch 1983) diagnostic. These two convergence diagnostics are calculated based on only a single MCMC chain, which saves some computation time and is less cumbersome for the applied user. Both diagnostics require a single chain and may be applied with any MCMC method. The functions geweke.diag and heidel.diag in R package coda are incorporated in our package and used to compute the convergence diagnostics. If so desired the user can apply other diagnostics from the coda package manually.
Geweke's convergence diagnostic is calculated by taking the difference between the means from the first m A iterations and the last m B iterations, where m is the total number of iterations. If the ratios m A m and m B m are fixed and m A + m B < m, , then by the central limit theorem, the distribution of this diagnostic approaches a standard normal as m tends to infinity. In our package, m A = 0.1m and m B = 0.5m.
The Heidelberg and Welch diagnostic is based on a test statistic to accept or reject the null hypothesis that the Markov chain is from a stationary distribution. The present package reports the Cramer-von Mises statistic to test for stationarity. The hypothesis test is based on Brownian bridge theory where the sequence of iterates is from a stationary. The test is iteratively applied on batches of draws from the posterior distributions. If the null hypothesis is rejected, the first 10% of the iterations are discarded and the stationarity test repeated. If the test fails again, an additional 10% of the iterations are discarded and the test is repeated. The process continues until 50% of the iterations have been discarded and the test still rejects. Our package uses the function heidel.diag in the coda package and sets the parameters = 0.1, pvalue = 0.5.
To obtain the convergence tests, the following R command is used: R> conv.diag(res) // res is a fitted object from any of the models

Tables of means
One key output of the package is a table of means for the categories of the factors at both level 1 and level 2. As explained above, we use effects coding to estimate the parameters of categorical variables, using the last level of each factor as the reference level. However, especially when there are multiple factors and interactions, interpretation of the parameter estimates is cumbersome. Therefore, posterior samples of each θ pq j,k in equation (3) generated by MCMC are used in the calculation of 'tables of means', similar to those produced by standard ANOVA. The advantage of doing this is that this output is familiar to behavioral researchers and relatively easy to interpret. Because these statistics are computed for each draw from the posterior distribution of the parameters, statistics from their posterior distributions are readily available. Specifically, the package computes statistics of interest such as 95% credible intervals and posterior standard deviations.
Let θ pq j,k,m denote the posterior sample of θ pq j,k in mth iteration of the MCMC chain. Then the grand mean is:μ where θ 00 m is the mth draw of the level-2 intercept corresponding to the level-1 intercept, which is equal to the grand mean. The 95% credible interval is simply provided by the 2.5% and 97.5% quantiles of the posterior distribution of {g −1 (θ 00 i )}, and the posterior standard deviation is also computed from the draws of the posterior distribution. Higher order tables are computed as illustrated below.
For example, in computing the one-way table of means of a level-1 factor A, the posterior mean of its level j is calculated as: where X A j is the effects coded column vector of factor levels, corresponding to level j of factor A, and in which all other factors and covariates are set to be 0; θ A0 m is the estimated vector of level-2 intercepts corresponding to level-1 factor A.
For another example, in computing the one-way table of means of a level-2 factor B, the posterior mean of its level j is calculated as: where Z j,B is the effects coded column vector of factor levels, corresponding to level j of factor B, and in which all other factors and covariates are set to be 0; θ 0B m is the estimated vector of level-2 coefficients of the effect of factor B on the level-1 intercept.
Continuing the example, the means of the two-way table classified by A and B (level j of factor A and level k of factor B) is calculated as: where θ AB j,k,m is the mth draw from the coefficient matrix, the kth row of which is a vector of level-2 coefficients representing the effect of factor B on to the jth level of level-1 factor A.
Based on the above formulas, the function table.means() computes the tables of means (currently limited to two-way interactions at each level of the model) and their posterior quantiles.
R> table.means(res) // res is a fitted object from any of the models

Table of sums of squares and effect sizes
If the experimental design is balanced, a variance decomposition can be performed at the level-2 equations (3) or (5) to produce information on sums of squares and effect sizes. Both of these sets of statistics are important in interpreting the results of experiments in behavioral research. For this purpose, it is convenient to consider the ANOVA as a regression as in equation (5), so that for θ p j = (θ p j,1 , θ p j,2 , ..., θ p j,K ), the total sum of squares can be represented as In the package, equation (23) is estimated as where m indexes the posterior samples of θ p j and Z β p j . If the design matrix Z is orthogonal, then the sum of squares attributable to each of the factors and their interactions can be written in terms of the submatrix Z q of Z corresponding to each factor q and its coefficients θ pq j = (θ pq j,1 , ..., θ pq j,Kq ), q = 1, ..., Q. (e.g. Draper and Smith 1998), Where θ pq j is the vector of coefficients of the dummy variables corresponding to factor or interaction q. Equation (24) If the design is not balanced, type III sum-of-squares are computed. These reflect the presence of a main effect after the other main effects and interactions are accounted for, and are valid in the presence of significant interactions (Fox 1997).
Effect sizes measure the degree of association between an effect (e.g., a main effect, an interaction, a linear contrast) and the dependent variable, and are interpreted as the proportion of variance in the dependent variable that is attributable to each effect. They are of eminent importance in applied research, where they are used as additional information next to statistical significance levels. There are several measures of effect size (Kirk 1982;Tabachnick and Fidell 1989). In the R package, the most commonly used measure Eta squared (η 2 p ), is calculated. It is defined as follows, where: SS ef f ect = the sums of squares for the effect of interest, SS total = the total sums of squares for all effects, interactions, and errors in the regression.
For example, the effect size of factor θ pq j is, η pq m = SS(θ pq j,m ) SS(θ p j,m ) . Because the effect sizes are calculated at each draw of the parameters, their posterior distributions are obtained. In the package, the function BAnova() performs all computations discussed above and outputs a table of sums of squares and effect sizes, R> BAnova(res) // res is a fitted object from any of the models

Table of P-values
The package computes Bayesian p-values for posteriors of each factor (Gill 2007), which enables significance testing. The null-hypotheses for the test concerning θ pq j,k in equation (3) is H 0 : θ pq j,k = 0, versus H 1 : θ pq j,k = 0.
If there are coefficients θ pq j,k 1 , θ pq j,k 2 , ..., θ pq j,k J representing J levels of a factor with more than two levels, we calculate a single P-value to represent the significance differences among all levels, as in standard ANOVA. The null-hypothesis is: We compute these Bayesian p-values in this case as follows. Let θ pq j,k min and θ pq j,kmax denote the coefficients with the smallest and largest posterior mean. Then the P-value is defined as min(P θ (θ pq j,k min ), P θ (θ pq j,kmax )).
The function

Applications
In this section, we provide three applications of the BANOVA package to the analysis of previously published experimental studies. The first study by Etkin and Ratner (2012) investigated how the perceived variety among products, as means to a goal, affects peoples' motivation to pursue that goal. In this application we illustrate a between-subjects ANOVA, with dependent variables that are, respectively Normal and ordered categorical. The second study, by Ferraro, Kirmani, and Matherly (2013), examines the effects of conspicuous brand usage on consumers' attitudes toward a brand. In this application, we illustrate hierarchical ANCOVA models, with Normal and t-distributed dependent variables. The third study by Wedel and Pieters (2014), investigates the effects of color on the rapid gist perception of advertising. In this study, we illustrate the application of a hierarchical ANOVA with both within-and between-subjects factors, and a binomial dependent variable.

Application 1: Impact of the variety among means on motivation
In this examples we illustrate the application of BANOVA package on a data from a study on goal attainment (Etkin and Ratner 2012). The study investigated how the perceived variety (high vs. low) among products, as means to a subjects' goal, affects their motivation to pursue that goal. The hypothesis was that only when progress toward a goal is low, product variety increases motivation to pursue the goal. In the study, one hundred and five subjects were randomly assigned to one of four conditions in a 2 (goal progress: low vs. high) by 2 (variety among means: low vs. high) between-subjects design. The final goal was a "fitness goal", and the products used were protein bars; variety was manipulated by asking subjects to think about how the products were similar (low) or different (high); goal progress was primed by asking subjects questions regarding the frequency of their recent workouts on low (0, 1,..., 5 or more) versus high (5 or less, 6, 7,..., 10) frequency scales. Subjects were asked questions regarding the similarity of protein bars as a manipulation check, and the bid they were willing to make for the bars, which are used as dependent variables in the study.
The data can be loaded by the following R command: R> data(goalstudy) The structure of the data is shown below:
In the first analysis, we consider log transform of the bid amount (log(bid + 1)) as the dependent variable, assumed to follow a normal distribution. This analysis comprises a 2 (goal progress: low vs. high) x 2(variety among means: low vs. high) between-subjects hierarchical Bayesian ANOVA of the bid amount. Since the study involves a between-subjects design, the within-subjects model only includes an intercept. The function BANOVA.normal() is used to execute the analysis: R> goalstudy$logbid <-log(goalstudy$bid + 1) R> app_1 <-BANOVA.Normal(logbid~1,~goalprogress*varmeans, goalstudy, + goalstudy$id, burnin = 5000, sample = 1000, thin = 20) The posterior means and standard deviations of the hyper parameters are reported from 1000 target samples, with a thinning factor of 20 to reduce autocorrelation, and with 5,000 samples being discarded as the burn-in period, for a total of 25,000 samples. To confirm that the chain has converged after the burn-in, the following R command outputs the Geweke's and the Heidelberg and Welch convergence diagnostics. The results are shown below, The above result indicates that the chains converged well before the end of the burn-in. The function trace.plot() provides visual diagnostics of convergence, some of the results are shown in Figure 2. The posterior means, standard deviations, 95% credible intervals and Bayesian p-values of hyper parameters are computed as follows, and the results are shown below. Following standard conventions, we will call an effect 'significant' if the 95% posterior credible interval of the parameter does not cover zero.

R> summary(app_1)
Call: BANOVA.Normal(l1_formula = logbid~1, l2_formula =~goalprogress * varmeans, data = goalstudy, id = goalstudy$id, burnin = 5000, sample = 1000, thin = 20) Based on the table of p-values and coefficients in the above results, the interaction between variety among means (varmeans) and goal progress (goalprogress) is significant. The table of means, produced with the command below, shows that when goal progress was low (goalprogress = 1), participants bid more for the products when perceived variety was high (varmeans = 2) versus low (varmeans = 1). On the contrary, when goal progress was high, participants bid more when perceived variety was low versus high. To predict specific values of the dependent variable, the function predict() in R can be applied to the objects returned by BANOVA.*(). For example, to predict the value of the dependent variable for the 3rd subject in the data set, respectively a situation of low goal progress and a high variety among means, the following R commands can be used: To predict the value of the dependent variable for a situation of low goal progress and a high variety among means, respectively the entire data set, the following R commands can be used (the results are not shown):

R> table.means(app_1)
R> # predict the mean corresponding to goalprogress:1 and varmeans:2 R> predict(app_1, c(0,0,1,2,0,0)) #all variables must have a value, but only # the values of goalprogress and varmeans will be considered R> # predict all training data R> predict(app_1, goalstudy) Since even the log-normal distribution may not describe the bid data very well, it could also be analyzed assuming a Poisson distribution for the bid amounts (there are ony a few noninteger values which are rounded). The following R commands constructs the hierarchical Bayes ANOVA model and summarizes the results shown below.
R> goalstudy$bid <-as.integer(goalstudy$bid + 0.5) R> app_1a<-BANOVA.Poisson(bid~1,~goalprogress*varmeans, goalstudy, + goalstudy$id, burnin = 5000, sample = 1000, thin = 20) R> summary(app_1a) Call: BANOVA.Poisson(l1_formula = bid~1, l2_formula =~goalprogress * varmeans, data = goalstudy, id = goalstudy$id, burnin = 5000, sample = 1000, thin = 20) The next analysis is a manipulation check: the perceived similarity of the products is the dependent variable expected to be different between the levels of the varmeans factor. Since it is a seven-point scale variable, an ordered multinomial distribution is used. The 2 (goal progress: low vs. high) x 2(variety among means: low vs. high) hierarchical Bayesian ANOVA is executed using the function BANOVA.ordMultinomial() in the BANOVA package. Since the study involves a between-subjects design, the within-subjects model only includes an intercept. All between-subjects factors are included in the level-2 model. The analysis is done with the following commands, the results are provided below.
R> app_2 <-BANOVA.ordMultinomial (perceivedsim~1,~goalprogress*varmeans, + goalstudy, goalstudy$id, burnin = 3000, sample = 1000, thin = 5) R> summary(app_2) Call: BANOVA.ordMultinomial(l1_formula = perceivedsim~1, l2_formula =~goalprogress * varmeans, data = goalstudy, id = goalstudy$id, burnin = 3000, sample = 1000, thin = 5) The chain converged well within the burn-in period (the convergence statistics are not shown here). The posterior means and standard deviations of the hyper parameters are reported from a total of 8,000 samples, with 3,000 being discarded as the burn-in period, and the remaining 5,000 samples thinned by a factor 5. From the Bayesian p-values, we can see that only the variety condition (intercept: varmeans) has a significant effect on the perceived similarity of the products and the table of sums of squares shows that the effect size is relatively large. The table of means is produced with the following command, and the result is provided below.    To predict means corresponding to the first two data points in goalstudy, the predict() command is used, which outputs the probabilities for each category of each data point. Note that, for the convenience of generation of the JAGS code, the program uses a uniform naming scheme for all level 1 and level 2 parameters which are different from the names in the original data.

Application 2: Conspicuous brand usage
We next illustrate the BANOVA package on data from a study that examines consumers' attitudes toward a brand after seeing another consumer conspicuously using it (Ferraro et al. 2013). Conspicuous brand usage occurs when a consumer draws attention to a brand she uses by flaunting (Ferraro et al. 2013). One hundred fifty-four subjects from an online panel participated in the study. Conspicuousness was manipulated as a between-subjects factor, by exposing subjects to a forty-five seconds video in which the conspicuous usage of the brand (Apple ipad) was manipulated (low vs high conspicuousness). In addition, the so called self-brand connection was measured: this refers to the extent to which a consumers' own self-concept matches the image she has of a certain brand. Brand attitude was calculated as the average of three seven-point scale questions (dislike/like, unfavorable/favorable and bad/good). The relation between conspicuousness, self-brand connection, and brand attitudes was investigated. The analysis aims to test the hypothesis that there are negative effects of conspicuous brand usage on the attitudes toward the brand, only for subjects that have a low self-brand connection.
This example illustrates a hierarchical Bayesian ANCOVA model. Brand attitude is considered as a continuous dependent variable, assumed to follow a Normal distribution. Data of this study can be loaded by the following R command:

R> data(ipadstudy)
The data is displayed in the long format including only responses and between-subjects variables.
The Bayes ANOVA uses as a dependent variable the attitude toward the brand, measured by averaging answers on three seven-point scales. It can be executed using the function BANOVA.Normal() in the BANOVA package. Since it is a between-subjects design, the withinsubjects model only includes an intercept. The between-subjects covariates include owner, age, gender, selfbrand and the interaction between conspic and selfbrand. The two-level factor conspic is effects coded and the one-way between-subjects ANCOVA is specified as follows.
R> # mean center covariates R> ipadstudy$age <-ipadstudy$age -mean(ipadstudy$age) R> ipadstudy$owner <-ipadstudy$owner -mean(ipadstudy$owner) R> ipadstudy$gender <-ipadstudy$gender -mean(ipadstudy$gender) R> app_3 <-BANOVA.Normal(attitude~1,~owner + age + gender + selfbrand*conspic, + ipadstudy, ipadstudy$id, burnin = 5000, sample = 1000, thin = 10 ) The posterior means and standard deviations of the hyper parameters are reported from a total 15,000 samples, with 5,000 being discarded as the burn-in period, and the remainder thinned by a factor 10. The chain converged well within the burn-in period (the convergence statistics are not shown here). The table of sums-of-squares, effect sizes and Bayesian p-values, as well as the posterior means, standard deviations, 95% credible intervals, and Bayesian p-values of hyper parameters are computed as follows, the results are shown below.

R> summary(app_3)
Call: BANOVA.Normal(l1_formula = attitude~1, l2_formula =~owner + age + gender + selfbrand * conspic, data = ipadstudy, id = ipadstudy$id, burnin = 5000, sample = 1000, thin = 10) Based on the above estimates, conspicuousness, self-brand connection and the interaction: conspicuousness x self-brand connection, significantly affect the attitude towards the brand, consistent with Ferraro et al. (2013). Note that since we applied ANCOVA, a type III analysis of variance is used, so that the sum of squares (and effect size) for each effect is computed conditional upon all other effects and thus they do not add up to the total sum of squares.
In this application, the distribution of the dependent variable is continuous, but it may not be Normal. The function BANOVA.T() can be applied to construct an ANOVA model in which the response variable is assumed to follow a student's t distribution. This permits (weakly) robust inference (Bernardo and Giron 1992), as it allows for fatter tails and outliers in the data. The results, shown below, are similar to those in the results above which supports their robustness to distributional assumptions, and are not discussed here.

Application 3: Gist perception of advertising
We finally illustrate the application of the BANOVA package in a study into the influence of color on advertising gist perception, which is the very rapid identification of ads during brief exposures. Specifically, we analyze the effect of color on the perception of the gist of advertising when the advertising exposure is blurred (Wedel and Pieters 2014). In the study, one hundred and sixteen subjects were randomly assigned to one condition of a 5 (blur: normal, low, medium, high, very high) x 2 (color: full color, grayscale) between-participants, x 2 (image: typical ads, atypical ads) within-participants, mixed design. Participants were exposed to 40 images, 32 full-page ads and 8 editorial pages. There were 8 ads per product category, with 4 typical and 4 atypical ones. Blur was manipulated by processing the advertising images with a Gaussian blur filter of different radius. Subjects were asked to identify, after being flashed an image for 100msec., whether the image was an ad or not.
The data included in the package can be loaded into R using the data() function, i.e., using the following R code:

R> data(colorad)
The structure of colorad is shown below using the head() function. It is in long format including both within-subjects and between-subjects variables.

R> head(colorad)
id typic y blurfac color blur 1 1 0 8 2 1 3.6889 2 1 1 6 2 1 3.6889 3 2 0 12 4 0 4.7875 4 2 1 6 4 0 4.7875 5 3 0 11 2 0 3.6889 6 3 1 9 2 0 3.6889 Here, the within-subject variable typic is a factor with 2 levels '0' (typical ads) and '1'(atypical ads); between-subject variables are: blur, a numerical variable representing the blur of the image (the log-radius of a Gaussian blur filter used to produce the images), blurfac, a factor variable with the five levels of blur, and color, a factor representing the color of the ads with 2 levels '0'(full color) and '1'(grayscale). id is the subject identification number. The dependent variable is the number of times ads were correctly identified as an ad, out of the 16 ads, for each subject for each level of typic.
We are interested in the effects of within-and between-subject factors typic, and color, and the variable blur, as well as their interactions. The factor typic varies within individuals; the factors blur, color and blur x color interaction vary between individuals.
The analysis of this experiment is executed with the function BANOVA.Bin() in the BANOVA package. First, the continuous covariate blur is mean centered. The R code to implement the analysis is shown below.
R> data(colorad) R> # mean center Blur for effect coding R> colorad$blur <-colorad$blur -mean(colorad$blur) R> app_5 <-BANOVA.Bin(y~typic,~color*blur, colorad, colorad$id, + as.integer (16), burnin = 3000, sample = 2000, thin = 5) The posterior means and standard deviations of the hyper parameters are reported from a total of 13,000 samples, with 3,000 being discarded as the burn-in period, and the remainder thinned by a factor 5 to reduce autocorrelation. To confirm that the chain has converged after the burn-in, the conv.diag(app_5) command outputs the convergence diagnostics and trace.plot(app_5) provides visual diagnostics of convergence (the results are not shown here, but indicates that the chains converged well before the end of the burn-in).
The posterior means, standard deviations, 95% credible intervals and Bayesian p-values of hyper parameters are computed with the following command, and the results are shown below.

R> summary(app_5)
Call: BANOVA.Bin(l1_formula = y~typic, l2_formula =~color * blur, data = colorad, id = colorad$id, num_trails = as.integer (16), burnin = 3000, sample = 2000, thin = 5) Based on the above estimates, ad identification is significantly influenced by ad typicality (typic): typical ads are identified more accurately as ads as compared to less typical ones. The accuracy of ad identification is also affected by the degree of blur. The three-factor interaction (blur x color x typic) is also significant, which reveals that color protects the identification of typical ads against blur (Wedel and Pieters 2014).
These results are based on the ANCOVA model with blur as a continuous covariate. To further understand the effects of blur, we can use the discrete variable blur (blurfac) in a two-way ANOVA at the between-subject level (and the factor typic again within-subjects), using the following command: R> app_6 <-BANOVA.Bin(y~typic,~color*blurfac, colorad, colorad$id, + as.integer (16), burnin = 20000, sample = 3000, thin = 5) Since the above model involves more parameters, to ensure all parameters in the models converge, a larger number of burn-in and target samples are used: a total of 35,000. The table of sums of squares, effect sizes and p-values, as well as the posterior means, standard deviations, 95% credible intervals and Bayesian p-values of the parameters are produced with the function summary(), the results are presented below.

R> summary(app_6)
Call: BANOVA.Bin(l1_formula = y~typic, l2_formula =~color * blurfac, data = colorad, id = colorad$id, num_trails = as.integer (16), burnin = 20000, sample = 3000, thin = 5) We first inspect the tables of sums of squares, effect sizes and p-values. In this table, the columns denote between-subject factors and the rows denote the within-subject factors. The values in the table present the sum-of-squares and effect sizes of the effects of these betweensubject factors on the within-subjects factors. Again, the accuracy of ad identification is affected by blur, and to a lesser extent by color. From the tables of p-values, ad typicality (the value corresponding to the row name 'typic' and column name '(Intercept)') and the degree of blur (the value corresponding to the row name '(Intercept)' and column name 'blurfac') are again highly significant. There is also support for the main effect of color. The three-factor interaction (blurfac x color x typic) is also significant, which again shows that color protects the identification of typical ads against blur (Wedel and Pieters 2014). The conclusions from the table of estimates are similar to those from the results of the previous model, but this table allows us to inspect the effects of each level of bur, and the interactive effects with color and typicality.
Through the tables of means for all factors and their interactions, we can inspect these effects in more detail. These are produced with the function  From the above tables, we can see that typical color ads (typic = 0, color = 0) are always more accurately identified than atypical color ads (typic= 1, color = 0). Typical grayscale ads (typic = 0, color = 1, blur = 1,...,5), however, are only more accurately identified than atypical grayscale ads (Ttypic= 1, color = 1, blur = 1,...,5) when there is no blur, or a low level of blur (Wedel and Pieters 2014).

Discussion
This paper has presented the R package BANOVA, which can be used to analyze experimental data with a wide variety of hierarchical Bayesian ANOVA models. The response variable can be normal, student's t, binomial, multinomial or Poisson and the between-subject model follows the traditional ANOVA or ANCOVA structure that allows the estimation of sums of squares and effect sizes of each experimental factor. In the behavioral sciences there is an abundance of research that lends itself to the application of these types of analyses. We hope that the availability of user-friendly software in the form of the BANOVA package will simulate the analysis of these studies under more reasonable assumptions on the distribution of the data and its hierarchical structure. This is expected to reduce both the type I and type II errors made in this research.
There are a number of other R packages that can be used to fit hierarchical Bayes mod-els, including BACC (Geweke 1999), bayesm (Rossi and McCulloch 2006) and MCMCpack (Martin, Quinn, and Park 2011) as well as the WinBUGS, OpenBUGS and JAGS software. However, all these either require the user to be familiar with Bayesian statistical modeling and BUGS programming, while some are not suitable for ANOVA analysis of experimental data. The motivation for the development of the BANOVA package is therefore to overcome these limitations and to offer user-friendly routines for the applied researcher. These were illustrated in three applications : (1) models can be run by only a single function call, (2) only a small number of settings are required to run each model, (3) outputs are included in summary tables, such as the table of effect sizes, table of p-values and tables of means, which are easy to interpret for the applied researcher.
The package utilizes the JAGS software, which offers many advantages from a modeling perspective, while in addition the JAGS code is produced as a by-product of the analysis and available for the applied researcher to inspect and modify. However, a disadvantage of this choice is that this decreases the computation efficiency. Future developments could focus on reprogramming the MCMC code in languages such as C++ or Java. In addition, the package can be extended with other models, for example using different distributions of the dependent variable.