A Fortran 90 Program for Conﬁrmatory Analysis of Variance

There are diﬀerent conﬁrmatory techniques to compare means, like hypothesis testing and (Bayesian) model selection. However, there is no software package in which these techniques are available. A Fortran 90 program is written, which enables researchers to apply these techniques to their data. Besides traditional hypotheses, like H 0 : µ 1 = µ 2 = µ 3 and H u : µ 1 , µ 2 , µ 3 , order-restricted hypotheses, like µ 1 > µ 2 > µ 3 or µ 1 > µ 2 = µ 3 or µ 1 > µ 2 < µ 3 , can be evaluated.


Introduction
Often researchers have a theory with respect to the ordering of the means in the experiment. These theories can be written as order-restricted hypotheses (e.g., µ 1 > µ 2 > µ 3 ) and can be tested with confirmatory methods. In the context of comparing independent means, that is, analysis of variance (ANOVA), three approaches are distinguished: Silvapulle and Sen (2005, pp. 25-42) present theF test. There are two types ofF tests. Namely the ordered alternative and the ordered null. In the ordered alternative, the classical null (H 0 ) is tested against an order-restricted hypothesis, for example, H 1 . In the ordered null, an order-restricted hypothesis, like H 1 , is tested against the classical alternative (H u ). Anraku (1999) introduces the order-restricted information criterion (ORIC). The ORIC can be used to select the best of a set of order-restricted hypotheses, like the hypotheses in (2). Klugkist, Laudy, and Hoijtink (2005) present a Bayesian model selection (BMS) criterion, which can be used in the same context as the ORIC.
For these approaches, user-friendly software is not available. In this paper software is introduced with which the three confirmatory approaches can be executed.
The model used in this paper and in the software is the ANOVA model: where i = 1, . . . , k, j = 1, . . . , n i , y ij is the jth observation of the dependent variable for group i, which has n i observations, µ i is the mean of group i, and ij is an error term. The error terms are independently and normally distributed with expected value 0 and variance σ 2 , that is, ij ∼ N (0, σ 2 ).
The hypotheses H 0 and H 3 are the classical hypotheses, the other two are order-restricted hypotheses. It is also possible to specify a set of models/hypotheses without the classical null H 0 : µ 1 = . . . = µ k and/or the classical alternative H u : µ 1 , . . . , µ k . We recommend to include H u (when doing model selection) as a safeguard for choosing a weak hypotheses (Kuiper and Hoijtink 2010). Furthermore, one should include H 0 only when there is real interest in H 0 .
Although software for exploratory approaches is widely available -e.g., classical hypothesis testing in SPSS (SPSS Inc. 2006) and model selection using information criteria, like AIC, in R with the package nlme (Pinheiro, Bates, DebRoy, Sarkar, and R Development Core Team 2009) -this is not the case for confirmatory approaches, that is, for evaluation of the four hypotheses in (2). The software presented in this paper can evaluate different types of hypotheses which can be formulated by Aµ ≥ 0, for some matrix A in which each row is a permutation of the k-vector (−1, 1, 0, . . . , 0) and µ = (µ 1 , . . . , µ k ) .
Note that also factorial ANOVA models fit in the framework presented. For instance, a 2×3design can be represented by (1) with k = 6. Specific hypotheses about the six group means can again be formulated by Aµ ≥ 0 (as explained earlier). In a standard two-way ANOVA, three hypotheses are tested concerning the presence of a main effect of the first factor, a main effect of the second factor, and the presence of an interaction effect. If one or more of these effects are found, further evaluation is required to describe the direction of the effects (making the approach exploratory). In a confirmatory approach, in contrast to a exploratory approach, expected patterns are specified beforehand. For instance, H : {µ 1 = µ 2 = µ 3 }, {µ 4 < µ 5 < µ 6 }, is a prespecified and specific interaction effect. Competing (interaction) effects can also be specified.
In the next section, subsequently, theF test, the ORIC, and Bayesian model selection will be shortly explained. In Section 3, a practical example is provided and analyzed using each of the three approaches. The appendix contains a user manual for the software.
2. Three confirmatory techniques for comparing means 2.1. Hypothesis testing using the F-bar (F ) statistic In classical statistical testing the hypothesis all means are equal is tested against the alternative not all means are equal. This is usually tested with an F test using a one-way ANOVA. However, often researchers want to test a certain order restriction, because of a theory with respect to the order of the means in the experiment. See, for example, H 1 and H 2 in (2).
In Silvapulle and Sen (2005, pp. 25-42) where RSS (H) is the residual sum of squares under hypothesis H and S 2 = (n 1 + · · · + n k − k) −1 i j (y ij −ȳ i ) 2 is the mean square error, with n 1 + · · · + n k − k the error degrees of freedom. This is applied to the two types of test. For each of the two tests the RSS (H null ) and RSS (H alt ) will be elaborated on.
The first test is the ordered alternative, in this test H 0 : µ 1 = . . . = µ k is tested against an order restriction of the form H 1 : Aµ ≥ 0, for some matrix A in which each row is a permutation of the k-vector (−1, 1, 0, . . . , 0) and µ = (µ 1 , . . . , µ k ) . In this test, whereȳ is the overall mean, and Since i j (y ij − µ i ) 2 can be rewritten as whereȳ i is the ith group/treatment mean, y is the matrix consisting of the elements y ij , and This constrained minimization problem, where the objective function q(µ) is quadratic in µ and the (equality and inequality) constraints are linear in µ, is a quadratic programming problem. There are efficient computer algorithms for this minimization problem, here the IMSL subroutine QPROG (Visual Numerics 2003, pp. 1307-1310 is used in the Fortran 90 program. The second test is the ordered null in which a null of the form H 1 : Aµ ≥ 0 (as explained earlier) is tested against the alternative no restrictions on the µ i ∀i, that is, H u : µ 1 , . . . , µ k . Then, As with classical hypothesis testing, p values must be determined. The exact p value, for thē F , can be obtained via simulation. In the ANOVA model, the errors are normally distributed, that is j ∼ N (0, σ 2 ). Therefore, the simulation consists of the following three steps (Silvapulle and Sen 2005, pp. 32-33 and 40): 1. Generate independent observations z ij (i = 1, . . . , k and j = 1, . . . , n i ) from the standard normal distribution N (0, 1).
2. ComputeF for the generated data.
3. Repeat the previous two steps R p times. In the program, the default value of R p is R p = 100, 000. Calculate the number of times theF statistic, calculated in Step 2, exceeds the sample value of theF statistic, this number is denoted by M . The p value is calculated by M/R p .
When the p value is smaller than the nominal α-level, often set equal to 0.05, the null hypothesis is rejected. Thus, in the ordered alternative, when p < α, H 0 is rejected and in the ordered null, when p < α, the order-restricted hypothesis is rejected. Anraku (1999) proposes the order-restricted information-criterion (ORIC). It can be used to select the best of a set (M) of models/hypotheses H m , m ∈ M. The set of hypotheses can contain H 0 , H u , and order-restricted hypotheses of the form Aµ ≥ 0, for some matrix A in which each row is a permutation of the k-vector (−1, 1, 0, . . . , 0) and µ = (µ 1 , . . . , µ k ) .

Model selection using order-restricted information criterion
Like other information criteria, the ORIC is based on the log likelihood (log L) and a penalty term (PT ): ORIC = −2 log L + 2P T . The hypothesis with the smallest ORIC value is the preferred hypothesis.
The maximum likelihood estimators (mle's)μ m andσ 2 m for H m , m ∈ M are the values of µ and σ 2 , respectively, that maximize the log likelihood for H m : whereσ 2 m = 1 N i j (y ij −μ mi ) 2 , and N = i n i . Because of the order restrictions, the order-restricted mleμ m must be found (Anraku 1999). Since the termσ 2 m cancels out the term k i=1 n i j=1 (y ij −μ mi ) 2 , the maximization of the likelihood actually comes down to minimizing log(σ 2 m ), subject to µ i − µ i ≥ 0, for i, i = 1, . . . , k (Silvapulle and Sen 2005, pp. 42-43). Which results in the order-restricted mleμ m as defined in (3).
The penalty term of H m (P T m ) is equal to: where V m is explained below, LP ml (q m − 1, V m ) is a level probability, that is, the probability that there are l distinct mean values / levels among the q m − 1 means in the order-restricted mle, and q m − 1 ≤ k the number of distinct µ i 's in Hypothesis m. The distinct number of parameter values equals q m , because of the unknown variance term. For example, in case k = 5 and the hypothesis is H 1 : µ 5 = µ 3 > {µ 1 , µ 4 } > µ 2 , there are 4 distinct mean values, namely µ 5 and µ 3 are a distinct value and µ 1 , µ 4 and µ 2 are each a distinct value, and an unknown variance term. Thus, q 1 = 5. Note that the number of observations of the four distinct mean values are:ñ 1 = n 3 +n 5 ,ñ 2 = n 1 ,ñ 3 = n 4 andñ 4 = n 2 . The computation of the level probabilities (corresponding to the restrictions) can be done via simulation (Silvapulle and Sen 2005, pp. 78-81), consisting of 5 steps (where for convenience the subscript m is left out): 1. Generate Z (of dimension q −1) from N (0, V ), where q −1 equals the number of distinct µ i values in the hypothesis. It holds that V = diag{1/ñ 1 , . . . , 1/ñ q−1 }, whereñ l is the number of observations of group l (l = 1, . . . , q − 1).

ComputeZ via (3), that is
where H is the order-restricted hypothesis.
3. Determine the number of distinct values inZ, called levels, denote this by s.

4.
Repeat the previous steps R PT times. In the program, the default value of R PT is R PT = 100, 000.
The penalty term can thus be seen as the expected number of distinct parameters, that is, the expected number of distinct mean values plus 1 (because of the unknown variance term). Klugkist et al. (2005) present the (Bayesian) encompassing prior approach for order-restricted hypotheses in ANOVA. The model selection criterion used is the Bayes factor (Kass and Raftery 1995;Chib 1995), which is the ratio of marginal likelihoods of two hypotheses, say H m and H m :

Bayesian model selection
where p(µ, σ 2 |H m ) and p(µ, σ 2 |y, H m ) are the prior and posterior distribution of the model parameters, respectively, which will be elaborated upon later.
In the encompassing prior approach, a prior p(µ, σ 2 |H u ) is specified for the unconstrained hypothesis H u : µ 1 , . . . , µ k . The prior distribution of any hypothesis H m nested in H u follows from the encompassing prior, using: where the indicator function I µ∈Hm has the value one if the argument is true, that is, if the parameter values are in accordance with the constraints imposed by H m , and zero otherwise.
The encompassing prior is specified as follows (Klugkist et al. 2005): All model parameters are a priori considered to be independent, that is, The prior distributions for all means are equal, that is, As will be shown in the sequel, for each parameter a relatively uninformative, conjugate prior will be specified, that is, p(µ i ) ∼ N (µ 0 ; τ 2 0 ) for i = 1, . . . , k, and p(σ 2 ) ∼ Inv-χ 2 (1; σ 2 0 ).
In sum, the encompassing prior for the ANOVA model is: Combination of (6) and (7) gives the prior distribution (up to proportionality) of any orderrestricted hypothesis H m : In a similar way as in (6), the posterior of any hypothesis H m is p(µ,σ 2 |y,H m ) = p(µ,σ 2 |y, H u ) I µ∈Hm p(µ,σ 2 |y,H u )I µ∈Hm dµdσ 2 .
The posterior distribution is proportional to the density of the data times the prior distribution, that is The encompassing posterior p(µ, σ 2 |y, H u ) is (10) where the indicator function always equals one. Klugkist et al. (2005) have shown that (5) when using the prior in (6) and subsequent posterior in (9) leads to a simple form for the Bayes factor for a nested hypothesis H m with the unconstrained hypothesis H u : where c m and d m are the last terms (between the large brackets) of (6) and (9), respectively. The inverse of these constants, that is, c −1 m and d −1 m are the proportions of the encompassing prior and posterior, respectively, in agreement with the constraints of hypothesis H m . Estimation of these proportions is straightforward using sampling. This means that in the context of order-restricted ANOVA, Bayes factors can be obtained without the -often burdensomeestimation of marginal likelihoods.

Specification of the encompassing prior
To complete the specification of the prior distribution, values must be assigned to the hyperparameters µ 0 , τ 2 0 , and σ 2 0 . Klugkist and Hoijtink (2007) showed that Bayes factors for hypotheses formulated using inequality constraints among parameters are insensitive to the exact specification, as long as the encompassing prior is relatively vague. However, the results for hypotheses containing equality constraints are sensitive to the choice of τ 2 0 . Although we want relatively uninformative priors, that is, a large τ 2 0 , too large values result in Bartlett's or Lindley's paradox (cf. Lindley 1957;Bernardo and Smith 1994). Hence, to get reasonable values for the hyper-parameters, the prior specification is data-based. A Gibbs sample (Smith and Roberts 1993) is drawn from the unconstrained posterior p(µ,σ 2 |y, H u ) = L(µ, σ 2 |y) × p(µ, σ 2 |H u ), where p(µ, σ 2 |H u ) ∝ 1. Summaries of the posterior sample provide values for the hyper-parameters according to the following choices: For σ 2 0 , the posterior mean of σ 2 is used. This provides a value that is reasonable for the data at hand and, with 1 degree of freedom, a posterior that is hardly affected by the prior.
To obtain µ 0 and τ 2 0 , the information about each of the µ's in the posterior sample is combined as follows: Based on the posterior sample, a credibility interval for each µ i (i = 1, . . . , k) is determined by:μ i ± pv · s µ i , whereμ i and s µ i are the mean and standard deviation of the sampled values for µ i , respectively, and pv stands for prior vagueness. With the pv value it is specified which interval is used (e.g., pv = 2 provides the 95% credibility interval) and allows the user a choice for the amount of vagueness in the encompassing prior. Subsequently, the smallest lower bound (lb) and the largest upper bound (ub) of the k intervals define one broad interval containing all reasonable values for each of the µ's. From this interval, µ 0 = (lb + ub)/2 and τ 0 = (ub − lb)/2 are specified.
For hypotheses containing equality constraints, the value specified for pv will affect the resulting Bayes factors. Larger pv values provide more support for hypotheses containing equality constraints (i.e., Lindley's paradox). Specification of the pv value by the user also provides the option to investigate prior sensitivity by running the program several times with different values. This is recommended if one or more of the hypotheses contain equality constraints among the means.

Estimation of the Bayes factor
The computation of Bayes factors of order-restricted hypotheses versus the unconstrained hypothesis is straightforward by taking a sample from the unconstrained prior (7) and a sample from the unconstrained posterior, that is, (10) with I µ∈Hm = 1 for all µ. Samples are obtained by application of the Gibbs sampler (Smith and Roberts 1993). The proportions of prior and posterior iterations in agreement with the order-restricted hypotheses provide estimates for c −1 m and d −1 m , respectively. However, for a hypothesis containing at least one strict equality (e.g., µ 1 = µ 2 ), the direct application of this approach would result in the problematic outcome d −1 m = 0 and c −1 m = 0. Therefore, the program evaluates 'about equality' constraints instead, that is, |µ 1 − µ 2 | < δ for a positive small δ. Two options are provided in the software: Researchers have the opportunity to investigate 'relevant differences' between means by specifying a non-zero δ, or strict equality constrained hypotheses can be evaluated, in which case δ approaches zero in a stepwise method. The latter requires an extension of the basic approach and includes constrained sampling. This will be elaborated in the next section.
For hypotheses that only require unconstrained sampling, estimation of Bayes factors in the software is based on a minimum of R BMS iterations from both prior and posterior. The default value of R BMS is R BMS = 500, 000. The posterior sample is taken after discarding 1,000 iterations that serve as burn-in. Note that, sampling from the unconstrained prior does not require a burn-in period, since all parameters are a priori independent.
For highly constrained hypotheses, the default value of R BMS = 500, 000 iterations may be insufficient to get stable estimates of d m and c m . Therefore, some additional rules are incorporated in the program, when the default setting is chosen: For more than 6 groups, the number of iterations from prior and posterior are doubled; for more than 10 groups, the number of iterations from prior and posterior are set at 5 million. Furthermore, another additional rule is incorporated (whether the default setting is used or not): if a minimum of 100 prior 'hits' (iterations in agreement with the constraints) is not reached for each hypothesis in the set, more iterations are added until this is the case. For an elaboration on the investigation of the stability and Monte Carlo errors of Bayes factors computed via (11), see Klugkist and Hoijtink (2007).
Stepwise estimation for small δ For small values of δ, the estimation as just described would be rather inefficient. Furthermore, for δ = 0 it would give the result c −1 m = d −1 m = 0. Therefore, a procedure is applied where the unconstrained samples are evaluated with a not too small initial δ value, denoted δ 0 , followed by a procedure that decreases δ 0 in a stepwise way, using δ r = δ r−1 /3, for steps r = 1, . . . , R (see also Klugkist 2008).
The stepwise procedure is based on the following product rule for the Bayes factor of H m with the unconstrained H u : Consider the case where H m includes both inequality and strict equality (δ = 0) constraints. The notation m δr is used to denote the constraints of H m , where the desired value δ = 0 is replaced by a larger value δ r (r = 0, . . . , R).
The first Bayes factor in (12), BF m δ 0 u = cm δ 0 u dm δ 0 u , requires sampling from the unconstrained prior and posterior and counting the number of iterations in agreement with all constraints in m δ 0 , that is, the order restrictions as well as the equalities evaluated with δ 0 . The second and subsequent steps require constrained sampling, that is, sampling from (8) and (10). Consider, sampled from the prior of H m with δ replaced by δ 0 , that are in agreement with H m with δ replaced by δ 1 (δ 1 < δ 0 ). Similarly, d m δ 1 m δ 0 is based on a sample from the posterior constrained to the area that complies with the constraints of m δ 0 . Constrained sampling is also done by application of the Gibbs sampler, with inverse probability sampling to obtain samples in agreement with the constraints (see Gelfand, Smith, and Lee 1992).
In each step, H m is evaluated with a smaller δ r value. The stepwise Bayes factors represent change in the estimated BF mu , as a consequence of the decrease in δ r . At a certain point, a further decrease of the value for δ r no longer changes the Bayes factor, that is, for large enough R, BF m δ R m δ R−1 → 1 (Berger and Delempady 1987). This implies that a good approximation of BF mu with exact equalities is obtained.
To obtain an efficient estimation procedure, it is important to start with a large enough δ 0 . In the software, the starting value is prior-based and equals τ 0 /2 (τ 0 for more than 8 groups) unless this value is smaller than the user-specified δ in which case δ is evaluated directly. Sampling from the unconstrained prior and posterior (first step) is as explained in the previous section. In each subsequent step (r = 1, . . . , R), samples are drawn from the constrained priors and posteriors after a burn-in of 100 iterations. The number of iterations from each constrained prior is minimally 500,000. If necessary, up to 1 million iterations are done until the number of hits reaches 500. If after 1 million iterations the number of hits is below 100, more samples are drawn until 100 hits are obtained. From each constrained posterior, samples are drawn until 500 hits are obtained, with a maximum of 1 million iterations.
The number of steps, R, is determined by one of two stopping rules: convergence of the estimate of the final Bayes factor is assumed if two subsequent BF m δr m δ r−1 values deviate less than 0.05 from 1. When δ = 0, the last step of the procedure is performed as soon as δ r ≤ δ (if δ r < δ, in the final step it is set at δ).

Interpretation of the results
The software estimates Bayes factors for each order-restricted hypothesis with the unconstrained hypothesis using (11) or (12). The Bayes factor for the comparison of two orderrestricted hypotheses, say H m and H m can be computed, using: A Bayes factor provides the amount of support of one hypothesis compared to another. If, for instance BF mm = 6, the support for H m is 6 times as large as for H m . Likewise, BF mm = 0.5 shows that the support for H m is 2 times as small as the support for H m .
Furthermore, the software provides posterior model probabilities (pmp), representing the relative support for each hypothesis in a finite set of hypotheses (M). To obtain pmp's from Bayes factors, prior model probabilities must be specified, representing the degree of belief in each hypothesis before observing the data. A usual -objective -choice are equal prior probabilities for all hypotheses, that is, p(H m ) = 1/M , for m ∈ M, where M denotes the total number of hypotheses. This prior specification, which is also adopted in the software, leads to the following equation for pmp(H m ): Posterior model probabilities can be computed including or excluding the unconstrained hypothesis. As an example, consider H 0 , H 1 and H 2 from (2). In the software, the user can specify the presence of just these 3 hypotheses (i.e., M = 3) and the pmp's are computed excluding the unconstrained hypothesis: Alternatively, one can also explicitly add the unconstrained hypothesis as a hypothesis of interest (M = 4). The resulting pmp values, denoting the unconstrained hypothesis by H 3 , are: for m ∈ M and with BF 3u = 1.

Example based on Lucas (2003)
The three approaches for confirmatory ANOVA will be illustrated using data from the research of Lucas (2003). In this study, the interest lies in the amount of influence a leader has on his/her group members. The experiment contained five experimental groups: (1) a group with a randomly selected male leader, (2) a group with a randomly selected female leader, (3) a group with a male leader selected on ability, (4) a group with a female leader selected on ability, and (5) a group with a female leader selected on ability after institutionalization of female leadership. The institutionalization is done by showing the participants a film in which it is normal to have female leadership and females do well as leaders. The resulting group means and standard deviations of the influence of the leader are shown in Table 1.
The research question of Lucas (2003) is: "Can institutionalization of female leadership reduce the influence gap between woman and men by legitimating structures of female leadership?" The expectations of Lucas (2003) (Lucas 2003).
Male leaders (group 1 and 3) have higher influence over participants than female leaders (group 2 and 4, respectively), ceteris paribus.
Leaders appointed on ability (group 3 and 4) have higher influence over participants than leaders appointed randomly (group 1 and 2, respectively), ceteris paribus.
Institutionalized female leaders selected on ability (group 5) have higher influence over participants than 'normal' female leaders selected on ability (group 4), or than randomly selected female leaders (group 1).
Institutionalized female leaders selected on ability (group 5) have (almost) the same influence over participants as male leaders appointed on ability (group 3).
These expectations can be represented by the hypothesis H 1 : where µ i represents the mean influence of the leader in group i.
Another hypothesis of interest can be: Leaders chosen on basis of ability score higher than leaders selected at random (so, group 3 scores higher than group 1 and group 4 scores higher than group 2).
Male leaders selected at random (group 1) have an higher influence than female leaders selected on competence (group 4).
There is no difference in influence of female leaders selected on competence in case of institutionalization (group 5) or in the 'normal' case (group 4). This can be represented by H 2 : µ 3 > µ 1 > µ 4 = µ 5 > µ 2 .
In addition, the traditional null and alternative hypothesis, H 0 and H 3 from (2), respectively, will be used to illustrate theF test, the order-restricted information criterion and Bayesian model selection.

Results using ORIC
The ORIC consists of a fit/likelihood part and a complexity/penalty part. In Table 3 the likelihood, penalty term, and ORIC values are given for H 0 , H 1 , H 2 and H 3 . The penalty for H 0 equals 2, because there are two distinct parameters: all means are equal, which represents one distinct parameter, and the parameter σ 2 . Analogously, the penalty for H 3 is 6: there are five distinct means (since there are no restrictions among the means) and one variance parameter. In hypotheses H 1 and H 2 there are also five means, but these hypotheses contain inequality constraints, as opposed to H 0 and H 3 . The penalties of these order-restricted hypotheses are calculated as explained earlier. The values of the penalties are respectively 3.20 and 3.13. The hypothesis with the smallest value for the ORIC is the preferred hypothesis. Thus, in the example, H 1 is the preferred hypothesis according to the ORIC.

Results using BMS
The results using BMS are presented in Table 4. For each hypothesis, the Bayes factor comparing that hypothesis with the unconstrained hypothesis (H 3 ) is presented for three different values of pv. The equality constraints are evaluated as strict equalities, that is, δ = 0 is approximated using the stepwise approach explained in Section 2.3.
The priors using pv = 2 and pv = 3 differ only with respect to the variance of the normal distributions, with values 2.32 and 3.27, respectively.
Irrespective of the choice made for the prior, H 1 is clearly the most supported order-restricted hypothesis. The corresponding posterior probabilities (using equal prior model probabilities) for the four hypotheses are 0.00, 0.97, 0.02 and 0.01, respectively (for each pv value). These results again lead to the conclusion that H 1 is the best of the four models/hypotheses considered, as was also concluded when using the ORIC. Comparison of the three prior specifications shows that, for the hypotheses at hand, the results are not very sensitive to the specification of the prior.

A. ConfirmatoryANOVA.exe user manual
This user manual will describe and illustrate the options available in ConfirmatoryANOVA.exe (published along with this manuscript and also available at http://www.fss.uu.nl/ms/ Kuiper/). This program is made in Fortran 90 using the Intel Visual Fortran Compiler 9.1 for Windows. This compiler uses IMSL 5.0.
To make the program more user-friendly, an interface is made with use of C# in Microsoft Visual Studio 2005 (Appendix B). The interface calls the exe file made in Fortran 90 and it makes the appropriate input file needed for the Fortran 90 program. The input for the Fortran 90 program is described in this appendix. Those interested in only the interface are referred to Appendix B.
ConfirmatoryANOVA.exe is free of use. However, when results obtained with this program are published, please refer to this paper. In the program the following methods can be performed: theF test, the order-restricted information criterion (ORIC),

A.1. Modification input files
No matter what analysis should be performed two text files have to be modified (such that they apply to your data), namely Input.txt and Data.txt.
It should be noted that: The names of the text files are fixed and cannot be changed. These files have to be text files (also known as ASCII files).
The format of these files should not be changed, that is, do not add empty lines and do not delete lines containing labels.
The data in Data.txt should be complete, that is, missing data are not allowed.

First half of Input.txt
First of all, you must denote which analyses should be performed. This has to be done in Input.txt. A certain analysis will be performed if in the line below the name of that analysis a 1 is filled in. It will not be performed, when a 0 is filled in.
When the ORIC and theF test should be performed and BMS should not, the first half of Input.txt should look as follows (when using the default values for the seed value and number of iterations): Seed value and number of iterations (>0) for Fbar test, ORIC, and BMS 123 100000 100000 500000 Perform F bar test, ORIC, BMS (1 = yes, 0 = no) 1 1 0 We will come back to the seed value and number of iterations in Section A.5.

Data.txt
Second of all, group membership and the corresponding data must be given in Data.txt, where in the first column the group numbers must be given and in the second column the corresponding data point y ij . The order of the group numbers and data points does not matter as long as the group number corresponds to the data point in the same row. 2.97 · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 5 1.38 · · · · · · · · · · · · 5 4.58 · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · The group numbers do not need to be sequential. For example, if you have a SPSS or Excel file with data for 10 groups and in the current analysis you want to compare only 5 groups (which are not the first five). In that case, you can just copy the appropriate data from the SPSS or Excel file to a text file without adjusting the group numbers. In the software, the group numbers will be made sequential and the output will be given for these adjusted sequential group numbers. For example, if you have data with group numbers 1, 4, 5, 6, and 8 (whether the data are in order or not), these will become group numbers 1, 2, 3, 4, and 5, respectively. Note that, in specifying the restrictions (see the next sections), you need to use the adjusted sequential group numbers.
From the data the number of groups and the number of observations per group are determined.

A.2. Basic elements of writing constraints
In performing anF test, in determining the ORIC or in doing BMS, all the hypotheses of interest, like H 0 to H 3 in (2), must be given explicitly. Note that it is also possible to specify a set of hypotheses without the classical null H 0 : µ 1 = . . . = µ k and/or alternative H u : µ 1 , . . . , µ k . However, we recommend to include the alternative H u (when doing model selection), since it can be used to protect against choosing a weak hypothesis (Kuiper and Hoijtink 2010). Note that one should include H 0 only when there is real interest in H 0 .
When using the ORIC or doing BMS several models/hypotheses are compared to each other. In theF test, the order-restricted hypotheses (like H 1 and H 2 in (2)) are tested against H 0 and H u . If the classical null and/or the alternative are included in the set of hypotheses, the classical null will be tested against the classical alternative.
The basic elements, for writing down the hypotheses of interest are: 1. Representation of an equality sign (=) Suppose the hypothesis of interest is µ 5 = µ 3 , that is, µ 5 = µ 3 , µ 1 , µ 2 , µ 4 . The ordering of the group numbers in this restriction is represented by: 5 3 1 2 4. The restriction is represented by: 1 1 0 0 0, where the 1s indicate that mean 5 and 3 belong to set 1 and are equal to each other, and where 0 indicates that the corresponding mean is unrestricted. N.B. in a restriction the first set is always labeled as 1, the second as 2 (and so on).

Representation of a greater than sign (>)
Suppose the hypothesis of interest is µ 1 > µ 3 , µ 2 , µ 4 , µ 5 . The ordering of the group numbers in this restriction is represented by: 1 3 2 4 5. The restriction is represented by: 1 -3 0 0 0, where -3 means that mean 3 is smaller than mean 1. Thus, it represents µ 3 < µ 1 , which is equal to µ 1 > µ 3 . Here again 1 indicates that mean 1 belongs to set 1. Because of the inequality restriction between mean 1 and 3, mean 3 belongs to set 2 (the importance of this will be made clear in the next section).

A.3. Combinations of basic elements
Often the hypothesis of interest can be represented in a smaller number of restrictions than when using only the basic elements. The following shortcuts can be used: Because of the equality constraints ("="), means 5, 3 and 1 belong to set 1. Therefore, means 2 and 4 belong to set 2. Thus, this hypothesis can be represented by the following ordering of the group numbers and corresponding restriction: Ordering of means in restriction 5 3 1 2 4 (Order) Restrictions 1 1 1 2 2 Because of the equality constraints, mean 5, 3 and 1 belong to set 1, and means 2 and 4 belong to set 2. Because of the constraint between mean 1 and 2, mean 2 belongs implicitly to set 2. This hypothesis can be represented by the following ordering of the group numbers and corresponding restriction: Ordering of means in restriction 5 3 1 2 4 (Order) Restrictions 1 1 1 -3 2 The 1s indicate the equality constraints between mean 5, 3 and 1, the −3 represents the inequality constraint ">" between mean 1 and 2, and the 2 indicates the equality constraints between mean 2 and 4 (because mean 2 implicitly belongs to set 2). N.B. the restrictions µ 5 = µ 1 , µ 5 > µ 2 , µ 3 > µ 2 , µ 5 > µ 4 , µ 3 > µ 4 , µ 1 > µ 4 do not have to stated explicitly, these will hold since it holds that µ 5 = µ 3 , µ 3 = µ 1 , µ 1 > µ 2 and µ 2 = µ 4 .
3. µ 5 = µ 3 > µ 1 > µ 2 = µ 4 This hypothesis can be represented by the following ordering of the group numbers and corresponding restriction: Ordering of means in restriction 5 3 1 2 4 (Order) Restrictions 1 1 -3 -3 3 The 1s indicate the equality constraint between mean 5 and 3. The −3s represents the inequality constraint ">" between mean 3 and 1 and between 1 and 2. Note that mean 1 implicitly belongs to set 2 and mean 2 implicitly to set 3. Therefore, the equality constraint between mean 2 and 4 is represented by the 3, because mean 2 and 4 belong to set 3.
Likewise, this hypothesis can be represented by the following ordering of the group numbers and corresponding restriction: Ordering of means in restriction (Order) Restrictions 1 1 -3 -1 3 This hypothesis can be represented by the following ordering of the group numbers and corresponding restriction: Ordering of means in restriction 5 3 1 2 4 (Order) Restrictions 1 1 -3 -3 0 Note that, mean 4 is a free parameters, that is, means which are not restricted at all. The free parameters are denoted by a "0" and no group numbers are assigned (directly or indirectly). Here, as in the previous two examples, mean 2 belongs to group 3 and mean 4 belongs to another group. However, this is not group 4. Another example is given next.
6. µ 5 = µ 3 , µ 4 , µ 1 > µ 2 This hypothesis can be represented by the following ordering of the group numbers and corresponding restriction: Ordering of means in restriction 5 3 4 1 2 (Order) Restrictions 1 1 0 2 -3 The 1s indicate the equality constraint between mean 5 and 3. Note that, as mentioned in the previous example, no group number is assigned to free parameters. So, mean 4 belongs to another group than all the other means, but no group number is assigned. A 0 is filled in. Mean 1 belong to the next group, that is, group 2. The −3 represents the inequality constraint ">" between mean 1 and 2. Note that mean 2 indirectly belongs to group 3.
This hypothesis can be represented by the following ordering of the group numbers and corresponding restriction: Ordering of means in restriction 5 3 1 2 4 (Order) Restrictions 1 1 -3 -1 3 This hypothesis can be represented by the following ordering of the group numbers and corresponding restriction: Ordering of means in restriction 5 3 1 2 4 (Order) Restrictions 1 1 -3 2 -3 The 1s indicate the equality constraints between mean 5 and 3, the −3s represents the inequality constraint ">" between mean 3 and 1 and between 2 and 4. Note that mean 1 implicitly belongs to set 2. Therefore, the equality constraint between mean 1 and 2 is represented by the 2.
In the program ConfirmatoryANOVA.exe error messages are built in to detect wrongly stated hypotheses. But sometimes wrongly stated hypotheses are not detected, because the stated hypothesis represents another existing hypothesis. When accidently a 3 is given instead of a 2, another existing hypothesis is stated. Namely, the restriction 1 1 -3 3 -3 represents the hypothesis µ 5 = µ 3 > µ 1 , µ 2 > µ 4 . So, care must be taken in writing down the hypothesis of interest.
This hypothesis can be represented by the following ordering of the group numbers and corresponding restriction: Ordering of means in restriction 5 3 1 2 4 (Order) Restrictions 1 1 -3 3 -3
However, the BMS approach in the program also provides the option to specify about equality constraints (δ > 0). In that case, the second hypothesis is evaluated using |µ 1 − µ 2 | < δ and |µ 2 − µ 3 | < δ, whereas the first hypothesis adds a third constraint: |µ 1 − µ 3 | < δ. The results of the first and second hypothesis may differ and therefore careful consideration of the formulation of hypotheses is important.

A.5. Set the seed value and number of iterations
The calculation of the p value of theF and the penalty of the ORIC (i.e., PT ) and BMS are sampling based approaches. For example, when generating data from a normal distribution (in order to determine the p value of theF or the penalty of the ORIC), a seed value is needed. When using the same seed value, the same data will be 'sampled'. When looking at another seed value in a rerun of the same problem, one can also see how stable the results are. Thus, the p value of theF , the penalty of the ORIC (i.e., PT ), and the results of BMS can differ for various seed values.
In case a result is not stable, the number of iterations needs be set higher. In theF test, the p value depends on the number of iterations R p . When using the ORIC, the penalty is dependent on the number of iterations R PT . When doing BMS, the Gibbs sampler is used, which is based on a minimum of R BMS iterations. These values can also be set in the input, namely in the second line of Input.txt (see Section A.1.1). The default values of the number of iterations in each method are: R p = 100, 000, R PT = 100, 000, and R BMS = 500, 000.
Note that the higher the number of iterations the higher the computing time. If one lowers the number of iterations (in order to lower the computing time), one must be aware that this probably affects the stability of the results. Furthermore, when the initial number of iterations for BMS (i.e., R BMS ) is lowered, the computing time is not necessarily decreased, because of the requirement of a minimum of 100 prior hits.

A.6. Error messages
In the program ConfirmatoryANOVA.exe error messages are built in to detect wrong stated hypotheses. However, it does not detect all wrongly stated hypotheses, since the stated hypothesis can represent another existing hypothesis, as is made clear in Section A.3.
It is also possible to state other input wrongly. For example, the wrong number of restrictions is given. When making a mistake, an informative warning will be given.
However, it is possible to make a mistake that we have not foreseen. In that case, check the input and compare it to the data. If you cannot solve the problem, send the input and data file to r.m.kuiper@uu.nl.

A.7. Modification of the second half of Input.txt
For all three methods (i.e., theF test, the ORIC, and BMS), all hypotheses of interest must be given explicitly. In case BMS is performed, two additional specifications need to be made: The desired δ (for exact equalities specify δ = 0, for an about equality any positive number can be specified) and the prior vagueness pv (default recommendation pv = 2; any positive number may be specified). This must be done in the second half of Input.txt. If the hypotheses of interest are the set of hypotheses of Lucas stated in (2), Input.txt has the format shown in Table 5 (with comments in angle brackets).

A.8. Save and close
When you have modified Input.txt (such that it applies to your data), you should save and close it.
A.9. Run ConfirmatoryANOVA.exe When ConfirmatoryANOVA.exe is runned, the output file Output.txt will be created in the folder you are working in.

Output.txt
Output.txt gives the results of the requested analyses. In Section B.5, the output is given when using the interface for the Lucas example described in this paper (provided that all three analyses, that is, theF test, the ORIC, and BMS, are performed).
In case the interface is not used, the output will be a bit different, namely in two ways: 1) The hypotheses of interest will be displayed in the way they are filled in Input.txt. When B. User manual of ConfirmatoryANOVA.exe with interface B.1. Read, write or copy data First of all, the data should be entered. Press on the Data button in the ConfirmatoryANOVA.exe form (see Figure 1) to go the DataInput form (see Figure 2). The data can be entered manually, by copying it from a file (say an SPSS or Excel file) or by reading it from a text (i.e., .txt) file. See also Section A.1.2 for a description of the data format. When the data are read from a file or, otherwise, after clicking on the OK button, the data are validated. In case of invalid data (e.g., in case no number is entered or in case a row has only a group number or has only a data point), the corresponding lines are made red. The invalid data should be corrected (by adjusting the data in the textbox/field or by adjusting the .txt file and rereading the adjusted file) or deleted (e.g., when all lines should be deleted, by clicking on the Clear Invalid Data button). After the adjustments, press the OK button. The data will be validated again. Note that, in case of deleting data, the number of observations per group (which is shown in the ConfirmatoryANOVA.exe form) will be adjusted automatically.
In case the data are valid, you return to the ConfirmatoryANOVA.exe form (see Figure 1).

B.2. Specify methods
From the data (and group membership), the number of groups and number of observations per group are determined. Then, you must denote which analyses should be performed. A certain analysis will be performed if the corresponding checkbox is checked.
In case BMS is performed, two additional specifications need to be made (in a popup-panel): The desired δ and the prior vagueness pv. It holds that δ ≥ 0 and pv > 0. When specifying exact equality restrictions in BMS, δ must be set to δ = 0. When specifying about equality restrictions in BMS, δ must be set to δ > 0. In the latter case, one should carefully specify the restrictions (see Section A.4). The default recommendation of pv is pv = 2.
For all three methods, the hypotheses of interest must be given explicitly (in the then appearing panel). Specifying the order-restricted hypotheses will be explained in the next section. When you do not want to specify any order-restricted hypothesis, you should uncheck the Add (order-restricted) hypothesis/-es checkbox (see Figure 1). One should also specify whether one wants to evaluate the classical null hypothesis (i.e., H 0 : µ 1 = . . . = µ k ) and the classical alternative (i.e., H a : µ 1 , . . . , µ k ), here also called the unconstrained hypothesis (see Figure 1).
As discussed in Section A.5, the values for the seed value and the number of iterations can be specified. This is done in the Settings form (see Figure 3) appearing when pressing the Settings button.
More details on specifying the restrictions are given next.

B.3. Specifying the order-restricted hypotheses
An (order-restricted) hypothesis, say H 1 , can be specified in the panel appearing when clicking on the Add button in the Edit Hypothesis 1 panel (see Figure 4). Then fill in the hypothesis, that is, fill in the group numbers and the corresponding constraints between the means of these group. Note that a hypothesis can consist of multiple 'restrictions'. To add another restriction in, say H 1 , you must press the Add button in the Edit restriction(s) in H1 panel (see Figure 4).
As mentioned before, the numbers in a certain restriction always consists of the numbers 1 to "the number of groups" (in the example, 5) and each number is used precisely once. It should be noted that in case of 5 means and you want to evaluate H 1 : µ 3 > µ 1 > µ 4 , you should fill in H 1 : µ 3 > µ 1 > µ 4 , µ 2 , µ 5 or, better "3 > 1 > 4, 2, 5". If the entry is a number greater than "the number of groups" or the entry is not an integer, then the corresponding textbox will be made red and an error message is given in the ConfirmatoryANOVA.exe form. The check on using every number only once is done after pressing the Run button. In that case, an error message will be given in the Progress Report panel and in Error.txt (see also the next subsection).
One can also specify the classical H 0 and/or the classical H a as a hypothesis of interest. Note that one should include H 0 only when there is real interest in H 0 (Kuiper and Hoijtink 2010). We recommend to include H a (when doing model selection) as a safeguard for choosing a weak hypotheses (Kuiper and Hoijtink 2010). H 0 and H a do not need to be specified explicitly, one can just check the corresponding checkboxes (see Figure 1 and Figure 4).

B.4. Error messages
Before pressing the Run button, one should check whether the requirements are met. When not all requirements are met and you press the Run button, a popup appears with the text "Not all requirements are met yet.". The requirements are that the data are valid, the specification of δ and pv (if needed) is correct, and that the hypotheses (if any) are specified correctly. In case a requirement is met the checkbox is checked; otherwise, it is not checked and an error message is given in the ConfirmatoryANOVA.exe form.
Furthermore, error messages are built in to detect other wrongly stated input (e.g., when not using every number precisely once in a (order) restriction). A popup will appear with the text "Error in input. ... This run will be stopped.". These error messages will be given in the Progress Report panel in the ConfirmatoryANOVA.exe form (see Figure 1 and Figure 4) and in Error.txt (which will be created in the folder where ConfirmatoryANOVA.exe is saved in). In most cases the methods will not be performed and no output will be given.
When, for some reason, the error is not detected, a popup will appear with the text "The program has stopped. No methods are performed. Check input. . . .". In that case no output will be given. One should look at the input again, especially the input needed for the method during which the error occurred. When looking at the Progress Report panel one can get a better idea of during which method the error has occurred. It should be noted that the methods are performed in a fixed order: theF test is performed first, then the ORIC and the