GFD : An R Package for the Analysis of General Factorial Designs

Factorial designs are widely used tools for modeling statistical experiments in all kinds of disciplines, e.g., biology, psychology, econometrics and medicine. For testing null hypotheses in this framework, ANOVA methods are widely used. However, the corresponding F tests are only valid for normally distributed data with equal variances, two assumptions which are often not met in practice. The R package GFD provides an implementation of the Wald-type statistic (WTS), the ANOVA-type statistic (ATS) and a studentized permutation version of the WTS. Both the WTS and the permuted WTS do not require normally distributed data or variance homogeneity, whereas the ATS assumes normality. All methods are available for general crossed or nested designs and all main and interaction eﬀects can be plotted. Additionally, the package is equipped with an optional graphical user interface to facilitate application for a wide range of users. We illustrate the implemented methods for a range of diﬀerent designs.


Introduction
Originated in the agricultural sciences factorial designs are widely used tools for modeling statistical experiments in a variety of disciplines, e.g., biology, econometrics, medicine, ecology or psychology.For testing null hypotheses formulated in terms of means, analysis-of-variance (ANOVA) methods are well known, and preferred for making statistical inference.ANOVA methods are implemented in R within the function aov in the R package stats (R Core Team 2017).The anova function in this package as well as Anova in the car package (Fox and Weisberg 2011) provide clearly arranged ANOVA tables for fitted models.The corresponding F tests, however, are only valid under the assumption of normally distributed errors and equal variances across the different treatment groups.These assumptions are hard to verify in practice and often not met.A violation usually inflates the type-I or -II errors of the F statistics.
The accuracy of the F tests depends on the actual data distributions, sample size allocations, and the degree of variance heteroscedasticity.For normally distributed errors, several procedures for heteroscedastic data have been proposed, e.g., the generalized Welch-James test (Johansen 1980), the approximate degrees of freedom test (Zhang 2012) or the ANOVA-type test proposed by Brunner, Dette, and Munk (1997), see also Bathke, Schabenberger, Tobias, and Madden (2009).These tests control the type-1 error level in heteroscedastic designs quite accurately, but are in general not asymptotically exact for non-normal data.In comparison to that, the Wald-type statistic, see Equation 2 below, is asymptotically exact in general factorial designs without assuming variance homogeneity or normally distributed error terms.It is well known, however, that the Wald-type statistic requires large sample sizes to control the pre-assigned type-I error, see e.g., Vallejo, Fernández, and Livacic-Rojas (2010).Its small sample behavior may be improved by applying an adequate permutation procedure, see Pauly, Brunner, and Konietschke (2015) for the theoretical background.The only comparable test included in the R function oneway.test is the Welch (1951) test for heteroscedastic one-way layouts.Furthermore, an ANOVA-type test based on ranks is also implemented in the R package asbio (Aho 2017) within the functions BDM and BDM.2way for nonparametric one-and two-way layouts, respectively.
For a user friendly application of these rather robust methods in statistical data sciences, the R package GFD has been developed.The use of the main function GFD as well as its output are very similar to the aov function from the R package stats or the Anova function from the R package car (Fox and Weisberg 2011).Its application provides a descriptive overview of the data as well as the complete ANOVA-tables according to the formula input, which allows the modeling of arbitrary high-way layouts.Hereby the Wald-type statistic, a permuted version thereof as well as the ANOVA-type statistic for these general factorial designs are implemented.Both the Wald-type statistic as well as the permutation test neither assume normality nor homogeneous variances, while the ANOVA-type statistic assumes normality.Furthermore, all main and interaction effects can be plotted along with (1 − α) confidence intervals.In addition, the package is equipped with a graphical user interface (GUI) to facilitate application for a wide audience of statisticians, practitioners, and educational purposes.The package is freely available from the Comprehensive R Archive Network (CRAN) at https://CRAN.R-project.org/package=GFD.
The paper is organized as follows: In Section 2 we describe the statistical model and the tests used in this setting.In Section 3 we provide various examples for different settings which are statistically evaluated with the R package GFD.Finally, we discuss the results in Section 4 and provide an outlook to future work.
Throughout the paper we use the following notation: We denote by P a = I a − 1 a J a the a-dimensional centering matrix, I a is the a-dimensional unit matrix and J a denotes the a × a matrix of 1's, i.e., J a = 1 a 1 a , where 1 a = (1, . . ., 1) is the a-dimensional column vector of 1's.

Statistical model and inference methods
In order to cover different factorial designs, we consider the following general linear model where k = 1, . . ., n i is the experimental unit within class i = 1, . . ., a.Note that different sample sizes n i are admitted.For each fixed i the error terms ε ik are independent and identically distributed with E(ε i1 ) = 0 and VAR(ε i1 ) = σ 2 i > 0. Note that we neither assume normality of the error terms nor variance homoscedasticity.In this setting, a higher way factorial structure with crossed or nested factors can be achieved by splitting up the index i into sub-indices i 1 , i 2 , . . ., i p .In our notation, the components i = 1, . . ., a can be considered as a lexicographic order of the factor level combinations.
In this framework we like to test general linear null hypotheses H µ 0 : Hµ = 0 about the mean vector µ = (µ 1 , . . ., µ a ) .Here H denotes an adequate hypothesis contrast matrix of interest. Let ) denote the vector of group means and let In order to test the null hypotheses formulated above in this general framework, we consider two generalizations of the two-sample Welch t statistic: The Wald-type statistic (WTS) as discussed, e.g., in Pauly et al. (2015), and the ANOVA-type statistic (ATS) from Brunner et al. (1997).The WTS is given by (2) Here, M + denotes the Moore-Penrose inverse of a matrix M .It is well known that under rather weak assumptions the WTS has asymptotically a central χ 2 f distribution with f = rank(H) degrees of freedom under H µ 0 : Hµ = 0.However, the WTS requires large sample sizes to get a satisfactory approximation by using the quantiles of the limiting χ 2 distribution (Akritas, Arnold, and Brunner 1997;Akritas and Brunner 1997;Vallejo et al. 2010;Pauly et al. 2015).
A second generalization of the two-sample Welch statistic is the ANOVA-type statistic (ATS) defined as where T = H (HH ) − H.Following Brunner et al. (1997) the distribution of the ATS can be approximated by an F ( f , f0 )-distribution such that the first two moments coincide, i.e., by choosing Here D denotes the matrix of diagonal elements of T and Λ = diag((n 1 − 1) −1 , . . ., (n a − 1) −1 ) (Brunner et al. 1997;Brunner and Puri 2001).Note that in the two-sample case this approximation coincides with the Satterthwaite-t-approximation.However, the ATS is in general asymptotically exact only for normally distributed error terms.
Another possibility is to improve the small sample behavior of the WTS by applying a permutation procedure (Pauly et al. 2015).To describe this procedure in detail, let . ., a the diagonal matrix of empirical variances ( σ π i ) 2 under this permutation.Then, the permuted Wald-type statistic (WTPS) is given by which is the WTS as defined in Equation 2 calculated with the permuted observations.Now, a permutation test is achieved by the following steps: 1. Fix the data Y and compute the WTS Q N .
2. Permute the data randomly and obtain the value of Q π N .Safe this in A 1 .
4. Compute the p value by the (approximative) conditional permutation distribution (i.e., the empirical distribution of A 1 , . . ., A J ) as Instead of computing the p value for making statistical inference, the original WTS Q N can be compared with the (1 − α) quantile of the conditional distribution of Q π N given the data Y , i.e., the empirical quantile of A 1 , . . ., A J .Pauly et al. (2015) have shown that this algorithm yields a valid permutation approach and consistent level α test, i.e., the conditional distribution of the WTPS always approximates the null distribution of Q N .The test controls the preassigned level α under the null hypothesis and is even finitely exact if the pooled data is exchangeable under the hypothesis.Note that in the special case of a one-way layout the WTPS reduces to the permutation test for means of Chung and Romano (2013).The default value for the number of permutation runs in the R package GFD is nperm = J = 10, 000.
For practical recommendations we briefly summarize the main properties of the three considered tests from Pauly et al. (2015): Mathematically, only the WTS and WTPS provide valid asymptotic procedures for general factorial designs.Nevertheless, simulation studies demonstrate that the ATS controls the α level for finite samples rather satisfactory.In case of non-normal data, however, the test tends to be conservative, which leads to loss of power.The WTS, in contrast, is quite liberal for small to moderate sample sizes.The WTPS is a rather accurate procedure even for non-normal data.When data is very skewed and heteroscedastic, the test tends to be liberal and to over-reject the hypothesis, in particular when the larger sample has the smaller variance (so called negative pairing).Its liberality is, however, not as pronounced as for the WTS.
Note that in comparison the coin package (Hothorn, Hornik, van de Wiel, and Zeileis 2008), which contains permutation tests for two-and multiple-sample problems, does not, e.g., handle heteroscedastic shift models.In our more general situation we allow for different variances and/or different distributions among the different groups.Furthermore, the Welch test from the function oneway.test is also only an approximation for normally distributed models that is known to perform worse than the ATS and the WTPS, see e.g., Vallejo et al. (2010) and Pauly et al. (2015).Remark further, that the ANOVA-type tests from the R package asbio (Aho 2017) are based on ranks and test different null hypotheses formulated in terms of distribution functions instead of means.
For the calculation of the confidence intervals, we have used the corresponding quantiles of the t distribution.

Two-sample tests
A special case of model ( 1) is the heteroscedastic two-sample case, i.e., a = 2.This results in the extended Behrens-Fisher model which is usually analyzed using a Welch's t test in the statistic Its distribution is approximated by a t ν distribution with estimated Satterthwaite-Welch degree of freedom ν to account for variance heterogeneity.Another possibility to approximate the distribution of T N as defined in Equation 3 is to employ the studentized permutation distribution of T N , and to carry out the test as a permutation test as proposed by Janssen (1997Janssen ( , 2005)).
Note that the Wald-type statistic Q N , as well as the ATS A N are the square of T N in the two-sample case.Furthermore, both the statistics Q N and A N are identical in this setup; and the second degree of freedom f0 of the ATS is identical to the Satterthwaite-Welch degree of freedom.The first degree of freedom f is equal to 1, by definition.Thus the ATS test is essentially Welch's t test and the WTPS test is in fact Janssen's permutation test.

Examples
In this section, we provide examples demonstrating how different factorial designs can be analyzed using the GFD package.The function GFD returns an object of class 'GFD' from which the user may obtain plots and summaries of the results using plot(), print() and summary() methods, respectively.Here, print() returns a short summary of the results, i.e., the values of the test statistics along with degrees of freedom and corresponding p values whereas summary() also displays some descriptive statistics such as the means and variances for the different factor level combinations.Plotting is based on plotrix (Lemon 2006).For two-and higher-way layouts, the factors for plotting can be additionally specified in the plot call, see the examples below.
GFD(formula, data = NULL, nperm = 10000, alpha = 0.05) Note that the test statistics for the main effects considered in Section 2 are not changed by whether or not an additional interaction term is specified in formula since the tests are determined by the choice of the hypothesis matrix H.Only crossed and hierarchical (nested) designs are implemented -a mixture of both is up to date not available.Figure 2: Graphical user interfaces for plotting: The left GUI is for the one-way layout (no choice of factors possible), the other one is for a higher-way layout.An example for plotting interactions is given in the right panel.
Furthermore, the GFD package is equipped with an optional GUI, based on RGtk2 (Lawrence and Temple Lang 2010), which will be explained in detail in the next section.

Graphical user interface
The GUI is started in R with the command calculateGUI().Note that the GUI depends on RGtk2 and will only work if RGtk2 is installed.The user can specify the data location (either directly or via the "load data" button), the formula, the number of permutations and the significance level α, see Figure 1.Additionally, one can specify whether or not headers are included in the data file, and which separator and character symbols are used for decimals in the data file.The GUI also provides a plotting option, which generates a new window for specifying the factors to be plotted (in higher-way layouts) along with a few plotting parameters, see Figure 2. Note that four-and higher way interactions cannot be plotted due to the increasing complexity of the plots.

Two-sample tests
As an example of a two-sample problem we consider a subset of the weightgain data set (Hand, Daly, McConway, Lunn, and Ostrowski 1993) from the HSAUR package (Everitt and Hothorn 2017).The data contains information on the weight gain (in grams) of rats which were randomized to one of four diets, distinguished by the amount of protein (high and low) and the source of protein (beef and cereal).For our purposes, we first restrict our analysis to the high protein group.
The data may also be analyzed using the GUI, see Figure 3 for an example.The corresponding plot of the effect is given in Figure 4.

One-way layout
In a one-way layout, we are interested in the effect of factor A, i.e., we wish to test the null hypothesis An example for such a model is the data set on startup costs of companies, which was selected from the Business Opportunities Handbook, see Cengage College (2008).The data represent business startup costs in thousands of dollars for five different kinds of shops.
R> library("GFD") R> data("startup", package = "GFD") R> set.seed( 456  This example nicely demonstrates the liberal behavior of the WTS (p value = 0.0046) as well as the conservative behavior of the ATS (p value = 0.055).The WTPS, in contrast, is somewhere in between with a p value of 0.0246.

Two-way layout
In a two-way crossed design, with i = 1, . . ., a; j = 1, . . ., b; k = 1, . . ., n ij , one is interested in tests for the main effects of the factors A and B as well as for an interaction of the two, i.e., H 0 (A) : or formulated with suitable contrast matrices: We will again consider the weightgain data set from package HSAUR.This time, however, we are interested in analyzing both factors, i.e., amount and source of protein.
Figure 6 shows plots for the main effect of the factor type as well as the interaction between both factors.

Three-way layout
For the three-way example, we consider a data set on pizza delivery times (Mackisack 1994).

Nested design
A nested design is covered by the model where factor B is nested within the levels of factor A. As an example, we consider the curdies data set (Quinn, Lake, and Schreiber 1996) included in the GFD package.The aim of the study was to describe basic patterns of variation in a small flatworm, Dugesia, in the Curdies River, Western Victoria.Therefore, worms were sampled at two different seasons and three different sites within each season.For our analyses we consider both factors as fixed (e.g., some sites may only be accessed in summer).The R code for analyzing this nested design is given in the following: R> library("GFD") R> data("curdies", package = "GFD") R> set.seed( 987 In this setting, both WTS and WTPS detect a significant influence of the season whereas the ATS, again, only shows a borderline significance at 5% level.The effect of the site is not significant.A plot for the nested effect is given in Figure 8.

Conclusion and future work
The R package GFD implements a broad range of semi-parametric methods for the analysis of general factorial designs, i.e., linear models without the assumption of normality and/or homoscedastic variances across the treatment groups.Three different methods are implemented: Wald-type statistic Q N , ANOVA-type statistic A N as well as a permutation approach proposed by Pauly et al. (2015).All methods can be used to test general hypotheses among the main and interaction effects.In particular, nested designs can be analyzed using GFD.From a practical point of view we recommend the WTPS procedure since it has been found in Pauly et al. (2015) to posses both good finite type-I error rate control and power behavior.The ATS and WTS, in comparison, are slightly conservative or rather liberal, respectively.Confidence interval plots are available for all effects of interest -except of four-and higher-way interactions.
A graphical user interface (GUI) has been implemented which allows a convenient use of the software in industry, academia, and educational purposes.We plan to update the GFD package on a regular basis with new procedures available for the analysis of general designs.So far, ANOVA-based methods are implemented, and an adjustment of the treatment effects for covariates is not possible.Furthermore, tests and simultaneous confidence intervals for multiple comparisons based on the permutation approach are not yet available.The extension of the implemented methods to covariates and multiple comparisons and their implementation will be part of future research.

Figure 1 :
Figure 1: The GUI for tests in general factorial designs: The user can specify the data location, the formula, the number of permutations and the significance level α.One can additionally choose to plot the results.

Figure 3 :Figure 4 :
Figure 3: Graphical user interface with formula for the weightgain data set.

Figure 5 :
Figure 5: Mean startup costs for the five different companies in the startup data example.

Figure 6 :
Figure 6: Plots of the interaction of factors source and type in the weight gain data (left) and for factor source alone (right).

Figure 7 :
Figure 7: Plots of the three-way interaction (upper panel) and the two-way interaction between factors Coke and Crust (lower panel).

Figure 8 :
Figure 8: Plot for the effects in the nested design.The sites are nested within seasons.