Mixture Experiments in R Using mixexp

This article discusses the design and analysis of mixture experiments with R and illustrates the use of the recent package mixexp. This package provides functions for creating mixture designs composed of extreme vertices and edge and face centroids in constrained mixture regions where components are subject to upper, lower and linear constraints. These designs cannot be created by other R packages. mixexp also provides functions for graphical display of models fit to data from mixture experiments that cannot be created with other R packages.


Introduction
A mixture experiment occurs when the response variable, y, is a function of the relative proportions of components (x i , i = 1, . . . q) in a mixture, rather than a function of the total amount of each component. Since the proportions of components in a mixture must add to one, i.e., q i=1 x i = 1.0, the experimental region for mixture experiments is constrained, and traditional factorial or response surface designs are not appropriate. A comprehensive reference on the designs and methods of data analysis useful for mixture experiments is the book by Cornell (2002). Mixture experiments are widely used today in formulation experiments, blending experiments, and marketing choice experiments where the goal is to determine the most preferred attribute composition of a product at a given price (Ragavarao, Wiley, and Chitturi 2011).  Piepel (1997) gives an extensive review of commercial software with mixture experiment capabilities.
The lm funcion in base R (R Core Team 2016) can be used to fit models to mixture experiments. There are functions in R packages AlgDesign (Wheeler 2014) and qualityTools (Roth 2016) that allow the user to create Simplex Lattice Designs and Simplex Centroid Designs for unconstrained mixture experiments. The package mixexp (Lawson 2016) adds the ability to create designs comprised of extreme vertices of constrained experimental regions, possibly augmented with edge and face centroids, and other interior points in the simplex.
The R package qualityTools allows a user to visualize the relationship between the response and mixture components by making ternary contour plots and 3D wireframe plots of Scheffé mixture model in three components over the unconstrained simplex region. The package mixexp expands these capabilities by allowing the user to: 1) make contour plots of a greater variety of equations, 2) create contour plots within constrained regions bounded by pseudo components, 3) make contour plots of models fit to more than three components by holding one or more component(s) constant and, 4) create response trace plots along the Cox (1971) or Piepel (1982) directions through constrained regions in higher dimensional spaces.
The purpose of this article is to show how to design and analyze mixture experiments in R, and to simultaneously illustrate the basic features of the package mixexp. Details of the arguments and options for all the functions in mixexp can be found in Lawson (2016).

Models for the analysis of mixture experiments
Due to the fact that mixture components must sum to 1, the linear model that is normally written as y = β 0 + q i=1 β i x i + is redundant for data from mixture designs. Scheffé (1958) wrote the linear model for mixtures as: (1) The coefficients in Equation 1 have physical meaning. β i is the expected response at the vertex where x i = 1.0. Scheffé (1958) wrote the quadratic model for mixture experiments as: are popular alternatives to the quadratic model.
For experiments involving q mixture components and p process variables, so called mixtureprocess variable experiments, a model can be created by crossing a mixture component model,  like Equation 2, with a standard linear or response surface model in the process variables. For example, crossing a Scheffé quadratic model in q mixture components (x i , i = 1, . . . q) with a linear plus linear by linear interactions model in p process variables (z j , j = 1 . . . p) results in Equation 5.
However, the number of coefficients in Equation 5 increases rapidly with the number of mixture components and process variables. Kowalski, Cornell, and Vining (2000) proposed a more parsimonious second order model given in Equation 6. This model assumes there is no linear effect of the process variables on the nonlinear mixture component blending. Equation 6 is often adequate for mixture-process experiments.

Standard mixture designs in unconstrained regions
In this section we illustrate the use of mixexp functions for creating standard designs for mixture experiments. Scheffé (1958) proposed the simplex-lattice designs denoted SLD(q,k) based on the interpretation of the coefficients for the Scheffé linear, quadratic and cubic models for mixture experiments. The mixexp function SLD creates these designs. Below is an example of creating a SLD(4,2) design. The q in SLD(q,k) represents the number of mixture components and k represents the number of levels of each component. A k-level design supports fitting a Scheffé model of order k. Cornell (2002) described alternate designs called a simplex-centroid designs that can be used to collect data appropriate for fitting the Scheffé special-cubic model. The mixexp function SCD creates these designs. Below is an example of creating a simplex-centroid design with three mixture components and storing it in the data frame des.

R> des <-SCD(3) R> DesignPoints(des)
When there are only three mixture components, the mixexp function DesignPoints can be used to display the design graphically in the simplex experimental region. The result of the call to DesignPoints above is displayed in Figure 1.

Mixture designs in constrained regions
In many mixture experiments each component in the mixture can only be varied within specific lower and upper constraints. In this case, the experimental region is an irregular hyperpolyhedron rather than a simplex, and the Simplex Lattice and Simplex Centroid designs are not appropriate. In constrained experimental regions, McLean and Anderson (1966) recommend experimental designs that consist of all the extreme vertices of the constrained region, possibly augmented by edge and facet centroids. Piepel (1988) published Fortran code for generating extreme vertices and various dimensional centroids of linearly constrained regions. The function crvtave in mixexp calls this Fortran code to create a design containing the extreme vertices and specified centroids. The function Xvert is a front end for crvtave. It takes as inputs vectors of upper and lower constraints for each mixture component, and bounds and coefficients for linear constraint equations. Next, it sets up the matrix defined by Piepel (1988), then calls the crvtave function to create the design. It returns a data frame with the vertices, requested edge and facet centroids, and the overall centroid. For example an experimental design consisting of the extreme vertices augmented by the edge and overall centroids defined by the constraint equations: can be produced by the commands below.  Snee (1979) recommended that linear constraints be normalized by dividing all constants in the equation by the largest constant for numerical stability. This was not necessary in this example, since the largest constant in the linear constraint equations was 1. The code to create the design and the results are shown below. When there are constraints on several mixture components, the number of extreme vertices may become very large. Snee and Marquardt (1976) discussed an experiment in product development that involved eight mixture components with the following constraints.
The function Xvert finds 182 vertices for this constrained region. It was only feasible for the experimenters to complete a maximum of 20 experiments in order to detect which mixture components had large effects. Snee and Marquardt (1976) created a design by selecting a D-optimal subset of 16 vertices augmented by 4 overall centroids. A D-optimal set of sixteen vertices can be found using the optFederov function in the package AlgDesign in conjunction with the Xvert function. In the code below, a candidate data frame, cand (consisting of extreme vertices of the constrained region), is created using the Xvert function. Next, the run numbers are removed, and the reduced data frame is stored in candm. Finally the optFederov function is used to create the D-optimal subset that is stored in ScreeMix.

Augmenting designs with interior points
The designs produced by the SLD, SCD or Xvert functions include design points around the perimeter of the experimental region, possibly augmented by the centroid. Sometimes it is desirable to have additional interior points for checking the fit of a model. The mixexp function Fillv can add interior points to a design, created by SLD, SCD or Xvert, by averaging all possible pairs of design points. Lawson and Erjavec (2001) describe an experiment where an extreme vertices design was used to study the fixed carbon resulting from a coke briquette composed of a mixture of x 1 = calcinate, x 2 = tar free solids, and x 3 = tar solids, and baked at a constant temperature. Constraints on the mixture are shown below.
The following code illustrates the use of the Xvert function to create the extreme vertices design called coke, and the use of the Fillv function to add additional interior points to the design coke to create the design cokef. Xvert automatically created the graph of the design points in coke on the left side of Figure 3. The DesignPoints function was used to display the design cokef graphically on the right side of

Designs for mixture experiments with process variables
To create a design for fitting the fully crossed mixture-process model (Equation 5 ), a SLD(q,2) design can be crossed with a 2 p design in the process variables. With three mixture components and two process variables, this design can be visualized in two ways in Figure 4.
This design can be created by merging (with the R merge function) a mixture design created with the SLD function with a factorial design created with the the R expand.grid function as shown in the code below Figure 4.

Fitting mixture experiment models with R
Scheffé mixture models like Equations 1-4 and more complicated models like Equation 5 can be fit to the data from mixture experiments using the lm function. For example the code below illustrates fitting the special-cubic model (Equation 4) to the data from silicon wafer etching experiment described by Myers and Montgomery (2002).
When fitting models that do not include an intercept, Cornell (2002) points out that the R 2 statistic from a standard least-squares model fit will be inflated, due to the fact that the total sum of squares is not corrected for the mean. This gives the impression that the model fits better than it actually does. By including the intercept and leaving out one of the linear terms (i.e., x1), the lm function will produce the correct R 2 , but the intercept reported in the model summary will actually be the coefficient for the term left out (i.e.,β I 0 =β N I 1 ) where I is the model including the intercept without x1, and N I is the no-intercept model that includes x1. The coefficients for the remaining linear terms in the model that includes an intercept (I), are actually the differences of the the coefficients for those terms minus the coefficient for the term left out (i.e.,β I i =β N I i −β N I 1 ). The estimated variances of the coefficients of linear terms in the model including an intercept (I) will likewise be functions of the variances and covariances of the estimated coefficients in the no-intercept model (N I). As an alternative to using lm to fit involving mixture components, the mixexp function MixModel fits the models 1-6 (shown in Section 2) and prints the correct coefficients, standard errors, and R 2 . An example is shown below with the data from Myers and Montgomery (2002)'s silicon wafer etching experiment.
R> MixModel(frame = etch, "erate", mixcomps = c("x1", "x2", "x3"), model = 4) The output is shown below which matches the results in Myers and Montgomery (2002 To illustrate the analysis of a mixture experiment with process variables, consider the problem discussed by Sahni, Piepel, and Naes (2009). They studied a process to produce low-fat mayonnaise. The product was a mixture of three components x 1 = stabilizer, x 2 = starch 1, and x 3 = starch 2. The response they were interested in was the viscosity of the final product that was influenced not only by the ingredients, but also by two process variables: z 1 =heat exchanger temperature and z 2 = the flow rate through the system. The goal was to achieve a viscosity of 3657 at the lowest cost. The constraints on the mixture components are shown below: The data from the crossed extreme-vertices design in the mixture components and a 2 2 factorial design (augmented by a center point) in the process variables is shown in Table 1. This data set is stored in the data frame MPV which is part of the daewr package (Lawson 2015b) that contains the data frames and functions from the book Design and Analysis of Experiments with R (Lawson 2015a). Equation 5 can be fit to this data using the commands below the table.

Contour plots
The function contourPlot3 in the R package qualityTools can be used to make ternary contour plots in the unconstrained mixture-simplex design space. However, if there are constraints on the mixture components, it might not show much detail in the constrained region.  Table 4.1. In this code, the design is created by the Xvert function in mixexp, and the R function lm is used to fit the quadratic mixture model 2. The object lm object quadm is the first argument to the ModelPlot function.

Actual Component Space
The ModelPlot function in mixexp can also make mixture-contour plots of equations involving more than three components by holding some components fixed at values chosen by the user. For example, consider Cornell (2002)'s application of blending four chemical pesticides for control of mites. The design was a simplex-centroid design that is created by the SCD function in the code below. The special-cubic model (Equation 4) was fit using the MixModel function. In the first time through the loop, the ModelPlot function is called with the argument slice = list(x4 = 0.0) that holds x 4 =Dibrom constant at 0.0. This produced the contour plot on the left side of Figure 6. In the second time through the loop, ModelPlot function is called with the argument slice = list(x4 = 0.4) that fixes x 4 =Dibrom at 0.4 and produces the plot in the right side of Figure 6.

Effect plots
When there are more than three components in a mixture, another way of representing the fitted surface is by plotting the predicted responseŷ(x) along each component axis on the same graph (Cornell 2002). Cox (1971)  R> ModelEff(nfac = 4, mod = 4, dir = 2, ufunc = mitemdM, + dimensions = list("x1", "x2", "x3", "x4")) The argument nfac specifies the number of components in the model and the argument mod = 4 specifies that a special cubic model will be supplied, which is given in the lm object specified by ufunc=mitemdM. The argument dimensions gives the names (in quotes) of the mixture components in the lm object mitemdM. Finally, the dir argument specifies whether the plot should be made on the Piepel = 1 direction or Cox = 2 direction. The lm object can be created by the MixModel function, or by the R function lm, as long as the terms in the model are specified in the same order they would be specified by MixModel. The resulting plot is shown in Figure 8.
In this plot it can be seen that decreasing the proportions of either x 2 = Omite or x 3 = Kelthane in the mixture causes the average percentage of mites remaining (relative to the number before spraying) to decrease from more than 20 percent to near 0 percent. This plot also shows that the optimal proportion of x 4 = Dibrom for reducing the average percentage of mites is near its value at the centroid of the design, and that changing the proportion of x 1 = Vendex in the mixture has the smallest effect on mite control. Snee and Marquardt (1976)  the statistical significance of their coefficients since the physical interpretation of these coefficients is the predicted responses at the pure component mixtures (which may be out of the constrained experimental region). However, when the response trace along a component axis is a flat horizontal line, it indicates that changing the proportion of that component has little effect on the response. The data for the 16-run screening experiment in eight components discussed in Section 3.2, and described by Snee and Marquardt (1976), is included in the data frame SneeMq in the mixexp package. The code below produces the plot shown in Figure 9.
The plot shows the response trace for x 6 to be essentially flat, and Snee and Marquardt (1976) concluded the amount of this component in the mixture had little effect on the response. The plot also shows that the response traces for x 2 and x 3 nearly overlap. In addition, the traces for x 1 and x 4 and the traces for x 5 , x 7 and x 8 nearly overlap. With these graphical clues, Snee and Marquardt (1976) found that within that the standard errors of the fitted coefficients β 2 ≈ β 3 , β 1 ≈ β 4 , and β 5 ≈ β 7 ≈ β 8 . This means that the effect of changing the proportions of x 2 or x 3 ; or x 1 or x 4 ; or x 5 , x 7 or x 8 is essentially the same. Based on this observation, Snee and Marquardt (1976) concluded the model could be reduced to three dimensions by defining x 1 = (x 2 + x 3 )/(1 − x 6 ), x 2 = (x 1 + x 4 )/(1 − x 6 ), and x 3 = (x 5 + x 7 + x 8 )/(1 − x 6 ). For models 1, 2, and 4 in Section 2, there is a simpler function EffPlot that creates response traces along the Cox or Piepel component axes when there is a data frame containing the design with the mixture components x1xnfac as the first nfac columns and a response y as the last column. This is the format of SneeMq. Therefore, the plot in Figure 9 can be produced with the following code.
R> data("SneeMq") R> Effplot(des = SneeMq, mod = 1, dir = 1) There is no need to include the lower and upper constraints or a user function as arguments to EffPlot since this function finds the constraints in the data frame and fits the model internally.
When there are process variables in a mixture experiment, ModelEff can be used to make response traces along the Cox or Piepel component axes, with the process variables held constant. For example, the code on the next page illustrates making plots at the low and high values of the coded process variable z = (RP M − 65)/20 in the experiment (reported by Gallant, Prickett, Cesarec, and Bruck (2008)) to determine the effects of mixture components and a process variable on the burning rate of composite rocket propellants.

Discussion and future directions
Design and analysis of mixture experiments in R is enhanced by functions in the mixexp package. This package provides functions for creating mixture designs in unconstrained regions. It also uses Piepel (1988)'s Fortran code to create designs in linearly constrained regions. In conjunction with the Fillv function, which creates interior points, and the AlgDesign package an even wider variety of mixture designs such as screening designs, mixture-process variables designs, and blocked mixture experiments can be created. Model fitting for mixture experiments can be accomplished with the R lm function. mixexp also provides functions for ternary contour plots and response trace plots of models fit to mixture experiments. This enhances the graphical capabilities for mixture experiments available in R.
When process variables are included in mixture experiments, it is often convenient to run the experiments in a split-plot arrangement. For example, if it is time consuming to make the mixtures, it might be convenient to make a large batch of the first mixture of components in the design, and then subject parts of this large batch to different combination of levels of the process variables while the next mixture batch is being made.
When a mixture-process experiment is performed in this way, a split-plot arrangement results and a mixed model should be fit to the data. This can be done using the lmer function in the lme4 package (Bates, Mächler, Bolker, and Walker 2015). Goos and Donev (2007) proposed efficient designs for mixture-process experiments in split-plot arrangements. While there is no ability to create their designs in R, Goos and Donev (2007) have provided Fortran code to create these designs. Future plans are to use this Fortran code by calling it with a function in mixexp similar to the way that Piepel (1988)'s code is called for creating designs in constrained regions.