HE Plots for Repeated Measures Designs

Hypothesis Error (HE) plots, introduced in Friendly (2007), provide graphical methods to visualize hypothesis tests in multivariate linear models, by displaying hypothesis and error covariation as ellipsoids and providing visual representations of eﬀect size and signiﬁcance. These methods are implemented in the heplots package for R (Fox, Friendly, and Monette 2007) and SAS (Friendly 2006), and apply generally to designs with ﬁxed-eﬀect factors (MANOVA), quantitative regressors (multivariate multiple regression) and combined cases (MANCOVA). This paper describes the extension of these methods to repeated measures designs in which the multivariate responses represent the outcomes on one or more “within-subject” factors. This extension is illustrated using the heplots package for R . Examples describe one-sample proﬁle analysis, designs with multiple between-S and within-S factors, and doubly-multivariate designs, with multivariate responses observed on multiple occasions.


Introduction
Hypothesis Error (HE) plots, introduced in Friendly (2007), provide graphical methods to visualize hypothesis tests in multivariate linear models, by displaying hypothesis and error covariation as ellipsoids and providing visual representations of effect size and significance. The heplots package (Fox et al. 2007) for R (R Development Core Team 2010) implements these methods for the general class of the multivariate linear model (MVLM) including fixed-effect factors (MANOVA), quantitative regressors (multivariate multiple regression (MMREG)) and combined cases (MANCOVA). Here, we describe the extension of these methods to repeated measures designs in which the multivariate responses represent the outcomes on one or more "within-subject" factors. This vignette also appears in the Journal of Statistical Software (Friendly 2010).

Multivariate linear models: Notation
To set notation, we express the MVLM as where, Y ≡ (y 1 , y 2 , . . . , y p ) is the matrix of responses for n subjects on p variables, X is the design matrix for q regressors, B is the q × p matrix of regression coefficients or model parameters and U is the n × p matrix of errors, with vec(U ) ∼ N p (0, I n ⊗ Σ), where ⊗ is the Kronecker product.
A convenient feature of the MVLM for general multivariate responses is that all tests of linear hypotheses (for null effects) can be represented in the form of a general linear test, where L is a matrix of constants whose rows specify h linear combinations or contrasts of the parameters to be tested simultaneously by a multivariate test. In R all such tests can be carried out using the functions Anova() and linearHypothesis() in the car package. 1 For any such hypothesis of the form Eqn.
(2), the analogs of the univariate sums of squares for hypothesis (SS H ) and error (SS E ) are the p × p sum of squares and crossproducts (SSP) matrices given by (Timm 1975, Ch. 3,5): and where U = Y − X B is the matrix of residuals. Multivariate test statistics (Wilks' Λ, Pillai trace, Hotelling-Lawley trace, Roy's maximum root) for testing Eqn.
(2) are based on the s = min(p, h) non-zero latent roots of HE −1 and attempt to capture how "large" H is, relative to E in s dimensions. All of these statistics have transformations to F statistics giving either exact or approximate null hypothesis F distributions. The corresponding latent vectors give a set of s orthogonal linear combinations of the responses that produce maximal univariate F statistics for the hypothesis in Eqn.
(2); we refer to these as the canonical discriminant dimensions.
In a univariate, fixed-effects linear model, it is common to provide F tests for each term in the model, summarized in an analysis-of-variance (ANOVA) table. The hypothesis sums of squares, SS H , for these tests can be expressed as differences in the error sums of squares, SS E , for nested models. For example, dropping each term in the model in turn and contrasting the resulting residual sum of squares with that for the full model produces so-called Type-III tests; adding terms to the model sequentially produces so-called Type-I tests; and testing each term after all terms in the model with the exception of those to which it is marginal produces so-called Type-II tests. Closely analogous MANOVA tables can be formed similarly by taking differences in error sum of squares and products matrices (E) for such nested models. Type I tests are sensible only in special circumstances; in balanced designs, Type II and Type III tests are equivalent. Regardless, the methods illustrated in this paper apply to any multivariate linear hypothesis.

Data ellipses and ellipsoids
In what follows, we make extensive use of ellipses (or ellipsoids in 3+D) to represent joint variation among two or more variables, so we define this here. The data ellipse (or covariance ellipse), described by Dempster (1969) and Monette (1990), is a device for visualizing the relationship between two variables, Y 1 and Y 2 . Let D 2 M (y) = (y − y) T S −1 (y − y) represent the squared Mahalanobis distance of the point y = (y 1 , y 2 ) T from the centroid of the data y = (Y 1 , Y 2 ) T . The data ellipse E c of size c is the set of all points y with D 2 M (y) less than or equal to c 2 : E c (y; S,y) ≡ y: (y − y) T S −1 (y − y) ≤ c 2 Here, S = n i=1 (y − y) T (y − y)/(n − 1) = Var(y) is the sample covariance matrix. Many properties of the data ellipse hold regardless of the joint distribution of the variables, but if the variables are bivariate normal, then the data ellipse represents a contour of constant density in their joint distribution. In this case, D 2 M (y) has a large-sample χ 2 distribution with 2 degrees of freedom, and so, for example, taking c 2 = χ 2 2 (0.95) = 5.99 ≈ 6 encloses approximately 95 percent of the data. Taking c 2 = χ 2 2 (0.68) = 2.28 gives a bivariate analog of the univariate ±1 standard deviation interval, enclosing approximately 68% of the data.
The generalization of the data ellipse to more than two variables is immediate: Applying Equation 5 to y = (y 1 , y 2 , y 3 ) T , for example, produces a data ellipsoid in three dimensions. For p multivariate-normal variables, selecting c 2 = χ 2 p (1−α) encloses approximately 100(1−α) percent of the data. 2

HE plots
The essential idea behind HE plots is that any multivariate hypothesis test Eqn. (2) can be represented visually by ellipses (or ellipsoids in 3D) which express the size of co-variation against a multivariate null hypothesis (H) relative to error covariation (E). The multivariate tests, based on the latent roots of HE −1 , are thus translated directly to the sizes of the H ellipses for various hypotheses, relative to the size of the E ellipse. Moreover, the shape and orientation of these ellipses show something more-the directions (linear combinations of the responses) that lead to various effect sizes and significance.
In these plots, the E matrix is first scaled to a covariance matrix (E/df e = Var(U i )). The ellipse drawn (translated to the centroid y of the variables) is thus the data ellipse of the residuals, reflecting the size and orientation of residual variation. In what follows (by default), we always show these as "standard" ellipses of 68% coverage. This scaling and translation also allows the means for levels of the factors to be displayed in the same space, facilitating interpretation.
The ellipses for H reflect the size and orientation of covariation against the null hypothesis. They always proportional to the data ellipse of the fitted effects (predicted values) for a given hypothesized term. In relation to the E ellipse, the H ellipses can be scaled to show either the effect size or strength of evidence against H 0 (significance).
For effect size scaling, each H is divided by df e to conform to E. The resulting ellipses are then exactly the data ellipses of the fitted values, and correspond visually to multivariate analogs of univariate effect size measures (e.g., (ȳ 1 −ȳ 2 )/s where s=within group standard deviation). That is, the sizes of the H ellipses relative to that of the E reflect the (squared) differences and correlation of the factor means relative to error covariation.
For significance scaling, it turns out to be most visually convenient to use Roy's largest root test as the test criterion. In this case the H ellipse is scaled to H/(λ α df e ) where λ α is the critical value of Roy's statistic. Using this gives a simple visual test of H 0 : Roy's test rejects H 0 at a given α level if and only if the corresponding α-level H ellipse extends anywhere outside the E ellipse. 3 Consequently, when the rank of H = min(p, h) ≤ 2, all significant effects can be observed directly in 2D HE plots; when rank(H) = 3, some rotation of a 3D plot will reveal each significant effect as extending somewhere outside the E ellipsoid.
In our R implementation, the basic plotting functions in the heplots package are heplot() and heplot3d() for mlm objects. These rely heavily on the Anova() and other functions from the car package (Fox 2009) for computation. For more than three response variables, all pairwise HE plots can be shown using a pairs() function for mlm objects. Alternatively, the related candisc package ) produces HE plots in canonical discriminant space. This shows a low-rank 2D (or 3D) view of the effects for a given term in the space of maximum discrimination, based on the linear combinations of responses which produce maximally significant test statistics. See Friendly (2007);Fox, Friendly, and Monette (2009) for details and examples for between-S MANOVA designs, MMREG and MANCOVA models.

Repeated measures designs
The framework for the MVLM described above pertains to the situation in which the response vectors (rows, y T i of Y n×p ) are iid and the p responses are separate, not necessarily commensurate variables observed on individual i.
In principle, the MVLM extends quite elegantly to repeated-measure (or within-subject) designs, in which the p responses per individual can represent the factorial combination of one or more factors that structure the response variables in the same way that the between-individual design structures the observations. In the multivariate approach to repeated measure data, the same model Eqn. (1) applies, but hypotheses about between-and within-individual variation are tested by an extended form of the general linear test Eqn. (2), which becomes where M is a matrix of constants whose columns specify k linear combinations or contrasts among the responses, corresponding to a particular within-individual effect. In this case, the H and E matrices for testing Eqn. (6) become and This may be easily seen to be just the ordinary MVLM applied to the transformed responses Y M which form the basis for a given within-individual effect. The idea for this approach to repeated measures through a transformation of the responses was first suggested by Hsu (1938) and is discussed further by Rencher (1995) and Timm (1980). In what follows, we refer to hypotheses pertaining to between-individual effects (specified by L) as "between-S" and hypotheses pertaining to within-individual effects (M ) as "within-S." Between-S effect tested In the general case, various L matrices provide contrasts or select the particular coefficients tested for between-S effects, while various M matrices specify linear combinations of responses for the within-S effects. This is illustrated in Table 2 for a three-way design with two between-S factors (A, B) and one within-S factor (C).
The between-S terms themselves are tested using the unit vector M = (1 p ), giving a test based on the sums over the within-S effects. This simply reflects the principle of marginality, by which effects for any term in a linear model are tested by averaging over all factors not included in that term. Tests using a matrix M of contrasts for a within-S effect provide tests of the interactions of that effect with each of the between-S terms. That is, LBM = 0 tests between-S differences among the responses transformed by M .
For more than one within-S factor, the full M matrices for various within-S terms are generated as Kronecker products of the one-way M contrasts with the unit vector (1) of appropriate size. For example, with c levels of factor C and d of factor D, Each of the within-S terms combine with any between-S terms in an obvious way to give an extended version of Table 2 with additional rows for M D and M CD .
In passing, we note that all software (SAS, SPSS, R, etc.) that handles repeated measure designs through this extension of the MLM effectively works via the general linear test Eqn.
(2), with either implicit or explicit specifications for the L and M matrices involved in testing any hypothesis for between-or within-S effects. This mathematical elegance is not without cost, however. The MLM approach does not allow for missing data (a particular problem in longitudinal designs), and the multivariate test statistics (Wilks' Λ, etc.) assume the covariance matrix of U is unstructured. Alternative analysis based on mixed (or hierarchical) models, e.g. (Pinheiro and Bates 2000;Verbeke and Molenberghs 2000) are more general in some ways, but to date visualization methods for this approach remain primitive and the mixed model analysis does not easily accommodate multivariate responses.
The remainder of the paper illustrates these MLM analyses, shows how they may be performed in R, and how HE plots can be used to provide visual displays of what is summarized in the multivariate test statistics. We freely admit that these displays are somewhat novel and take some getting used to, and so this paper takes a more tutorial tone. We exemplify these methods in the context of simple, one-sample profile analysis (Section 3), designs with multiple between-and within-S effects (Section 4), and doubly-multivariate designs (Section 5), where two or more separate responses (e.g., weight loss and self esteem) are each observed in a factorial structure over multiple within-S occasions. In Section 6 we describe a simplified interface for these plots in the development versions of the heplots and car packages. Finally (Section 7) we compare these methods with visualizations based on the mixed model.

One sample profile analysis
The simplest case of a repeated-measures design is illustrated by the data on vocabulary growth of school children from grade 8 to grade 11, in the data frame VocabGrowth, recording scores on the vocabulary section of the Cooperative Reading Test for a cohort of 64 students.
(The scores are scaled to a common, but arbitrary origin and unit of measurement, so as to be comparable over the four grades.) Since these data cover an age range in which physical growth is beginning to decelerate, it is of interest whether a similar effect occurs in the acquisition of new vocabulary. Thus, attention here is arguably directed to polynomial trends in grade: average rate of change (slope, or linear trend) and shape of trajectories (quadratic and cubic components). A boxplot of these scores ( Figure 1) gives an initial view of the data. To do this, we first reshape the data from wide to long format (i.e., each 4-variate row becomes four rows indexed by grade). We can see that vocabulary scores increase with age, but the trend of means appears non-linear. The standard univariate and multivariate tests for the differences in vocabulary with grade can be carried out as follows. First, we fit the basic MVLM with an intercept only on the right-hand side of the model, since there are no between-S effects. The intercepts estimate the means at each grade level, µ 8 , . . . , µ 11 . R> (Vocab.mod <-lm(cbind(grade8,grade9,grade10,grade11)~1, data=VocabGrowth)) Call: lm(formula = cbind(grade8, grade9, grade10, grade11)~1, data = VocabGrowth) Coefficients: grade8 grade9 grade10 grade11 (Intercept) 1.14 2.54 2.99 3.47 We could test the multivariate hypothesis that all means are simultaneously zero, µ 8 = µ 9 = µ 10 = µ 11 = 0. This point hypothesis is the simplest case of a multivariate test under Eqn.
(2), with L = I. This hypothesis tests that the vocabulary means are all at the arbitrary origin for the scale. Often this test is not of direct interest, but it serves to illustrate the H and E matrices involved in any multivariate test, their representation by HE plots, and how we can extend these plots to the repeated measures case.
The H and E matrices can be printed with summary(Vocab.aov0), or extracted from the Anova.mlm object. In this case, H is simply nyy T and E is the sum of squares and crossproducts of deviations from the column means, n i=1 (y i − y) T (y i − y). The HE plot for the Vocab.mod model shows the test for the (Intercept) term (all means = 0). To emphasize that the test is assessing the (squared) distance ofȳ from 0, in relation to the covariation of observations around the grand mean, we define a simple function to mark the point hypothesis H 0 = (0, 0). R> mark.H0 <-function(x=0, y=0, cex=2, pch=19, col="green3", lty=2, pos=2) { points(x,y, cex=cex, col=col, pch=pch) text(x,y, expression(H[0]), col=col, pos=pos) if (lty>0) abline(h=y, col=col, lty=lty) if (lty>0) abline(v=x, col=col, lty=lty) } Here we show the HE plot for the grade8 and grade9 variables in Figure 2. The E ellipse reflects the positive correlation of vocabulary scores across these two grades, but also shows that variability is greater in Grade 8 than in Grade 9. Its position relative to (0,0) indicates that both means are positive, with a larger mean at Grade 9 than Grade 8. R> heplot(Vocab.mod, terms="(Intercept)", type="III") R> mark.H0(0,0) R> title(expression(paste("Multivariate test of ", H[0], " : ", bold(mu)==0))) The H ellipse plots as a degenerate line because the H matrix has rank 1 (1 df for the MANOVA test of the Intercept). The fact that the H ellipse extends outside the E ellipse (anywhere) signals that this H 0 is clearly rejected (for some linear combination of the response variables). Moreover, the projections of the H and E ellipses on the grade8 and grade9 axes, showing H widely outside E, signals that the corresponding univariate hypotheses, µ 8 = 0 and µ 9 = 0 would also be rejected.

Testing within-S effects
For the Anova() function, the model for within-S effects-giving rise the M matrices in Eqn. (6)-is specified through the arguments idata (a data frame giving the factor(s) used in the intra-subject model) and idesign (a model formula specifying the intra-subject design). That is, if Z = [idata], the M matrices for different intra-subject terms are generated from columns of Z indexed by the terms in idesign, with factors and interactions expanded expanded to contrasts in the same way that the design matrix X is generated from the between-S design formula.
Thus, to test the within-S effect of grade, we need to construct a grade variable for the levels of grade and use this as a model formula, idesign=~grade to specify the within-S design in the call to Anova.
R> idata <-data.frame(grade=ordered(8:11)) R> (Vocab.aov <-Anova(Vocab.mod, idata=idata, idesign=~grade, type="III")) As shown in Table 2, any such within-S test is effectively carried out using a transformation Y to Y M , where the columns of M provide contrasts among the grades. For the overall test of grade, any set of 3 linearly independent contrasts will give the same test statistics, though, of course the interpretation of the parameters will differ. Specifying grade as an ordered factor (grade=ordered(8:11)) will cause Anova() to use the polynomial contrasts shown in M poly below.
Alternatively, M f irst would test the gains in vocabulary between grade 8 (baseline) and each of grades 9-11, while M last would test the difference between each of grades 8-10 from grade 11. (In R, these contrasts are constructed with M.first <-contr.sum(factor(11:8))[4:1,3:1], and M.last <-contr.sum(factor(8:11)) respectively.) In all cases, the hypothesis of no difference among the means across grade is transformed to an equivalent multivariate point hypothesis, M µ y = 0, such as we visualized in Figure 2.
Correspondingly, the HE plot for the effect of grade can be constructed as follows. For expository purposes we explicitly transform Y to Y M , where the columns of M provide contrasts among the grades reflecting linear, quadratic and cubic trends using M poly .
Using M poly , the MANOVA test for the grade effect is then testing H 0 : M µ y = 0 ↔ µ Lin = µ Quad = µ Cubic = 0. That is, the means across grades 8-11 are equal if and only if their linear, quadratic and cubic trends are simultaneously zero.
We can show this test visually as follows. Figure 3(a) shows a scatterplot of the transformed linear and quadratic trend scores, overlayed with a 68% data ellipse. Figure 3(b) is the corresponding HE plot for these two variables. Thus, we can see that the E ellipse is simply the data ellipse of the transformed vocabulary scores; its orientation indicates a slight tendency for those with greater slopes (gain in vocabulary) to have greater curvatures (leveling off earlier). Figure 3 is produced as follows: R> op <-par(mfrow=c(1,2)) R> data.ellipse(trends[,1:2], xlim=c (-4,8), ylim=c(-3,3), levels=0.68, main="(a) Data ellipse ") R> mark.H0(0,0) R> heplot(within.mod, terms="(Intercept)", col=c("red", "blue"), type="III", term.labels="Grade", , xlim=c (-4,8), ylim=c(-3,3), main="(b) HE plot for Grade effect") R> mark.H0(0,0) R> par(op) We interpret Figure 3(b) as follows, bearing in mind that we are looking at the data in the transformed space (Y M ) of the linear (slopes) and quadratic (curvatures) of the original data (Y ). The mean slope is positive while the mean quadratic trend is negative. That is, overall, vocabulary increases with Grade, but at a decreasing rate. The H ellipse plots as a degenerate line because the H matrix has rank 1 (1 df for the MANOVA test of the Intercept). Its projection outside the E ellipse shows a highly significant rejection of the hypothesis of equal means over Grade.
In such simple cases, traditional plots (boxplots, or plots of means with error bars) are easier to interpret. HE plots gain advantages in more complex designs (two or more between-or within-S factors, multiple responses), where they provide visual summaries of the information used in the multivariate hypothesis tests.

Between-and within-S effects
When there are both within-and between-S effects, the multivariate and univariate hypotheses tests can all be obtained together using Anova() with the idata and idesign specifying the within-S levels and the within-S design, as shown above. linearHypothesis() can be used to test arbitrary contrasts in the within-or between-effects.
However, to explain the visualization of these tests for within-S effects and their interactions using heplot() and related methods it is again convenient to explicitly transform Y → Y M for a given set of within-S contrasts, in the same way as done for the VocabGrowth data. See Section 6 for simplified code producing these HE plots directly, without the need for explicit transformation.
To illustrate, we use the data from O' Brien and Kaiser (1985) contained in the data frame OBrienKaiser in the car package. The data are from an imaginary study in which 16 female and male subjects, who are divided into three treatments, are measured at a pretest, posttest, and a follow-up session; during each session, they are measured at five occasions at intervals of one hour. The design, therefore, has two between-subject and two within-subject factors.
For simplicity here, we initially collapse over the five occasions, and consider just the within-S effect of session, called session in the analysis below. Note that the between-S design is unbalanced (so tests based on Type II sum of squares and crossproducts are preferred, because they conform to the principle of marginality).
R> The factors gender and treatment were specified with the following contrasts, L gender , and L treatment , shown below. The contrasts for treatment are nested (Helmert) contrasts testing a comparison of the control group with the average of treatments A and B (treatment1) and the difference between treatments A and B (treatment2).
R> contrasts(OBK$gender) We first fit the general MANOVA model for the three repeated measures in relation to the between-S factors. As before, this just establishes the model for the between-S effects. If we regarded the repeated measure effect of session as equally spaced, we could simply use polynomial contrasts to examine linear (slope) and quadratic (curvature) effects of time.
Here, it makes more sense to use profile contrasts, testing (post -pre) and (fup -post).
R> session <-ordered(c("pretest", "posttest", "followup"), levels=c("pretest", "posttest", "followup")) R> contrasts(session) <-matrix(c(-1, 1, 0, 0, -1, 1), ncol=2) R> session It is useful to point out here that the default print methods for Anova.mlm objects, as shown above, gives an optimally compact summary for all between-and within-S effects, using a given test statistic, yet all details and other test statistics are available using the summary method. 4 For example, using summary(aov.OBK) as shown below, we can display all the multivariate tests together with the H and E matrices, and/or all the univariate tests for the traditional univariate mixed model, under the assumption of sphericity and with Geiser-Greenhouse and Huhyn-Feldt corrected F tests. To conserve space in this article the results are not shown here. OK, now its time to understand the nature of these effects. Ordinarily, from a data-analytic point of view, I would show traditional plots of means and other measures (as in Figure 1) or their generalizations in effect plots (Fox 1987;Fox and Hong 2009). But I'm not going to do that here. Instead, I'd like for you to understand how HE plots provide a compact visual summary of an MLM, mirroring the tabular presentation from Anova(mod.OBK) above, but which also reveals the nature of effects. Here, you have to bear in mind that between-S effects are displayed most naturally in the space of the response variables, while within-S effects are most easily seen in the contrast space of transformed responses (Y M ).
HE plots for between-S effects can be displayed for any pair of responses with heplot(). Figure 4 shows this for pre and post. By default, H ellipses for all model terms (excluding the intercept) are displayed. Additional MLM tests can also be displayed using the hypotheses argument; here we specify the two contrasts for the treatment effect shown above as contrasts(OBK$treatment). R> # HE plots for Between-S effects R> heplot(mod.OBK, hypotheses=c("treatment1", "treatment2"), col=c("red", "black", "blue", "brown", "gray40", "gray40"), hyp.labels=c("(A,B)-Control", "A-B"), main="Between-S effects and contrasts" ) R> lines(c(3,7), c(3,7), col="green") In Figure 4, we see that the treatment effect is significant, and the large vertical extent of this H ellipse shows this is largely attributable to the differences among groups in the Post session. Moreover, the main component of the treatment effect is the contrast between the Control group and groups A & B, which outperform the Control group at Post test. The effect of gender is not significant, but the HE plot shows that that males are higher than females at both Pre and Post tests. Likewise, the treatment:gender interaction fails significance, but the orientation of the H ellipse for this effect can be interpreted as showing that the differences among the treatment groups are larger for males than for females. Finally, the line of unit slope shows that for all effects, scores are greater on post than pre.
Alternatively, all pairwise HE plots for the session means can be shown using pairs() for the mlm object mod.OBK, with the result shown in Figure 5.
R> pairs(mod.OBK, col=c("red", "black", "blue", "brown")) Here we see that (a) the treatment effect is largest in the combination of post-test and follow up; (b) this 2 df test is essentially 1-dimensional in this view, i.e., treatment means at post-test and follow up are nearly perfectly correlated; (c) the superior performance of males relative to females, while not significant, holds up over all three sessions.
As before, for expository purposes, HE plots for within-S effects are constructed by transforming Y → Y M , here using the (profile) contrasts for session. R> # Transform to profile contrasts for within-S effects R> OBK$session.1 <-OBK$post -OBK$pre R> OBK$session.2 <-OBK$fup -OBK$post R> mod1.OBK <-lm(cbind(session.1, session.2)~treatment*gender, data=OBK) R> Anova(mod1.OBK, test="Roy", type="III") From the schematic summary in Table 2, with these (or any other) contrasts as M session , the tests of the effects contained in treatment*gender in mod1.OBK are identical to the interactions of these terms with session, as shown above for the full model in aov.OBK. The (Intercept) term in this model represents the session effect.
The HE plot for within-S effects ( Figure 6) is constructed from the mod1.OBK object as shown below. The main manipulation done here is to re-label the terms plotted to show each of them as involving session, as just described.

Higher-order designs
The scheme described above and the obvious generalization of Table 2 easily accommodates designs with two or more within-S factors. Any number of between-S factors are handled automatically, by the model formula for between-S effects specified in the lm() call, e.g., treatment * gender. For example, for the O'Brien-Kaiser data with session and hour as two within-S factors, first create a data frame, within specifying the values of these factors for the 3 × 5 combinations.

R>
Tests involving the interaction of session:hour use the Kronecker product, M session ⊗M hour .
R> pairs(mod.OBK2.hour, type="III", remove.intercept=FALSE, term.labels="hour", col=colors) In the designs discussed above the same measure is observed on all occasions. Sometimes, there are two or more different measures, Y 1 , Y 2 , . . . , observed at each occasion, for example response speed and accuracy. In these cases, researchers often carry out separate repeated measures analyses for each measure. However the tests of between-S effects and each within-S effect can also be carried out as multivariate tests of Y 1 , Y 2 , . . . simultaneously, and these tests are often more powerful, particularly when the effects for the different measures are weak, but correlated.

Doubly-multivariate designs
In the present context, such doubly-multivariate designs can be easily handled in principle by treating the multiple measures as an additional within-S factor, but using an identity matrix as the M matrix in forming the linear hypotheses to be tested via Eqn. (6). For example, with two measures, Y 1 , Y 2 observed on three repeated sessions, the full M matrix for the design is generated as in Eqn. (9) as In R, we can express this as follows, using M.measure to represent Y 1 , Y 2 .
R> M.measure <-diag(2) R> rownames(M.measure)<-c("Y1", "Y2") R> colnames(M.measure)<-c("Y1", "Y2") R> kronecker (  In the result, the first two columns correspond to the within-S Intercept term, and are used to test all between-S terms for Y 1 , Y 2 simultaneously. The remaining columns correspond to the session effect for both variables and all interactions with session. In practice, this analysis must be performed in stages because Anova() does not (yet) 5 allow such a doubly-multivariate design to be specified directly.

Example: Weight loss and self esteem
To illustrate, we consider the data frame WeightLoss originally from Andrew Ainsworth (http://www.csun.edu/~ata20315/psy524/main.htm), giving (contrived) data on weight loss and measures of self esteem after each of three months for 34 individuals, who were observed in one of three groups: Control, Diet group, Diet + Exercise group. The within-S factors are thus measure (wl, se) and month (1:3).
R> Because this design is complex, and to facilitate interpretation of the effects we will see in HE plots, it is helpful to view traditional plots of means with standard errors for both variables. These plots, shown in Figure 9, 6 show that, for all three groups, the amount of weight lost each month declines, but only the Diet + Exercise maintains substantial weight loss through month 2. For self esteem, all three groups have a U-shaped pattern over months, and by month 3, the groups are ordered Control < Diet < Diet + Exercise. Research interest in the differences among groups would likely be focused on the questions: (a) Do the two diet groups differ from the control group? (b) Is there an additional effect of exercise, given diet? These questions may be tested with the (Helmert) contrasts used below for group, which are labeled group1 and group1 respectively. R> contrasts(WeightLoss$group) <-matrix(c (-2,1,1, 0, -1, 1),ncol=2) R> (wl.mod<-lm(cbind(wl1,wl2,wl3,se1,se2,se3)~group, data=WeightLoss)) Call: lm(formula = cbind(wl1, wl2, wl3, se1, se2, se3)~group, data = WeightLoss) As before, it is often useful to examine HE plots for pairs of variables in this analysis before proceeding to the within-S analysis. For example, Figure 10 shows the test of group and the two contrasts for weight loss and for self esteem at months 1 and 2.
R> op <-par(mfrow=c(1,2)) R> heplot(wl.mod, hypotheses=c("group1", "group2"), xlab="Weight Loss, month 1", ylab="Weight Loss, month 2") R> heplot(wl.mod, hypotheses=c("group1", "group2"), variables=4:5, xlab="Self Esteem, month 1", ylab="Self Esteem, month 2") R> par(op) This is helpful, but doesn't illuminate the overall group effect for weight loss and self esteem for all three months, and, of course cannot shed light on any interactions of group with measure or month. In the following discussion, we will assume that the researcher is particularly interested in understanding the relation between weight loss and self esteem as it is expressed in changes over time and differences among groups.
To carry out the doubly-multivariate analysis, we proceed as follows. First, we define the M matrix for the measures, used in the between-S analysis. We use M = I 2 ⊗ 1/3 so that the resulting scores are the means (not sums) for weight loss and self esteem. R> between <-as.matrix(WeightLoss[,-1]) %*% measure R> between.mod <-lm(between~group, data=WeightLoss) R> Anova(between.mod, test="Roy", type="III") Type III MANOVA Tests: Roy test statistic Df test stat approx F num Df den Df Pr(>F) (Intercept) 1 85.62 1284.3 2 30 < 2e-16 *** group 2 0.36 5.5 2 31 0.00891 ** ---Signif. codes: 0 ✬***✬ 0.001 ✬**✬ 0.01 ✬*✬ 0.05 ✬.✬ 0.1 ✬ ✬ 1 The HE plot for this component of the analysis (Figure 11) shows a striking effect: Averaging over all three months, the means for the Control, Diet and DietEx group on both weight loss and self esteem are highly correlated and in the expected direction. This is something that is not at all obvious in Figure 9.
R> heplot(between.mod, hypotheses=c("group1", "group2"), xlab="Weight Loss", ylab="Self Esteem", col=c("red", "blue", "brown"), main="Weight Loss & Self Esteem: Group Effect") Next, for the within-S analysis, we define the M matrix for months, using orthogonal polynomials representing linear and quadratic trends. As before, the test of the (Intercept) term in these trend scores corresponds to the month effect in the doubly-multivariate model, and the group effect tests the group × month interaction.
The interested reader might wish to compare the standard univariate plots of means in Figure 9 with the HE plots in Figure 11 and Figure 12. The univariate plots have the advantage of showing the data directly, but cannot show the sources of significant effects in multivariate repeated measures models. HE plots have the advantage that they show directly what is expressed in the multivariate tests for relevant hypotheses.
6. Simplified interface: heplots 0.9 and car 2.0 It sometimes happens that the act of describing and illustrating software spurs development to make both simpler, and such is the case here. At the beginning, the stable version of car on CRAN provided the computation for multivariate linear hypotheses including repeated measures designs, but could not handle doubly-multivariate designs directly; the CRAN version of heplots could only repeated measures by explicitly transforming Y → Y M and re-fitting submodels in terms of the transformed responses.
The new versions of these packages on CRAN (http://cran.us.r-project.org/) now handle these cases directly from the basic mlm object. heplot() now provides the arguments idata, idesign, icontrasts, or, for the doubly-multivariate case, imatrix, which are passed to Anova() to calculate the appropriate H and E matrices.
Omitting these arguments in the call to heplot() gives an HE plot for all between-S effects (or the subset specified by the terms argument), just as before. For the within-S effects, E matrices differ for for different within-S terms, so it is necessary to specify the intra-subject term (iterm, corresponding to M ) for which HE plots are desired. Several examples are given below.

Comparison with other approaches
The principal goals of this paper have been (a) to describe the extension of the classical MVLM to repeated measures designs; (b) to explain how HE plots provide compact and understandable visual summaries of the effects shown in typical numerical tables of MANOVA tests; and (c) illustrate these in a variety of contexts ranging from single-sample designs to complex doubly-multivariate designs.
In the context of repeated measures designs, I mentioned earlier that mixed models for longitudinal data provide an attractive alternative to the MVLM (because the former easily accommodate missing or unbalanced data over intra-subject measurements, time-varying covariates, and often allows the residual covariation to be modeled with fewer parameters).
Here we consider a classic data set (Potthoff and Roy 1964) used in the first application of the MVLM to growth-curve analysis. These data are often used as illustrations of longitudinal models, e.g., Verbeke and Molenberghs (2000, §17.4).
Investigators at the University of North Carolina Dental School followed the growth of 27 children (16 males, 11 females) from age 8 until age 14 in a study designed to establish typical patterns of jaw size useful for orthodontic practice. Every two years they measured the distance between the pituitary and the pterygomaxillary fissure, two points that are easily identified on x-ray exposures of the side of the head. The questions of interest include (a) Over this age range, can growth be adequately represented as linear in time, or is some more complex function necessary? (b) Are separate growth curves needed for boys and girls, or can both be described by the same growth curve?

Longitudinal, mixed model approach
The mixed model for longitudinal data is very general and flexible for the reasons noted above, but it is inappropriate here to relate any more than the barest of details necessary for this example. We begin with simple plots of the data: A profile plot grouped by Sex (Figure 13 (left)), R> data("Orthodont", package="nlme") R> library("lattice") R> xyplot(distance~age|Sex, data=Orthodont, type=✬b✬, groups=Subject, pch=15:25, col=palette(), cex=1.3, main="Orthodont data") and also a summary plot showing fitted lines for each individual, together with the pooled ordinary least squares regression of distance on age (Figure 13 (right)).
With the longitudinal mixed model, contemplate fitting two models describing an individual's pattern of growth: a model fitting only linear growth and a model fitting each person's trajectory exactly by including quadratic and cubic trends in time. For the sake of interpretation of coefficients in these models, it is common to recenter the time variable so that time=0 corresponds to initial status. Using Year = (age-8), we have: m1 : m3 : where the vector of residuals for subject i is e i• ∼ N (0, R i ). (For this example, we take R i to be unstructured, even though other specifications require fewer parameters.) For the linear model (m1), we entertain the possibility that the person-level intercepts (β 0i ) and slopes (β 1i ) depend on Sex, and so specify them as random coefficients, In these equations the γs are the fixed effects, while the u (along with the errors e it ) are random effects. Note that Sex is coded 0=Male, 1=Female, so γ 00 and γ 10 are the intercept and slope for Males; γ 01 pertains to the difference in intercepts for Females relative to Males, while γ 11 is the difference in slopes.
The linear growth model can be fit using lme as follows: To aid interpretation, we can plot the estimated fixed effects from the linear model (m1) as follows, using the predict method for lme objects to calculate the the fitted values for boys and girls over the range of years (0 to 6) corresponding to ages 8 to 14, as in Figure 14(left). Similar code produces a plot of the cubic model Figure 14(right).

MVLM approach
For the multivariate approach, the Orthodont data must first be reshaped to wide format with the distance values as separate columns.
R> Ortho.mod <-lm(cbind(distance.8, distance.10, distance.12, distance.14)~Sex, data=Orthowide) R> idata <-data.frame(age=ordered(seq(8,14,2))) R> Ortho.aov <-Anova(Ortho.mod, idata=idata, idesign=~age) R> Ortho.aov We see that both Sex and age are highly significant and their interaction is nearly significant. Figure 15 shows HE plots for the between-and within-S effects, produced as shown below. The left panel plots the effect of Sex for ages 8 and 14, with a green line of unit slope. Males clearly show greater growth by age 14, and the difference between males and females is greater at at 14 than at age 8. The right panel shows the linear and quadratic trends with age, reflecting the overall age main effect and the Sex:age interaction. Recalling that the contributions of each displayed variable to each effect in an HE plot can be seen by their horizontal and vertical shadows relative to the E ellipse, we see that the main effect of age is essentially linear, and the overall Sex:age effect is nearly significant due to a difference in slopes, but not curvature.
R> heplot(Ortho.mod, idata=idata, idesign=~age, iterm="age", col=c("red", "blue", "brown"), variables=c(2,3), main="Orthodont data: Nonlinear Within-S effects") We can confirm the impression that no nonlinear effects are important by testing linear hypotheses. To explain this, we first show the details of the test of the overall Sex:age effect, as tested with linearHypothesis(). The "response transformation matrix" shown below is equivalent to M poly described earlier (Section 3.1) for a 4-level factor with linear, quadratic and cubic trend components. The univariate tests for individual contrasts in age are then based on the diagonal elements of the H and E matrices.
R> linearHypothesis(Ortho.mod, hypothesis="SexFemale", idata=idata, idesign=~age, iterms="age", tit Response transformation matrix: age.L age.Q age.C distance.8 -0.670820 0.5 -0.223607 distance.10 -0.223607 -0.5 0.670820 distance.12 0.223607 -0.5 -0.670820 distance.14 0.670820 0.5 0.223607 Sum of squares and products for the hypothesis: age.L age.Q age.C age.L 12.11415 3.81202 -2.86766 age.Q 3.81202 1.19955 -0.90238 age.C -2.86766 -0.90238 0.67883 Sum of squares and products for error: age.L age.Q age.C age.L 59.16733 -11.22417 4.52784 This paper has shown how these methods can be extended to repeated measures designs, by displaying effects in the transformed space of contrasts or linear combinations for within-S effects. As we hope to have shown, these plots can provide insights into the relations among effects and interpretations of those that are significant (or not) which go beyond what is available in traditional, univariate displays.