SAS/IML Macros for a Multivariate Analysis of Variance Based on Spatial Signs

Recently, new nonparametric multivariate extensions of the univariate sign methods have been proposed. Randles (2000) introduced an aﬃne invariant multivariate sign test for the multivariate location problem. Later on, Hettmansperger and Randles (2002) considered an aﬃne equivariant multivariate median corresponding to this test. The new methods have promising eﬃciency and robustness properties. In this paper, we review these developments and compare them with the classical multivariate analysis of variance model. A new SAS/IML tool for performing a spatial sign based multivariate analysis of variance is introduced.


Introduction
Classical statistical techniques for multivariate location problems such as Hotelling's T 2 tests, multivariate analysis of variance (MANOVA) and multivariate multiple regression analysis rely on the assumption that the data were from a multivariate normal distribution. The inference methods are then based on the assumption of multivariate normality, the sample mean vector and the sample covariance matrix. However, these methods are extremely sensitive to outlying observations and they are inefficient for heavy tailed noise distributions. Möttönen and Oja (1995) reviewed multivariate sign and rank tests and the corresponding estimates based on the L 1 -type objective function. The tests and estimates were rotation invariant and equivariant, but not affine invariant/equivariant. Recently, new nonparametric multivariate extensions of the univariate sign methods have been proposed. Randles (2000) developed an affine invariant one-sample multivariate sign test. Hettmansperger and Randles (2002) considered an affine equivariant multivariate median corresponding to this test. Their approach combines the simultaneous use of the spatial median (Brown 1983), Tyler's M -estimate of scatter (Tyler 1987) and the transformation-retransformation technique (Chakraborty, Chaudhuri, and Oja 1998). Chakraborty et al. (1998) used a similar idea as Hettmansperger and Randles (2002), but not Tyler's scatter matrix. Like Randles' test, the Hettmansperger and Randles (2002) estimate is fairly easy to compute.
In this paper, we will first recall the classical MANOVA model. In Section 3 we review the multivariate spatial sign tests and estimators analoguous to their classical alternatives. In addition, we outline some ideas how to approximate the precision of the estimates. Section 4 introduces new SAS macros written in Interactive Matrix Language (IML) for performing the analysis. As far as the authors are aware, these procedures are not currently available in standard software packages. Finally, the use of the SAS/IML tools is illustrated by an example. The complete SAS/IML code is available at http://www.jstatsoft.org/v16/i05/.
In the following sections we will assume that there are c independent random samples of p-dimensional observations. Let X = (x 11 · · · x 1n 1 x 21 · · · x 2n 2 · · · x c1 · · · x cnc ) denote the p × N data matrix, where x ij = (x ij1 x ij2 · · · x ijp ) represents the jth observation of the ith sample. Further write N = n 1 + · · · + n c for the total number of observations. In practice, the data matrix is often given as a transpose of X. We are interested in drawing conclusions on the parameter set µ 1 , µ 2 , . . . , µ c , Σ, where µ i denotes the center of symmetry of the ith sample, and Σ the covariance (or scatter) matrix (assumed to be common for all the samples). Alternatively, one may wish to parametrize the model by µ 1 , ∆ 12 , . . . , ∆ 1c , Σ, where ∆ 1i = µ i − µ 1 represents the difference between sample i and the first sample used as a reference (e.g. placebo). In general, we wish to estimate both sets of parameters, and construct the associated location tests.
Let B denote a nonsingular p × p matrix and b a p × 1 vector. A location estimateμ i (X) and a scatter matrix estimateΣ(X) are affine equivariant if These definitions simply mean that a rescaling, a rotation or a shift of the data should results into corresponding transformation in the estimates, but the value of the test statistic should remain unchanged.

Classical MANOVA
When more than one attribute is measured per observational unit and the observational units arise from independent populations, the design is typically analyzed by multivariate analysis of variance techniques. Classical MANOVA assumption is that the outcome vectors x ij (p×1) are generated from the model where µ i = (µ i1 µ i2 · · · µ ip ) is the location center for the ith sample (population), and ε ij = (ε ij1 ε ij2 · · · ε ijp ) are independent and identically distributed random errors from a multivariate normal distribution N p (0, Σ). In a one-sample case, the classical test for the location problem is Hotelling's T 2 test.
Lemma 1 Hotelling's T 2 statistic for testing H 0 : µ = 0 is wherex is the sample mean vector and S is the sample covariance matrix. Furthermore, In a multisample case, the interest is to test the null hypothesis of no difference in location between the samples H 0 : µ 1 = · · · = µ c assuming a common covariance matrix Σ. Under the null hypothesis, the maximum likelihood estimator of a joint µ is the sample mean vector over the combined sample, and the maximum likelihood estimator of Σ is the pooled sample covariance matrix. For hypothesis testing, we may use the two-sample Hotelling's T 2 statistic, or in a more general c-sample case, the Hotelling's trace statistic: Lemma 2 Hotelling's trace statistic for testing H 0 : µ 1 = · · · = µ c is where B is the between-samples sums of squares matrix and W the within-samples sums of squares matrix. Under the null hypothesis, the test statistic is asymptotically χ 2 p(c−1) distributed. Write for standardized observations with the sample mean vector zero and the sample covariance matrix I p , andz for their sample means. We can write Note that the limiting distribution is still χ 2 p(c−1) if the covariance matrix estimate (N −c) −1 W is replaced by the regular pooled sample covariance matrix S. We will observe similarities between the trace statistic and a multivariate spatial sign test statistic later on.

Spatial sign MANOVA
In this section we present sign based competitors to the Hotelling's T 2 statistic and to the sample mean vector. Estimation and test constructions are based on the spatial signs of suitably standardized outcome vectors.
Multivariate extension of the sign concept, the spatial sign vector, is defined as where x = (x x) 1/2 is the Euclidean length of vector x. Spatial signs are clearly rotation equivariant but not affine equivariant.
Let V denote the scatter matrix defined by Tyler (1987), which is the solution to Tyler's scatter matrix is affine equivariant, but unique only up to a multiplication by a constant; we will choose the symmetric version with Tr(V) = p. For a sign based analysis, it suffices to standardize by z ij =V −1/2 (x ij −μ i ). A location estimate is needed as well, and its selection will be discussed in the subsequent sections. The standardization is an analogue to the Mahalanobis transformation in the classical multivariate analysis, but instead of standardizing the sample variance-covariance matrix of the original data, this standardization produces a standardized variance-covariance matrix for the spatial sign vectors. For standardized data, the sign vectors then tend to lie uniformly on the unit sphere (see Figures 1, 2 and 3). Denote the direction vectors by u ij = S(z ij ) and the radius by r ij = z ij .
Again assume that the outcome vectors are generated from where the residuals can be decomposed as ε ij = Σ 1/2 r ij u ij . Moving roughly from strong to minimal conditions, different model assumptions of the underlying distribution in terms of the direction vector U ij and the radius R ij ≥ 0 can be listed as follows (Randles 2000): • R 2 ij ∼ χ 2 p , and • U ij and R ij are independent.

Elliptical symmetry
• U ij is uniformly distributed on a p-dimensional unit sphere and • U ij and R ij are independent.

Elliptical directions
• U ij is uniformly distributed on a p-dimensional unit sphere.

Symmetry
• R ij U ij has the same distribution as −R ij U ij .

Directional symmetry
• U ij has the same distribution as −U ij .
The families are not subsets of each other: for a hierarchy between these symmetry assumptions see Randles (2000). Multivariate spatial sign methods are typically distribution-free in the family of elliptical directions. If the underlying distribution is skewed, the location parameter µ is the population median vector rather than the mean vector (symmetry center in models 1, 2 and 4). Similarly, Σ is the covariance matrix in the multivariate normal model, and proportional to the covariance matrix (if it exists) in the elliptical symmetry model.

One-sample case
Consider testing the null hypothesis H 0 : µ = 0 against the alternative hypothesis H 1 : µ = 0 (without loss of generality). For standardized signs u j = S V −1/2 (x j − 0) , seek an estimate of Tyler's scatter matrix as the solution to the implicit equation Obviously, the estimate is not influenced by r j . Hence, a distribution-free test in the family of elliptical directions is given by Lemma 3 Under the null hypothesis H 0 : µ = 0, the limiting distribution of the multivariate spatial sign test statistic The development was given by Randles (2000). The test statistic Q 2 is affine invariant.
For small samples, Randles (2000) proposes the use of a sign change test. For the family of directionally symmetric distributions, it leads into a conditionally distribution-free test. Let U denote a p × N matrix with u j as the jth column. Furthermore, let S 1 , . . . , S M , denote independent random N × N diagonal sign change matrices with 2 N equiprobable values of diag(±1, . . . , ±1). SinceV is sign change invariant, the p-value can be estimated bŷ that is, by the proportion of cases where Q 2 (US m ) ≥ Q 2 (U), m = 1, ..., M . Möttönen, Oja, and Tienari (1997) studied the limiting efficiency of multivariate sign tests for multivariate t-distributions. They show that the efficiency relative to Hotelling's test is 0.785 even in a bivariate normal case (∞ degrees of freedom). In four dimensions, they obtained relative efficiencies of 0.884, 1.051 and 2.250 for ∞, 10 and 4 degrees of freedom, respectively. In dimension 10, the same figures were 0.951, 1.131 and 2.422. See also Randles (1989) for the family of elliptically symmetric power family distributions. Hettmansperger and Randles (2002) introduced the simultaneous estimation of location and scatter in the one-sample case. They computed a multivariate location estimate and a scatter matrix estimate to satisfy for standardized signs u j = S V −1/2 (x j −μ) . The solutions to the equations are the transformation-retransformation spatial median and Tyler's scatter matrix, respectively. Standardization by the resulting location and scatter estimates distributes direction vectors uniformly into a unit sphere centered at 0 (Figure 3). Another important property of the estimates is that they are affine equivariant. The property is reached by the above utilization of the transformation-retransformation procedure (Chakraborty et al. 1998).

Several samples case
Next consider c independent random samples with cumulative distribution functions it is assumed that the underlying distributions have a joint scatter matrix and differ only in location. Our interest is to test the null hypothesis of no treatment difference H 0 : µ 1 = · · · = µ c or, equivalently, H 0 : ∆ 12 = · · · = ∆ 1c = 0.
Furthermore, we wish to estimate the centers of symmetry µ 1 , . . . , µ c for each sample, and the treatment differences ∆ 12 , . . . , ∆ 1c with respect to a reference location µ 1 .
Start by constructing the standardized sign vectors u ij = S V −1/2 (x ij −μ) , whereμ and V are the null case estimates (obtained as in the one-sample estimation case). Then, ifV is a √ N -consistent estimate andμ the corresponding transformation-retransformation spatial median, we have for elliptical F that Lemma 4 Under the null hypothesis H 0 : µ 1 = · · · = µ c , the multisample multivariate spatial sign test statistic has a limiting χ 2 distribution with p (c − 1) degrees of freedom.
("ave j " means the average taken over j.) A conditionally distribution-free test can be obtained by permuting (Oja and Randles 2004): Let P 1 , . . . , P M denote random N × N permutation matrices with N ! equiprobable values obtained by permuting the rows of an identity matrix (N × N ). Asμ andV are permutation invariant, p-value can be estimated aŝ where U is the data set consisting of standardized signs.
The test statistic resembles the Hotelling's trace test statistic (1) in a classical MANOVA setting. But (3) is based on the directions only. For limiting efficiencies, see Randles (1989) and Möttönen et al. (1997). Estimation is extended to a c-sample case as follows. Chooseμ 1 , . . . ,μ c andV so that they satisfy ave{u 1j } = · · · = ave{u cj } = 0 and ave u ij u ij = 1 p I p for standardized signs u ij = S V −1/2 (x ij −μ i ) . The resulting estimates are the sample transformation-retransformation spatial medians utilizing a joint Tyler's scatter matrix. Due to the affine equivariance property, the differences ∆ 12 , . . . , ∆ 1c can be constructed as the differences of the transformation-retransformation spatial medians.

Estimation of accuracy
The following asymptotic result gives a way to approximate the precision of the estimates.
Lemma 5 In the elliptically symmetric case Therefore, an estimate of the covariance matrix is achieved by See for example Ollila, Oja, and Croux (2003b), Ollila, Hettmansperger, and Oja (2003a), and Hettmansperger and Randles (2002). In the case of several samples, an estimate COV(μ i ) is obtained by replacing N by n i in Lemma 5. The covariance matrix estimate of∆ 1j is easily obtained asμ 1 , . . . ,μ c are asymptotically independent.
Another possibility to estimate precision is to use distribution-free methods such as bootstrapping and delete-1 jackknife. These methods are quite attractive since they require no assumption of the underlying distribution or, assuming that some basic prerequisites are fulfilled, a large sample size.
To get a bootstrap covariance matrix estimate, generate bootstrap samples X * 1 , . . . , X * B by sampling (with replacement) from the observed sample X, keeping sample size fixed. Then compute the desired estimate from each bootstrap sample.

Lemma 6 A bootstrap estimator of the covariance matrix ofμ is
is the location estimate from the bth bootstrap sample.
In case of more than one sample, we wish to make use of the model assumption of a common scatter matrix V. After standardization by the estimatesμ i andV, z ij vectors are approximately "independent and identically distributed". The idea is to sample (with replacement) from the data set (as if it were one sample) Z = {z 11 , . . . , z 1n 1 , · · · , z c1 , . . . , z cnc }.
Then transform each z * ij back to obtain x * ij =V 1/2 z * ij +μ i . These vectors then constitute the bootstrap sample X * = x * ij . Then we can proceed as usual. Note that some healthy caution is needed with bootstrapping. As pointed out by Stromberg (1997), there is in fact a "high" probability of generating a single bootstrap sample with an unusually large proportion of outlying observations. This proportion might even exceed the breakdown point of the estimator. Thus, even for robust methods, bootstrap estimation may sometimes fail in the presence of outliers. Yet another problem could arise when the sample size is small compared to the dimension of the data.
To overcome possible limitations of bootstrapping, an alternative approach is a delete-1 jackknife estimator.
Lemma 7 The delete-1 jackknife estimator of covariance matrix ofμ in the one-sample case is is the location estimate from a sample without the ith observation.
We have not used jackknife methods in a case of several samples.
Delete-1 jackknife does not always work well, for example, in conjunction with a nonsmooth estimator such as the vector of marginal medians (Shao and Wu 1989). However, delete-1 jackknife appears to perform nicely with the transformation-retransformation spatial median.

SAS/IML modules
The programs are organized as macros, which consist of frequently used modules and the master code. This section outlines the functionality of the modules, so that an advanced user can modify and make further use of them. The SAS/IML programs (sgnmanova_1.sas and sgnmanova_c.sas) are available at http://www.jstatsoft.org/v16/i05/.

Modules for estimation of location and scatter
Modules estimate_1 (one-sample case) and estimate_c (c-sample case) perform the estimation procedure. The estimation algorithm uses the steps 1. Compute the direction vectors u ij by the current estimate values.
4. Return to 1 and continue until convergence.
Vector of the componentwise medians and a p × p identity matrix are used as starting values. Iteration steps are given bŷ V ← pV 1/2 ave ij u ij u ij V 1/2 and µ i ←μ i + ave j r −1 ij −1V 1/2 ave j {u ij }. (Hettmansperger and Randles 2002;Vardi and Zhang 2001;Oja and Randles 2004). The symmetric transformation matrixV −1/2 is found via the spectral decomposition of the matrix V.
We have also implemented a protection against landing iteration on a data point (Vardi and Zhang 2001). Their modification ensures that iteration moves on even then. The need for such protection is rare, but it does have practical value in bootstrapping, since-for some bootstrap samples-the iteration might encounter a large mass of data on a single point.
estimate_1 Input for the module are the data matrix and the desired level of precision. The module returns a (p + 1) × p matrix where the first row is the location estimate and the remaining rows consist of the scatter matrix estimate.
estimate_c Input for the module are the data matrix, the desired level of precision and the number of samples. The module returns a (p + c) × p matrix where the first c rows are the location estimates and the remaining rows consist of the scatter matrix estimate.

Modules for hypothesis testing
Modules for testing the null hypothesis are named as test_1 (one-sample case) and test_c (multisample case).
test_1 Input for the module are the data matrix, the desired level of precision and the number of sign change permutations. Module estimates the scatter matrix under H 0 (fixed location), and returns a 1×4 vector with value of the test statistic, a p-value based on the limiting distribution, a p-value based on a sign change permutation distribution and its standard error (from a binomial distribution) as elements.
test_c Input for the module are the data matrix, the desired level of precision and the number of permutations. Calls the estimate_1 module. The module returns a 1 × 4 vector with value of the test statistic, a p-value based on the limiting distribution, a p-value based on a permutation distribution and its standard error (from binomial distribution) as elements.
Small number of permutations guarantees a reasonable computation time.

Modules for estimation of accuracy
Module asymptotic estimates the covariance matrix ofμ based on the limiting distribution.
Similarly, module bootstrap estimates the covariance matrix by bootstrapping, and module jackknife estimates it by the delete-1 jacknife. Note that jackknife module is available only for the one-sample case.
asymptotic Input for the module consist of the data matrix, the estimated parameters values and the desired level of precision. In a multisample case the number of samples has to be given as well. The module returns a p × p covariance matrix estimate in a one-sample case, and a cp × (p + 1) matrix in a multisample case, where the first column identifies the rows which contain the covariance matrix estimate ofμ i .
bootstrap Input for the module consist of the data matrix, the desired level of precision and the number of bootstrap samples. In a multisample case the number of samples has to be given as well. Calls the respective estimate modules. The module returns a p × p covariance matrix estimate in a one-sample case, and a cp × (p + 1) matrix in a multisample case, where the first column identifies the rows which contain the covariance matrix ofμ i .
jackknife Input for the module consist of the data matrix, location estimate and the desired level of precision. Calls the estimate_1 module. The module returns a p × p covariance matrix estimate.
It is a good idea to start with a small number of bootstrap samples.

Multivariate normal distribution
We simulated a two-sample case (n 1 = n 2 = 50) x 1j ∼ N 3 (µ 1 , Σ) and x 2j ∼ N 3 (µ 2 , Σ), One-sample analysis. We start by analysing the first sample data as a one-sample problem. The null hypothesis of interest is H 0 : µ 1 = 0. The SAS statements %INCLUDE '<full path>\sgnmanova_1.sas'; sgnmanova_1(y3onesam, eps=1E-9, nperm=1000, nboot=500); produce the output The null hypothesis is thus not rejected; p-values based on the limiting χ 2 3 -distribution and on a permutation distribution were 0.889 and 0.898, respectively. The estimate of µ 1 and covariance matrix estimates ofμ 1 are obtained from the output as well (the output is reduced to fit it on the page): The mean is slightly more accurate in the normal case. But, if just one observation of the data set is contaminated (by adding, say, 10 to all its components), the covariance matrix estimates are: The covariance matrix ofμ 1 is almost unaffected, but the covariance matrix of the sample mean nearly doubles in size. This reflects the robustness of the spatial median against outliers.

MU
Despite of a single outlier, bootstrapping worked well. We will return to the robustness studies in the two-sample case.  Naturally, the same phenomenon is reflected in the corresponding estimates. For a contamination factor of 100,μ 1 = ( −0.02 0.07 0.17 ) , and x 1 = ( 1.98 2.09 2.06 ) .
The Hettmansperger-Randles estimate is still close to the true value, but the sample mean vector is totally destroyed.

Multivariate Cauchy distribution
In this section we study the behavior of the estimates for a heavy-tailed error distribution. We simulated a data set (N = 50) from a multivariate Cauchy distribution using the model where y j ∼ N 3 (0, I 3 ), z 2 j ∼ χ 2 1 , and y j and z j are independent. Then x j has a spherical multivariate Cauchy distribution. The distribution does not possess finite moments, and it has very heavy tails. giving results mainly in the same direction. Due to the extreme values generated by the underlying Cauchy distribution, bootstrapping appears to slightly overestimate the elements of the variance-covariance matrix.
Due to the lack of finite moments of the noise distribution, a classical analysis is not helpful at all: 6. Concluding remarks Hettmansperger and Randles (2002) recognized that the conditions for the existence and the uniqueness of simultaneous solutions to the estimating equations have not been established. In authors' experience, however, the algorithm appears always to converge. Lopuhaä and Rousseeuw (1991) showed that the spatial median has a 50% breakdown point. The breakdown point of Tyler's scatter matrix is positive, and generally within the interval [ 1/(p + 1), 1/p ]. Both the location estimator and the scatter estimator have bounded influence functions. Given these robustness qualities, the minimal model assumptions and the good efficiency properties, multivariate spatial sign methods are attractive alternatives to the classical procedures particularly for skewed or heavy-tailed distributions, or in the presence of outliers.