robustlmm : An R Package for Robust Estimation of Linear Mixed-Eﬀects Models

As any real-life data, data modeled by linear mixed-eﬀects models often contain out-liers or other contamination. Even little contamination can drive the classic estimates far away from what they would be without the contamination. At the same time, datasets that require mixed-eﬀects modeling are often complex and large. This makes it diﬃcult to spot contamination. Robust estimation methods aim to solve both problems: to provide estimates where contamination has only little inﬂuence and to detect and ﬂag contamination. We introduce an R package, robustlmm , to robustly ﬁt linear mixed-eﬀects models using the Robust Scoring Equations estimator. The package’s functions and methods are designed to closely equal those oﬀered by lme4 , the R package that implements classic linear mixed-eﬀects model estimation in R . The robust estimation method in robustlmm is based on the random eﬀects contamination model and the central contamination model. Contamination can be detected at all levels of the data. The estimation method does not make any assumption on the data’s grouping structure except that the model parameters are estimable. robustlmm supports hierarchical and non-hierarchical (e.g., crossed) grouping structures. The robustness of the estimates and their asymptotic eﬃciency is fully controlled through the function interface. Individual parts (e.g., ﬁxed eﬀects and variance components) can be tuned independently. In this tutorial, we show how to ﬁt robust linear mixed-eﬀects models using robustlmm , how to assess the model ﬁt, how to detect outliers, and how to compare diﬀerent ﬁts. If you use the software, please cite this article as published in the Journal of Statistic Software (Koller 2016).


Introduction
Linear mixed-effects models are powerful tools to model data with multiple levels of random variation, sometimes called variance components.Data with multiple levels of random variation may have contamination or outliers on any of these levels.To detect and deal with contamination, we developed a method that fits linear mixed-effects models robustly, using the Robust Scoring Equations estimator (Koller and Stahel 2015;Koller 2013).We have implemented the methods in the R-package robustlmm (Koller 2015) that we introduce here.
The variability introduced at the random effects level generally affects multiple observations simultaneously.In a one-way anova dataset, for example, a group level random effect influences the observed value of all the observations that belong to the corresponding group.If this group level random effect were an outlier with respect to the other group levels, this would lead to a whole group of outliers on the level of observations (see, e.g., plate g in Figure 1).When using classic estimation methods, even one such outlier might inflate the between-group variability estimate and distort the results (see example discussed in Section 4).In such a case it would be natural to assume that the group's random effect (or mean) is an outlier rather than all observations are outliers in the same direction.This concept of allowing potential contamination on different sources of variability leads to the "random effects contamination model".With this model, we make the assumption of long-tailed or "gross error" distributions for the random effects as well and not just for the random errors.The effect of the contamination is then propagated via the design matrices to the actual observations.Levels of random variability can be hierarchical or crossed, or both, depending on the grouping structure in the data.This implies that the effect of a single outlier on the random effects level is not always as straight forward as in the above mentioned one-way anova example.The effect may be different for each observation as the result of an outlier for a single observation is combined with all the other random effects that affect this observation.This complex relationship between the source of contamination and what is effectively realized in the data can make it very hard or even impossible to spot contamination.This is where robust methods step in and help clear the picture.
Basing the robust estimator on the "random effects contamination model" allows not only multiple sources of contamination, it also avoids unnecessary assumptions about the data's grouping structure.The only assumption on the grouping structure, that is also required for classic estimation, is that the model parameters are estimable.Other contamination models usually assume that contamination is introduced and dealt with at the lowest level only -the level of the observations.In mixed-effects models, observations generally correlate with one another, and robust methods must respect these correlations.These dependencies between observations require other contamination models to make strict assumptions about the grouping structure.The random effects contamination model assumes that contamination occurs directly at the source of random variability, before the grouping structure is introduced, thus circumventing the complexity introduced by the data structure and avoids unnecessary assumptions.
Classic estimation of linear mixed-effects models is mainly provided by two functions in R (Table 1).The function lme in the R package nlme (Pinheiro, Bates, DebRoy, Sarkar, and R Core Team 2016) supports a variety of random effects and error level covariance structures.It is designed for hierarchical data structures, so incorporating crossed random effects is not straightforward.The function lmer from the lme4 package (Bates, Mächler, Bolker, and Walker 2015) is not limited in that respect: it supports arbitrary grouping structures and efficiently deals with large data by making heavy use of memory-saving sparse representations of matrices.Special random effects and error level covariance structures like, e.g., compound symmetry or AR(1) correlation models, are, however, not yet supported.Linear quantile mixed effects estimation is implemented in the lqmm function from the lqmm package (Geraci 2014).This is not a robust method per se, but allows for median-based estimation.The function supports only one grouping level but allows the correlation structure of the random effects to be specified.
For robust estimation of linear mixed-effects models, there exists a variety of specialized implementations in R, all using different approaches to the robustness problem.Most of them are available on the Comprehensive R Archive Network (CRAN) as R packages.Except the method presented in this paper, all other methods are applicable only for certain grouping structures, see Table 1 for an overview.The function heavyLme in the heavy package (Osorio 2016) implements mixed-effects models using t distributions.However, it allows for a single grouping factor only, which limits the method to two-level data.As both, the residual errors and the random effects are modeled with a t distribution, the method can capture outliers on both the subject and the observational level.The degrees of freedom for the two t distributions are fixed to be the same.Hence, it is not possible to have a differing treatment of outliers on the two levels.Multiple random effects are fitted with a correlation parameter, uncorrelated random effects are not supported.The function lmeRob implements the method by Copt and Victoria-Feser (2006).It is not available on CRAN but from the authors upon request.They reformulate the mixed-effects problem as multivariate problem and apply multivariate MMestimation.This approach requires the grouping structure to be nested and the data to be balanced.Observations are down-weighted at the highest group level, so the high breakdown point of 50% applies to the number of groups that can be contaminated, not to the number of observations.The implementation only supports uncorrelated random effects within levels.
The function rlme in the rlme R package implements nested hierarchical mixed-effects models using a rank-based approach (Bilgic, Susmann, and McKean 2014).The function supports only simple random intercepts, and solutions might not be unique.
This article is a tutorial for robustlmm, an implementation of the Robust Scoring Equations estimator to fit mixed-effects models for the statistical computing environment R (R Core Team 2016).The R package robustlmm is available on CRAN at http://cran.r-project.org/package=robustlmm under the GPL-v2 license.
In the next section we provide background on robustlmm's underlying estimating equations and algorithms.In Section 3, we describe how robustlmm is implemented.In Section 4, we work an example and demonstrate how to do a full statistical analysis.Pointers to further information are given in Section 5. Details, tables of tuning parameters and formulas are contained in the Appendix.

Model equations and assumptions
We work with the general linear mixed-effects model in matrix form and, following Bates (2010), with spherical random effects.The spherical random effects are obtained from the regular random effects by a transformation such that they have a covariance matrix that equals a scaled identity matrix.This transformation enables variance components to be estimated as exactly zero.The model equations are: where y is the response vector of length n, β is the fixed effects vector of length p with design matrix X, and b * is the spherical random effects vector of length q with design matrix Z.The relation between the regular and the spherical random effects is b = U b (θ)b * .The lower triangular matrix U b (θ) is parameterized by the vector θ.The covariance matrix of the random effects is The matrix U e is assumed to be a diagonal matrix of known weights.
As mentioned in the introduction, we do not assume anything about the structure of the data (i.e., the design matrices X and Z), though we do make the usual assumption which the model parameters are estimable.We do assume the covariance matrix of the random effects V b (θ) to be block-diagonal.This assumption excludes problems that cannot be written in block-diagonal form, like geostatistical problems with spatial dependence encoded in V b (θ) (see the georob package (Papritz 2016) for robust methods to deal with this special case).
To reduce the complexity of the algorithms, our implementation makes additional assumptions about the covariance matrices of the random effects and the residual errors that are not required by the theory per se.Blocks of V b (θ) of size 2 × 2 and larger are assumed to be unstructured, i.e., unconstrained covariance matrices (other structures such as compound symmetry are not supported).In the remainder of the text, we will call blocks of size 1 × 1 diagonal and larger blocks unstructured.Finally, the residual error covariance matrix is assumed to be a diagonal matrix with only one unknown scaling parameter.

Robustness approach
Robustness is achieved by robustification of the scoring equations.The scoring equations are the derivatives of the log-likelihood.To fit the model (1), either the log-likelihood can be maximized, or the roots of the scoring equations can be found.Robust estimating equations are derived from the scoring equations by replacing the residuals and predicted spherical random effects with bounded functions.These bounded functions ensure that a single term (error or random effect) only has bounded influence on the estimating equations.To get robust and efficient estimating equations of σ and θ, we apply the Design Adaptive Scale approach by Koller and Stahel (2011).The robust estimating equations are provided in Appendix B. A detailed derivation and evaluation of the robust method is given in Koller (2013) and Koller and Stahel (2015).
The robustified estimating equations no longer correspond to any likelihood or pseudo-likelihood.Thus, information criteria like AIC and tests based on the log-likelihood statistic are unavailable for the robust method we present here.

Weighting functions, robustness weights and tuning
Tuning (adjusting robustness properties of the resulting estimates) is done by adjusting parameters that control the form of the bounded functions in the robust estimating equations.
In M-estimation terminology, these bounded functions are called ψ-functions.They are the derivatives of a ρ-function (see Maronna, Martin, and Yohai (2006) for exact definitions).The Huber function, a function that is quadratic around zero and linear for values outside ±k, is a ρ-function (the corresponding ψ-function is shown in Figure 3).The parameter k is called the tuning parameter.Larger values yield more efficient, but less robust estimates (for k = ∞ one recovers the REML-estimates), whereas smaller values yield more robust but less efficient estimates.A popular choice is to fix the asymptotic efficiency at 95% of the classic estimates (k = 1.345 for the Huber function).
Replacing terms by bounded functions thereof down-weights terms with a large absolute value.In the robustness literature, these weights are called robustness weights.Observations or random effects with low robustness weights are classified as outliers by the robust method.
For a given ψ-function, the robustness weights are defined as where we replace the .in w .and ψ .by e or b to specify the terms to which the functions are applied (e for errors/residuals; b for random effects).To gain robustness for all estimates, estimating equations for covariance parameters have to be treated differently from fixed and random effects, although the weighting functions for similar terms are related.We therefore distinguish the weighting functions used for estimating σ and θ with a superscript (σ) in equations.(In robustlmm, the functions are objects of class psi_func.The arguments are called rho.e, rho.b, rho.sigma.e and rho.sigma.b.) The robustness weights defined in (2) yield robust estimates of the fixed effects and predicted values for the random effects for all ρ-functions with a bounded derivative, and also for convex ρ-functions like the Huber function.For estimates of scaling factors (σ and θ for diagonal-only blocks of V b (θ)), the requirements to get robust estimates are more strict.These are not robust when convex ρ-functions are used.To get robust estimates for scaling factors, we need to use ρ-functions so that w (σ)  .(v)v 2 is bounded for v → ±∞.When convex ρ-functions are used to estimate the fixed and random effects, a natural choice for a ρ-function to estimate the scaling factors is the one that corresponds to the squared robustness weights, i.e., w (σ)  .
Note the similarity to Huber's Proposal 2. (The function psi2propII can be used to transform a ρ-function to the corresponding ρ-function that yields squared robustness weights.)Squared robustness weights are not required for block-diagonal parts of V b (θ).Instead of M-scale type estimating equations, the unstructured blocks require methods similar to multivariate M-estimators for estimating covariance matrices.Multivariate M-estimators, as introduced by Stahel (1987), use a derived set of ψ-functions that also yield bounded influence estimates for convex ρ-functions.(This derivation is handled internally in robustlmm.) The use of different ρ-functions in the estimating equations for σ and θ ensures the resulting estimates to be robust, but lowers the efficiency of the estimates σ and θ.This might be acceptable for problems in which the scale parameter σ is considered a nuisance parameter, but in mixed-effects modeling one is usually interested in estimating the variance components and does not regard them as nuisance terms.If desired, the efficiency of the estimates of σ and θ can be increased by increasing the tuning parameters of ψ

Estimation algorithms
The models are fit with a nested iterative reweighting algorithm.If there are no initial estimates, then the classic estimates are used as initial estimates.The outer loop is updating θ until it converges.For each new value of θ, we update β and b * and then σ.This algorithm converges to a local solution of the estimating equations.For convex ρ-functions and squared robustness weights, the solution can be expected to be unique aside from pathological, easily discarded solutions.A detailed description of the algorithm is given in Appendix C.

Implementation
The robustlmm package is built upon the lme4 package, more specifically the lmer function.The structure of the objects and the methods are implemented to be as similar as possible to the ones of lme4 with robustness specific extensions where needed.The object returned by rlmer is of class rlmerMod.Even though this class is close to the corresponding class lmerMod returned by lmer, rlmerMod does not extend lmerMod.This is for two reasons.First, methods for classic estimates are in general not applicable to robust estimates without changes.Second, class inheritance would require a lot of maintenance when the corresponding code in lme4 is changed.While computational methods of the lme4 package are implemented in C++, the robustlmm package is implemented in pure R.
The main function of the package is rlmer, its name hinting at the fact that it is a robust version of the lmer function.Besides additional arguments to control the robustness of the fit, the usage of rlmer is identical to lmer.Most of the functions available for objects returned by lmer are also available for objects returned by rlmer, e.g., predict or getME.The getME function is a universal accessor function for quantities derived from the fitted object (see help("getME")).The function anova requires the log-likelihood statistic and is therefore unavailable.The simulation functions simulate and bootMer have not yet been implemented.The functions to create diagnostic plots, dotplot, plot and qqmath for objects returned by ranef, as well as dotplot and plot for objects returned by coef, are available and identical to the those from lme4.In addition to the mentioned plot methods, we have added a plot method plot.rlmerMod for objects returned by rlmer and lmer.It creates a Tukey-Anscombe plot, a QQ-plot of the residuals and the random effects as well as scatterplots of the random effects.

The Penicillin data
We illustrate the use of the robustlmm package on a dataset originally published by Davies and Goldsmith (1972) and later used by Bates (2010).Davies and Goldsmith (1972) describe it as data coming from an investigation to. . . . . .assess the variability between samples of penicillin by the B. subtilis method.In this test method a bulk-inoculated nutrient agar medium is poured into a Petri dish of approximately 90 mm.diameter, known as a plate.When the medium has set, six small hollow cylinders or pots (about 4 mm. in diameter) are cemented onto the surface at equally spaced intervals.A few drops of the penicillin solutions to be compared are placed in the respective cylinders, and the whole plate is placed in an incubator for a given time.Penicillin diffuses from the pots into the agar, and this produces a clear circular zone of inhibition of growth of the organisms, which can be readily measured.The diameter of the zone is related in a known way to the concentration of penicillin in the solution.
The description implies that it is a balanced two-way anova dataset with two types of random effects: sample with six levels and plate with 24 levels.The random effects are completely crossed.The dataset is distributed as the Penicillin dataset in lme4.
To emphasize the effect of the robust method, we modified the dataset slightly (as we did in Koller and Stahel (2015)).We scaled down the first plate's observation values, and we moved one observation in plate f down to the lowest original observation.The modified dataset is shown in Figure 1.

Plate
Diameter growth inhibition zone (mm)

Fitting the model and assessing the model fit
We start by loading the R package and the modified Penicillin dataset.

R> str(PenicillinC)
data.frame : 144 obs. of 4 variables: $ diameter : num 27 23 26 23 ... $ plate : Factor w/ 24 levels "g","s","x","u",..: 18 18 18 18 ... $ sample : Factor w/ 6 levels "A","B","C","D",..: 1 2 3 4 ... $ contaminated: Factor w/ 2 levels "changed","original": 2 2 2 2 ... We fit the model using the function rlmer.The name of the function suggests that it is a robust variant of a popular function to fit classic mixed-effects models in R: the function lmer of the R package lme4.The specification of models is the same for both functions.In a single formula, we specify the fixed and random terms of the model.Fixed terms are added in the usual R formula notation, and random terms are specified in parentheses.Random effects are defined in conjunction with a grouping factor.The grouping factor is separated from the random effect by a pipe symbol "|".We now fit the classic and robust models:

R> summary(rfm)
Robust linear mixed model fit by DAStau Formula: diameter ~( 1  The output is close to the output from lmer.The first part of the summary provides information about the model fit; the estimates of V b (θ) and σ; followed by the estimated fixed effects and derived statistics and (if applicable) the correlation of the fixed effects.The second part gives information about the robustness weights and a list of the ρ-functions employed.
As we can read off the summary, the default ρ-function is the smoothed Huber ρ-function, which is a smoothed variant of the regular Huber function (the exact definition is given in the Appendix).The functions for estimating σ and θ are suffixed by "Proposal 2", indicating that squared robustness weights, with a high robustness but a low efficiency, are used by default (diagonal blocks only, for unstructured blocks the regular ρ-function is used by default).
The summary also tells us that the lowest robustness weights for both the observations and the groups are about 0.2.We can get a full named list of the robustness weights by using the calls getME(rfm, "w_e") and getME(rfm, "w_b") for the observations and the groups, respectively.
With the call plot(rfm), we can create simple residual analysis plots including normal QQplots of the predicted random effects.The resulting plots are shown in Figure 2. The darker color indicates observations with a low robustness weight w . .From the plot as well as from the summary shown above, we can see that the observations that were changed have been detected by the method, i.e., they have received a low robustness weight.There are also other observations that were assigned a rather low robustness weight and we would do good to investigate them.

Tuning the fit
As already mentioned in Section 2.3, the estimates of σ and θ have a low efficiency if the same tuning parameters are used as for estimating the fixed and random effects.To get a higher efficiency, we have to increase the tuning parameter of ρ b .Tables of tuning parameters are provided in the Appendix.We can change the default tuning parameter of objects of class psi_func with the function chgDefaults.The smoothed Huber function is a convex ρ-function.Hence, we need to use squared robustness weights to get robust estimates of the residual error scale and the variance components (see Section 2.3).We can convert any ρ-function to the corresponding one with squared robustness weights using the function psi2propII.The latter function also allows the tuning parameters to be changed simultaneously, so that a second call to chgDefaults can be saved.We use the update function to fit a model with a higher efficiency for the estimates of σ and θ: R> rfm2 <-update (rfm,rho.sigma.e = psi2propII(smoothPsi,k = 2.28), + rho.sigma.b= psi2propII(smoothPsi, k = 2.28)) Note that the update function both uses the call information from the given object rfm, and, if applicable, also uses estimates from the object as initial values for the fitting procedure.
To specify different ρ-functions for different random effects, the arguments rho.b and rho.sigma.baccept as input a list of ρ-functions.The order in the list of the random effects must be the same as the order listed in the output of summary.To fit only the element of θ that corresponds to the "sample" random effect with higher efficiency, we use Note the missing second argument k when generating the first element of the list rsb.In that case, the default tuning parameter k = 1.345 is used.
To compare the estimates of the various fits we did so far, we can use the function compare.We set the argument show.rho.functions to FALSE to avoid a lengthy display of the ρ-functions here.To enhance the comparison, we also fit the model to the original, uncontaminated data with the classic, non-robust, method.
R> fmUncontam <-update(fm, data = Penicillin) R> compare (fmUncontam,fm,rfm,rfm2,rfm3,show.rhoThe resulting table gives the estimates and standard errors in parentheses.The REML row gives the restricted maximum likelihood statistic.As mentioned earlier, this statistic is unavailable for robust fits.The output of the compare function can also be passed to xtable from the xtable package (Dahl 2016) to create L A T E X or HTML-tables.

How to choose tuning parameters
When choosing tuning parameters for rlmer, one has to balance robustness and efficiency.In the examples discussed above, this means that by setting the tuning parameters too high, the estimates might break down and the resulting estimates are misleading.On the other hand, it is not good practice to set the tuning parameters very low as this will produce inefficient, i.e., imprecise, estimates.An approach as outlined above, fitting the model with low as well as with high efficiency, is better.In general, parameters that are not considered nuisance parameters should be estimated with high efficiency if possible.
When comparing fits with low to fits with high efficiency and robust to classic fits, one should first compare the parameter estimates (keeping in mind their precision or confidence intervals).If there are any relevant differences, then a study of the robustness weights should give insight as to which observations cause the difference.It is important not to just remove and, thus, ignore outliers.Whenever possible, the reason why an outlier is far from the bulk of observations should be determined.Outliers that are not merely due to recording errors usually carry information that can help to improve the model.
Applying this method to the example shown above, we can see that the one contaminated plate clearly inflates the classic, non-robust estimates for the standard deviation of the "plate" random effect.The robust method can detect this contamination and reduce its effect, leading to an estimate that is only slightly inflated.The comparison of the robust fits with lower and higher efficiency shows that the contamination is not strong enough to cause a breakdown of the fit with higher efficiency but lower robustness.Also, the higher efficiency for the estimate of the standard deviation of the "sample" random effect leads to a better estimate that is closer to the classic fits.Finally, the robust estimates of the standard deviation of the random errors are closer to the original classic fit of the uncontaminated data.

Controlling the fitting procedure
To diagnose problems with the fitting procedure, use the argument verbose.The argument takes values from 1 to 5, and gives more verbose output for higher values.If the method is not converging, increasing the maximum number of allowed iterations (argument max.it) or the tolerance (rel.tol)below which convergence is declared can help to achieve convergence.
To specify starting values for the fitting procedure, use the init argument.The init argument expects a list with four items: fixef, the fixed effects, u, the spherical random effects, sigma, the error scale σ, and theta, the vector θ parameterizing the random effects covariance matrix V b (θ).
To shorten fitting times at the beginning of an analysis, use the argument method = "DASvar".This method is faster as it uses simple direct approximations instead of numerical integrals to compute the scale and covariance parameters using the Design Adaptive Scale approach (see Koller and Stahel (2011) and Appendix B).The DASvar method yields approximate results only.For covariance matrices of the random effects V b (θ) with unstructured blocks of size three and larger, the method DASvar is the only method currently available.

Further information
Detailed information on the properties of the robust method and validation using simulation studies can be found in compact form in Koller and Stahel (2015) and more detailed in Koller (2013).Bates ( 2010) is a general introduction to mixed modeling using the R package lme4 (Bates et al. 2015).Because lme4 and robustlmm are similar, this is also a good starting point for using robustlmm.
We avoided the topic of robust testing for linear mixed-effects models in this tutorial.The usual caveats of testing in mixed models apply for the methods presented here.The wiki at http://glmm.wikidot.com is a good resource.Bootstrap presents itself as a simple (but data structure dependent) way to get p values.One has to be careful, though, since the ordinary bootstrap quantiles are not robust (Salibián-Barrera, Van Aelst, and Willems 2008;Singh 1998).
Development of robustlmm is hosted on GitHub at http://www.github.com/kollerma/robustlmm.Any issues with the package can also be reported there.

A. The smoothed Huber function and tables of tuning constants
The smoothed Huber ψ-function is defined as where . We recommend using a value of s = 10.The asymptotic properties of the regular Huber function and the smoothed Huber function are almost identical when this value is used.We can therefore safely use the same tuning parameter k for both ψ-functions.The two ψ-functions are compared in Figure 3. Tuning constants for this and the lqq ψ-function (Koller and Stahel 2011) are shown in Tables 2 to 5.  Table 2: Tuning parameters k for scale estimates such that they reach the same asymptotic efficiency as the location estimate.For the Huber ψ-function.

B. Robust estimating equations
This Appendix summarises the more extensive derivation of the robust estimating equations found in Koller (2013).

B.1. Fixed and random effects
Let k(j) be a function that maps random effect j to the corresponding block k, then the squared Mahalanobis distances of the estimated random effects are We may then define the robustness weight for the jth random effect as w b (d j ).We use standard (location and linear regression) robustness weights: It is convenient to represent the robustness weights as (diagonal) weighting matrix, The robust estimating equations are then where Λ b = Diag(λ e /λ b,j ) j=1,...,q is a diagonal matrix with elements depending on the block size

B.2. Scale
We apply the Design Adaptive Scale approach following Koller and Stahel (2011).We get where the superscript • (σ) is used to distinguish the weighting functions used for the scale and covariance parameters from the ones used for the fixed effects.Just as in the linear regression case, we define τ e,i as the value that zeroes the expectation of the i-th summand in ( 6).The expectation is where the distribution of the residuals is approximated using a linear expansion of β and b * around their true values (Koller 2013, Appendix C), and κ The weighting functions used for the scale estimates are the squared robustness weights used to estimate the fixed and random effects, w 0), for convex ρ-functions.For redescending ρ-functions, it is unnecessary to use the squared robustness weights.Using the same weights as for the fixed and random effects still gives robust estimates (assuming ψ(x)x is bounded).When the squared weights are used, a different set of tuning parameters must be used to estimate the scale and covariance parameters.Tables of tuning parameters can be found in Appendix A.

B.3. Covariance parameters
For the covariance parameters, we have to treat the diagonal and the block-diagonal case of U b separately.

Diagonal case
In the case of diagonal U b (θ), estimating θ is essentially a scale estimation problem on b * .It can be robustified just like the estimating equation for σ, see Equation 6.For a model with only one random effects term, the robust estimating equations are with τ b,j such that and normalizing constant Generalization to multiple random effects terms is straightforward.We get one such equation for each of the random effects terms.

Block-diagonal case
For block-diagonal U b (θ) we have to take care of the block structure.The normalizing constant τ 2 b,j must be replaced by a matrix T b,k defined for each block k.Analogous to the estimator for the covariance matrix and location problem, we must use two different weight functions: one for the size of the matrix w b .For details, we refer to Stahel (1987) and Hampel, Ronchetti, Rousseeuw, and Stahel (1986, Chapter 5).As in the cited references, we introduce a third weight function w b is defined such that Remark.The optimal B-robust estimator derived in Stahel (1987) is given by w b .In higher dimensions, the efficiency loss for the estimated size is negligible.Hence, a smaller tuning parameter may be chosen for w b may be used to get approximately the same efficiency for θ as for σ.Tables of tuning parameters for higher dimensions for the Huber and the lqq ψ-functions can be found in Appendix A.
Before we can state the robust estimating equation for the block-diagonal case, we need one more definition.Let The robust estimating equation in the block-diagonal case can then be defined as follows.For l = 1, . . ., r, where is the inverse of any square root of the s × s matrix T b,k .As in the diagonal case, we define the matrix T b,k such that each summand has expectation zero.For l = 1, . . ., r, Remarks.The symmetric matrix T b,k is fully defined for unstructured covariance matrices only, where r = s(s + 1)/2.For other covariance matrix structures we can replace T b,k by the variance of the linear approximation of b * .
Since in the classic case, the linear approximations for b * and ε * are exact, the estimating equation ( 9) reduces to the REML estimating equations.A similar argument is valid for the estimating equation ( 6) for σ.

C. Estimation algorithm
The algorithm for finding the simultaneous roots of the estimating equations ( 5), ( 6) and ( 8) (and/or (9)) can be split into four general steps.They are: 1. Compute initial estimates.
3. Keeping the intermediate solutions β and b * fixed, find σ such that (6) is fulfilled.
4. Check the estimating equations for θ, (8) and/or (9), for convergence.If they are not fulfilled, update θ in some way and go back to step 2.
The algorithms for the four steps can be chosen independently from each other.We discuss the four steps below.
When this algorithm stops, it has found a simultaneous solution of all the estimating equations, except in the block-diagonal case where some, but not all, components corresponding to a block might lie on the border of the parameter space.For those parameters, the estimating equations won't be satisfied.To avoid incorrect solutions, it is crucial that the estimates for β and b * are updated for each new candidate of θ and that the initial estimate for θ is large enough.Otherwise, the algorithm might wrongly set all components of θ corresponding to one block to zero or close to zero.This is illustrated for a simple one-way anova in Figure 4.The expected sum of squares vanishes for θ = 0 in the classic case.In the robust case, the expectation does not vanish, but there is a solution close to zero.This is an artifact of the linear approximation used to compute the expectation.As long as convex ρ-functions are used, the classic estimates are generally a good choice of initial estimates.Zero components of the initial θ should be set to one at the start of the algorithm.

C.1.
Step 1: Initial estimates  8) are solved at the points where the two curves cross.The solutions are highlighted by black dashed lines, θ is the correct solution, θ † the wrong one.
The methods described here work for convex ρ-functions as well as for redescender ρ-functions.
If redescender ρ-functions are used, the algorithm as defined here converges to a local solution.
It is up to the initial estimator to provide starting values that ensure the algorithm converges to the right local solution, whatever the right solution is.In case of MM-estimates for the fixed effects model, the initial S-estimate makes sure that the final estimate has the desired high breakdown point.The same would certainly also be desirable in case of mixed-effects models.However, to the best of our knowledge, there exists currently no such estimator.The S-estimators by Copt and Victoria-Feser (2006) and Chervoneva and Vishnyakov (2011) do not seem suitable, since they are based on a different contamination model and are not as general as the method proposed here.
If convex ρ-functions are used, this difficulty does not exist.Apart from the artificial solution θ close to zero (see Figure 4) which is easily distinguishable from the true solution, we conjecture that the solutions are unique as they are for the Proposal 2 case in the locationscale problem (Koller 2013).We therefore consider it safe to use the classic solutions as initial estimates when convex ρ-functions are used.
Redescender ρ-functions have the advantage that they can assign a weight of zero to some observations or random effects levels.This makes it possible that such observations have no influence on the estimates.When convex ρ-functions are used, an observation practically always has an influence on the estimates, since a weight of zero is only reached in the limit, when the residual or the random effect level approaches plus or minus infinity.If one is interested in eliminating the influence of observations, then one might consider the following.First, compute the fit using a convex ρ-function.Then use the results as starting value for fitting using a redescender ρ-function in a second step.In the absence of good initial estimators for redescender ρ-functions, this approach might be used to get at least some of the desirable properties of redescender ρ-functions.

C.2. Step 2: Fixed and random effects
For given θ and σ, the estimation of the fixed and random effects can be done using iteratively reweighted least squares.
Let W e be defined analogously to W b , i.e., W e = Diag(w e (ε * i /σ)) i=1,...,n , where Then insert these weights into (5) and expand ε * to get the following linear system of equations, By alternating between computing β and b * for a given set of weights and updating the weights for a given set of estimates, we get a simple and efficient algorithm for computing the fixed and random effects.
We start the algorithm with either a predefined set of weights or set all the weights to one.When the relative change of the estimates is small enough, the algorithm can stop.This suggests a simple two-step algorithm, namely alternating between computing σ using the above formula and updating the weights given σ.This algorithm is quick and reliable, especially if the overall algorithm has almost converged and σ only changes little between iterations of θ.

C.3. Step 3: Variance parameter
A similar procedure can also be derived for the computation of τ e,i .Solving (7) for τ e,i yields

C.4. Step 4: Covariance parameters
In the following, we will assume that we have only one block type.The algorithms mentioned below can be easily generalized to multiple block types.One iteration then consists of computing the updates for every block type separately before applying all of them together.
In case of diagonal U b (θ), θ may be computed using the analogue of the algorithm for Step 3.This has proven to be much more efficient and robust compared to using a generic root solving procedure.
The same is true in the non-diagonal case.Nevertheless, if we assume a special covariance structure, the only options are generic root solving procedures such as Newton-Raphson.The Newton-Raphson algorithm, however, can be quite unstable and often does not converge for problems with many parameters.
In the case of unstructured covariance matrices, there exists a better algorithm of EM-type.
Let the function L(A) return the lower triangular Cholesky factor of A, and L −1 return the inverse of the factor.Then, for unstructured covariance matrices and in terms of the first block U b,1 of U b , the update is where the superscript in square brackets denotes the iteration.The right-hand side is computed using θ [it−1] , the value from the last iteration, and w (.) b,k is the corresponding k-th robustness weight.
Remark.To see that (10) is indeed a sensible update, we have to first rewrite the r scalar valued estimating equations into one matrix valued estimating equation.We may write (8) as When assuming an unstructured covariance matrix for the random effects, Q ℓ,k has only one non-zero value and does not depend on k. (For other block types, Q ℓ,k vanishes, thus decoupling the problem of tuning parameters for popular ψ-functions are provided in the Appendix.

Figure 1 :
Figure 1: Diameters of growth inhibition zones of 6 samples applied to each of 24 agar plates to assess penicillin concentration.The lines join the observations of the same sample.The plates have been reordered by their diameter values.The observations marked by × have been modified to introduce some contamination.

Figure 2 :
Figure 2: Residual analysis plots for robust fit rfm.The coloring of the points gives information about the corresponding robustness weights.

Figure 3 :
Figure 3: Comparison of the Huber and the smoothed Huber ψ-function for k = 1.345 and s = 10.
) = min(1/b η , 1/d).Other weight functions may be chosen, as long as ψ(d) = dw(d) is a ψ-function.For w this would be the Huber ψ-function.For low dimensions s, one may choose w s = 2 and Huber or smoothed Huber ψ-functions (see Appendix A), the squared tuning parameter of ρ

Figure 4 :
Figure 4: Sum of Squares of the spherical random effects for a balanced one-way anova.The smoothed Huber function was used for both ρ e and ρ b .The estimating equations (8) are solved at the points where the two curves cross.The solutions are highlighted by black dashed lines, θ is the correct solution, θ † the wrong one.
e W e U −1 e ZU b U ⊤ b Z ⊤ U −⊤ e W e U −1 e X U ⊤ b Z ⊤ U −⊤ e W e U −1 e ZU b + Λ b W b β b * = X ⊤ U −⊤ e W e y U ⊤ b Z ⊤ U −⊤ e W e y.
to use the same two-step procedure as lined out above.The values τ e,i have to be recomputed for every new value of θ, preferably using the values of the last candidate θ as starting values.

Table 1 :
(Mächler 2016)assic and robust estimation methods available in R. See also the CRAN Task View on robust statistical methods(Mächler 2016).