Iterative bias reduction multivariate smoothing in R: The ibr package

In multivariate nonparametric analysis, sparseness of the covariates also called curse of dimensionality, forces one to use large smoothing parameters. This leads to a biased smoother. Instead of focusing on optimally selecting the smoothing parameter, we fix it to some reasonably large value to ensure an over-smoothing of the data. The resulting base smoother has a small variance but a substantial bias. In this paper, we propose an R package named ibr to iteratively correct the initial bias of the (base) estimator by an estimate of the bias obtained by smoothing the residuals. After a brief description of Iterated Bias Reduction smoothers, we examine the base smoothers implemented in the packages: Nadaraya-Watson kernel smoothers and thin plate splines smoothers. Then, we explain the stopping rules available in the package and their implementation. Finally we illustrate the package on two examples: a toy example in RxR and the original Los Angeles ozone dataset.


Introduction
Regression is a fundamental data analysis tool for uncovering functional relationships for the conditional expectation from pairs of observations (X i , Y i ), i = 1, . . . , n. Classical linear regression is the simplest example of this. More generally, we can let the data help determine the general form of the relationship by using one of the numerous non-parametric regression estimators, such as wavelets based smoothers, kernel smoothers, and splines smoothers [Buja et al., 1989, Cleveland and Devlin, 1988, Eubank, 1988, Fan and Gijbels, 1996, Antoniadis and Oppenheim, 1995, Simonoff, 1996. The latter flexible smoothing methods are implemented as R functions found in numerous contributed packages. For instance the package wavethresh [Nason, 2010] implements a wavelet based smoother, the package lokern [for R and enhanced by Martin Maechler, 2010] provides a kernel smoothers and the function smooth.spline calculates a cubic spline smoother. When the number of dependent variables d is greater than 3 or 4, fully nonparametric regression suffers from the curse of dimensionality, even for moderate sample sizes (say n being equal to a few hundred). As a result, application of fully non-parametric methods are discouraged in dimensions four and higher. Instead, the statistical literature encourages using constrained non-parametric regression models (additive models [Hastie and Tibshirani, 1990], single and multiple index models and projection pursuit models) to estimate useful approximations of the conditional expectation. The latter methods are provided to the R community in the contributed package mgcv [Wood, 2011] for additive modelling, function ppr for projection pursuit and package mda [Hastie et al., 2011] for MARS.
Originating from the machine learning community, the boosting algorithm is also another tool for non-parametric regression [see Friedman, 2001, and references therein]. This fairly recent and very popular method has numerous variations, such as adaboost (the original method), logitboost for classification, and the L 2 boosting for regression. The interesting feature is that it provides a framework for combining various weak learners (non-parametric smoothers) into a smoother that is better than any single smoother that it is composed off. Packages for L 2 boosting are already available in R: for instance the package mboost [Hothorn et al., 2010] allows for L 2 boosting for regression problem as well as logistic boosting for classification. For multivariate regression, the L 2 boosting algorithm has been applied to component-wise additive modelling with classical smoother such as smoothing splines [see Bühlmann and Hothorn, 2007].
Linking the L 2 boosting algorithm to an iterative bias correction scheme (see for example Bühlmann and Hothorn [2007] for boosting of smoothing splines and Cornillon et al. [2011] for discussions on L 2 boosting of more general smoothers) provides a statistical interpretation of the L 2 boosting algorithm. This interpretation was alluded to in Ridgeway [2000]'s discussion of Friedman et al. [2000] paper on the statistical interpretation of boosting. The basic idea of estimating (and correcting for) the bias of a pilot smoother goes back to the concept of twicing introduced by Tukey [1977]. The idea of iterating the bias correction was central to adaptive bagging algorithm of Breiman [1999]. More details about statistical properties in univariate or multivariate smoother can be found in Bühlmann and Yu [2003] or Cornillon et al. [2011].
This paper focuses on the computational implementation in R [R Development Core Team, 2009] of the iterated bias correction procedure for fully multivariate regression smoothers. We start in Section two by briefly presenting the concept of iterative bias reduction and recalling its connection to L 2 boosting. The details of our numerical implementation and a review of available options in our R package ibr are given in Section three. The last section is devoted to examples.

Method
Suppose that the pairs (X i , Y i ) ∈ R d ×R are related through the non-parametric regression model where m(·) is an unknown smooth function, and the disturbances ε i are independent mean zero and variance σ 2 random variables that are independent of all the covariates. It is helpful to rewrite Equation (1) in vector form by setting Y = (Y 1 , . . . , Y n ) t , m = (m(X 1 ), . . . , m(X n )) t and ε = (ε 1 , . . . , ε n ) t , to get Linear smoothers estimate the regression function m evaluated at the covariates by linear combinations of the responses that can be compactly written as where S λ is an n × n smoothing matrix with smoothing parameter λ. By slight abuse of language, we will sometimes refer to the vector of fitted values m = Y = ( Y 1 , . . . , Y n ) t as the smooth of Y . Typical smoothers [see for instance Hastie et al., 2001] include bin smoothers, spline based smoothers (regression splines, smoothing splines, and thin-plate splines), kernel based smoothers (Nadaraya-Watson kernels and local polynomials smoothers), and series based smoothers (Fourier smoothers and wavelet smoothers). In this paper, we focus only on two common types of smoothers: Nadaraya-Watson kernels (where λ is the bandwidth) and thin-plate splines (where λ is the penalty parameter). Extensions to other smoothers can easily be achieved by suitably modifying our theoretical results and software. The linear smoother (3) has bias To estimate the bias, observe that the residuals . This suggests estimating the bias by smoothing the negative residuals Recall that, in multivariate non-parametric analysis, sparseness of the covariates also called curse of dimensionality, forces one to use large smoothing parameters λ. This leads to very biased base smoother S λ . Thus the bias correction in multivariate non-parametric analysis arises as a natural tool to correct classical smoother S λ . If λ is large, not all the bias is usually removed after a the first correction. To remove the remaining bias, iterations of the bias reduction step have to be performed. For instance k − 1 bias reduction step produces the linear smoother at iteration k: When d = 1, the sequence of iterated bias corrected smoothers agrees with the L 2 -boosted smoothers without shrinkage. For d > 1, the boosting algorithm is applied component-wise to additive regression models [see Bühlmann and Yu, 2003]. This results in a sequence of constrained (additive) approximation of the fully non-parametric regression function m.
For thin-plate splines and kernels smoothers (with suitable kernels, such as as a Gaussian density function), each iteration of the bias correction produces a noisier but less biased smoother. In the limit, the sequence of iterative bias corrected smoothers reproduces the raw data [Cornillon et al., 2011]. Thus there is a need for good stopping rules for the iterative bias correction algorithm.
We note that, provided λ is large enough, its exact value is not crucial as the choice of k will adapt to λ: if two base smoothers are chosen, one with λ 1 and another with λ 2 > λ 1 , the chosen iteration k 2 will be greater than k 1 as it takes more iterations with a very smooth base smoother to remove the bias.
To make prediction at arbitrary locations x ∈ R d of the covariates, we extend linear smoothers to functions of the form  where S(x) is a vector of size n whose entries are the weights for predicting m(x). The vector S(x) reduces to the j th row of the smoothing matrix when x = X j , and is readily computed for many of the smoothers used in practice.
For extended base smoothers of the form (6), we propose to extend the associated iterative bias corrected smoother m k by observing that This implies that m k (X j ) = S λ (X j ) tβ k . Hence we propose to extend the iterative bias corrected smoother to R d via the function

Stopping Rules
Selecting a suitable number of bias correction iterations k is crucial. There exists an unknown number k of bias corrections iterations of thin-plate spline base smoothers that produces estimators for the regression function that achieve the optimal rate of convergence for mean square error (see [Bühlmann and Yu, 2003] for the univariate case and [Cornillon et al., 2011] for the multivariate counter-part). This optimal unknown number of iterations can be estimated consistently from data using GCV [Cornillon et al., 2011]. But one can select the number of iterations from data using classical model selection methodologies such as: GCV [Craven and Wahba, 1979], AIC [Akaike, 1973], BIC [Schwarz, 1978], AICc [Hurvich et al., 1998] or gMDL [Hansen and Yu, 2001]. In particular, use of AICc is advocated in Bühlmann and Hothorn [2007] and empirical evidence for GCV can be found in Cornillon et al. [2008]. Other methods such as cross-validation (leave-one-out or K-fold) or the use of training set and test set are also reasonable procedure to estimate k [Bühlmann and Hothorn, 2007]. Both empirical [Cornillon et al., 2011] and theoretical [Cornillon et al., 2011] considerations support using GCV in practice.

Choice of kernel for kernel smoothers
The behavior of the sequence of iterative bias corrected kernel smoothers depend critically on the properties of the smoother kernel. Specifically, the smoothing kernel needs to be positive definite [see Di Marzio andTaylor, 2008, Cornillon et al., 2011]. Examples of positive definite kernels include the Gaussian and the triangle densities, and examples of kernel that are not definite positive include the uniform and the Epanechnikov kernels.

Implementation in R
Our implementation of the iterative bias corrected procedure in R follows the established S3 methods [see writing R extensions R Development Core Team, 2009]. The main function (called ibr) produces an object of class ibr. Applying generic functions, such as summary, predict, plot or residuals, to an ibr class object produces the expected standard summary statistics, prediction for new data (or fitted values), plot of the object and residuals.

Base smoother
Two types of base smoother are implemented in the function ibr: thin-plate and kernel smoother. This choice is driven by the smoother argument (character): tps or k. For kernel smoother, some classical choice are available using the kernel argument (character): Gaussian kernel (g, the default), triangle density (t), and the quartic (q) density. The computations have been optimized for the Gaussian kernel. Argument kernel enables the use of Epanechnikov kernel (e) or uniform (u) kernel but only for pedagogical purposes.

Computations
To predict new data, we compute recursivelyβ k using equations (8) and (9). Computation of the fitted values using Equation (4) can be computed using a similar recursive update formula: starting withb 0 = (I − S λ )Y : Computations of eitherb k orβ k require O(kn 2 ) operations and is implemented numerically by using the corresponding level 2 Blas function [Golub and Van Loan, 1996]. In practice, we often found that the number of iterations k that are required to be evaluated in order to select an good data-driven choicek is commensurate with the sample size n. Thus the algorithm that produces the final smoother is typically of order O(n 3 ).
Numerical experiments have shown that an alternative algorithm, based on an eigenvalue decomposition of the smoothing matrix S λ (also an order O(n 3 ) algorithm), is faster when combined with GCV for selecting the number of iterations. We have implemented the latter algorithm in the ibr package. This approach is easily understood and implemented for thin plate spline smoothers, whose smoothing matrix S λ is symmetric. For kernel smoothers, the smoothing matrix is not symmetric and further discussion is needed.
While the kernel base smoother S λ is not symmetric, we can rewrite equation (4) using an eigen decomposition of a symmetric matrix. Specifically, write and D a diagonal matrix with entries D ii = 1/ n j=1 K ij . With this notation we write the smoothing matrix ofm k in Equation (4) as where A = D 1/2 KD 1/2 . The latter is symmetric, and so can diagonalized A = U ΛU ′ , with U the orthogonal matrix of eigenvectors and Λ the diagonal matrix of eigenvalues. Equation (4) becomeŝ The coefficientβ k in (8) becomeŝ Recognizing the sum inside the bracket as the k − 1 first term of geometrical series, we rewriteβ The core of computation becomes the eigen decomposition which is done in a very efficient way by the function eigen for moderate n (n < 1000 for instance). For additional efficiencies, the computations of A and D 1/2 are done in C for the default Gaussian kernel.

Stopping rules
The ibr package implements several classical criteria to empirically select an optimal number k of bias correction iterations. They include Generalized Cross Validation (GCV), the Akaike Information Criteria (AIC), the Bayesian Information Criteria (BIC), a corrected Akaike Information Criteria (AICc) and generalized Minimum Description Length (gMDL). The choice for which method is to be used is controlled by the argument criterion of the ibr function, with the default method being GCV. Cross-validation are also available, but our discussion of that method is postponed to Section 3.5.
The evaluation of an optimal number k of iterations using any one of these classical criteria is not a trivial task. The package ibr implements both a computationally burdensome exhaustive search method and a computationally efficient but approximate method. The latter is the default method. The user can request ibr to perform an exhaustive search by setting the argument exhaustive=TRUE in the list control.par.

Exhaustive search method
The exhaustive search method evaluates, for each k in an interval [K min ; K max ], the criterion to identify its global minimizer. The default values for the range are K min = 1 and K max = 10 5 .

Numerical optimization method
The default method relies on the fact that the criteria is easily calculated for arbitrary k ∈ R + . This enables us to use standard optimization routine to minimize the criterion. While this approach is conceptually simple, there are two pitfalls: First, most criterion break down for very large k for which the smoother essentially interpolates the data, i.e.,m k ≈ Y . Second, some criterion exhibit multiple local minima (see figure 3b).
All model selection criteria trade-off goodness of fit, as measured by log( Y − m k 2 ) with a measure of the complexity of the smoother. Numerical difficulties arise when Y ≈m k , which occurs when the number k of iterations is close to the sample size n. To overcome this problem, we bound from above the maximum allowable number of iterations by setting the variable dfmaxi in the list control.par. By default, its value is 2n/3. Hard-coded error handling prevents evaluation of the criteria when either k > n(1 − 10 −10 ) or Y −m k 2 ≤ 10 −10 . These exceptions also apply to the exhaustive search algorithms.
Classical model selection criteria have been developed in the context where the effective number of estimated parameters is significantly smaller than the number of observations. Investigation of the criteria, as a function of effective degrees of freedom over a broader range of values reveals the presence of multiple local minima. While this does not impact the performance of the exhaustive search, the presence of local minima is potentially problematic for standard minimization algorithms. Our solution is to divide the interval [K min ; K max ] into smaller subintervals and apply on each subinterval a numerical optimization using the function optimize, and the minimizer of these minimizations is returned. The splitting is controlled by the argument fraction in the list control.par, with default value of c (100, 200, 500, 1000, 5000, 1e04, 5e04, 1e05, 5e05, 1e06).   While the strategy of optimizing the criteria in subintervals is more expensive than optimizing over the original interval, it remains significantly faster than performing an exhaustive search.

Scales of variables
The function ibr is designed to be used with two types of linear smoothers: thin-plate splines and kernel smoothers. Thin-plate splines are governed by a single parameter λ that weights the contribution of the roughness penalty. As a result, it is desirable to scale all the variables to have equal variance to ensure that the roughness penalty is applied equally to each variable. This is achieved by pre-processing the data with the scale function before applying smoothing the data with ibr.
Our implementation of the kernel smoother enables the use of a vector of different bandwidths, one for each of the regression variables. While the discussion on scaling applies when a common bandwidth is used for all the variables, we found in our numerical experiments that we get better results when we use the original variable but select a suitable bandwidth for each variable. The objective is not to select an optimal bandwidth, but rather control the amount of smoothing we do at each iteration. To this end, we propose to select the bandwidths such that one-dimensional smoothing matrix for each variable has the same effective degree of freedom. Typical values for the effective degree of freedom values are 1.05, 1.1, 1.2, 1.5 or 2, which the user sets with the df argument. Given an desired effective degree of freedom, package automatically determines the bandwidth using an adaptation of the uniroot algorithm programed in C.
Relating the effective degree of freedom of each of the univariate components to an effective degree of freedom for the multivariate smoother is non-trivial. As a result, some users may prefer to control the overall smoothing instead of the marginal smoothing of each of component. To enable (or disable) the control of the overall smoothing, the flag dftotal in list control.par have to be set to TRUE (the default value of that flag is dftotal=FALSE). With this option, our package takes the value of the argument df to calculate the individual bandwidths of each component using a C routine.

Stopping rules: K-fold cross-validation and Data splitting
Simple cross-validation, K-fold cross-validation, and more generally data splitting, are well established techniques for model selection that we use to determine the optimal number of iteration k for our iterative bias correction scheme. For these methods, the data are separated into two sets, a training set to estimate the regression function and a testing set to evaluate the out of sample prediction error, using either the root mean square error criterion="rmse" or the mean absolute error criterion="map" loss functions. We numerically minimize that prediction error, either using an optimization routine, the default method, or by exhaustive search (set exhaustive=TRUE in the list control.par).
Since simple leave-one out cross-validation usually leads to estimator that under-smooths (in our case, the selected number of iterations k is larger than the optimal one), we prefer to use either data splitting or K-fold cross-validation. The main difference between these two procedures is that usually data splitting is conducted once (except if the user asks for more using argument npermut) whereas for K-fold cross-validation, the original sample is randomly partitioned into K subsamples with each of the K subsets used as the test set and the remainders K − 1 subsets are combined to form the training set. The prediction error is then computed by averaging the errors across the K trials. In summary, we split the original data into two samples : a training one on which we evaluate the estimator and a testing one on which we predict the new observations as shown in figure 4.
The list cv.options in ibr controls the various options for cross-validation, including the size of the training set, the number of repetition of the procedure, the loss function and the type of splitting.

Selecting the number of iterations k with Data splitting
To have ibr perform a data splitting cross-validation, we set the following options in cv.options: 1. Input either ntest or ntrain, the size of the testing set n v or the size of the training test (n − n v ), respectively. The default value set ntest to ⌊n/10⌋. 2. Set the number of times the dataset is split in npermut. For classical data splitting, npermut have to be set equal to one (the default value is 20).

Fig 4: Training set and validation set
3. Set the type equal to random to enable random data splitting. This stage can be omitted as this is the default value. The argument seed can be used to control the seed of the random generator.
Data splitting (with test set of size ⌊n/10⌋) with root mean square error loss is achieved by the code > ibr(X,Y,criterion="rmse",cv.options=list(npermut=1)) A more complex example of data splitting that uses 100 samples of 3 observations to evaluate the prediction error using the mean absolute deviation loss is achieved with the code > ibr(X,Y,criterion="map",cv.options=list(ntest=3,npermut=100))

Selecting the number of iterations k with K-fold cross-validation
To perform a K-cross-validation with ibr, we set the following options in cv.options: 1. Set Kfold=TRUE (default is FALSE) or set Kfold equal to the number of folds 2. Set the number of folds K. One can either specify the size of the testing set n v in ntest or the size of the training set (n − n v ) in ntrain, in which case the fold is computed to be K = ⌊n/n v ⌋. One can set the number of folds K using by setting the argument Kfold equal to K. This implies that the size of the testing set is ⌊n/K⌋. 3. Specify the type of data-split. By default, the data are split randomly (type="random"). Alternatively, we divide the data using consecutive stretch of data (type="consecutive") or interleaved split (type="interleaved"). A forth option, type="timeseries" divides the data chronologically and uses the last ⌊n/K⌋ for the testing set. The splitting implied by consecutive is shown in figure 5a while the splitting using interleaved is shown in figure 5b. The obvious case of random draw is not shown. Finally, the optional argument seed can be used to control the seed for random number generator. It is given as seed argument of the set.seed function.  The first two lines of code give rise to the examples summarized in 5a and 5b, while the third line corresponds to a random K-fold cross-validation: > ibr(X,Y,criterion="rmse",cv.options=list(Kfold=6,type="consecutive")) > ibr (X,Y,criterion="rmse",cv.options=list(Kfold=6,type="interleaved")) > ibr (X,Y,criterion="rmse",cv.options=list(Kfold=6,type="random")) Finally, if the user wants to perform an exhaustive search for the number of iterations (from 1 to 1000 iterations) using the leave-one out cross-validation, she runs > ibr (X,Y,criterion="rmse",Kmax=1000,control.par=list(exhaustive=TRUE), + cv.options=list(Kfold=TRUE,ntest=1,type="consecutive"))

Variables selection
We can apply the standard strategy of balancing prediction errors and model complexity to select predictors. The main issue with variable selection with ibr is computational, as we wish to compare models using an optimal number of bias reduction iterations. To limit fitting models with many parameters, we only consider forward variable selection (see algorithm 1).
In analogy to selecting the number of iterations, controlled by entries in the list criterion, we control the variable selection procedure with the list varcrit. The latter has the same default values as the former.

Algorithm 1 Forward function
Require: criterion (GCV, AIC, AICc, BIC, gMDL, MAP or RMSE) Require: varcrit (GCV, AIC, AICc, BIC, gMDL) s ← 1 # current stage R matrix of infinity with d columns # Matrix of results S ← ∅ # variable(s) selected at current stage s min ← ∞ # Current minimum of criterion for s = 1 to d do for j = 1 to d such that j ∈ S do Sc ← S ∪ {j} # Adding one variable to the set of variables already selected res <-ibr(X Sc ,Y,criterion) # X Sc is the dataset with explanatory variables in Sc evaluation of criterion varcrit for res: R sj end for if all {R sj } j > s min then Return matrix R from row 1 to s − 1 else # Updating S ← S ∪ {arg min j R sj } s min ← min j R sj # Current minimum of criterion varcrit s ← s + 1 end if end for The forward function returns an object of class forwardibr. A plot method is provided for this class of object.
> xc <-rep(xr, sqrt(N)) ; yc <-rep(yr, rep(sqrt(N),sqrt(N))) > X <-cbind(xc, yc) ; Zc <-as.vector(Z) In this example, we will use thin-plate splines of order ν 0 . Since the procedure is adaptive, the default value is the smallest possible smoothness, which is 2 in our case. The effective degree of freedom of the thin plate smoother needs to be slightly larger than M 0 = ν0+d−1 ν0−1 . In our example, M = 3 and we chose λ such that the effective degree of freedom was 1.1 × M = 3.3. Figure 2 (a) graphs the base smoother at iteration zero.

Real example: Los Angeles Ozone Data
We consider the classical Los Angeles basin ozone concentration data set used by numerous authors (see for example Breiman [1996], Yu [2003, 2006]) to demonstrate the performance of various high dimensional smoothing techniques. The data consists of n = 330 observed ozone concentration related to d = 8 explanatory variables.
The order ν 0 of thin plate splines needs to be greater than d/2, that is ν 0 = 5. This implies that the minimal effective degree of freedom of the thin plate spline smoother S λ is M 0 = 495, which is greater than the sample size n. Even for larger sample sizes, say n = 500, the thin plate splines will be unsatisfactory base smoother (recall that in the preceding section, for d = 2 we started at 3.3 df with 100 observations).
For this reason, let us consider the (default) Gaussian kernel smoother. As we discussed in Section 3.4, we do not scale the eight explanatory variables but instead select the bandwidth of each univariate smoother to achieve a smoothing matrix that has an effective degree of freedom of 1.1. This ensures that at face value, each of the eight covariates has the same influence. The number of possible bias correction iterations k considered by the model selection procedure for selecting the optimal number of iterations lies between one and 10, 000 (default values for Kmin and Kmax). The R code for fitting this data is > data(ozone) > res.ibr <-ibr(ozone [,-1] From the summary, we see that the optimal number of iterations isk GCV = 64, which can be thought as quite low (recall that in the previous example the number of iterations ranged between 200 and 400). In this example, an exhaustive search method for determining the optimal number of iterations > ibr(ozone [,-1],ozone [,1],df=1.1,control.par=list(exhaustive=TRUE)) gives the same result. Because we only need a relatively small number of bias correction steps, we can select a smaller initial effective degree of freedom, say 1.05, while maintaining the computational complexity at a manageable level. Indeed, decreasing the effective degree of freedom of the pilot smoother increases the total number of bias reduction steps while typically providing some performance gains as measured by out of sample prediction errors.
A plot method is also available for the ibr object to display the residuals as a function of the index. To emulate [see Bühlmann and Yu, 2003] and draw 50 random splits of the data into a set of 297 training data and a set of 33 testing data, we issue the following commands We get an error of 14.98, which compare favorably with GAM (mgcv: 17.44), MARS (mda: 17.49), projection pursuit (ppr: 17.79 for nterms=2) or boosting (package mboost: 17.23). Note that since no default is available for the nterms argument of the function ppr, we follow the examples provided in the ppr documentation and have set nterms equal to 2. To summarize, the ibr smoother enjoys a 15% reduction in the out of sample prediction mean squared error over other state-of-the-art multivariate smoothing methods.
We note that the above comparison favors the L 2 boosting and MARS algorithms that take advantage of built-in variable selection procedures. To compare to these methods, we apply the forward variable selection using the random splitting method to ibr for this data set issuing the following commands.

YA <-Y[-ind] > XXT <-XX[ind,] > YT <-Y[ind]
We select variables using the commands > forward.ibr <-forward(XXA,YA) > varnumber <-apply(forward.ibr,1,which.min) > varnumber [1] 4 3 7 6 5 That is, the order of the variables to be included into the model is 4, 3, 7, 6 and 5. Variable selection leads to improved predictions. To quantify, on the testing set, this improvement, we compare the the prediction MSE of the selected five variable model with the prediction MSE of model that uses all the eight variables.

Conclusion
The ibr package provides additional features which are not offered by other packages on CRAN. These features are a complete implementation, using R language, of iterative biased reduction procedure which implement and generalize the twicing idea of Tukey [1977].
This method of smoothing for multivariate dataset seems to be promising especially on real dataset. But one limitation of this smoothing method is the use of matrix n × n, where n is the number of observations. Moreover, at the present time, the computational bottleneck is the eigen decomposition of an n × n matrix, which limits the size of the dataset to which this procedure can be applied too.