TY - JOUR
AU - Grimonprez, Quentin
AU - Blanck, Samuel
AU - Celisse, Alain
AU - Marot, Guillemette
PY - 2023/03/23
Y2 - 2023/06/07
TI - MLGL: An R Package Implementing Correlated Variable Selection by Hierarchical Clustering and Group-Lasso
JF - Journal of Statistical Software
JA - J. Stat. Soft.
VL - 106
IS - 3
SE - Articles
DO - 10.18637/jss.v106.i03
UR - https://www.jstatsoft.org/index.php/jss/article/view/v106i03
SP - 1 - 33
AB - <p>The R package MLGL, standing for multi-layer group-Lasso, implements a new procedure of variable selection in the context of redundancy between explanatory variables, which holds true with high-dimensional data. A sparsity assumption is made that postulates that only a few variables are relevant for predicting the response variable. In this context, the performance of classical Lasso-based approaches strongly deteriorates as the redundancy increases. The proposed approach combines variables aggregation and selection in order to improve interpretability and performance. First, a hierarchical clustering procedure provides at each level a partition of the variables into groups. Then, the set of groups of variables from the different levels of the hierarchy is given as input to group-Lasso, with weights adapted to the structure of the hierarchy. At this step, group-Lasso outputs sets of candidate groups of variables for each value of the regularization parameter. The versatility offered by package MLGL to choose groups at different levels of the hierarchy a priori induces a high computational complexity. MLGL, however, exploits the structure of the hierarchy and the weights used in group-Lasso to greatly reduce the final time cost. The final choice of the regularization parameter - and therefore the final choice of groups - is made by a multiple hierarchical testing procedure.</p>
ER -