Published by the Foundation for Open Access Statistics Editors-in-chief: Bettina Grün, Torsten Hothorn, Rebecca Killick, Edzer Pebesma, Achim Zeileis    ISSN 1548-7660; CODEN JSSOBK
Authors: Yumi Kondo, Matias Salibian-Barrera, Ruben Zamar
Title: RSKC: An R Package for a Robust and Sparse K-Means Clustering Algorithm
Abstract: Witten and Tibshirani (2010) proposed an algorithim to simultaneously find clusters and select clustering variables, called sparse K-means (SK-means). SK-means is particularly useful when the dataset has a large fraction of noise variables (that is, variables without useful information to separate the clusters). SK-means works very well on clean and complete data but cannot handle outliers nor missing data. To remedy these problems we introduce a new robust and sparse K-means clustering algorithm implemented in the R package RSKC. We demonstrate the use of our package on four datasets. We also conduct a Monte Carlo study to compare the performances of RSK-means and SK-means regarding the selection of important variables and identification of clusters. Our simulation study shows that RSK-means performs well on clean data and better than SK-means and other competitors on outlier-contaminated data.

Page views:: 4118. Submitted: 2013-04-04. Published: 2016-08-28.
Paper: RSKC: An R Package for a Robust and Sparse K-Means Clustering Algorithm     Download PDF (Downloads: 6655)
RSKC_2.4.2.tar.gz: R source package Download (Downloads: 212; 373KB) Replication materials Download (Downloads: 111; 10MB)
v72i05.R: R replication code Download (Downloads: 234; 26KB)

DOI: 10.18637/jss.v072.i05

This work is licensed under the licenses
Paper: Creative Commons Attribution 3.0 Unported License
Code: GNU General Public License (at least one of version 2 or version 3) or a GPL-compatible license.