Published by the Foundation for Open Access Statistics Editors-in-chief: Bettina Grün, Torsten Hothorn, Edzer Pebesma, Achim Zeileis    ISSN 1548-7660; CODEN JSSOBK
Authors: Fang Chang, Weiliang Qiu, Ruben H. Zamar, Ross Lazarus, Xiaogang Wang
Title: clues: An R Package for Nonparametric Clustering Based on Local Shrinking
Abstract: Determining the optimal number of clusters appears to be a persistent and controversial issue in cluster analysis. Most existing R packages targeting clustering require the user to specify the number of clusters in advance. However, if this subjectively chosen number is far from optimal, clustering may produce seriously misleading results. In order to address this vexing problem, we develop the R package clues to automate and evaluate the selection of an optimal number of clusters, which is widely applicable in the field of clustering analysis. Package clues uses two main procedures, shrinking and partitioning, to estimate an optimal number of clusters by maximizing an index function, either the CH index or the Silhouette index, rather than relying on guessing a pre-specified number. Five agreement indices (Rand index, Hubert and Arabie's adjusted Rand index, Morey and Agresti's adjusted Rand index, Fowlkes and Mallows index and Jaccard index), which measure the degree of agreement between any two partitions, are also provided in clues. In addition to numerical evidence, clues also supplies a deeper insight into the partitioning process with trajectory plots.

Page views:: 6086. Submitted: 2009-03-26. Published: 2010-02-03.
Paper: clues: An R Package for Nonparametric Clustering Based on Local Shrinking     Download PDF (Downloads: 6774)
clues_0.5-0.tar.gz: R source package Download (Downloads: 864; 566KB)
v33i04.R: R example code from the paper Download (Downloads: 859; 2KB) WDBC.csv: Example data set in CSV format Download (Downloads: 872; 48KB)

DOI: 10.18637/jss.v033.i04

This work is licensed under the licenses
Paper: Creative Commons Attribution 3.0 Unported License
Code: GNU General Public License (at least one of version 2 or version 3) or a GPL-compatible license.