Main Article Content
Determining the optimal number of clusters appears to be a persistent and controversial issue in cluster analysis. Most existing R packages targeting clustering require the user to specify the number of clusters in advance. However, if this subjectively chosen number is far from optimal, clustering may produce seriously misleading results. In order to address this vexing problem, we develop the R package clues to automate and evaluate the selection of an optimal number of clusters, which is widely applicable in the field of clustering analysis. Package clues uses two main procedures, shrinking and partitioning, to estimate an optimal number of clusters by maximizing an index function, either the CH index or the Silhouette index, rather than relying on guessing a pre-specified number. Five agreement indices (Rand index, Hubert and Arabie's adjusted Rand index, Morey and Agresti's adjusted Rand index, Fowlkes and Mallows index and Jaccard index), which measure the degree of agreement between any two partitions, are also provided in clues. In addition to numerical evidence, clues also supplies a deeper insight into the partitioning process with trajectory plots.