Story Details

  • Stop using the elbow criterion for k-means

    Posted: 2025-03-23 02:51:38

    The paper "Stop using the elbow criterion for k-means" argues against the common practice of using the elbow method to determine the optimal number of clusters (k) in k-means clustering. The authors demonstrate that the elbow method is unreliable, often identifying spurious elbows or missing genuine ones. They show this through theoretical analysis and empirical examples across various datasets and distance metrics, revealing how the within-cluster sum of squares (WCSS) curve, on which the elbow method relies, can behave unexpectedly. The paper advocates for abandoning the elbow method entirely in favor of more robust and theoretically grounded alternatives like the gap statistic, silhouette analysis, or information criteria, which offer statistically sound approaches to k selection.

    Summary of Comments ( 13 )
    https://news.ycombinator.com/item?id=43450550

    HN users discuss the problems with the elbow method for determining the optimal number of clusters in k-means, agreeing it's often unreliable and subjective. Several commenters suggest superior alternatives, such as the silhouette coefficient, gap statistic, and information criteria like AIC/BIC. Some highlight the importance of considering the practical context and the "business need" when choosing the number of clusters, rather than relying solely on statistical methods. Others point out that k-means itself may not be the best clustering algorithm for all datasets, recommending DBSCAN and hierarchical clustering as potentially better suited for certain situations, particularly those with non-spherical clusters. A few users mention the difficulty in visualizing high-dimensional data and interpreting the results of these metrics, emphasizing the iterative nature of cluster analysis.