Further heuristics for $k$-means: The merge-and-split heuristic and the $(k,l)$-means

Nielsen, Frank; Nock, Richard

Computer Science > Machine Learning

arXiv:1406.6314 (cs)

[Submitted on 23 Jun 2014]

Title:Further heuristics for $k$-means: The merge-and-split heuristic and the $(k,l)$-means

Authors:Frank Nielsen, Richard Nock

View PDF

Abstract:Finding the optimal $k$-means clustering is NP-hard in general and many heuristics have been designed for minimizing monotonically the $k$-means objective. We first show how to extend Lloyd's batched relocation heuristic and Hartigan's single-point relocation heuristic to take into account empty-cluster and single-point cluster events, respectively. Those events tend to increasingly occur when $k$ or $d$ increases, or when performing several restarts. First, we show that those special events are a blessing because they allow to partially re-seed some cluster centers while further minimizing the $k$-means objective function. Second, we describe a novel heuristic, merge-and-split $k$-means, that consists in merging two clusters and splitting this merged cluster again with two new centers provided it improves the $k$-means objective. This novel heuristic can improve Hartigan's $k$-means when it has converged to a local minimum. We show empirically that this merge-and-split $k$-means improves over the Hartigan's heuristic which is the {\em de facto} method of choice. Finally, we propose the $(k,l)$-means objective that generalizes the $k$-means objective by associating the data points to their $l$ closest cluster centers, and show how to either directly convert or iteratively relax the $(k,l)$-means into a $k$-means in order to reach better local minima.

Comments:	14 pages
Subjects:	Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Machine Learning (stat.ML)
Cite as:	arXiv:1406.6314 [cs.LG]
	(or arXiv:1406.6314v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1406.6314

Submission history

From: Frank Nielsen [view email]
[v1] Mon, 23 Jun 2014 02:34:34 UTC (52 KB)

Computer Science > Machine Learning

Title:Further heuristics for $k$-means: The merge-and-split heuristic and the $(k,l)$-means

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Further heuristics for $k$-means: The merge-and-split heuristic and the $(k,l)$-means

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators