Google Scholar

A critical note on the evaluation of clustering algorithms

T Zhang, L Zhong, B Yuan - arXiv preprint arXiv:1908.03782, 2019 - arxiv.org

arXiv preprint arXiv:1908.03782, 2019•arxiv.org

Experimental evaluation is a major research methodology for investigating clustering algorithms and many other machine learning algorithms. For this purpose, a number of benchmark datasets have been widely used in the literature and their quality plays a key role on the value of the research work. However, in most of the existing studies, little attention has been paid to the properties of the datasets and they are often regarded as black-box problems. For example, it is common to use datasets intended for classification in clustering research and assume class la-bels as the ground truth for judging the quality of cluster-ing. In our work, with the help of advanced visualization and dimension reduction techniques, we show that this practice may seriously compromise the research quality and produce misleading results. We suggest that the applicability of existing benchmark datasets should be carefully revisited and significant efforts need to be devoted to improving the current practice of experimental evaluation of clustering algorithms to ensure an essential match between algorithms and problems.

arxiv.org

Show moreShow less

Save Cite Cited by 5 Related articles All 2 versions View as HTML

Cite

Advanced search

Saved to My library

A critical note on the evaluation of clustering algorithms