skip to main content
10.5555/1761171.1761196guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Selecting diversifying heuristics for cluster ensembles

Published: 23 May 2007 Publication History

Abstract

Cluster ensembles are deemed to be better than single clustering algorithms for discovering complex or noisy structures in data. Various heuristics for constructing such ensembles have been examined in the literature, e.g., random feature selection, weak clusterers, random projections, etc. Typically, one heuristic is picked at a time to construct the ensemble. To increase diversity of the ensemble, several heuristics may be applied together. However, not any combination may be beneficial. Here we apply a standard genetic algorithm (GA) to select from 7 standard heuristics for k-means cluster ensembles. The ensemble size is also encoded in the chromosome. In this way the data is forced to guide the selection of heuristics as well as the ensemble size. Eighteen moderate-size datasets were used: 4 artificial and 14 real. The results resonate with our previous findings in that high diversity is not necessarily a prerequisite for high accuracy of the ensemble. No particular combination of heuristics appeared to be consistently chosen across all datasets, which justifies the existing variety of cluster ensembles. Among the most often selected heuristics were random feature extraction, random feature selection and random number of clusters assigned for each ensemble member. Based on the experiments, we recommend that the current practice of using one or two heuristics for building k-means cluster ensembles should be revised in favour of using 3-5 heuristics.

References

[1]
H. Ayad, O. Basir, and M. Kamel. A probabilistic model using information theoretic measures for cluster ensembles. In Proc. 5th International Workshop on Multiple Classifier Systems, MCS04, pages 144-153, Cagliari, Italy, 2004.
[2]
C. L. Blake and C. J. Merz. UCI repository of machine learning databases, 1998. http://www.ics.uci.edu/~mlearn/MLRepository.html.
[3]
X. Z. Fern and C. E. Brodley. Solving cluster ensemble problems by bipartite graph partitioning. In Proc. 21th International Conference on Machine Learning, ICML, Banff, Canada, 2004.
[4]
A. Fred. Finding consistent clusters in data partitions. In F. Roli and J. Kittler, editors, Proc. 2nd International Workshop on Multiple Classifier Systems, MCS'01, volume 2096 of Lecture Notes in Computer Science, pages 309-318, Cambridge, UK, 2001. Springer-Verlag.
[5]
A. N. L. Fred and A. K. Jain. Combining multiple clusterungs using evidence accumulation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(6):835-850, 2005.
[6]
J. Ghosh. Multiclassifier systems: Back to the future. In F. Roli and J. Kittler, editors, Proc. 3d International Workshop on Multiple Classifier Systems, MCS'02, volume 2364 of Lecture Notes in Computer Science, pages 1-15, Cagliari, Italy, 2002. Springer-Verlag.
[7]
D. Goldberg. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, NY, 1989.
[8]
D. Greene, A. Tsymbal, N. Bolshakova, and P. Cunningham. Ensemble clustering in medical diagnostics. Technical Report TCD-CS-2004-12, Department of Computer Science, Trinity College, Dublin, Ireland, 2004.
[9]
L. I. Kuncheva, S. T. Hadjitodorov, and L. P. Todorova. Experimental comparison of cluster ensemble methods. In Proc. FUSION, Florence, Italy, 2006.
[10]
B. Minaei, A. Topchy, and W. Punch. Ensembles of partitions via data resampling. In Proceedings of the International Conference on Information Technology: Coding and Computing, ITCC04, Las Vegas, 2004.
[11]
S. Monti, P. Tamayo, J. Mesirov, and T. Golub. Consensus clustering: A resampling based method for class discovery and visualization of gene expression microarray data. Machine Learning, 52:91-118, 2003.
[12]
B. D. Ripley. Pattern Recognition and Neural Networks. University Press, Cambridge, 1996.
[13]
A. Strehl and J. Ghosh. Cluster ensembles - A knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3:583-618, 2002.
[14]
A. Topchy, B. Minaei, A. K. Jain, and W. Punch. Adaptive clustering ensembles. In Proceedings of ICPR, 2004, Cambridge, UK, 2004.
[15]
A. Weingessel, E. Dimitriadou, and K. Hornik. An ensemble method for clustering, 2003. Working paper, http://www.ci.tuwien.ac.at/Conferences/DSC-2003/.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
MCS'07: Proceedings of the 7th international conference on Multiple classifier systems
May 2007
524 pages
ISBN:9783540724810
  • Editors:
  • Michal Haindl,
  • Josef Kittler,
  • Fabio Roli

Sponsors

  • EU IST FP6 BioSecure Network of Excellence
  • EU IST FP6 MUSCLE Network of Excellence
  • IAPR: International Association for Pattern Recognition
  • University of Surrey
  • University of Cagliari: University of Cagliari

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 23 May 2007

Author Tags

  1. cluster ensembles
  2. diversifying heuristics
  3. genetic algorithms
  4. multiple classifier systems
  5. pattern recognition

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media