Article

Selecting diversifying heuristics for cluster ensembles

Authors:

Stefan T. Hadjitodorov,

Ludmila I. KunchevaAuthors Info & Claims

MCS'07: Proceedings of the 7th international conference on Multiple classifier systems

Pages 200 - 209

Published: 23 May 2007 Publication History

Abstract

Cluster ensembles are deemed to be better than single clustering algorithms for discovering complex or noisy structures in data. Various heuristics for constructing such ensembles have been examined in the literature, e.g., random feature selection, weak clusterers, random projections, etc. Typically, one heuristic is picked at a time to construct the ensemble. To increase diversity of the ensemble, several heuristics may be applied together. However, not any combination may be beneficial. Here we apply a standard genetic algorithm (GA) to select from 7 standard heuristics for k-means cluster ensembles. The ensemble size is also encoded in the chromosome. In this way the data is forced to guide the selection of heuristics as well as the ensemble size. Eighteen moderate-size datasets were used: 4 artificial and 14 real. The results resonate with our previous findings in that high diversity is not necessarily a prerequisite for high accuracy of the ensemble. No particular combination of heuristics appeared to be consistently chosen across all datasets, which justifies the existing variety of cluster ensembles. Among the most often selected heuristics were random feature extraction, random feature selection and random number of clusters assigned for each ensemble member. Based on the experiments, we recommend that the current practice of using one or two heuristics for building k-means cluster ensembles should be revised in favour of using 3-5 heuristics.

References

[1]

H. Ayad, O. Basir, and M. Kamel. A probabilistic model using information theoretic measures for cluster ensembles. In Proc. 5th International Workshop on Multiple Classifier Systems, MCS04, pages 144-153, Cagliari, Italy, 2004.

[2]

C. L. Blake and C. J. Merz. UCI repository of machine learning databases, 1998. http://www.ics.uci.edu/~mlearn/MLRepository.html.

[3]

X. Z. Fern and C. E. Brodley. Solving cluster ensemble problems by bipartite graph partitioning. In Proc. 21th International Conference on Machine Learning, ICML, Banff, Canada, 2004.

Digital Library

[4]

A. Fred. Finding consistent clusters in data partitions. In F. Roli and J. Kittler, editors, Proc. 2nd International Workshop on Multiple Classifier Systems, MCS'01, volume 2096 of Lecture Notes in Computer Science, pages 309-318, Cambridge, UK, 2001. Springer-Verlag.

Digital Library

[5]

A. N. L. Fred and A. K. Jain. Combining multiple clusterungs using evidence accumulation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(6):835-850, 2005.

Digital Library

[6]

J. Ghosh. Multiclassifier systems: Back to the future. In F. Roli and J. Kittler, editors, Proc. 3d International Workshop on Multiple Classifier Systems, MCS'02, volume 2364 of Lecture Notes in Computer Science, pages 1-15, Cagliari, Italy, 2002. Springer-Verlag.

Digital Library

[7]

D. Goldberg. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, NY, 1989.

Digital Library

[8]

D. Greene, A. Tsymbal, N. Bolshakova, and P. Cunningham. Ensemble clustering in medical diagnostics. Technical Report TCD-CS-2004-12, Department of Computer Science, Trinity College, Dublin, Ireland, 2004.

[9]

L. I. Kuncheva, S. T. Hadjitodorov, and L. P. Todorova. Experimental comparison of cluster ensemble methods. In Proc. FUSION, Florence, Italy, 2006.

[10]

B. Minaei, A. Topchy, and W. Punch. Ensembles of partitions via data resampling. In Proceedings of the International Conference on Information Technology: Coding and Computing, ITCC04, Las Vegas, 2004.

Digital Library

[11]

S. Monti, P. Tamayo, J. Mesirov, and T. Golub. Consensus clustering: A resampling based method for class discovery and visualization of gene expression microarray data. Machine Learning, 52:91-118, 2003.

Digital Library

[12]

B. D. Ripley. Pattern Recognition and Neural Networks. University Press, Cambridge, 1996.

Digital Library

[13]

A. Strehl and J. Ghosh. Cluster ensembles - A knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3:583-618, 2002.

Digital Library

[14]

A. Topchy, B. Minaei, A. K. Jain, and W. Punch. Adaptive clustering ensembles. In Proceedings of ICPR, 2004, Cambridge, UK, 2004.

Digital Library

[15]

A. Weingessel, E. Dimitriadou, and K. Hornik. An ensemble method for clustering, 2003. Working paper, http://www.ci.tuwien.ac.at/Conferences/DSC-2003/.

Digital Library

Cited By

Zimek ACampello RSander J(2014)Ensembles for unsupervised outlier detectionACM SIGKDD Explorations Newsletter10.1145/2594473.259447615:1(11-22)Online publication date: 17-Mar-2014
https://dl.acm.org/doi/10.1145/2594473.2594476
Chen ZKalashnikov DMehrotra SÇetintemel UZdonik SKossmann D(2009)Exploiting context analysis for combining multiple entity resolution systemsProceedings of the 2009 ACM SIGMOD International Conference on Management of data10.1145/1559845.1559869(207-218)Online publication date: 29-Jun-2009
https://dl.acm.org/doi/10.1145/1559845.1559869
Nascimento MDe Toledo FCarvalho A(2008)Consensus clustering using spectral theoryProceedings of the 15th international conference on Advances in neuro-information processing - Volume Part I10.5555/1813488.1813549(461-468)Online publication date: 25-Nov-2008
https://dl.acm.org/doi/10.5555/1813488.1813549

Selecting diversifying heuristics for cluster ensembles
1. Computing methodologies
  1. Artificial intelligence
    1. Search methodologies
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
      2. Unsupervised learning

Recommendations

Moderate diversity for better cluster ensembles

Adjusted Rand index is used to measure diversity in cluster ensembles and a diversity measure is subsequently proposed. Although the measure was found to be related to the quality of the ensemble, this relationship appeared to be non-monotonic. In some ...
Bayesian cluster ensembles

Cluster ensembles provide a framework for combining multiple base clusterings of a dataset to generate a stable and robust consensus clustering. There are important variants of the basic cluster ensemble problem, notably including cluster ensembles with ...
Cluster ensembles in collaborative filtering recommendation

Recommender systems, which recommend items of information that are likely to be of interest to the users, and filter out less favored data items, have been developed. Collaborative filtering is a widely used recommendation technique. It is based on the ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

MCS'07: Proceedings of the 7th international conference on Multiple classifier systems

May 2007

524 pages

ISBN:9783540724810

Editors:
Michal Haindl
Academy of Sciences CR, Institute of Information Theory and Automation, Prague, Czech Republic
,
Josef Kittler
University of Surrey, Centre for Vision, Speech and Signal Processing, Guildford, Surrey, UK
,
Fabio Roli
University of Cagliari, Department of Electrical and Electronic Engineering, Cagliari, Italy

Sponsors

EU IST FP6 BioSecure Network of Excellence
EU IST FP6 MUSCLE Network of Excellence
IAPR: International Association for Pattern Recognition
University of Surrey
University of Cagliari: University of Cagliari

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 23 May 2007

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zimek ACampello RSander J(2014)Ensembles for unsupervised outlier detectionACM SIGKDD Explorations Newsletter10.1145/2594473.259447615:1(11-22)Online publication date: 17-Mar-2014
https://dl.acm.org/doi/10.1145/2594473.2594476
Chen ZKalashnikov DMehrotra SÇetintemel UZdonik SKossmann D(2009)Exploiting context analysis for combining multiple entity resolution systemsProceedings of the 2009 ACM SIGMOD International Conference on Management of data10.1145/1559845.1559869(207-218)Online publication date: 29-Jun-2009
https://dl.acm.org/doi/10.1145/1559845.1559869
Nascimento MDe Toledo FCarvalho A(2008)Consensus clustering using spectral theoryProceedings of the 15th international conference on Advances in neuro-information processing - Volume Part I10.5555/1813488.1813549(461-468)Online publication date: 25-Nov-2008
https://dl.acm.org/doi/10.5555/1813488.1813549

View Options

View options

Media

Figures

Other

Tables

View Table of Contents