skip to main content
10.1145/3208788.3208789acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmaiConference Proceedingsconference-collections
research-article

A user-satisfaction-based clustering method

Published: 20 April 2018 Publication History

Abstract

Clustering is a common method for data analysis where a good clustering helps users to better understand the data. As for clustering quality measurement, the mainly used are some objective measures, while some researchers also paid attention to users' goals and they proposed methods to get users involved in clustering. However, a good clustering must meet the satisfaction of the users. Apart from these objective measures and users' goals, whether the clustering is easy to understand is also important for clustering quality measurement, especially in high-dimensional data clustering, if the data points in the final clusters are with high dimensions, it will hinder users' understanding of the clustering results. With all these concerns considered, we proposed an index of users' satisfaction with high-dimensional data clustering. According to this index, we further put forward a user-satisfaction-based clustering method to better serve users' satisfaction. We first developed an optimization model about users' satisfaction, then we used genetic algorithm to solve this model and obtained some high-quality clusterings, after reclustering of the clusterings obtained in previous steps, a few representative high-quality clusterings are provided for users to select. The experiment results suggest that our method is effective to provide some representative clusterings with the clustering quality, users' goals and the interpretability of clustering results being well considered.

References

[1]
Balcan, M. F., Blum, A., and Vempala, S. A discriminative framework for clustering via similarity functions. ACM Symposium on Theory of Computing, Victoria, British Columbia, Canada, May, 2008, 671--680.
[2]
Basu, S., Banerjee, A., and Mooney, R. J. 2004. Active Semi-Supervision for Pairwise Constrained Clustering, siam international conference on data mining, (2004), 333--344.
[3]
Balcan, M. F., and Blum, A. Clustering with Interactive Feedback. International Conference on Algorithmic Learning Theory, 2008, 316--328.
[4]
Wagstaff, K., Cardie, C., Rogers, S., et al. Constrained K-means Clustering with Background Knowledge. Eighteenth International Conference on Machine Learning, 2001, 577--584.
[5]
Andrzejewski, D., Zhu, X., and Craven, M. Incorporating Domain Knowledge into Topic Modeling via Dirichlet Forest Priors. International Conference on Machine Learning, 2009, 25.
[6]
Jagarlamudi, J., Hal Daumé, I., and Udupa, R. Incorporating lexical priors into topic models. Conference of the European Chapter of the Association for Computational Linguistics, 2009, 204--213.
[7]
Bekkerman, R., Raghavan, H., Allan, J., et al. Interactive clustering of text collections according to a user-specified criterion. International Joint Conference on Artifical
[8]
Hu, Y., Boyd-Graber, J., and Satinoff, B. 2014. Interactive topic modeling. Machine Learning, 95, 3 (2014), 423--469.
[9]
Pleple, Q. 2013. Interactive Topic Modeling. Dissertations & Theses - Gradworks, (2013).
[10]
Cutting, D. R., Karger, D. R., Pedersen, J. O., et al. Scatter/Gather: a cluster-based approach to browsing large document collections. International Acm Sigir Conference on Research & Development in Information Retrieval, 1992, 318--329.
[11]
Dasgupta, Ng, and Vincent. 2010. Which Clustering Do You Want? Inducing Your Ideal Clustering with Minimal Feedback. Journal of Artificial Intelligence Research, 39, 1 (2010), 581--632.
[12]
Caruana, R., Elhawary, M., Nguyen, N., et al. Meta Clustering{C}. International Conference on Data Mining, 2006, 107--118.
[13]
Gondek, D. Non-redundant data clustering. IEEE International Conference on Data Mining, 2004, 75--82. Intelligence, 2007, 684--689.
[14]
Xuan, H. D., and Bailey, J. Generation of Alternative Clusterings Using the CAMI Approach. Siam International Conference on Data Mining, SDM 2010, April 29 - May 1, 2010, Columbus, Ohio, Usa, 2011, 118--129.
[15]
Jain, P., Meka, R., and Dhillon, I. S.2008. Simultaneous Unsupervised Learning of Disparate Clusterings. Statistical Analysis & Data Mining, 1,3 (2008), 195--210.
[16]
Cui, Y., Fern, X. Z., and Dy, J. G. Learning multiple nonredundant clusterings. ACM Transactions on Knowledge Discovery From Data, 4,3 (2010).
[17]
Srivastava, A., Zou, J., Adams, R. P., et al. 2016. Clustering with a Reject Option: Interactive Clustering as Bayesian Prior Elicitation. arXiv preprint arXiv:1602.06886, (2016).
[18]
Rand, W. 1971. Objective Criteria for the Evaluation of Clustering Methods. Journal of the American Statistical Association, 66, 336 (1971), 846--850.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICMAI '18: Proceedings of 2018 International Conference on Mathematics and Artificial Intelligence
April 2018
95 pages
ISBN:9781450364201
DOI:10.1145/3208788
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 April 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. clustering
  2. clustering measures
  3. feature selection
  4. genetic algorithm

Qualifiers

  • Research-article

Conference

ICMAI '18

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media