skip to main content
10.1145/956750.956764acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Information-theoretic co-clustering

Published: 24 August 2003 Publication History

Abstract

Two-dimensional contingency or co-occurrence tables arise frequently in important applications such as text, web-log and market-basket data analysis. A basic problem in contingency table analysis is co-clustering: simultaneous clustering of the rows and columns. A novel theoretical formulation views the contingency table as an empirical joint probability distribution of two discrete random variables and poses the co-clustering problem as an optimization problem in information theory---the optimal co-clustering maximizes the mutual information between the clustered random variables subject to constraints on the number of row and column clusters. We present an innovative co-clustering algorithm that monotonically increases the preserved mutual information by intertwining both the row and column clusterings at all stages. Using the practical example of simultaneous word-document clustering, we demonstrate that our algorithm works well in practice, especially in the presence of sparsity and high-dimensionality.

References

[1]
P. Bradley, U. Fayyad, and C. Reina. Scaling clustering algorithms to large databases. In KDD'03. AAAI Press, 1998.]]
[2]
Y. Cheng and G. Church. Biclustering of expression data. In Proceedings ISMB, pages 93-103. AAAI Press, 2000.]]
[3]
T. Cover and J. Thomas. Elements of Information Theory. John Wiley & Sons, New York, USA, 1991.]]
[4]
I. S. Dhillon. Co-clustering documents and words using bipartite spectral graph partitioning. In Proceedings of The 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD-2001), pages 269-274, 2001.]]
[5]
I. S. Dhillon, J. Fan, and Y. Guan. Efficient clustering of very large document collections. In R. Grossman, C. Kamath, P. Kegelmeyer, V. Kumar, and R. Namburu, editors, Data Mining for Scientific and Engineering Applications, pages 357-381. Kluwer Academic Publishers, 2001.]]
[6]
I. S. Dhillon, S. Mallela, and R. Kumar. A divisive information-theoretic feature clustering algorithm for text classification. Journal of Machine Learning Research(JMLR): Special Issue on Variable and Feature Selection, 3:1265-1287, March 2003.]]
[7]
I. S. Dhillon and D. S. Modha. Concept decompositions for large sparse text data using clustering. Machine Learning, 42(1):143-175, January 2001.]]
[8]
R. El-Yaniv and O. Souroujon. Iterative double clustering for unsupervised and semi-supervised learning. In ECML-01, pages 121-132, 2001.]]
[9]
R. A. Fisher. On the interpretation of x2 from the contingency tables, and the calculation of p. J. Royal Stat. Soc., 85:87-94, 1922.]]
[10]
N. Friedman, O. Mosenzon, N. Slonim, and N. Tishby. Multivariate information bottleneck. In UAI-2001, pages 152-161, 2001.]]
[11]
J. A. Hartigan. Direct clustering of a data matrix. Journal of the American Statistical Association, 67(337):123-129, March 1972.]]
[12]
T. Hofmann. Probabilistic latent semantic indexing. In Proc. ACM SIGIR. ACM Press, August 1999.]]
[13]
T. Hofmann and J. Puzicha. Latent class models for collaborative filtering. In Proceedings of the International Joint Conference in Artificial Intelligence (IJCAI), 1999.]]
[14]
A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs, NJ, 1988.]]
[15]
Ken Lang. News Weeder: Learning to filter netnews. In ICML-95, pages 331-339, 1995.]]
[16]
A. K. McCallum. Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering. www.cs.cmu.edu/ mccallum/bow, 1996.]]
[17]
B. Mirkin. Mathematical Classification and Clustering. Kluwer Academic Publishers, 1996.]]
[18]
N. Slonim, N. Friedman, and N. Tishby. Agglomerative multivariate information bottleneck. In NIPS-14, 2001.]]
[19]
N. Slonim and N. Tishby. Document clustering using word clusters via the information bottleneck method. In ACM SIGIR, pages 208-215, 2000.]]
[20]
N. Tishby, F. C. Pereira, and W. Bialek. The information bottleneck method. In Proc. of the 37-th Annual Allerton Conference on Communication, Control and Computing, pages 368-377, 1999.]]

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
August 2003
736 pages
ISBN:1581137370
DOI:10.1145/956750
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 August 2003

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. co-clustering
  2. information theory
  3. mutual information

Qualifiers

  • Article

Conference

KDD03
Sponsor:

Acceptance Rates

KDD '03 Paper Acceptance Rate 46 of 298 submissions, 15%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)111
  • Downloads (Last 6 weeks)9
Reflects downloads up to 14 Sep 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media