article

Free access

A Generalized Maximum Entropy Approach to Bregman Co-clustering and Matrix Approximation

Authors:

Dharmendra S. ModhaAuthors Info & Claims

The Journal of Machine Learning Research, Volume 8

Pages 1919 - 1986

Published: 01 December 2007 Publication History

PDF eReader

Abstract

Co-clustering, or simultaneous clustering of rows and columns of a two-dimensional data matrix, is rapidly becoming a powerful data analysis technique. Co-clustering has enjoyed wide success in varied application domains such as text clustering, gene-microarray analysis, natural language processing and image, speech and video analysis. In this paper, we introduce a partitional co-clustering formulation that is driven by the search for a good matrix approximation---every co-clustering is associated with an approximation of the original data matrix and the quality of co-clustering is determined by the approximation error. We allow the approximation error to be measured using a large class of loss functions called Bregman divergences that include squared Euclidean distance and KL-divergence as special cases. In addition, we permit multiple structurally different co-clustering schemes that preserve various linear statistics of the original data matrix. To accomplish the above tasks, we introduce a new minimum Bregman information (MBI) principle that simultaneously generalizes the maximum entropy and standard least squares principles, and leads to a matrix approximation that is optimal among all generalized additive models in a certain natural parameter space. Analysis based on this principle yields an elegant meta algorithm, special cases of which include most previously known alternate minimization based clustering algorithms such as kmeans and co-clustering algorithms such as information theoretic (Dhillon et al., 2003b) and minimum sum-squared residue co-clustering (Cho et al., 2004). To demonstrate the generality and flexibility of our co-clustering framework, we provide examples and empirical evidence on a variety of problem domains and also describe novel co-clustering applications such as missing value prediction and compression of categorical data matrices.

Cited By

View all

Battaglia EPeiretti FPensa R(2024)Co-clustering: A Survey of the Main Methods, Recent Trends, and Open ProblemsACM Computing Surveys10.1145/369887557:2(1-33)Online publication date: 4-Oct-2024
https://dl.acm.org/doi/10.1145/3698875
G. Silva MC. Madeira SHenriques R(2024)A Comprehensive Survey on Biclustering-based Collaborative FilteringACM Computing Surveys10.1145/367472356:12(1-32)Online publication date: 22-Jun-2024
https://dl.acm.org/doi/10.1145/3674723
Battaglia EPeiretti FPensa R(2024)Fast parameterless prototype-based co-clusteringMachine Language10.1007/s10994-023-06474-y113:4(2153-2181)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1007/s10994-023-06474-y
Show More Cited By

Index Terms

A Generalized Maximum Entropy Approach to Bregman Co-clustering and Matrix Approximation

Recommendations

A generalized maximum entropy approach to bregman co-clustering and matrix approximation
KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining

Co-clustering is a powerful data mining technique with varied applications such as text clustering, microarray analysis and recommender systems. Recently, an information-theoretic co-clustering approach applicable to empirical joint probability ...
Smart Multitask Bregman Clustering and Multitask Kernel Clustering

Traditional clustering algorithms deal with a single clustering task on a single dataset. However, there are many related tasks in the real world, which motivates multitask clustering. Recently some multitask clustering algorithms have been proposed, ...
Multitask fuzzy Bregman co-clustering approach for clustering data with multisource features

In usual real-world clustering problems, the set of features extracted from the data has two problems which prevent the methods from accurate clustering. First, the features extracted from the samples provide poor information for clustering purpose. ...

Comments

Information & Contributors

Information

Published In

cover image The Journal of Machine Learning Research

The Journal of Machine Learning Research Volume 8, Issue

12/1/2007

2736 pages

ISSN:1532-4435

EISSN:1533-7928

Issue’s Table of Contents

Publisher

JMLR.org

Publication History

Published: 01 December 2007

Published in JMLR Volume 8

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

64
Total Citations
View Citations
243
Total Downloads

Downloads (Last 12 months)34
Downloads (Last 6 weeks)13

Reflects downloads up to 06 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Battaglia EPeiretti FPensa R(2024)Co-clustering: A Survey of the Main Methods, Recent Trends, and Open ProblemsACM Computing Surveys10.1145/369887557:2(1-33)Online publication date: 4-Oct-2024
https://dl.acm.org/doi/10.1145/3698875
G. Silva MC. Madeira SHenriques R(2024)A Comprehensive Survey on Biclustering-based Collaborative FilteringACM Computing Surveys10.1145/367472356:12(1-32)Online publication date: 22-Jun-2024
https://dl.acm.org/doi/10.1145/3674723
Battaglia EPeiretti FPensa R(2024)Fast parameterless prototype-based co-clusteringMachine Language10.1007/s10994-023-06474-y113:4(2153-2181)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1007/s10994-023-06474-y
Chen WWang HLong ZLi T(2023)Fast Flexible Bipartite Graph Model for Co-ClusteringIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.319427535:7(6930-6940)Online publication date: 1-Jul-2023
https://dl.acm.org/doi/10.1109/TKDE.2022.3194275
Leibrandt RGünnemann S(2023)Generalized density attractor clustering for incomplete dataData Mining and Knowledge Discovery10.1007/s10618-022-00904-637:2(970-1009)Online publication date: 18-Jan-2023
https://dl.acm.org/doi/10.1007/s10618-022-00904-6
Ferragina PManzini GGagie TKöppl DNavarro GStriani MTosoni F(2022)Improving matrix-vector multiplication via lossless grammar-compressed matricesProceedings of the VLDB Endowment10.14778/3547305.354732115:10(2175-2187)Online publication date: 1-Jun-2022
https://dl.acm.org/doi/10.14778/3547305.3547321
Riverain PFossier SNadif M(2022)Poisson degree corrected dynamic stochastic block modelAdvances in Data Analysis and Classification10.1007/s11634-022-00492-917:1(135-162)Online publication date: 27-Feb-2022
https://dl.acm.org/doi/10.1007/s11634-022-00492-9
Gujral ENeves LPapalexakis EShah NDemartini GZuccon GCulpepper JHuang ZTong H(2021)NEDProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482455(627-637)Online publication date: 26-Oct-2021
https://dl.acm.org/doi/10.1145/3459637.3482455
Redko IVayer TFlamary RCourty NLarochelle HRanzato MHadsell RBalcan MLin H(2020)CO-optimal transportProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3497197(17559-17570)Online publication date: 6-Dec-2020
https://dl.acm.org/doi/10.5555/3495724.3497197
Salah AAilem MNadif MMcIlraith SWeinberger K(2018)Word co-occurrence regularized non-negative matrix tri-factorization for text data co-clusteringProceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence10.5555/3504035.3504524(3992-3999)Online publication date: 2-Feb-2018
https://dl.acm.org/doi/10.5555/3504035.3504524
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Abstract

Cited By

Index Terms

Recommendations