skip to main content
10.1145/2487575.2487582acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Flexible and robust co-regularized multi-domain graph clustering

Published: 11 August 2013 Publication History

Abstract

Multi-view graph clustering aims to enhance clustering performance by integrating heterogeneous information collected in different domains. Each domain provides a different view of the data instances. Leveraging cross-domain information has been demonstrated an effective way to achieve better clustering results. Despite the previous success, existing multi-view graph clustering methods usually assume that different views are available for the same set of instances. Thus instances in different domains can be treated as having strict one-to-one relationship. In many real-life applications, however, data instances in one domain may correspond to multiple instances in another domain. Moreover, relationships between instances in different domains may be associated with weights based on prior (partial) knowledge. In this paper, we propose a flexible and robust framework, CGC (Co-regularized Graph Clustering), based on non-negative matrix factorization (NMF), to tackle these challenges. CGC has several advantages over the existing methods. First, it supports many-to-many cross-domain instance relationship. Second, it incorporates weight on cross-domain relationship. Third, it allows partial cross-domain mapping so that graphs in different domains may have different sizes. Finally, it provides users with the extent to which the cross-domain instance relationship violates the in-domain clustering structure, and thus enables users to re-evaluate the consistency of the relationship. Extensive experimental results on UCI benchmark data sets, newsgroup data sets and biological interaction networks demonstrate the effectiveness of our approach.

References

[1]
A. Asuncion and D. Newman. Uci machine learning repository. 2007.
[2]
S. Asur, D. Ucar, and S. Parthasarathy. An ensemble framework for clustering protein-protein interaction networks. In Bioinformatics, pages 29--40, 2007.
[3]
S. Bickel and T. Scheffer. Multi-view clustering. In ICDM, pages 19--26, 2004.
[4]
S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
[5]
K. Chaudhuri, S. M. Kakade, K. Livescu, and K. Sridharan. Multi-view clustering via canonical correlation analysis. In ICML, pages 129--136, 2009.
[6]
W. Cheng, X. Zhang, Y. Wu, X. Yin, J. Li, D. Heckerman, and W. Wang. Inferring novel associations between snp sets and gene sets in eqtl study using sparse graphical model. In BCB'12, pages 466--473, 2012.
[7]
H. J. Cordell. Detecting gene-gene interactions that underlie human diseases. Nat. Rev. Genet., 10:392--404, 2009.
[8]
W. Dai, Q. Yang, G.-R. Xue, and Y. Yu. Self-taught clustering. In ICML, pages 200--207, 2008.
[9]
C. Ding, T. Li, W. Peng, and H. Park. Orthogonal nonnegative matrix t-factorizations for clustering. In KDD, pages 126--135, 2006.
[10]
T. Feng and X. Zhu. Genome-wide searching of rare genetic variants in WTCCC data. Hum. Genet., 128:269--280, 2010.
[11]
D. Fenyo, editor. Computational Biology. Methods in Molecular Biology. 2010.
[12]
X. Z. Fern and C. E. Brodley. Solving cluster ensemble problems by bipartite graph partitioning. In ICML, pages 36--45, 2004.
[13]
J. Gao, F. Liang, W. Fan, Y. Sun, and J. Han. Graph-based consensus maximization among multiple supervised and unsupervised models. In NIPS, pages 585--593, 2009.
[14]
Q. Gu, Z. Li, and J. Han. Learning a kernel for multi-task clustering. In AAAI, 2011.
[15]
S. Horvath and J. Dong. Geometric interpretation of gene coexpression network analysis. PLoS Computational Biology, 4, 2008.
[16]
J. S. Hub and B. L. de Groot. Detection of functional modes in protein dynamics. PLoS Computational Biology, 2009.
[17]
D. Kuang, H. Park, and C. H. Q. Ding. Symmetric nonnegative matrix factorization for graph clustering. In SDM, pages 106--117, 2012.
[18]
A. Kumar and H. D. III. A co-training approach for multi-view spectral clustering. In ICML, pages 393--400, 2011.
[19]
A. Kumar, P. Rai, and H. D. III. Co-regularized multi-view spectral clustering. In NIPS, pages 1413--1421, 2011.
[20]
D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. In NIPS, pages 556--562, 2000.
[21]
J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. M. VanBriesen, and N. S. Glance. Cost-effective outbreak detection in networks. In KDD, pages 420--429, 2007.
[22]
B. Long, P. S. Yu, and Z. M. Zhang. A general model for multiple view unsupervised learning. In SDM, pages 822--833, 2008.
[23]
V. K. Mootha, C. M. Lindgren, K. F. Eriksson, A. Subramanian, S. Sihag, J. Lehar, P. Puigserver, E. Carlsson, M. Ridderstrale, E. Laurila, N. Houstis, M. J. Daly, N. Patterson, J. P. Mesirov, T. R. Golub, P. Tamayo, B. Spiegelman, E. S. Lander, J. N. Hirschhorn, D. Altshuler, and L. C. Groop. PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat. Genet., 34(3):267--273, 2003.
[24]
A. Y. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In NIPS, pages 849--856, 2001.
[25]
J. Shi and J. Malik. Normalized cuts and image segmentation. CVPR, pages 888--905, 1997.
[26]
H. Späth. Cluster Dissection and Analysis. Theory, FORTRAN programs, examples. Ellis Horwood, 1985.
[27]
A. Strehl, J. Ghosh, and C. Cardie. Cluster ensembles - a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3:583--617, 2002.
[28]
Y. Sun and J. Han. Mining Heterogeneous Information Networks: Principles and Methodologies. 2012.
[29]
L. Tang, X. Wang, and H. Liu. Community detection via heterogeneous interaction analysis. Data Min. Knowl. Discov, 25:1--33, 2012.
[30]
W. Tang, Z. Lu, and I. S. Dhillon. Clustering with multiple graphs. In ICDM, pages 1016--1021, 2009.
[31]
The Gene Ontology Consortium. Gene ontology: tool for the unification of biology. Nature Genetics, 25(1):25--29, 2000.
[32]
S. van Dongen. A cluster algorithm for graphs. In Centrum voor Wiskunde en Informatica (CWI), page 40. 2000.
[33]
X. Wang and I. Davidson. Flexible constrained spectral clustering. In KDD, pages 563--572, 2010.
[34]
P. H. Westfall and S. S. Young. Resampling-based Multiple Testing. Wiley, New York, 1993.
[35]
W. Xu, X. Liu, and Y. Gong. Document clustering based on non-negative matrix factorization. In SIGIR, Clustering, pages 267--273, 2003.
[36]
X. Zhang, S. Huang, F. Zou, and W. Wang. TEAM: Efficient two-locus epistasis tests in human genome-wide association study. Bioinformatics, 26(12):i217--227, 2010.
[37]
D. Zhou and C. J. C. Burges. Spectral clustering and transductive learning with multiple views. In ICML, pages 1159--1166, 2007.

Cited By

View all

Index Terms

  1. Flexible and robust co-regularized multi-domain graph clustering

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2013
    1534 pages
    ISBN:9781450321747
    DOI:10.1145/2487575
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 August 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. co-regularization
    2. graph clustering
    3. nonnegative matrix factorization

    Qualifiers

    • Research-article

    Conference

    KDD' 13
    Sponsor:

    Acceptance Rates

    KDD '13 Paper Acceptance Rate 125 of 726 submissions, 17%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)12
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 06 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media