skip to main content
10.1145/1516360.1516426acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article
Free access

RankClus: integrating clustering with ranking for heterogeneous information network analysis

Published: 24 March 2009 Publication History

Abstract

As information networks become ubiquitous, extracting knowledge from information networks has become an important task. Both ranking and clustering can provide overall views on information network data, and each has been a hot topic by itself. However, ranking objects globally without considering which clusters they belong to often leads to dumb results, e.g., ranking database and computer architecture conferences together may not make much sense. Similarly, clustering a huge number of objects (e.g., thousands of authors) in one huge cluster without distinction is dull as well.
In this paper, we address the problem of generating clusters for a specified type of objects, as well as ranking information for all types of objects based on these clusters in a multi-typed (i.e., heterogeneous) information network. A novel clustering framework called RankClus is proposed that directly generates clusters integrated with ranking. Based on initial K clusters, ranking is applied separately, which serves as a good measure for each cluster. Then, we use a mixture model to decompose each object into a K-dimensional vector, where each dimension is a component coefficient with respect to a cluster, which is measured by rank distribution. Objects then are reassigned to the nearest cluster under the new measure space to improve clustering. As a result, quality of clustering and ranking are mutually enhanced, which means that the clusters are getting more accurate and the ranking is getting more meaningful. Such a progressive refinement process iterates until little change can be made. Our experiment results show that RankClus can generate more accurate clusters and in a more efficient way than the state-of-the-art link-based clustering methods. Moreover, the clustering results with ranks can provide more informative views of data compared with traditional clustering.

References

[1]
J. Bilmes. A gentle tutorial on the em algorithm and its application to parameter estimation for gaussian mixture and hidden markov models, 1997.
[2]
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst., 30(1--7):107--117, 1998.
[3]
D. R. Cutting, D. R. Karger, J. O. Pedersen, and J. W. Tukey. Scatter/gather: a cluster-based approach to browsing large document collections. pages 318--329, 1992.
[4]
DBLP. The dblp computer science bibliography. http://www.informatik.uni-trier.de/ley/db/.
[5]
J. E. Gentle and W. HSrdle. Handbook of Computational Statistics: Concepts and Methods, chapter 7 Evaluation of Eigenvalues, pages 245--247. Springer, 1 edition, 2004.
[6]
C. L. Giles. The future of citeseer. In 10th European Conference on PKDD (PKDD'06), page 2, 2006.
[7]
Z. Gyöngyi, H. Garcia-Molina, and J. Pedersen. Combating web spam with trustrank. In Proceedings of the Thirtieth international conference on Very large data bases (VLDB'04), pages 576--587. VLDB Endowment, 2004.
[8]
J. E. Hirsch. An index to quantify an individual's scientific research output. Proceedings of the National Academy of Sciences, 102:16569, 2005.
[9]
G. Jeh and J. Widom. SimRank: a measure of structural-context similarity. In Proceedings of the eighth ACM SIGKDD conference (KDD'02), pages 538--543. ACM, 2002.
[10]
W. Jiang, J. Vaidya, Z. Balaporia, C. Clifton, and B. Banich. Knowledge discovery from transportation network data. In Proceedings of the 21st ICDE Conference (ICDE'05), pages 1061--1072, 2005.
[11]
J. M. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM, 46(5):604--632, 1999.
[12]
U. Luxburg. A tutorial on spectral clustering. Statistics and Computing, 17(4):395--416, 2007.
[13]
Z. Nie, Y. Zhang, J.-R. Wen, and W.-Y. Ma. Object-level ranking: Bringing order to web objects. In Proceedings of the fourteenth International World Wide Web Conference (WWW'05), pages 567--574. ACM, May 2005.
[14]
S. Roy, T. Lane, and M. Werner-Washburne. Integrative construction and analysis of condition-specific biological networks. In Proceedings of AAAI'07, pages 1898--1899, 2007.
[15]
J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):888--905, 2000.
[16]
A. Sidiropoulos, D. Katsaros, and Y. Manolopoulos. Generalized h-index for disclosing latent facts in citation networks. CoRR, abs/cs/0607066, 2006.
[17]
X. Yin, J. Han, and P. S. Yu. Linkclus: Efficient clustering via heterogeneous semantic links. In Proceedings of the 32nd VLDB conference (VLDB'06), pages 427--438, 2006.
[18]
O. Zamir and O. Etzioni. Grouper: A dynamic clustering interface to web search results. pages 1361--1374, 1999.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
EDBT '09: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
March 2009
1180 pages
ISBN:9781605584225
DOI:10.1145/1516360
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 March 2009

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

EDBT/ICDT '09
EDBT/ICDT '09: EDBT/ICDT '09 joint conference
March 24 - 26, 2009
Saint Petersburg, Russia

Acceptance Rates

Overall Acceptance Rate 7 of 10 submissions, 70%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)376
  • Downloads (Last 6 weeks)45
Reflects downloads up to 22 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media