skip to main content
10.1145/1076034.1076087acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

PageRank without hyperlinks: structural re-ranking using links induced by language models

Published: 15 August 2005 Publication History

Abstract

Inspired by the PageRank and HITS (hubs and authorities) algorithms for Web search, we propose a structural re-ranking approach to ad hoc information retrieval: we reorder the documents in an initially retrieved set by exploiting asymmetric relationships between them. Specifically, we consider generation links, which indicate that the language model induced from one document assigns high probability to the text of another; in doing so, we take care to prevent bias against long documents. We study a number of re-ranking criteria based on measures of centrality in the graphs formed by generation links, and show that integrating centrality into standard language-model-based retrieval is quite effective at improving precision at top ranks.

References

[1]
Sergey Brin and Lawrence Page. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the 7th International World Wide Web Conference, pages 107--117, 1998.]]
[2]
W. Bruce Croft and John Lafferty, editors. Language Modeling for Information Retrieval. Number 13 in Information Retrieval Book Series. Kluwer, 2003.]]
[3]
Inderjit Dhillon. Co-clustering documents and words using bipartite spectral graph partitioning. In Proceedings of the Seventh ACM SIGKDD Conference, pages 269--274, 2001.]]
[4]
Güneş Erkan and Dragomir R. Radev. LexRank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research, 22:457--479, 2004.]]
[5]
Eugene Garfield. Citation analysis as a tool in journal evaluation. Science, 178:471--479, 1972.]]
[6]
Gene H. Golub and Charles F. Van Loan. Matrix Computations. The Johns Hopkins University Press, third edition, 1996.]]
[7]
Winfried K. Grassmann, Michael I. Taksar, and Daniel P. Heyman. Regenerative analysis and steady state distributions for Markov chains. Operations Research, 33(5):1107--1116, 1985.]]
[8]
Geoffrey R. Grimmett and David R. Stirzaker. Probability and Random Processes. Oxford Science Publications, third edition, 2001.]]
[9]
Vasileios Hatzivassiloglou and Kathleen McKeown. Predicting the semantic orientation of adjectives. In Proceedings of the 35th ACL/8th EACL, pages 174--181, 1997.]]
[10]
Marti A. Hearst and Jan O. Pedersen. Reexamining the cluster hypothesis: Scatter/Gather on retrieval results. In Proceedings of SIGIR, 1996.]]
[11]
Djoerd Hiemstra and Wessel Kraaij. Twenty-One at TREC7: Ad hoc and cross-language track. In Proceedings of the Seventh Text Retrieval Conference (TREC-7), pages 227--238, 1999.]]
[12]
Thorsten Joachims. Transductive learning via spectral graph partitioning. In Proceedings of ICML, 2003.]]
[13]
Jon Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46:604--632, 1999.]]
[14]
Wessel Kraaij and Thijs Westerveld. TNO-UT at TREC9: How different are web documents? In Proceedings of the Ninth Text Retrieval Conference (TREC-9), pages 665--671, 2001.]]
[15]
Wessel Kraaij, Thijs Westerveld, and Djoerd Hiemstra. The importance of prior probabilities for entry page search. In Proceedings of SIGIR, pages 27--34, 2002.]]
[16]
Oren Kurland and Lillian Lee. Corpus structure, language models, and ad hoc information retrieval. In Proceedings of SIGIR, pages 194--201, 2004.]]
[17]
Oren Kurland, Lillian Lee, and Carmel Domshlak. Better than the real thing? Iterative pseudo-query processing using cluster-based language models. In Proceedings of SIGIR, 2005.]]
[18]
John D. Lafferty and Chengxiang Zhai. Document language models, query models, and risk minimization for information retrieval. In Proceedings of SIGIR, pages 111--119, 2001.]]
[19]
Victor Lavrenko, James Allan, Edward DeGuzman, Daniel LaFlamme, Veera Pollard, and Steven Thomas. Relevance models for topic detection and tracking. In Proceedings of the Human Language Technology Conference (HLT), pages 104--110, 2002.]]
[20]
Anton Leuski. Evaluating document clustering for interactive information retrieval. In Proceedings of the tenth International Conference on Information and Knowledge Managment (CIKM), pages 33--40, 2001.]]
[21]
Xiaoyan Li and W. Bruce Croft. Time-based language models. In Proceedings of the 12th International Conference on Information and Knowledge Managment (CIKM), pages 469--475, 2003.]]
[22]
Xiaoyong Liu and W. Bruce Croft. Cluster-based retrieval using language models. In Proceedings of SIGIR, pages 186--193, 2004.]]
[23]
Rada Mihalcea. Graph-based ranking algorithms for sentence extraction, applied to text summarization. In The Companion Volume to the Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, pages 170--173, 2004.]]
[24]
Rada Mihalcea and Paul Tarau. TextRank: Bringing order into texts. In Proceedings of EMNLP, pages 404--411, 2004. Poster.]]
[25]
David R. H. Miller, Tim Leek, and Richard~M. Schwartz. A hidden Markov model information retrieval system. In Proceedings of SIGIR, pages 214--221, 1999.]]
[26]
Kenney Ng. A maximum likelihood ratio information retrieval model. In Proceedings of the Eighth Text Retrieval Conference (TREC-8), pages 483--492, 2000.]]
[27]
Paul Ogilvie and Jamie Callan. Experiments using the LEMUR toolkit. In Proceedings of the Tenth Text Retrieval Conference (TREC-10), pages 103--108, 2001.]]
[28]
Bo Pang and Lillian Lee. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the ACL, pages 271--278, 2004.]]
[29]
Gabriel Pinski and Francis Narin. Citation influence for journal aggregates of scientific publications: Theory, with application to the literature of physics. Information Processing and Management, 12:297--312, 1976.]]
[30]
Jay M. Ponte and W. Bruce Croft. A language modeling approach to information retrieval. In Proceedings of SIGIR, pages 275--281, 1998.]]
[31]
Ian Ruthven and Mounia Lalmas. A survey on the use of relevance feedback for information access systems. Knowledge Engineering Review, 18(2):95--145, 2003.]]
[32]
Chirag Shah and W. Bruce Croft. Evaluating high accuracy retrieval techniques. In Proceedings of SIGIR, pages 2--9, 2004.]]
[33]
Amit Singhal, Chris Buckley, and Mandar Mitra. Pivoted document length normalization. In Proceedings of SIGIR, pages 21--29, 1996.]]
[34]
William J. Stewart. Introduction to the numerical solution of Markov chains. Princeton University Press, 1994.]]
[35]
Tao Tao and ChengXiang Zhai. A two-stage mixture model for pseudo feedback. In Proceedings of the 27th SIGIR, pages 486--487, 2004. Poster.]]
[36]
Naftali Tishby and Noam Slonim. Data clustering by Markovian relaxation and the information bottleneck method. In Advances in Neural Information Processing Systems (NIPS) 14, pages 640--646, 2000.]]
[37]
Anastasios Tombros, Robert Villa, and C.J. van Rijsbergen. The effectiveness of query-specific hierarchic clustering in information retrieval. Information Processing and Management, 38(4):559--582, 2002.]]
[38]
Kristina Toutanova, Christopher D. Manning, and Andrew Y. Ng. Learning random walk models for inducing word dependency distributions. In Proceedings of the International Conference on Machine Learning, 2004.]]
[39]
Peter Willett. Query specific automatic document classification. International Forum on Information and Documentation, 10(2):28--32, 1985.]]
[40]
Chengxiang Zhai and John D. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of SIGIR, pages 334--342, 2001.]]

Cited By

View all

Index Terms

  1. PageRank without hyperlinks: structural re-ranking using links induced by language models

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
    August 2005
    708 pages
    ISBN:1595930345
    DOI:10.1145/1076034
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 August 2005

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. HITS
    2. PageRank
    3. authorities
    4. graph-based retrieval
    5. high-accuracy retrieval
    6. hubs
    7. language modeling
    8. social networks
    9. structural re-ranking

    Qualifiers

    • Article

    Conference

    SIGIR05
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 01 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media