skip to main content
10.1145/2983323.2983844acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Pseudo-Relevance Feedback Based on Matrix Factorization

Published: 24 October 2016 Publication History

Abstract

In information retrieval, pseudo-relevance feedback (PRF) refers to a strategy for updating the query model using the top retrieved documents. PRF has been proven to be highly effective in improving the retrieval performance. In this paper, we look at the PRF task as a recommendation problem: the goal is to recommend a number of terms for a given query along with weights, such that the final weights of terms in the updated query model better reflect the terms' contributions in the query. To do so, we propose RFMF, a PRF framework based on matrix factorization which is a state-of-the-art technique in collaborative recommender systems. Our purpose is to predict the weight of terms that have not appeared in the query and matrix factorization techniques are used to predict these weights. In RFMF, we first create a matrix whose elements are computed using a weight function that shows how much a term discriminates the query or the top retrieved documents from the collection. Then, we re-estimate the created matrix using a matrix factorization technique. Finally, the query model is updated using the re-estimated matrix. RFMF is a general framework that can be employed with any retrieval model. In this paper, we implement this framework for two widely used document retrieval frameworks: language modeling and the vector space model. Extensive experiments over several TREC collections demonstrate that the RFMF framework significantly outperforms competitive baselines. These results indicate the potential of using other recommendation techniques in this task.

References

[1]
N. Abdul-jaleel, J. Allan, W. B. Croft, F. Diaz, L. Larkey, X. Li, D. Metzler, M. D. Smucker, T. Strohman, H. Turtle, and C. Wade. UMass at TREC 2004: Novelty and HARD. In TREC '04, 2004.
[2]
C. Aggarwal and C. Zhai. Mining Text Data. 2012.
[3]
C. Carpineto and G. Romano. A Survey of Automatic Query Expansion in Information Retrieval. ACM Comput. Surv., 44(1):1--50, 2012.
[4]
K. Collins-Thompson. Reducing the Risk of Query Expansion via Robust Constrained Optimization. In CIKM '09, pages 837--846, 2009.
[5]
W. B. Croft and D. J. Harper. Using Probabilistic Models of Document Retrieval Without Relevance Information. J. of Documentation, 35(4):285--295, 1979.
[6]
S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. Indexing by Latent Semantic Analysis. J. Assoc. Inf. Sci., 41(6):391--407, 1990.
[7]
M. Dehghani, S. Abnar, and J. Kamps. The Healing Power of Poison: Helpful Non-relevant Documents in Feedback. In CIKM '16, 2016.
[8]
M. Dehghani, H. Azarbonyad, J. Kamps, D. Hiemstra, and M. Marx. Luhn Revisited: Significant Words Language Models. In CIKM '16, 2016.
[9]
E. Gaussier and C. Goutte. Relation Between PLSA and NMF and Implications. In SIGIR '05, pages 601--602, 2005.
[10]
B. He and I. Ounis. Finding Good Feedback Documents. In CIKM '09, pages 2011--2014, 2009.
[11]
R. He, Y. Zhu, and W. Zhan. Using Local Latent Semantic Indexing with Pseudo Relevance Feedback in Web Image Retrieval. In NCM '09, pages 1354--1357, 2009.
[12]
N.-D. Ho. Nonnegative Matrix Factorization Algorithms and Applications. PhD thesis, Universite Catholique de Louvain, 2008.
[13]
J. Lafferty and C. Zhai. Document Language Models, Query Models, and Risk Minimization for Information Retrieval. In SIGIR '01, pages 111--119, 2001.
[14]
V. Lavrenko and W. B. Croft. Relevance Based Language Models. In SIGIR '01, pages 120--127, 2001.
[15]
D. D. Lee and H. S. Seung. Learning the Parts of Objects by Non-negative Matrix Factorization. Nature, 401:788--791, 1999.
[16]
D. D. Lee and H. S. Seung. Algorithms for Non-negative Matrix Factorization. In NIPS '01, pages 556--562. 2001.
[17]
J.-H. Lee, S. Park, C.-M. Ahn, and D. Kim. Automatic Generic Document Summarization Based on Non-negative Matrix Factorization. Inf. Process. Manage., 45(1):20--34, 2009.
[18]
Y. Li, J. Hu, C. Zhai, and Y. Chen. Improving One-class Collaborative Filtering by Incorporating Rich User Information. In CIKM '10, pages 959--968, 2010.
[19]
C.-J. Lin. Projected Gradient Methods for Nonnegative Matrix Factorization. Neural Comput., 19(10):2756--2779, 2007.
[20]
Y. Lv and C. Zhai. A Comparative Study of Methods for Estimating Query Language Models with Pseudo Feedback. In CIKM '09, pages 1895--1898, 2009.
[21]
Y. Lv and C. Zhai. Positional Relevance Model for Pseudo-relevance Feedback. In SIGIR '10, pages 579--586, 2010.
[22]
Y. Lv and C. Zhai. Revisiting the Divergence Minimization Feedback Model. In CIKM '14, pages 1863--1866, 2014.
[23]
D. Metzler and W. B. Croft. Latent Concept Expansion Using Markov Random Fields. In SIGIR '07, pages 311--318, 2007.
[24]
J. Miao, J. X. Huang, and Z. Ye. Proximity-based Rocchio's Model for Pseudo Relevance. In SIGIR '12, pages 535--544, 2012.
[25]
A. Montazeralghaem, H. Zamani, and A. Shakery. Axiomatic Analysis for Improving the Log-Logistic Feedback Model. In SIGIR '16, pages 765--768, 2016.
[26]
J. Parapar, A. Bellogín, P. Castells, and A. Barreiro. Relevance-based Language Modelling for Recommender Systems. Inf. Process. Manag., 49(4):966--980, 2013.
[27]
J. Pennington, R. Socher, and C. D. Manning. GloVe: Global Vectors for Word Representation. In EMNLP '14, pages 1532--1543, 2014.
[28]
J. M. Ponte and W. B. Croft. A Language Modeling Approach to Information Retrieval. In SIGIR '98, pages 275--281, 1998.
[29]
F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor. Recommender Systems Handbook. 2011.
[30]
S. E. Robertson and K. S. Jones. Relevance Weighting of Search Terms. J. Assoc. Inf. Sci., 27(3):129--146, 1976.
[31]
J. J. Rocchio. Relevance Feedback in Information Retrieval. In The SMART Retrieval System - Experiments in Automatic Document Processing. Prentice Hall, 1971.
[32]
I. Ruthven and M. Lalmas. A Survey on the Use of Relevance Feedback for Information Access Systems. Knowl. Eng. Rev., 18(2):95--145, 2003.
[33]
X. Shen and C. Zhai. Active Feedback in Ad Hoc Information Retrieval. In SIGIR '05, pages 59--66, 2005.
[34]
Y. Shi, M. Larson, and A. Hanjalic. Collaborative Filtering Beyond the User-Item Matrix: A Survey of the State of the Art and Future Challenges. ACM Comput. Surv., 47(1):1--45, 2014.
[35]
T. Tao and C. Zhai. Regularized Estimation of Mixture Models for Robust Pseudo-relevance Feedback. In SIGIR '06, pages 162--169, 2006.
[36]
Q. Wang, Z. Cao, J. Xu, and H. Li. Group Matrix Factorization for Scalable Topic Modeling. In SIGIR '12, pages 375--384, 2012.
[37]
Y. Wu, Q. Zhang, Y. Zhou, and X. Huang. Pseudo-Relevance Feedback Based on mRMR Criteria. In AIRS '10, pages 211--220, 2010.
[38]
Z. Ye, J. X. Huang, and H. Lin. Finding a Good Query-Related Topic for Boosting Pseudo-Relevance Feedback. J. Assoc. Inf. Sci. Technol., 62(4):748--760, 2011.
[39]
H.-F. Yu, C.-J. Hsieh, S. Si, and I. Dhillon. Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems. In ICDM '12, pages 765--774, 2012.
[40]
H. Zamani and W. B. Croft. Embedding-based Query Language Models. In ICTIR '16, 2016.
[41]
H. Zamani and W. B. Croft. Estimating Embedding Vectors for Queries. In ICTIR '16, 2016.
[42]
C. Zhai. Statistical Language Models for Information Retrieval. 2008.
[43]
C. Zhai and J. Lafferty. Model-based Feedback in the Language Modeling Approach to Information Retrieval. In CIKM '01, pages 403--410, 2001.
[44]
S. Zhang, W. Wang, J. Ford, and F. Makedon. Learning from Incomplete Ratings Using Non-negative Matrix Factorization. In SDM '06, pages 549--553, 2006.
[45]
M. Zitnik and B. Zupan. NIMFA: A Python Library for Nonnegative Matrix Factorization. J. Mach. Learn. Res., 13:849--853, 2012.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management
October 2016
2566 pages
ISBN:9781450340731
DOI:10.1145/2983323
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 October 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. language model
  2. matrix factorization
  3. pseudo-relevance feedback
  4. query expansion
  5. term recommendation

Qualifiers

  • Research-article

Conference

CIKM'16
Sponsor:
CIKM'16: ACM Conference on Information and Knowledge Management
October 24 - 28, 2016
Indiana, Indianapolis, USA

Acceptance Rates

CIKM '16 Paper Acceptance Rate 160 of 701 submissions, 23%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media