research-article

Pseudo-Relevance Feedback Based on Matrix Factorization

Authors:

Javid Dadashkarimi,

Azadeh Shakery,

W. Bruce CroftAuthors Info & Claims

CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management

Pages 1483 - 1492

https://doi.org/10.1145/2983323.2983844

Published: 24 October 2016 Publication History

Abstract

In information retrieval, pseudo-relevance feedback (PRF) refers to a strategy for updating the query model using the top retrieved documents. PRF has been proven to be highly effective in improving the retrieval performance. In this paper, we look at the PRF task as a recommendation problem: the goal is to recommend a number of terms for a given query along with weights, such that the final weights of terms in the updated query model better reflect the terms' contributions in the query. To do so, we propose RFMF, a PRF framework based on matrix factorization which is a state-of-the-art technique in collaborative recommender systems. Our purpose is to predict the weight of terms that have not appeared in the query and matrix factorization techniques are used to predict these weights. In RFMF, we first create a matrix whose elements are computed using a weight function that shows how much a term discriminates the query or the top retrieved documents from the collection. Then, we re-estimate the created matrix using a matrix factorization technique. Finally, the query model is updated using the re-estimated matrix. RFMF is a general framework that can be employed with any retrieval model. In this paper, we implement this framework for two widely used document retrieval frameworks: language modeling and the vector space model. Extensive experiments over several TREC collections demonstrate that the RFMF framework significantly outperforms competitive baselines. These results indicate the potential of using other recommendation techniques in this task.

References

[1]

N. Abdul-jaleel, J. Allan, W. B. Croft, F. Diaz, L. Larkey, X. Li, D. Metzler, M. D. Smucker, T. Strohman, H. Turtle, and C. Wade. UMass at TREC 2004: Novelty and HARD. In TREC '04, 2004.

[2]

C. Aggarwal and C. Zhai. Mining Text Data. 2012.

[3]

C. Carpineto and G. Romano. A Survey of Automatic Query Expansion in Information Retrieval. ACM Comput. Surv., 44(1):1--50, 2012.

Digital Library

[4]

K. Collins-Thompson. Reducing the Risk of Query Expansion via Robust Constrained Optimization. In CIKM '09, pages 837--846, 2009.

Digital Library

[5]

W. B. Croft and D. J. Harper. Using Probabilistic Models of Document Retrieval Without Relevance Information. J. of Documentation, 35(4):285--295, 1979.

[6]

S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. Indexing by Latent Semantic Analysis. J. Assoc. Inf. Sci., 41(6):391--407, 1990.

[7]

M. Dehghani, S. Abnar, and J. Kamps. The Healing Power of Poison: Helpful Non-relevant Documents in Feedback. In CIKM '16, 2016.

Digital Library

[8]

M. Dehghani, H. Azarbonyad, J. Kamps, D. Hiemstra, and M. Marx. Luhn Revisited: Significant Words Language Models. In CIKM '16, 2016.

Digital Library

[9]

E. Gaussier and C. Goutte. Relation Between PLSA and NMF and Implications. In SIGIR '05, pages 601--602, 2005.

Digital Library

[10]

B. He and I. Ounis. Finding Good Feedback Documents. In CIKM '09, pages 2011--2014, 2009.

Digital Library

[11]

R. He, Y. Zhu, and W. Zhan. Using Local Latent Semantic Indexing with Pseudo Relevance Feedback in Web Image Retrieval. In NCM '09, pages 1354--1357, 2009.

Digital Library

[12]

N.-D. Ho. Nonnegative Matrix Factorization Algorithms and Applications. PhD thesis, Universite Catholique de Louvain, 2008.

[13]

J. Lafferty and C. Zhai. Document Language Models, Query Models, and Risk Minimization for Information Retrieval. In SIGIR '01, pages 111--119, 2001.

Digital Library

[14]

V. Lavrenko and W. B. Croft. Relevance Based Language Models. In SIGIR '01, pages 120--127, 2001.

Digital Library

[15]

D. D. Lee and H. S. Seung. Learning the Parts of Objects by Non-negative Matrix Factorization. Nature, 401:788--791, 1999.

[16]

D. D. Lee and H. S. Seung. Algorithms for Non-negative Matrix Factorization. In NIPS '01, pages 556--562. 2001.

Digital Library

[17]

J.-H. Lee, S. Park, C.-M. Ahn, and D. Kim. Automatic Generic Document Summarization Based on Non-negative Matrix Factorization. Inf. Process. Manage., 45(1):20--34, 2009.

Digital Library

[18]

Y. Li, J. Hu, C. Zhai, and Y. Chen. Improving One-class Collaborative Filtering by Incorporating Rich User Information. In CIKM '10, pages 959--968, 2010.

Digital Library

[19]

C.-J. Lin. Projected Gradient Methods for Nonnegative Matrix Factorization. Neural Comput., 19(10):2756--2779, 2007.

Digital Library

[20]

Y. Lv and C. Zhai. A Comparative Study of Methods for Estimating Query Language Models with Pseudo Feedback. In CIKM '09, pages 1895--1898, 2009.

Digital Library

[21]

Y. Lv and C. Zhai. Positional Relevance Model for Pseudo-relevance Feedback. In SIGIR '10, pages 579--586, 2010.

Digital Library

[22]

Y. Lv and C. Zhai. Revisiting the Divergence Minimization Feedback Model. In CIKM '14, pages 1863--1866, 2014.

Digital Library

[23]

D. Metzler and W. B. Croft. Latent Concept Expansion Using Markov Random Fields. In SIGIR '07, pages 311--318, 2007.

Digital Library

[24]

J. Miao, J. X. Huang, and Z. Ye. Proximity-based Rocchio's Model for Pseudo Relevance. In SIGIR '12, pages 535--544, 2012.

Digital Library

[25]

A. Montazeralghaem, H. Zamani, and A. Shakery. Axiomatic Analysis for Improving the Log-Logistic Feedback Model. In SIGIR '16, pages 765--768, 2016.

Digital Library

[26]

J. Parapar, A. Bellogín, P. Castells, and A. Barreiro. Relevance-based Language Modelling for Recommender Systems. Inf. Process. Manag., 49(4):966--980, 2013.

Digital Library

[27]

J. Pennington, R. Socher, and C. D. Manning. GloVe: Global Vectors for Word Representation. In EMNLP '14, pages 1532--1543, 2014.

[28]

J. M. Ponte and W. B. Croft. A Language Modeling Approach to Information Retrieval. In SIGIR '98, pages 275--281, 1998.

Digital Library

[29]

F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor. Recommender Systems Handbook. 2011.

[30]

S. E. Robertson and K. S. Jones. Relevance Weighting of Search Terms. J. Assoc. Inf. Sci., 27(3):129--146, 1976.

[31]

J. J. Rocchio. Relevance Feedback in Information Retrieval. In The SMART Retrieval System - Experiments in Automatic Document Processing. Prentice Hall, 1971.

[32]

I. Ruthven and M. Lalmas. A Survey on the Use of Relevance Feedback for Information Access Systems. Knowl. Eng. Rev., 18(2):95--145, 2003.

Digital Library

[33]

X. Shen and C. Zhai. Active Feedback in Ad Hoc Information Retrieval. In SIGIR '05, pages 59--66, 2005.

Digital Library

[34]

Y. Shi, M. Larson, and A. Hanjalic. Collaborative Filtering Beyond the User-Item Matrix: A Survey of the State of the Art and Future Challenges. ACM Comput. Surv., 47(1):1--45, 2014.

Digital Library

[35]

T. Tao and C. Zhai. Regularized Estimation of Mixture Models for Robust Pseudo-relevance Feedback. In SIGIR '06, pages 162--169, 2006.

Digital Library

[36]

Q. Wang, Z. Cao, J. Xu, and H. Li. Group Matrix Factorization for Scalable Topic Modeling. In SIGIR '12, pages 375--384, 2012.

Digital Library

[37]

Y. Wu, Q. Zhang, Y. Zhou, and X. Huang. Pseudo-Relevance Feedback Based on mRMR Criteria. In AIRS '10, pages 211--220, 2010.

[38]

Z. Ye, J. X. Huang, and H. Lin. Finding a Good Query-Related Topic for Boosting Pseudo-Relevance Feedback. J. Assoc. Inf. Sci. Technol., 62(4):748--760, 2011.

Digital Library

[39]

H.-F. Yu, C.-J. Hsieh, S. Si, and I. Dhillon. Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems. In ICDM '12, pages 765--774, 2012.

Digital Library

[40]

H. Zamani and W. B. Croft. Embedding-based Query Language Models. In ICTIR '16, 2016.

Digital Library

[41]

H. Zamani and W. B. Croft. Estimating Embedding Vectors for Queries. In ICTIR '16, 2016.

Digital Library

[42]

C. Zhai. Statistical Language Models for Information Retrieval. 2008.

Digital Library

[43]

C. Zhai and J. Lafferty. Model-based Feedback in the Language Modeling Approach to Information Retrieval. In CIKM '01, pages 403--410, 2001.

Digital Library

[44]

S. Zhang, W. Wang, J. Ford, and F. Makedon. Learning from Incomplete Ratings Using Non-negative Matrix Factorization. In SDM '06, pages 549--553, 2006.

[45]

M. Zitnik and B. Zupan. NIMFA: A Python Library for Nonnegative Matrix Factorization. J. Mach. Learn. Res., 13:849--853, 2012.

Digital Library

Cited By

Khan TRashid UKhan A(2024)End-to-end pseudo relevance feedback based vertical web search queries recommendationMultimedia Tools and Applications10.1007/s11042-024-18559-4Online publication date: 21-Feb-2024
https://doi.org/10.1007/s11042-024-18559-4
Liu WZhou YZhu YDou Z(2024)How to personalize and whether to personalize? Candidate documents decideKnowledge and Information Systems10.1007/s10115-024-02138-y66:9(5581-5604)Online publication date: 27-May-2024
https://doi.org/10.1007/s10115-024-02138-y
Datta SGanguly DMacAvaney SGreene D(2024)A Deep Learning Approach for Selective Relevance FeedbackAdvances in Information Retrieval10.1007/978-3-031-56060-6_13(189-204)Online publication date: 16-Mar-2024
https://doi.org/10.1007/978-3-031-56060-6_13
Show More Cited By

Index Terms

Pseudo-Relevance Feedback Based on Matrix Factorization
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing
      1. Query reformulation
      2. Query representation
    2. Retrieval models and ranking

Recommendations

Pseudo-Relevance Feedback for Multiple Representation Dense Retrieval
ICTIR '21: Proceedings of the 2021 ACM SIGIR International Conference on Theory of Information Retrieval

Pseudo-relevance feedback mechanisms, from Rocchio to the relevance models, have shown the usefulness of expanding and reweighting the users' initial queries using information occurring in an initial set of retrieved documents, known as the pseudo-...
Query dependent pseudo-relevance feedback based on wikipedia
SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval

Pseudo-relevance feedback (PRF) via query-expansion has been proven to be e®ective in many information retrieval (IR) tasks. In most existing work, the top-ranked documents from an initial search are assumed to be relevant and used for PRF. One problem ...
Document-based and term-based linear methods for pseudo-relevance feedback

Query expansion is a successful approach for improving Information Retrieval effectiveness. This work focuses on pseudo-relevance feedback (PRF) which provides an automatic method for expanding queries without explicit user feedback. These techniques ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management

October 2016

2566 pages

ISBN:9781450340731

DOI:10.1145/2983323

General Chairs:
Snehasis Mukhopadhyay
Indiana University Purdue University Indianapolis, USA
,
ChengXiang Zhai
University of Illinois at Urbana-Champaign, USA
,
Program Chairs:
Elisa Bertino
Purdue University
,
Fabio Crestani
University of Lugano
,
Javed Mostafa
University of North Carolina
,
Jie Tang
Tsinghua University
,
Luo Si
Alibaba Group Inc & Purdue University
,
Xiaofang Zhou
University of Queensland
,
Yi Chang
Yahoo Research
,
Yunyao Li
IBM Research - Almaden
,
Parikshit Sondhi
WalmartLabs

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 October 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CIKM'16

Sponsor:

CIKM'16: ACM Conference on Information and Knowledge Management

October 24 - 28, 2016

Indiana, Indianapolis, USA

Acceptance Rates

CIKM '16 Paper Acceptance Rate 160 of 701 submissions, 23%;

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

42
Total Citations
View Citations
434
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)0

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Khan TRashid UKhan A(2024)End-to-end pseudo relevance feedback based vertical web search queries recommendationMultimedia Tools and Applications10.1007/s11042-024-18559-4Online publication date: 21-Feb-2024
https://doi.org/10.1007/s11042-024-18559-4
Liu WZhou YZhu YDou Z(2024)How to personalize and whether to personalize? Candidate documents decideKnowledge and Information Systems10.1007/s10115-024-02138-y66:9(5581-5604)Online publication date: 27-May-2024
https://doi.org/10.1007/s10115-024-02138-y
Datta SGanguly DMacAvaney SGreene D(2024)A Deep Learning Approach for Selective Relevance FeedbackAdvances in Information Retrieval10.1007/978-3-031-56060-6_13(189-204)Online publication date: 16-Mar-2024
https://doi.org/10.1007/978-3-031-56060-6_13
Li HMourad AZhuang SKoopman BZuccon G(2023)Pseudo Relevance Feedback with Deep Language Models and Dense Retrievers: Successes and PitfallsACM Transactions on Information Systems10.1145/357072441:3(1-40)Online publication date: 10-Apr-2023
https://dl.acm.org/doi/10.1145/3570724
Wang CWang PQin TWang CKumar SGuan XLiu JChang K(2023)SocialSift: Target Query Discovery on Online Social Media With Deep Reinforcement LearningIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2021.313058734:9(5654-5668)Online publication date: Sep-2023
https://doi.org/10.1109/TNNLS.2021.3130587
Hambarde KProença H(2023)Information Retrieval: Recent Advances and BeyondIEEE Access10.1109/ACCESS.2023.329577611(76581-76604)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3295776
Datta SGanguly DMitra MGreene D(2022)A Relative Information Gain-based Query Performance Prediction Framework with Generated Query VariantsACM Transactions on Information Systems10.1145/354511241:2(1-31)Online publication date: 21-Dec-2022
https://dl.acm.org/doi/10.1145/3545112
Li XMao JMa WWu ZLiu YZhang MMa SWang ZHe XSelcuk Candan KLiu HAkoglu LLuna Dong XTang J(2022)A Cooperative Neural Information Retrieval Pipeline with Knowledge Enhanced Automatic Query ReformulationProceedings of the Fifteenth ACM International Conference on Web Search and Data Mining10.1145/3488560.3498516(553-561)Online publication date: 11-Feb-2022
https://dl.acm.org/doi/10.1145/3488560.3498516
Feng JZhao RJiang J(2022)A Large Scale Document-Term Matching Method Based on Information Retrieval2022 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)10.1109/ISPA-BDCloud-SocialCom-SustainCom57177.2022.00048(323-330)Online publication date: Dec-2022
https://doi.org/10.1109/ISPA-BDCloud-SocialCom-SustainCom57177.2022.00048
Li HZhuang SMourad AMa XLin JZuccon G(2022)Improving Query Representations for Dense Retrieval with Pseudo Relevance Feedback: A Reproducibility StudyAdvances in Information Retrieval10.1007/978-3-030-99736-6_40(599-612)Online publication date: 5-Apr-2022
https://doi.org/10.1007/978-3-030-99736-6_40
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten