short-paper

Expansion via Prediction of Importance with Contextualization

Authors:

Sean MacAvaney,

Franco Maria Nardini,

Raffaele Perego,

Nicola Tonellotto,

Nazli Goharian,

Ophir FriederAuthors Info & Claims

SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 1573 - 1576

https://doi.org/10.1145/3397271.3401262

Published: 25 July 2020 Publication History

Abstract

The identification of relevance with little textual context is a primary challenge in passage retrieval. We address this problem with a representation-based ranking approach that: (1) explicitly models the importance of each term using a contextualized language model; (2) performs passage expansion by propagating the importance to similar terms; and (3) grounds the representations in the lexicon, making them interpretable. Passage representations can be pre-computed at index time to reduce query-time latency. We call our approach EPIC (Expansion via Prediction of Importance with Contextualization). We show that EPIC significantly outperforms prior importance-modeling and document expansion approaches. We also observe that the performance is additive with the current leading first-stage retrieval methods, further narrowing the gap between inexpensive and cost-prohibitive passage ranking approaches. Specifically, EPIC achieves a MRR@10 of 0.304 on the MS-MARCO passage ranking dataset with 78ms average query latency on commodity hardware. We also find that the latency is further reduced to 68ms by pruning document representations, with virtually no difference in effectiveness.

References

[1]

Nick Craswell, Bhaskar Mitra, and Daniel Campos. 2019. Overview of the TREC 2019 Deep Learning Track. In TREC.

[2]

Zhuyun Dai and Jamie Callan. 2019. Context-aware sentence/passage term importance estimation for first stage retrieval. arXiv (2019).

[3]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL.

[4]

Laura Dietz, Ben Gamari, Jeff Dalton, and Nick Craswell. 2017. TREC Complex Answer Retrieval Overview. In TREC.

[5]

Jiafeng Guo, Yixing Fan, Qingyao Ai, and W. Bruce Croft. 2016. A Deep Relevance Matching Model for Ad-hoc Retrieval. In CIKM.

[6]

Helia Hashemi, Mohammad Aliannejadi, Hamed Zamani, and W. Bruce Croft. 2019. ANTIQUE: A Non-Factoid Question Answering Benchmark. arXiv (2019).

[7]

Sebastian Hofst"atter and Allan Hanbury. 2019. Let's measure run time! Extending the IR replicability infrastructure to include performance aspects. In OSIRRC@SIGIR.

[8]

Sebastian Hofst"atter, Markus Zlabinger, and Allan Hanbury. 2020. Interpretable & Time-Budget-Constrained Contextualization for Re-Ranking. arXiv (2020).

[9]

Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR.

[10]

Sean MacAvaney. 2020. OpenNIR: A Complete Neural Ad-Hoc Ranking Pipeline. In WSDM.

[11]

Sean MacAvaney, Andrew Yates, Arman Cohan, and Nazli Goharian. 2019. CEDR: Contextualized Embeddings for Document Ranking. In SIGIR.

Digital Library

[12]

Bhaskar Mitra and Nick Craswell. 2019. An updated duet model for passage re-ranking. arXiv (2019).

[13]

Rodrigo Nogueira and Kyunghyun Cho. 2019. Passage Re-ranking with BERT. arXiv (2019).

[14]

Rodrigo Nogueira and Jimmy Lin. 2019. From doc2query to docTTTTTquery. (2019). https://cs.uwaterloo.ca/ jimmylin/publications/Nogueira_Lin_2019_docTTTTTquery-v2.pdf

[15]

Rodrigo Nogueira, Wei Yang, Jimmy Lin, and Kyunghyun Cho. 2019. Document expansion by query prediction. arXiv (2019).

[16]

Colin Raffel, Noam Shazeer, Adam Kaleo Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2019. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. arXiv (2019).

[17]

Nicola Tonellotto, Craig Macdonald, and Iadh Ounis. 2018. Efficient Query Processing for Scalable Web Search. Foundations and Trends in Information Retrieval, Vol. 12, 4--5 (2018), 319--492. http://dx.doi.org/10.1561/1500000057

Digital Library

[18]

Peilin Yang, Hui Fang, and Jimmy Lin. 2018. Anserini: Reproducible Ranking Baselines Using Lucene. J. Data and Information Quality, Vol. 10 (2018), 16:1--16:20.

Digital Library

Cited By

Yates ALassance CMacAvaney SNguyen TLei YSakai TIshita EOhshima HHasibi FMao JJose J(2024)Neural Lexical Search with Learned Sparse RetrievalProceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3673791.3698441(303-306)Online publication date: 8-Dec-2024
https://dl.acm.org/doi/10.1145/3673791.3698441
Wang JHuang JTu XWang JHuang ALaskar MBhuiyan A(2024)Utilizing BERT for Information Retrieval: Survey, Applications, Resources, and ChallengesACM Computing Surveys10.1145/364847156:7(1-33)Online publication date: 14-Feb-2024
https://dl.acm.org/doi/10.1145/3648471
Rau DDehghani MKamps J(2024)Revisiting Bag of Words Document Representations for Efficient Ranking with TransformersACM Transactions on Information Systems10.1145/364046042:5(1-27)Online publication date: 29-Apr-2024
https://dl.acm.org/doi/10.1145/3640460
Show More Cited By

Index Terms

Expansion via Prediction of Importance with Contextualization
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Context-Aware Document Term Weighting for Ad-Hoc Search
WWW '20: Proceedings of The Web Conference 2020

Bag-of-words document representations play a fundamental role in modern search engines, but their power is limited by the shallow frequency-based term weighting scheme. This paper proposes HDCT, a context-aware document term weighting framework for ...
Novel term weighting schemes for document representation based on ranking of terms and Fuzzy logic with semantic relationship of terms
Highlights
- Ranking of terms based term-weighting scheme proposed.
- Ranking of terms and ...
Abstract
Weighting and normalization are the most important factor that may affect the text representation significantly. This paper presents two novel term weighting schemes to represent text documents, namely, i). Term-weighting scheme for ...
Passage Retrieval on Structured Documents Using Graph Attention Networks
Advances in Information Retrieval
Abstract
Passage Retrieval systems aim at retrieving and ranking small text units according to their estimated relevance to a query. A usual practice is to consider the context a passage appears in (its containing document, neighbour passages, etc.) to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2020

2548 pages

ISBN:9781450380164

DOI:10.1145/3397271

General Chairs:
Jimmy Huang
York University, Canada
,
Yi Chang
Jilin University, China
,
Xueqi Cheng
Chinese Academy of Sciences, China
,
Program Chairs:
Jaap Kamps
University of Amsterdam, Netherlands
,
Vanessa Murdock
Amazon, U.S.A.
,
Ji-Rong Wen
Renmin University of China, China
,
Yiqun Liu
Tsinghua University, China

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 July 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

Italian Ministry of Education and Research (MIUR)
EU Horizon 2020 research and innovation programme

Conference

SIGIR '20

Sponsor:

SIGIR

SIGIR '20: The 43rd International ACM SIGIR conference on research and development in Information Retrieval

July 25 - 30, 2020

Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

55
Total Citations
View Citations
361
Total Downloads

Downloads (Last 12 months)48
Downloads (Last 6 weeks)5

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yates ALassance CMacAvaney SNguyen TLei YSakai TIshita EOhshima HHasibi FMao JJose J(2024)Neural Lexical Search with Learned Sparse RetrievalProceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3673791.3698441(303-306)Online publication date: 8-Dec-2024
https://dl.acm.org/doi/10.1145/3673791.3698441
Wang JHuang JTu XWang JHuang ALaskar MBhuiyan A(2024)Utilizing BERT for Information Retrieval: Survey, Applications, Resources, and ChallengesACM Computing Surveys10.1145/364847156:7(1-33)Online publication date: 14-Feb-2024
https://dl.acm.org/doi/10.1145/3648471
Rau DDehghani MKamps J(2024)Revisiting Bag of Words Document Representations for Efficient Ranking with TransformersACM Transactions on Information Systems10.1145/364046042:5(1-27)Online publication date: 29-Apr-2024
https://dl.acm.org/doi/10.1145/3640460
Formal TLassance CPiwowarski BClinchant S(2024)Towards Effective and Efficient Sparse Neural Information RetrievalACM Transactions on Information Systems10.1145/363491242:5(1-46)Online publication date: 29-Apr-2024
https://dl.acm.org/doi/10.1145/3634912
Leonhardt JMüller HRudra KKhosla MAnand AAnand A(2024)Efficient Neural Ranking Using Forward Indexes and Lightweight EncodersACM Transactions on Information Systems10.1145/363193942:5(1-34)Online publication date: 29-Apr-2024
https://dl.acm.org/doi/10.1145/3631939
Tigunova AHaratinezhad Torbati GYates AWeikum GSerra ESpezzano F(2024)STAR: Sparse Text Approach for RecommendationProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679999(4086-4090)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679999
Bruch SNardini FRulli CVenturini RSerra ESpezzano F(2024)Pairing Clustered Inverted Indexes with κ-NN Graphs for Fast Approximate Retrieval over Learned Sparse RepresentationsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679977(3642-3646)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679977
Yang EHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Contextualization with SPLADE for High Recall RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657919(2337-2341)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657919
Mansour WZhuang SZuccon GMackenzie JHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Revisiting Document Expansion and Filtering for Effective First-Stage RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657850(186-196)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657850
Gienapp LScells HDeckers NBevendorff JWang SKiesel JSyed SFröbe MZuccon GStein BHagen MPotthast MHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Evaluating Generative Ad Hoc Information RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657849(1916-1929)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657849
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents