research-article

Effective and practical neural ranking

Author:

Sean MacAvaneyAuthors Info & Claims

ACM SIGIR Forum, Volume 55, Issue 1

Article No.: 17, Pages 1 - 2

https://doi.org/10.1145/3476415.3476432

Published: 16 July 2021 Publication History

Get Access

Abstract

Supervised machine learning methods that use neural networks ("deep learning") have yielded substantial improvements to a multitude of Natural Language Processing (NLP) tasks in the past decade. Improvements to Information Retrieval (IR) tasks, such as ad-hoc search, lagged behind those in similar NLP tasks, despite considerable community efforts. Although there are several contributing factors, I argue in this dissertation that early attempts were not more successful because they did not properly consider the unique characteristics of IR tasks when designing and training ranking models. I first demonstrate this by showing how large-scale datasets containing weak relevance labels can successfully replace training on in-domain collections. This technique improves the variety of queries encountered when training and helps mitigate concerns of over-fitting particular test collections. I then show that dataset statistics available in specific IR tasks can be easily incorporated into neural ranking models alongside the textual features, resulting in more effective ranking models. I also demonstrate that contextualized representations, particularly those from transformer-based language models, considerably improve neural ad-hoc ranking performance. I find that this approach is neither limited to the task of ad-hoc ranking (as demonstrated by ranking clinical reports) nor English content (as shown by training effective cross-lingual neural rankers). These efforts demonstrate that neural approaches can be effective for ranking tasks. However, I observe that these techniques are impractical due to their high query-time computational costs. To overcome this, I study approaches for offloading computational cost to index-time, substantially reducing query-time latency. These techniques make neural methods practical for ranking tasks. Finally, I take a deep dive into better understanding the linguistic biases of the methods I propose compared to contemporary and traditional approaches. The findings from this analysis highlight potential pitfalls of recent methods and provide a way to measure progress in this area going forward.

References

[1]

Sean MacAvaney, Andrew Yates, Arman Cohan, and Nazli Goharian. CEDR: Contextualized embeddings for document ranking. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1101--1104, 2019a. URL https://dl.acm.org/doi/10.1145/3331184.3331317.

Digital Library

Google Scholar

[2]

Sean MacAvaney, Andrew Yates, Kai Hui, and Ophir Frieder. Content-based weak supervision for ad-hoc re-ranking. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 993--996, 2019b. URL https://dl.acm.org/doi/10.1145/3331184.3331316.

Digital Library

Google Scholar

[3]

Sean MacAvaney, Arman Cohan, and Nazli Goharian. SLEDGE: A simple yet effective baseline for covid-19 scientific knowledge search. arXiv, abs/2005.02365, 2020a. URL https://arxiv.org/abs/2005.02365. See also: extended version published in EMNLP 2020 (https://www.aclweb.org/anthology/2020.emnlp-main.341/).

Google Scholar

[4]

Sean MacAvaney, Franco Maria Nardini, Raffaele Perego, Nicola Tonellotto, Nazli Goharian, and Ophir Frieder. Efficient document re-ranking for transformers by precomputing term representations. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 49--58, 2020b. URL https://dl.acm.org/doi/10.1145/3397271.3401093.

Digital Library

Google Scholar

[5]

Sean MacAvaney, Franco Maria Nardini, Raffaele Perego, Nicola Tonellotto, Nazli Goharian, and Ophir Frieder. Expansion via prediction of importance with contextualization. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1573--1576, 2020c. URL https://dl.acm.org/doi/10.1145/3397271.3401262.

Digital Library

Google Scholar

Cited By

View all

Dayarathne RArachchi NAluthdeniya NRanasinghe WGanegoda G(2024)Centralized and Labeled Academic Journal Library Using Machine Learning & Deep Learning Approaches2024 International Conference on Image Processing and Robotics (ICIPRoB)10.1109/ICIPRoB62548.2024.10544300(1-6)Online publication date: 9-Mar-2024
https://doi.org/10.1109/ICIPRoB62548.2024.10544300

Effective and practical neural ranking
1. Computing methodologies
  1. Artificial intelligence

Recommendations

Effective and Practical Neural Ranking
Neural Ranking Models with Weak Supervision
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

Despite the impressive improvements achieved by unsupervised deep neural networks in computer vision and NLP tasks, such improvements have not yet been observed in ranking for information retrieval. The reason may be the complexity of the ranking ...
From Neural Re-Ranking to Neural Ranking: Learning a Sparse Representation for Inverted Indexing
CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management

The availability of massive data and computing power allowing for effective data driven neural approaches is having a major impact on machine learning and information retrieval research, but these models have a basic problem with efficiency. Current ...

Comments

Information & Contributors

Information

Published In

ACM SIGIR Forum Volume 55, Issue 1

June 2021

157 pages

ISSN:0163-5840

DOI:10.1145/3476415

Issue’s Table of Contents

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 July 2021

Published in SIGIR Volume 55, Issue 1

Check for updates

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
24
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Dayarathne RArachchi NAluthdeniya NRanasinghe WGanegoda G(2024)Centralized and Labeled Academic Journal Library Using Machine Learning & Deep Learning Approaches2024 International Conference on Image Processing and Robotics (ICIPRoB)10.1109/ICIPRoB62548.2024.10544300(1-6)Online publication date: 9-Mar-2024
https://doi.org/10.1109/ICIPRoB62548.2024.10544300

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Recommendations