skip to main content
research-article

Effective and practical neural ranking

Published: 16 July 2021 Publication History

Abstract

Supervised machine learning methods that use neural networks ("deep learning") have yielded substantial improvements to a multitude of Natural Language Processing (NLP) tasks in the past decade. Improvements to Information Retrieval (IR) tasks, such as ad-hoc search, lagged behind those in similar NLP tasks, despite considerable community efforts. Although there are several contributing factors, I argue in this dissertation that early attempts were not more successful because they did not properly consider the unique characteristics of IR tasks when designing and training ranking models. I first demonstrate this by showing how large-scale datasets containing weak relevance labels can successfully replace training on in-domain collections. This technique improves the variety of queries encountered when training and helps mitigate concerns of over-fitting particular test collections. I then show that dataset statistics available in specific IR tasks can be easily incorporated into neural ranking models alongside the textual features, resulting in more effective ranking models. I also demonstrate that contextualized representations, particularly those from transformer-based language models, considerably improve neural ad-hoc ranking performance. I find that this approach is neither limited to the task of ad-hoc ranking (as demonstrated by ranking clinical reports) nor English content (as shown by training effective cross-lingual neural rankers). These efforts demonstrate that neural approaches can be effective for ranking tasks. However, I observe that these techniques are impractical due to their high query-time computational costs. To overcome this, I study approaches for offloading computational cost to index-time, substantially reducing query-time latency. These techniques make neural methods practical for ranking tasks. Finally, I take a deep dive into better understanding the linguistic biases of the methods I propose compared to contemporary and traditional approaches. The findings from this analysis highlight potential pitfalls of recent methods and provide a way to measure progress in this area going forward.

References

[1]
Sean MacAvaney, Andrew Yates, Arman Cohan, and Nazli Goharian. CEDR: Contextualized embeddings for document ranking. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1101--1104, 2019a. URL https://dl.acm.org/doi/10.1145/3331184.3331317.
[2]
Sean MacAvaney, Andrew Yates, Kai Hui, and Ophir Frieder. Content-based weak supervision for ad-hoc re-ranking. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 993--996, 2019b. URL https://dl.acm.org/doi/10.1145/3331184.3331316.
[3]
Sean MacAvaney, Arman Cohan, and Nazli Goharian. SLEDGE: A simple yet effective baseline for covid-19 scientific knowledge search. arXiv, abs/2005.02365, 2020a. URL https://arxiv.org/abs/2005.02365. See also: extended version published in EMNLP 2020 (https://www.aclweb.org/anthology/2020.emnlp-main.341/).
[4]
Sean MacAvaney, Franco Maria Nardini, Raffaele Perego, Nicola Tonellotto, Nazli Goharian, and Ophir Frieder. Efficient document re-ranking for transformers by precomputing term representations. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 49--58, 2020b. URL https://dl.acm.org/doi/10.1145/3397271.3401093.
[5]
Sean MacAvaney, Franco Maria Nardini, Raffaele Perego, Nicola Tonellotto, Nazli Goharian, and Ophir Frieder. Expansion via prediction of importance with contextualization. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1573--1576, 2020c. URL https://dl.acm.org/doi/10.1145/3397271.3401262.

Cited By

View all
  • (2024)Centralized and Labeled Academic Journal Library Using Machine Learning & Deep Learning Approaches2024 International Conference on Image Processing and Robotics (ICIPRoB)10.1109/ICIPRoB62548.2024.10544300(1-6)Online publication date: 9-Mar-2024
  1. Effective and practical neural ranking

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM SIGIR Forum
    ACM SIGIR Forum  Volume 55, Issue 1
    June 2021
    157 pages
    ISSN:0163-5840
    DOI:10.1145/3476415
    Issue’s Table of Contents
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 16 July 2021
    Published in SIGIR Volume 55, Issue 1

    Check for updates

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 20 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Centralized and Labeled Academic Journal Library Using Machine Learning & Deep Learning Approaches2024 International Conference on Image Processing and Robotics (ICIPRoB)10.1109/ICIPRoB62548.2024.10544300(1-6)Online publication date: 9-Mar-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media