skip to main content
10.1145/3404835.3463254acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

Simplified Data Wrangling with ir_datasets

Published: 11 July 2021 Publication History

Abstract

Managing the data for Information Retrieval (IR) experiments can be challenging. Dataset documentation is scattered across the Internet and once one obtains a copy of the data, there are numerous different data formats to work with. Even basic formats can have subtle dataset-specific nuances that need to be considered for proper use. To help mitigate these challenges, we introduce a new robust and lightweight tool (ir_datasets) for acquiring, managing, and performing typical operations over datasets used in IR. We primarily focus on textual datasets used for ad-hoc search. This tool provides both a Python and command line interface to numerous IR datasets and benchmarks. To our knowledge, this is the most extensive tool of its kind. Integrations with popular IR indexing and experimentation toolkits demonstrate the tool's utility. We also provide documentation of these datasets through the \sys catalog: https://ir-datasets.com/. The catalog acts as a hub for information on datasets used in IR, providing core information about what data each benchmark provides as well as links to more detailed information. We welcome community contributions and intend to continue to maintain and grow this tool.

References

[1]
Mustafa Abualsaud, Christina Lioma, Maria Maistro, Mark D. Smucker, and Guido Zuccon. 2019. Overview of the TREC 2019 Decision Track. In TREC.
[2]
James Allan, Javed A. Aslam, Ben Carterette, Virgil Pavlu, and Evangelos Kanoulas. 2008. Million Query Track 2008 Overview. In TREC.
[3]
James Allan, Ben Carterette, Javed A. Aslam, Virgil Pavlu, Blagovest Dachev, and Evangelos Kanoulas. 2007. Million Query Track 2007 Overview. In TREC.
[4]
James Allan, Donna Harman, Evangelos Kanoulas, Dan Li, Christophe Van Gysel, and Ellen Vorhees. 2017. TREC 2017 Common Core Track Overview. In TREC.
[5]
Alexander Bondarenko, Maik Fröbe, Meriem Beloucif, Lukas Gienapp, Yamen Ajjour, Alexander Panchenko, Christian Biemann, Benno Stein, Henning Wachsmuth, Martin Potthast, and Matthias Hagen. 2020. Overview of Touché 2020: Argument Retrieval. In CLEF.
[6]
Vera Boteva, Demian Gholipour, Artem Sokolov, and Stefan Riezler. 2016. A Full-Text Learning to Rank Dataset for Medical Information Retrieval. In Proceedings of the European Conference on Information Retrieval (ECIR) (Padova, Italy). Springer.
[7]
Stefan Büttcher, Charles L. A. Clarke, and Ian Soboroff. 2006. The TREC 2006 Terabyte Track. In TREC.
[8]
Ben Carterette, Virgil Pavlu, Hui Fang, and Evangelos Kanoulas. 2009. Million Query Track 2009 Overview. In TREC.
[9]
Charles L. A. Clark, Falk Scholer, and Ian Soboroff. 2005. The TREC 2005 Terabyte Track. In TREC.
[10]
Charles Clarke, Nick Craswell, and Ian Soboroff. 2004. Overview of the TREC 2004 Terabyte Track. In TREC.
[11]
Charles L. A. Clarke, Nick Craswell, and Ian Soboroff. 2009. Overview of the TREC 2009 Web Track. In TREC.
[12]
Charles L. A. Clarke, Nick Craswell, Ian Soboroff, and Gordon V. Cormack. 2010. Overview of the TREC 2010 Web Track. In TREC.
[13]
Charles L. A. Clarke, Nick Craswell, Ian Soboroff, and Ellen M. Voorhees. 2011. Overview of the TREC 2011 Web Track. In TREC.
[14]
Charles L. A. Clarke, Nick Craswell, and Ellen M. Voorhees. 2012. Overview of the TREC 2012 Web Track. In TREC.
[15]
Arman Cohan, Sergey Feldman, Iz Beltagy, Doug Downey, and Daniel Weld. 2020. SPECTER: Document-level Representation Learning using Citation-informed Transformers. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 2270--2282. https://doi.org/10.18653/v1/2020.acl-main.207
[16]
Kevyn Collins-Thompson, Paul Bennett, Fernando Diaz, Charles L. A. Clarke, and Ellen M. Voorhees. 2013. TREC 2013 Web Track Overview. In TREC.
[17]
Kevyn Collins-Thompson, Craig Macdonald, Paul Bennett, Fernando Diaz, and Ellen M. Voorhees. 2014. TREC 2014 Web Track Overview. In TREC.
[18]
Nick Craswell, Daniel Campos, Bhaskar Mitra, Emine Yilmaz, and Bodo Billerbeck. 2020 a. ORCAS: 18 Million Clicked Query-Document Pairs for Analyzing Search. arXiv preprint arXiv:2006.05324 (2020).
[19]
Nick Craswell and David Hawking. 2002. Overview of the TREC-2002 Web Track. In TREC.
[20]
Nick Craswell and David Hawking. 2004. Overview of the TREC-2004 Web Track. In TREC.
[21]
Nick Craswell, David Hawking, Ross Wilkinson, and Mingfang Wu. 2003. Overview of the TREC 2003 Web Track. In TREC.
[22]
Nick Craswell, Bhaskar Mitra, Emine Yilmaz, and Daniel Campos. 2020 b. Overview of the TREC 2020 deep learning track. In TREC.
[23]
Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, and Ellen Voorhees. 2019. Overview of the TREC 2019 deep learning track. In TREC 2019.
[24]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT.
[25]
Laura Dietz and Ben Gamari. 2017. TREC CAR: A Data Set for Complex Answer Retrieval. (2017). http://trec-car.cs.unh.edu Version 1.5.
[26]
Laura Dietz, Manisha Verma, Filip Radlinski, and Nick Craswell. 2017. TREC Complex Answer Retrieval Overview. In TREC.
[27]
T. Diggelmann, Jordan L. Boyd-Graber, Jannis Bulian, Massimiliano Ciaramita, and Markus Leippold. 2020. CLIMATE-FEVER: A Dataset for Verification of Real-World Climate Claims. ArXiv, Vol. abs/2012.00614 (2020).
[28]
Jibril Frej, Didier Schwab, and Jean-Pierre Chevallet. 2020 a. MLWIKIR: A Python Toolkit for Building Large-scale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and More. In CIRCLE.
[29]
Jibril Frej, Didier Schwab, and Jean-Pierre Chevallet. 2020 b. WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset. In LREC.
[30]
Fredric Gey and Douglas Oard. 2001. The TREC-2001 Cross-Language Information Retrieval Track: Searching Arabic using English, French or Arabic Queries. In TREC.
[31]
Fredric Gey and Douglas Oard. 2002. The TREC-2002 Arabic/English CLIR Track. In TREC.
[32]
David Graff and Kevin Walker. 2001. Arabic Newswire Part 1 LDC2001T55. https://catalog.ldc.upenn.edu/LDC2001T55
[33]
Donna Harman. 1994. Overview of the Third Text REtrieval Conference (TREC-3). In TREC.
[34]
Donna Harman. 1995. Overview of the Fourth Text REtrieval Conference (TREC-4). In TREC.
[35]
Helia Hashemi, Mohammad Aliannejadi, Hamed Zamani, and Bruce Croft. 2020. ANTIQUE: A Non-Factoid Question Answering Benchmark. In ECIR.
[36]
Faegheh Hasibi, Fedor Nikolaev, Chenyan Xiong, K. Balog, S. E. Bratsberg, Alexander Kotov, and J. Callan. 2017. DBpedia-Entity v2: A Test Collection for Entity Search. Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (2017).
[37]
William Hersh, Aaron Cohen, Lynn Ruslen, and Phoebe Roberts. 2007 a. TREC 2007 Genomics Track Overview. In TREC.
[38]
William Hersh, Aaron Cohen, Jianji Yang, Ravi Teja Bhupatiraju, Phoebe Roberts, and Marti Hearst. 2007 b. TREC 2005 Genomics Track Overview. In TREC.
[39]
William Hersh, Aaron M. Cohen, Phoebe Roberts, and Hari Krishna Rekapalli. 2006. TREC 2006 Genomics Track Overview. In TREC.
[40]
William R. Hersh, Ravi Teja Bhuptiraju, Laura Ross, Phoebe Johnson, Aaron M. Cohen, and Dale F. Kraemer. 2004. TREC 2004 Genomics Track Overview. In TREC.
[41]
D. Hoogeveen, Karin M. Verspoor, and Timothy Baldwin. 2015. CQADupStack: A Benchmark Data Set for Community Question-Answering Research. Proceedings of the 20th Australasian Document Computing Symposium (2015).
[42]
Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc Brockschmidt. 2019. CodeSearchNet Challenge: Evaluating the State of Semantic Code Search. ArXiv (2019).
[43]
Kevin Martin Jose, Thong Nguyen, Sean MacAvaney, Jeff Dalton, and Andrew Yates. 2021. DiffIR: Exploring Differences in Ranking Models' Behavior. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval.
[44]
Mandar Joshi, Eunsol Choi, Daniel S. Weld, and Luke Zettlemoyer. 2017. TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension. In ACL.
[45]
Vladimir Karpukhin, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen tau Yih. 2020. Dense Passage Retrieval for Open-Domain Question Answering. arxiv: 2004.04906 [cs.CL]
[46]
Omar Khattab and Matei Zaharia. 2020. ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT. SIGIR (2020).
[47]
Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Matthew Kelcey, Jacob Devlin, Kenton Lee, Kristina N. Toutanova, Llion Jones, Ming-Wei Chang, Andrew Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov. 2019. Natural Questions: a Benchmark for Question Answering Research. TACL (2019).
[48]
Jimmy Lin and Miles Efron. 2013. Overview of the TREC-2013 Microblog Track. In TREC.
[49]
Jimmy Lin, Miles Efron, Yulu Wang, and Garrick Sherman. 2014. Overview of the TREC-2014 Microblog Track. In TREC.
[50]
Jimmy Lin, Xueguang Ma, Sheng-Chieh Lin, Jheng-Hong Yang, Ronak Pradeep, and Rodrigo Nogueira. 2021. Pyserini: An Easy-to-Use Python Toolkit to Support Replicable IR Research with Sparse and Dense Representations. ArXiv, Vol. abs/2102.10073 (2021).
[51]
Cheng Luo, Tetsuya Sakai, Yiqun Liu, Zhicheng Dou, Chenyan Xiong, and Jingfang Xu. 2017. Overview of the NTCIR-13 We Want Web Task. In NTCIR.
[52]
Sean MacAvaney. 2020. OpenNIR: A Complete Neural Ad-Hoc Ranking Pipeline. In Proceedings of the Thirteenth ACM International Conference on Web Search and Data Mining. 845--848. https://doi.org/10.1145/3336191.3371864
[53]
Sean MacAvaney, Franco Maria Nardini, Raffaele Perego, Nicola Tonellotto, Nazli Goharian, and Ophir Frieder. 2020. Efficient Document Re-Ranking for Transformers by Precomputing Term Representations. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 49--58. https://doi.org/10.1145/3397271.3401093
[54]
Sean MacAvaney, Andrew Yates, Kai Hui, and Ophir Frieder. 2019. Content-Based Weak Supervision for Ad-Hoc Re-Ranking. In SIGIR.
[55]
Craig Macdonald and Nicola Tonellotto. 2020. Declarative Experimentation inInformation Retrieval using PyTerrier. In Proceedings of ICTIR 2020.
[56]
Iain Mackie, Jeffrey Dalton, and Andrew Yates. 2021. How Deep is your Learning: the DL-HARD Annotated Deep Learning Dataset. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval.
[57]
Macedo Maia, S. Handschuh, A. Freitas, Brian Davis, R. McDermott, M. Zarrouk, and A. Balahur. 2018. WWW'18 Open Challenge: Financial Opinion Mining and Question Answering. Companion Proceedings of the The Web Conference 2018 (2018).
[58]
Antonio Mallia, Michal Siedlaczek, Joel Mackenzie, and Torsten Suel. 2019. PISA: Performant Indexes and Search for Academia. In Proceedings of the Open-Source IR Replicability Challenge co-located with 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, OSIRRC@SIGIR 2019, Paris, France, July 25, 2019. 50--56. http://ceur-ws.org/Vol-2409/docker08.pdf
[59]
Jiaxin Mao, Tetsuya Sakai, Cheng Luo, Peng Xiao, Yiqun Liu, and Zhicheng Dou. 2018. Overview of the NTCIR-14 We Want Web Task. In NTCIR.
[60]
Iadh Ounis, Gianni Amati, Vassilis Plachouras, Ben He, Craig Macdonald, and Douglas Johnson. 2005. Terrier information retrieval platform. In ECIR. Springer, 517--519.
[61]
Joao Palotti, Guido Zuccon, Jimmy, Pavel Pecina, Mihai Lupu, Lorraine Goeuriot, Liadh Kelly, and Allan Hanbury. 2017. CLEF 2017 Task Overview: The IR Task at the eHealth Evaluation Lab - Evaluating Retrieval Methods for Consumer Health Search. In CLEF.
[62]
The pandas development team. 2020. pandas-dev/pandas: Pandas. https://doi.org/10.5281/zenodo.3509134
[63]
Nick Craswell Li Deng Jianfeng Gao Xiaodong Liu Rangan Majumder Andrew McNamara Bhaskar Mitra Tri Nguyen Mir Rosenberg Xia Song Alina Stoica Saurabh Tiwary Tong Wang Payal Bajaj, Daniel Campos. 2016. MS MARCO: A Human Generated MAchine Reading COmprehension Dataset. In InCoCo@NIPS.
[64]
Navid Rekabsaz, Oleg Lesota, Markus Schedl, Jon Brassey, and Carsten Eickhoff. 2021. TripClick: The Log Files of a Large Health Web Search Engine. In SIGIR.
[65]
Kirk Roberts, Dina Demner-Fushman, Ellen Voorhees, William R. Hersh, Steven Bedrick, and Alexander J. Lazar. 2018. Overview of the TREC 2018 Precision Medicine Track. In TREC.
[66]
Kirk Roberts, Dina Demner-Fushman, Ellen Voorhees, William R. Hersh, Steven Bedrick, Alexander J. Lazar, and Shubham Pant. 2017. Overview of the TREC 2017 Precision Medicine Track. In TREC.
[67]
Kirk Roberts, Dina Demner-Fushman, Ellen Voorhees, William R. Hersh, Steven Bedrick, Alexander J. Lazar, Shubham Pant, and Funda Meric-Bernstam. 2019. Overview of the TREC 2019 Precision Medicine Track. In TREC.
[68]
Kirk Roberts, Dina Demner-Fushman, Ellen M. Voorhees, and William R. Hersh. 2016. Overview of the TREC 2016 Clinical Decision Support Track. In TREC.
[69]
Kirk Roberts, Matthew S. Simpson, Ellen Voorhees, and William R. Hersh. 2015. Overview of the TREC 2015 Clinical Decision Support Track. In TREC.
[70]
Willie Rogers. 2000 a. TREC Mandarin LDC2000T52. https://catalog.ldc.upenn.edu/LDC2000T52
[71]
Willie Rogers. 2000 b. TREC Spanish LDC2000T51. https://catalog.ldc.upenn.edu/LDC2000T51
[72]
Evan Sandhaus. 2008. The new york times annotated corpus. Linguistic Data Consortium, Philadelphia, Vol. 6, 12 (2008), e26752.
[73]
Royal Sequiera and Jimmy Lin. 2017. Finally, a Downloadable Test Collection of Tweets. In SIGIR.
[74]
Matthew S. Simpson, Ellen M. Voorhees, and William Hersh. 2014. Overview of the TREC 2014 Clinical Decision Support Track. In TREC.
[75]
Alan Smeaton and Ross Wilkinson. 1996. Spanish and Chinese Document Retrieval in TREC-5. In TREC.
[76]
Ian Soboroff, Shudong Huang, and Donna Harman. 2018. TREC 2018 News Track Overview. In TREC.
[77]
Ian Soboroff, Shudong Huang, and Donna Harman. 2019. TREC 2019 News Track Overview. In TREC.
[78]
Nandan Thakur, Nils Reimers, Andreas Rücklé, Abhishek Srivastava, and Iryna Gurevych. 2021. BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models. arXiv preprint arXiv:2104.08663 (4 2021). https://arxiv.org/abs/2104.08663
[79]
James Thorne, Andreas Vlachos, Christos Christodoulopoulos, and Arpit Mittal. 2018. FEVER: a Large-scale Dataset for Fact Extraction and VERification. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, New Orleans, Louisiana, 809--819. https://doi.org/10.18653/v1/N18--1074
[80]
Ellen Voorhees. 2004. Overview of the TREC 2004 Robust Retrieval Track. In TREC.
[81]
E. Voorhees, Tasmeer Alam, Steven Bedrick, Dina Demner-Fushman, W. Hersh, Kyle Lo, Kirk Roberts, I. Soboroff, and Lucy Lu Wang. 2020. TREC-COVID: Constructing a Pandemic Information Retrieval Test Collection. ArXiv, Vol. abs/2005.04474 (2020).
[82]
Ellen M. Voorhees. 2005. Overview of the TREC 2005 Robust Retrieval Track. In TREC.
[83]
Henning Wachsmuth, Shahbaz Syed, and Benno Stein. 2018. Retrieval of the Best Counterargument without Prior Topic Knowledge. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Melbourne, Australia). Association for Computational Linguistics, 241--251. http://aclweb.org/anthology/P18--1023
[84]
David Wadden, Shanchuan Lin, Kyle Lo, Lucy Lu Wang, Madeleine van Zuylen, Arman Cohan, and Hannaneh Hajishirzi. 2020. Fact or Fiction: Verifying Scientific Claims. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Online, 7534--7550. https://doi.org/10.18653/v1/2020.emnlp-main.609
[85]
Lucy Lu Wang, Kyle Lo, Yoganand Chandrasekhar, Russell Reas, Jiangjiang Yang, Darrin Eide, K. Funk, Rodney Michael Kinney, Ziyang Liu, W. Merrill, P. Mooney, D. Murdick, Devvret Rishi, Jerry Sheehan, Zhihong Shen, B. Stilson, A. Wade, K. Wang, Christopher Wilhelm, Boya Xie, D. Raymond, Daniel S. Weld, Oren Etzioni, and Sebastian Kohlmeier. 2020. CORD-19: The Covid-19 Open Research Dataset. ArXiv (2020).
[86]
Ross Wilkinson. 1997. Chinese Document Retrieval at TREC-6. In TREC.
[87]
Thomas Wolf, Quentin Lhoest, Patrick von Platen, Yacine Jernite, Mariama Drame, Julien Plu, Julien Chaumond, Clement Delangue, Clara Ma, Abhishek Thakur, Suraj Patil, Joe Davison, Teven Le Scao, Victor Sanh, Canwen Xu, Nicolas Patry, Angie McMillan-Major, Simon Brandeis, Sylvain Gugger, François Lagunas, Lysandre Debut, Morgan Funtowicz, Anthony Moi, Sasha Rush, Philipp Schmidd, Pierric Cistac, Victor Mu?tar, Jeff Boudier, and Anna Tordjmann. 2020. Datasets. GitHub. Note: https://github.com/huggingface/datasets, Vol. 1 (2020).
[88]
Peilin Yang, Hui Fang, and Jimmy Lin. 2017. Anserini: Enabling the Use of Lucene for Information Retrieval Research. SIGIR (2017).
[89]
Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William Cohen, Ruslan Salakhutdinov, and Christopher D. Manning. 2018. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 2369--2380. https://doi.org/10.18653/v1/D18--1259
[90]
Andrew Yates, Kevin Martin Jose, Xinyu Zhang, and Jimmy Lin. 2020. Flexible IR pipelines with Capreolus. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 3181--3188.
[91]
Guido Zuccon, Joao Palotti, Lorraine Goeuriot, Liadh Kelly, Mihai Lupu, Pavel Pecina, Henning Müller, Julie Budaher, and Anthony Deacon. 2016. The IR Task at the CLEF eHealth Evaluation Lab 2016: User-centred Health Information Retrieval. In CLEF.

Cited By

View all

Index Terms

  1. Simplified Data Wrangling with ir_datasets

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval
    July 2021
    2998 pages
    ISBN:9781450380379
    DOI:10.1145/3404835
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 July 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. benchmarks
    2. datasets
    3. information retrieval

    Qualifiers

    • Short-paper

    Conference

    SIGIR '21
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)96
    • Downloads (Last 6 weeks)11
    Reflects downloads up to 25 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Report on the 1st International Workshop on Open Web Search (WOWS 2024) at ECIR 2024ACM SIGIR Forum10.1145/3687273.368729058:1(1-13)Online publication date: 7-Aug-2024
    • (2024)Reproducible Hybrid Time-Travel Retrieval in Evolving CorporaProceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3673791.3698421(203-208)Online publication date: 8-Dec-2024
    • (2024)Enhancing Asymmetric Web Search through Question-Answer Generation and RankingProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671517(6127-6136)Online publication date: 25-Aug-2024
    • (2024)ReNeuIR at SIGIR 2024: The Third Workshop on Reaching Efficiency in Neural Information RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657994(3051-3054)Online publication date: 10-Jul-2024
    • (2024)Resources for Combining Teaching and Research in Information Retrieval CourseworkProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657886(1115-1125)Online publication date: 10-Jul-2024
    • (2024)Browsing and Searching Metadata of TRECProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657873(313-323)Online publication date: 10-Jul-2024
    • (2024)Revisiting Document Expansion and Filtering for Effective First-Stage RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657850(186-196)Online publication date: 10-Jul-2024
    • (2024)Neural Passage Quality Estimation for Static PruningProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657765(174-185)Online publication date: 10-Jul-2024
    • (2024)Validating Synthetic Usage Data in Living Lab EnvironmentsJournal of Data and Information Quality10.1145/362364016:1(1-33)Online publication date: 6-Mar-2024
    • (2024)RePair My Queries: Personalized Query Reformulation via Conditional TransformersWeb Information Systems Engineering – WISE 202410.1007/978-981-96-0579-8_16(219-229)Online publication date: 29-Nov-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media