skip to main content
10.1145/3404835.3463244acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

On the Quality of the TREC-COVID IR Test Collections

Published: 11 July 2021 Publication History

Abstract

Shared text collections continue to be vital infrastructure for IR research. The COVID-19 pandemic offered an opportunity to create a test collection that captured the rapidly changing information space during a pandemic, and the TREC-COVID effort was created to build such a collection using the TREC framework. This paper examines the quality of the resulting TREC-COVID test collections, and in doing so, offers a critique of the state-of-the-art in building reusable IR test collections. The largest of the collections--called 'TREC-COVID Complete'--is found to be on par with previous TREC ad~hoc collections with existing quality tests uncovering no apparent problems. Yet the lack of any way to definitively demonstrate the collection's quality and its violation of previously used quality heuristics suggest much work remains to be done to understand the factors affecting collection quality.

Supplementary Material

MP4 File (SIGIR21-fp1400.mp4)
Presentation for the paper "On the Quality of the TREC-COVID IR Test Collections".

References

[1]
Chris Buckley, Darrin Dimmick, Ian Soboroff, and Ellen Voorhees. 2007. Bias and the Limits of Pooling for Large Collections. Information Retrieval 10 (2007), 491--508.
[2]
Chris Buckley and Ellen M. Voorhees. 2000. Evaluating Evaluation Measure Stability. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, N. Belkin, P. Ingwersen, and M.K. Leong (Eds.). 33--40.
[3]
Gordon V. Cormack and Maura R. Grossman. 2016. Engineering Quality and Reliability in Technology-Assisted Review. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR '16). ACM, 75--84.
[4]
Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, and Ellen M. Voorhees. 2020. Overview of the TREC 2019 Deep Learning Track. In Proceedings of Twenty-Eighth Text REtrieval Conference (TREC 2019).
[5]
Kirk Roberts, Tasmeer Alam, Steven Bedrick, Dina Demner-Fushman, Kyle Lo, Ian Soboroff, Ellen Voorhees, Lucy Lu Wang, and William R Hersh. 2020. TREC-COVID: rationale and structure of an information retrieval shared task for COVID-19. Journal of the American Medical Informatics Association 27, 9 (2020), 1431--1436.https://doi.org/10.1093/jamia/ocaa091
[6]
Kirk Roberts, Tasmeer Alam, Steven Bedrick, Dina Demner-Fushman, Kyle Lo, Ian Soboroff, Ellen Voorhees, Lucy Lu Wang, and William R Hersh. 2021. Searching for Scientific Evidence in a Pandemic: An Overview of TREC-COVID. arXiv2104.09632 (2021).
[7]
Tetsuya Sakai. 2006. Evaluating Evaluation Metrics Based on the Bootstrap. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '06). 525--532. https://doi.org/10.1145/1148170.1148261
[8]
Gerard Salton and Chris Buckley. 1997. Improving retrieval performance by relevance feedback. Journal of the Association for Information Science and Technology 41 (1997), 355--364.
[9]
Mark Sanderson and Ian Soboroff. 2007. Problems with Kendall's Tau. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 839--840.
[10]
Alan Stuart. 1983. Kendall's Tau. In Encyclopedia of Statistical Sciences, Samuel Kotz and Norman L. Johnson (Eds.). Vol. 4. John Wiley & Sons, 367--369.
[11]
Ellen Voorhees, Tasmeer Alam, Steven Bedrick, Dina Demner-Fushman, William R. Hersh, Kyle Lo, Kirk Roberts, Ian Soboroff, and Lucy Lu Wang. 2020. TREC-COVID: Constructing a Pandemic Information Retrieval Test Collection.ACM SIGIR Forum 54, 1 (June 2020).
[12]
Ellen M. Voorhees. 2018. On Building Fair and Reusable Test Collections Using Bandit Techniques. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM '18). 407--416. https://doi.org/10.1145/3269206.3271766
[13]
Lucy Lu Wang, Kyle Lo, Yoganand Chandrasekhar, Russell Reas, Jiangjiang Yang, Darrin Eide, Kathryn Funk, Rodney Kinney, Ziyang Liu, William. Merrill, Paul Mooney, Dewey A. Murdick, Devvret Rishi, Jerry Sheehan, Zhihong Shen, Bran-don Stilson, Alex D. Wade, Kuansan Wang, Christopher Wilhelm, Boya Xie, Douglas M. Raymond, Daniel S. Weld, Oren Etzioni, and Sebastian Kohlmeier. 2020. CORD-19: The Covid-19 Open Research Dataset. ArXivabs/2004.10706(2020).
[14]
Justin Zobel. 1998. How Reliable are the Results of Large-Scale Information Retrieval Experiments?. In Proceedings of the 21st Annual International ACMSIGIR Conference on Research and Development in Information Retrieval, W. Bruce Croft, Alistair Moffat, C.J. van Rijsbergen, Ross Wilkinson, and Justin Zobel (Eds.). ACM Press, New York, Melbourne, Australia, 307--314.

Cited By

View all

Index Terms

  1. On the Quality of the TREC-COVID IR Test Collections

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval
    July 2021
    2998 pages
    ISBN:9781450380379
    DOI:10.1145/3404835
    Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 July 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. TREC
    2. datasets
    3. test collections

    Qualifiers

    • Short-paper

    Conference

    SIGIR '21
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)15
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 06 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media