tutorial

Low cost evaluation in information retrieval

Authors:

SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval

Page 903

https://doi.org/10.1145/1835449.1835675

Published: 19 July 2010 Publication History

Get Access

Abstract

Search corpora are growing larger and larger: over the last 10 years, the IR research community has moved from the several hundred thousand documents on the TREC disks to the tens of millions of U.S. government web pages of GOV2 to the one billion general-interest web pages in the new ClueWeb09 collection. But traditional means of acquiring relevance judgments and evaluating - e.g. pooling documents to calculate average precision - do not seem to scale well to these new large collections. They require substantially more cost in human assessments for the same reliability in evaluation; if the additional cost goes over the assessing budget, errors in evaluation are inevitable.

Some alternatives to pooling that support low-cost and reliable evaluation have recently been proposed. A number of them have already been used in TREC and other evaluation forums (TREC Million Query, Legal, Chemical, Web, Relevance Feedback Tracks, CLEF Patent IR, INEX). Evaluation via implicit user feedback (e.g. clicks) and crowdsourcing have also recently gained attention in the community. Thus it is important that the methodologies, the analysis they support, and their strengths and weaknesses are well-understood by the IR community. Furthermore, these approaches can support small research groups attempting to start investigating new tasks on new corpora with relatively low cost. Even groups that do not participate in TREC, CLEF, or other evaluation conferences can benefit from understanding how these methods work, how to use them, and what they mean as they build test collections for tasks they are interested in.

The goal of this tutorial is to provide attendees with a comprehensive overview of techniques to perform low cost (in terms of judgment effort) evaluation. A number of topics will be covered, including alternatives to pooling, evaluation measures robust to incomplete judgments, evaluating with no relevance judgments, statistical inference of evaluation metrics, inference of relevance judgments, query selection, techniques to test the reliability of the evaluation and reusability of the constructed collections.

The tutorial should be of interest to a wide range of attendees. Those new to the field will come away with a solid understanding of how low cost evaluation methods can be applied to construct inexpensive test collections and evaluate new IR technology, while those with intermediate knowledge will gain deeper insights and further understand the risks and gains of low cost evaluation. Attendees should have a basic knowledge of the traditional evaluation framework (Cranfield) and metrics (such as average precision and nDCG), along with some basic knowledge on probability theory and statistics. More advanced concepts will be explained during the tutorial.

Cited By

View all

Lee YLee KPark SHwang DKim JLee HLee MKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)QASAProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619195(19036-19052)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619195
Castells PHurley NVargas S(2021)Novelty and Diversity in Recommender SystemsRecommender Systems Handbook10.1007/978-1-0716-2197-4_16(603-646)Online publication date: 22-Nov-2021
https://doi.org/10.1007/978-1-0716-2197-4_16
Makagonov PFigueroa AEspinosa S(2018)On the Economic Contribution of Specialized Higher Education Institutions to the Development of Monofunctional CitiesStudies on Russian Economic Development10.1134/S107570071801009429:1(79-85)Online publication date: 21-Feb-2018
https://doi.org/10.1134/S1075700718010094
Show More Cited By

Index Terms

Low cost evaluation in information retrieval
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results

Recommendations

Current Status of the Evaluation of Information Retrieval

This is the second in the series of the articles on an application of the systems analytic approach to evaluation of information retrieval (IR). In the previous article a historical overview of IR was presented and existing terminological problems ...
Dynamic Test Collections for Retrieval Evaluation
ICTIR '15: Proceedings of the 2015 International Conference on The Theory of Information Retrieval

Batch evaluation with test collections of documents, search topics, and relevance judgments has been the bedrock of IR evaluation since its adoption by Salton for his experiments on vector space systems. Such test collections have limitations: they ...
Semiautomatic evaluation of retrieval systems using document similarities
CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management

Semiautomatic evaluation of retrieval systems using document similarities.

Comments

Information & Contributors

Information

Published In

SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval

July 2010

944 pages

ISBN:9781450301534

DOI:10.1145/1835449

General Chairs:
Fabio Crestani
University of Lugano, CH
,
Stéphane Marchand-Maillet
University of Geneva, CH
,
Program Chairs:
Hsin-Hsi Chen
National Taiwan University, TW
,
Efthimis N. Efthimiadis
University of Washington, USA
,
Jacques Savoy
University of Neuchatel, CH

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 July 2010

Check for updates

Author Tags

Qualifiers

Tutorial

Conference

SIGIR '10

Sponsor:

SIGIR

SIGIR '10: The 33rd International ACM SIGIR conference on research and development in Information Retrieval

July 19 - 23, 2010

Geneva, Switzerland

Acceptance Rates

SIGIR '10 Paper Acceptance Rate 87 of 520 submissions, 17%;

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
443
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Lee YLee KPark SHwang DKim JLee HLee MKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)QASAProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619195(19036-19052)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619195
Castells PHurley NVargas S(2021)Novelty and Diversity in Recommender SystemsRecommender Systems Handbook10.1007/978-1-0716-2197-4_16(603-646)Online publication date: 22-Nov-2021
https://doi.org/10.1007/978-1-0716-2197-4_16
Makagonov PFigueroa AEspinosa S(2018)On the Economic Contribution of Specialized Higher Education Institutions to the Development of Monofunctional CitiesStudies on Russian Economic Development10.1134/S107570071801009429:1(79-85)Online publication date: 21-Feb-2018
https://doi.org/10.1134/S1075700718010094
Chen YZhou KLiu YZhang MMa SKando NSakai TJoho HLi Hde Vries AWhite R(2017)Meta-evaluation of Online and Offline Web Search Evaluation MetricsProceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3077136.3080804(15-24)Online publication date: 7-Aug-2017
https://dl.acm.org/doi/10.1145/3077136.3080804
Lu XMoffat ACulpepper J(2016)The effect of pooling and evaluation depth on IR metricsInformation Retrieval Journal10.1007/s10791-016-9282-619:4(416-445)Online publication date: 21-Jun-2016
https://doi.org/10.1007/s10791-016-9282-6

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Current Status of the Evaluation of Information Retrieval

Dynamic Test Collections for Retrieval Evaluation

Semiautomatic evaluation of retrieval systems using document similarities