Article

Incremental test collections

Authors:

Ben Carterette,

James AllanAuthors Info & Claims

CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge management

Pages 680 - 687

https://doi.org/10.1145/1099554.1099723

Published: 31 October 2005 Publication History

Get Access

Abstract

Corpora and topics are readily available for information retrieval research. Relevance judgments, which are necessary for system evaluation, are expensive; the cost of obtaining them prohibits in-house evaluation of retrieval systems on new corpora or new topics. We present an algorithm for cheaply constructing sets of relevance judgments. Our method intelligently selects documents to be judged and decides when to stop in such a way that with very little work there can be a high degree of confidence in the result of the evaluation. We demonstrate the algorithm's effectiveness by showing that it produces small sets of relevance judgments that reliably discriminate between two systems. The algorithm can be used to incrementally design retrieval systems by simultaneously comparing sets of systems. The number of additional judgments needed after each incremental design change decreases at a rate reciprocal to the number of systems being compared. To demonstrate the effectiveness of our method, we evaluate TREC ad hoc submissions, showing that with 95% fewer relevance judgments we can reach a Kendall's tau rank correlation of at least 0.9.

References

[1]

S. M. Beitzel, E. C. Jensen, A. Chowdhury, D. Grossman, and O. Frieder. Using manually-build web directories for automatic evaluation of known-item retrieval. In Proceedings of SIGIR '03, pages 373--374, 2003.

Digital Library

Google Scholar

[2]

C. Buckley and E. M. Voorhees. Evaluating Evaluation Measure Stability. In Proceedings of SIGIR '00, pages 33--40, 2000.

Digital Library

Google Scholar

[3]

C. Buckley and E. M. Voorhees. Retrieval evaluation with incomplete information. In Proceedings of SIGIR '04, pages 25--32, 2004.

Digital Library

Google Scholar

[4]

G. V. Cormack, C. R. Palmer, and C. L. Clarke. Efficient Construction of Large Test Collections. In Proceedings of SIGIR '98, pages 282--289, 1998.

Digital Library

Google Scholar

[5]

S. P. Harter. Variations in relevance assessments and the measurement of retrieval effectiveness. JASIS, 47(1):37--49, 1996.

Digital Library

Google Scholar

[6]

M. Kendall. Rank Correlation Methods. Griffin, London, UK, fourth edition, 1970.

Google Scholar

[7]

M. Sanderson and H. Joho. Forming test collections with no system pooling. In Proceedings of SIGIR '04, pages 33--40, 2004.

Digital Library

Google Scholar

[8]

I. Soboroff, C. Nicholas, and P. Cahan. Ranking Retrieval Systems without Relevance Judgments. In Proceedings of SIGIR '01, pages 66--73, 2001.

Digital Library

Google Scholar

[9]

K. Sparck Jones and C. J. van Rijsbergen. Information Retrieval Test Collections. Journal of Documentation, 32(1):59--75, 1976.

Crossref

Google Scholar

[10]

E. Voorhees. Variations in Relevance Judgments and the Measurement of Retrieval Effectiveness. In Proceedings of SIGIR '98, pages 315--323, 1998.

Digital Library

Google Scholar

[11]

E. M. Voorhees. Evaluation by highly relevant documents. In Proceedings of SIGIR '01, pages 74--82, 2001.

Digital Library

Google Scholar

[12]

E. M. Voorhees. The philosophy of information retrieval evaluation. In CLEF '01: Revised Papers from the Second Workshop of CLEF, pages 355--370, London, UK, 2002. Springer-Verlag.

Digital Library

Google Scholar

[13]

E. M. Voorhees and D. Harman. Overview of the Eighth Text REtrieval Conference (TREC-8). In Proceedings of TREC-8, pages 1--24, 1999. NIST Special Publication 500-246.

Google Scholar

[14]

J. Zobel. How Reliable are the Results of Large-Scale Information Retrieval Experiments? In Proceedings of SIGIR '98, pages 307--314, 1998.

Digital Library

Google Scholar

Cited By

View all

Wang YXue SWu JXu P(2022)Continuous-time quantum walk based centrality testing on weighted graphsScientific Reports10.1038/s41598-022-09915-112:1Online publication date: 9-Apr-2022
https://doi.org/10.1038/s41598-022-09915-1
Roitero KMaddalena EMizzaro SScholer F(2022)On the effect of relevance scales in crowdsourcing relevance assessments for Information Retrieval evaluationInformation Processing and Management: an International Journal10.1016/j.ipm.2021.10268858:6Online publication date: 22-Apr-2022
https://dl.acm.org/doi/10.1016/j.ipm.2021.102688
Ferro N(2017)What Does Affect the Correlation Among Evaluation Measures?ACM Transactions on Information Systems10.1145/310637136:2(1-40)Online publication date: 29-Aug-2017
https://dl.acm.org/doi/10.1145/3106371
Show More Cited By

Index Terms

Incremental test collections
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
  2. Information storage systems

Recommendations

Minimal test collections for retrieval evaluation
SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval

Accurate estimation of information retrieval evaluation metrics such as average precision require large sets of relevance judgments. Building sets large enough for evaluation of real-world implementations is at best inefficient, at worst infeasible. In ...
Robust test collections for retrieval evaluation
SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

Low-cost methods for acquiring relevance judgments can be a boon to researchers who need to evaluate new retrieval tasks or topics but do not have the resources to make thousands of judgments. While these judgments are very useful for a one-time ...
Dynamic Test Collections for Retrieval Evaluation
ICTIR '15: Proceedings of the 2015 International Conference on The Theory of Information Retrieval

Batch evaluation with test collections of documents, search topics, and relevance judgments has been the bedrock of IR evaluation since its adoption by Salton for his experiments on vector space systems. Such test collections have limitations: they ...

Comments

Information & Contributors

Information

Published In

CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge management

October 2005

854 pages

ISBN:1595931406

DOI:10.1145/1099554

General Chair:
Otthein Herzog
University of Bremen, Germany
,
Program Chairs:
Hans-Jörg Schek
University for Health Sciences, Medical Informatics and Technology, Austria
,
Norbert Fuhr
University of Duisburg-Essen, Germany
,
Abdur Chowdhury
America Online, USA
,
Wilfried Teiken
IBM T.J. Watson Research Center, USA

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 October 2005

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

CIKM05

Sponsor:

CIKM05: Conference on Information and Knowledge Management

October 31 - November 5, 2005

Bremen, Germany

Acceptance Rates

CIKM '05 Paper Acceptance Rate 77 of 425 submissions, 18%;

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

22
Total Citations
View Citations
379
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)0

Reflects downloads up to 26 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Wang YXue SWu JXu P(2022)Continuous-time quantum walk based centrality testing on weighted graphsScientific Reports10.1038/s41598-022-09915-112:1Online publication date: 9-Apr-2022
https://doi.org/10.1038/s41598-022-09915-1
Roitero KMaddalena EMizzaro SScholer F(2022)On the effect of relevance scales in crowdsourcing relevance assessments for Information Retrieval evaluationInformation Processing and Management: an International Journal10.1016/j.ipm.2021.10268858:6Online publication date: 22-Apr-2022
https://dl.acm.org/doi/10.1016/j.ipm.2021.102688
Ferro N(2017)What Does Affect the Correlation Among Evaluation Measures?ACM Transactions on Information Systems10.1145/310637136:2(1-40)Online publication date: 29-Aug-2017
https://dl.acm.org/doi/10.1145/3106371
Soboroff IKando NSakai TJoho HLi Hde Vries AWhite R(2017)Building Test CollectionsProceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3077136.3082064(1407-1410)Online publication date: 7-Aug-2017
https://dl.acm.org/doi/10.1145/3077136.3082064
Schuth AOosterhuis HWhiteson Sde Rijke MBennett PJosifovski VNeville JRadlinski F(2016)Multileave Gradient Descent for Fast Online Learning to RankProceedings of the Ninth ACM International Conference on Web Search and Data Mining10.1145/2835776.2835804(457-466)Online publication date: 8-Feb-2016
https://dl.acm.org/doi/10.1145/2835776.2835804
Evangelopoulos XGiannakouris-Salalidis VIliadis LMakris CPlegas YPlerou ASioutas S(2016)Evaluating information retrieval using document popularityEngineering Applications of Artificial Intelligence10.1016/j.engappai.2016.01.02351:C(16-23)Online publication date: 1-May-2016
https://dl.acm.org/doi/10.1016/j.engappai.2016.01.023
Luchen Tan Clarke C(2015)A Family of Rank Similarity Measures Based on Maximized Effectiveness DifferenceIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2015.244854127:11(2865-2877)Online publication date: 1-Nov-2015
https://dl.acm.org/doi/10.1109/TKDE.2015.2448541
Gao NWebber WOard D(2014)Reducing Reliance on Relevance Judgments for System Comparison by Using Expectation-MaximizationProceedings of the 36th European Conference on IR Research on Advances in Information Retrieval - Volume 841610.5555/2964060.2964062(1-12)Online publication date: 13-Apr-2014
https://dl.acm.org/doi/10.5555/2964060.2964062
Schuth ASietsma FWhiteson SLefortier Dde Rijke MLi JWang XGarofalakis MSoboroff ISuel TWang M(2014)Multileaved Comparisons for Fast Online EvaluationProceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management10.1145/2661829.2661952(71-80)Online publication date: 3-Nov-2014
https://dl.acm.org/doi/10.1145/2661829.2661952
Gao NWebber WOard D(2014)Reducing Reliance on Relevance Judgments for System Comparison by Using Expectation-MaximizationAdvances in Information Retrieval10.1007/978-3-319-06028-6_1(1-12)Online publication date: 2014
https://doi.org/10.1007/978-3-319-06028-6_1
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Minimal test collections for retrieval evaluation

Robust test collections for retrieval evaluation

Dynamic Test Collections for Retrieval Evaluation

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations