skip to main content
10.1145/1099554.1099723acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Incremental test collections

Published: 31 October 2005 Publication History

Abstract

Corpora and topics are readily available for information retrieval research. Relevance judgments, which are necessary for system evaluation, are expensive; the cost of obtaining them prohibits in-house evaluation of retrieval systems on new corpora or new topics. We present an algorithm for cheaply constructing sets of relevance judgments. Our method intelligently selects documents to be judged and decides when to stop in such a way that with very little work there can be a high degree of confidence in the result of the evaluation. We demonstrate the algorithm's effectiveness by showing that it produces small sets of relevance judgments that reliably discriminate between two systems. The algorithm can be used to incrementally design retrieval systems by simultaneously comparing sets of systems. The number of additional judgments needed after each incremental design change decreases at a rate reciprocal to the number of systems being compared. To demonstrate the effectiveness of our method, we evaluate TREC ad hoc submissions, showing that with 95% fewer relevance judgments we can reach a Kendall's tau rank correlation of at least 0.9.

References

[1]
S. M. Beitzel, E. C. Jensen, A. Chowdhury, D. Grossman, and O. Frieder. Using manually-build web directories for automatic evaluation of known-item retrieval. In Proceedings of SIGIR '03, pages 373--374, 2003.
[2]
C. Buckley and E. M. Voorhees. Evaluating Evaluation Measure Stability. In Proceedings of SIGIR '00, pages 33--40, 2000.
[3]
C. Buckley and E. M. Voorhees. Retrieval evaluation with incomplete information. In Proceedings of SIGIR '04, pages 25--32, 2004.
[4]
G. V. Cormack, C. R. Palmer, and C. L. Clarke. Efficient Construction of Large Test Collections. In Proceedings of SIGIR '98, pages 282--289, 1998.
[5]
S. P. Harter. Variations in relevance assessments and the measurement of retrieval effectiveness. JASIS, 47(1):37--49, 1996.
[6]
M. Kendall. Rank Correlation Methods. Griffin, London, UK, fourth edition, 1970.
[7]
M. Sanderson and H. Joho. Forming test collections with no system pooling. In Proceedings of SIGIR '04, pages 33--40, 2004.
[8]
I. Soboroff, C. Nicholas, and P. Cahan. Ranking Retrieval Systems without Relevance Judgments. In Proceedings of SIGIR '01, pages 66--73, 2001.
[9]
K. Sparck Jones and C. J. van Rijsbergen. Information Retrieval Test Collections. Journal of Documentation, 32(1):59--75, 1976.
[10]
E. Voorhees. Variations in Relevance Judgments and the Measurement of Retrieval Effectiveness. In Proceedings of SIGIR '98, pages 315--323, 1998.
[11]
E. M. Voorhees. Evaluation by highly relevant documents. In Proceedings of SIGIR '01, pages 74--82, 2001.
[12]
E. M. Voorhees. The philosophy of information retrieval evaluation. In CLEF '01: Revised Papers from the Second Workshop of CLEF, pages 355--370, London, UK, 2002. Springer-Verlag.
[13]
E. M. Voorhees and D. Harman. Overview of the Eighth Text REtrieval Conference (TREC-8). In Proceedings of TREC-8, pages 1--24, 1999. NIST Special Publication 500-246.
[14]
J. Zobel. How Reliable are the Results of Large-Scale Information Retrieval Experiments? In Proceedings of SIGIR '98, pages 307--314, 1998.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge management
October 2005
854 pages
ISBN:1595931406
DOI:10.1145/1099554
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 October 2005

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. algorithms
  2. evaluation
  3. information retrieval
  4. test collections

Qualifiers

  • Article

Conference

CIKM05
Sponsor:
CIKM05: Conference on Information and Knowledge Management
October 31 - November 5, 2005
Bremen, Germany

Acceptance Rates

CIKM '05 Paper Acceptance Rate 77 of 425 submissions, 18%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 26 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media