skip to main content
10.1145/1458082.1458092acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

How does clickthrough data reflect retrieval quality?

Published: 26 October 2008 Publication History

Abstract

Automatically judging the quality of retrieval functions based on observable user behavior holds promise for making retrieval evaluation faster, cheaper, and more user centered. However, the relationship between observable user behavior and retrieval quality is not yet fully understood. We present a sequence of studies investigating this relationship for an operational search engine on the arXiv.org e-print archive. We find that none of the eight absolute usage metrics we explore (e.g., number of clicks, frequency of query reformulations, abandonment) reliably reflect retrieval quality for the sample sizes we consider. However, we find that paired experiment designs adapted from sensory analysis produce accurate and reliable statements about the relative quality of two retrieval functions. In particular, we investigate two paired comparison tests that analyze clickthrough data from an interleaved presentation of ranking pairs, and we find that both give accurate and consistent results. We conclude that both paired comparison tests give substantially more accurate and sensitive evaluation results than absolute usage metrics in our domain.

References

[1]
E. Agichtein, E. Brill, S. Dumais, and R. Ragno. Learning user interaction models for prediction web search results preferences. In Proc. of SIGIR 2006.
[2]
K. Ali and C. Chang. On the relationship between click-rate and relevance for search engines. In Proc. of Data-Mining and Information Engineering, 2006.
[3]
J.A. Aslam, V. Pavlu, and E. Yilmaz. A sampling technique for efficiently estimating measures of query retrieval performance using incomplete judgments. In ICML Workshop on Learning with Partial ly Classified Training Data, 2005.
[4]
J. Boyan, D. Freitag, and T. Joachims. A machine learning architecture for optimizing web search engines. In AAAI Workshop on Internet Based Information Systems, 1996.
[5]
C. Buckley and E.M. Voorhees. Retrieval evaluation with incomplete information. In Proc. of SIGIR 2004.
[6]
B. Carterette, J. Allan, and R. Sitaraman. Minimal test collections for retrieval evaluation. In Proc. of SIGIR 2006.
[7]
B. Carterette, P.N. Bennett, D.M. Chickering, and S.T. Dumais. Here or there: Preference judgements for relevance. In Proc. of ECIR 2008.
[8]
B. Carterette and R. Jones. Evaluating search engines by modeling the relationship between relevance and clicks. In Proc. of NIPS 2007.
[9]
G. Dupret, V. Murdock, and B. Piwowarski. Web search engine evaluation using clickthrough data and a user model. In WWW Workshop on Query Log Analysis, 2007.
[10]
S. Fox, K. Karnawat, M. Mydland, S. Dumais, and T. White. Evaluating implicit measures to improve web search. ACM Transactions on Information Science (TOIS), 23(2):147--168, April 2005.
[11]
S.B. Huffman and M. Hochster. How well does result relevance predict session satisfaction? In Proc. of SIGIR 2007.
[12]
T. Joachims. Optimizing search engines using clickthrough data. In Proc. of KDD 2002.
[13]
T. Joachims. Evaluating retrieval performance using clickthrough data. In J. Franke, G. Nakhaeizadeh, and I. Renz, editors, Text Mining. Physica Verlag, 2003.
[14]
T. Joachims, L. Granka, B. Pan, H. Hembrooke, F. Radlinski, and G. Gay. Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search. ACM Transactions on Information Science (TOIS), 25 (2), 2007. Article 7.
[15]
D. Kelly and J. Teevan. Implicit feedback for inferring user preference: A bibliography. ACM SIGIR Forum, 37(2):18--28, 2003.
[16]
J. Kozielecki. Psychological Decision Theory. Kluwer, 1981.
[17]
D. Laming. Sensory Analysis. Academic Press, 1986.
[18]
Y. Liu, Y. Fu, M. Zhang, S. Ma, and L. Ru. Automatic search engine performance evaluation with click-through data analysis. In Proc. of WWW 2007.
[19]
C.D. Manning, P. Raghavan, and H. Schuetze. Introduction to Information Retrieval. Cambridge University Press, 2008.
[20]
J. Reid. A task-oriented non-interactive evaluation methodology for information retrieval systems. Information Retrieval, 2:115--129, 2000.
[21]
I. Soboroff, C. Nicholas, and P. Cahan. Ranking retrieval systems without relevance judgments. In Proc. of SIGIR 2001.
[22]
A. Turpin and F. Scholer. User performance versus precision measures for simple search tasks. In Proc. of SIGIR 2006.
[23]
E.M. Voorhees and D.K. Harman, editors. TREC: Experiment and Evaluation in Information Retrieval. MIT Press, 2005.

Cited By

View all

Index Terms

  1. How does clickthrough data reflect retrieval quality?

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management
    October 2008
    1562 pages
    ISBN:9781595939913
    DOI:10.1145/1458082
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 October 2008

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. clickthrough data
    2. expert judgments
    3. implicit feedback
    4. retrieval evaluation

    Qualifiers

    • Research-article

    Conference

    CIKM08
    CIKM08: Conference on Information and Knowledge Management
    October 26 - 30, 2008
    California, Napa Valley, USA

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)33
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 17 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media