skip to main content
10.1145/2505515.2505698acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article
Open access

Evaluating aggregated search using interleaving

Published: 27 October 2013 Publication History

Abstract

A result page of a modern web search engine is often much more complicated than a simple list of "ten blue links." In particular, a search engine may combine results from different sources (e.g., Web, News, and Images), and display these as grouped results to provide a better user experience. Such a system is called an aggregated or federated search system.
Because search engines evolve over time, their results need to be constantly evaluated. However, one of the most efficient and widely used evaluation methods, interleaving, cannot be directly applied to aggregated search systems, as it ignores the need to group results originating from the same source (vertical results).
We propose an interleaving algorithm that allows comparisons of search engine result pages containing grouped vertical documents. We compare our algorithm to existing interleaving algorithms and other evaluation methods (such as A/B-testing), both on real-life click log data and in simulation experiments. We find that our algorithm allows us to perform unbiased and accurate interleaved comparisons that are comparable to conventional evaluation techniques. We also show that our interleaving algorithm produces a ranking that does not substantially alter the user experience, while being sensitive to changes in both the vertical result block and the non-vertical document rankings. All this makes our proposed interleaving algorithm an essential tool for comparing IR systems with complex aggregated pages.

References

[1]
R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In WSDM. ACM, 2009.
[2]
J. Arguello, F. Diaz, J. Callan, and B. Carterette. A methodology for evaluating aggregated search results. In ECIR. Springer, 2011.
[3]
O. Chapelle, D. Metzler, Y. Zhang, and P. Grinspan. Expected reciprocal rank for graded relevance. In CIKM. ACM, 2009.
[4]
O. Chapelle, T. Joachims, F. Radlinski, and Y. Yue. Large-scale validation and analysis of interleaved search evaluation. ACM TOIS, 2012.
[5]
D. Chen, W. Chen, and H. Wang. Beyond ten blue links: enabling user click modeling in federated web search. In WSDM. ACM, 2012.
[6]
A. Chuklin, P. Serdyukov, and M. de Rijke. Using intent information to model user behavior in diversified search. In ECIR, 2013.
[7]
C. Clarke, M. Kolla, and G. Cormack. Novelty and diversity in information retrieval evaluation. In SIGIR. ACM, 2008.
[8]
C. W. Cleverdon, J. Mills, and M. Keen. Factors determining the performance of indexing systems. Techn. report, ASLIB Cranfield project, 1966.
[9]
S. Dumais, E. Cutrell, and H. Chen. Optimizing search by showing results in context. In CHI, 2001.
[10]
J. He, C. Zhai, and X. Li. Evaluation of methods for relative comparison of retrieval systems based on click throughs. In CIKM. ACM, 2009.
[11]
K. Hofmann, S. Whiteson, and M. de Rijke. A probabilistic method for inferring preferences from clicks. In CIKM. ACM, 2011.
[12]
K. Hofmann, S. Whiteson, and M. Rijke. Balancing exploration and exploitation in listwise and pairwise online learning to rank for information retrieval. Information Retrieval, 16(1), Apr. 2012.
[13]
K. Hofmann, S. Whiteson, and M. de Rijke. Fidelity, soundness, and efficiency of interleaved comparison methods. ACM Trans. Inf. Syst., 31(4), Oct. 2013.
[14]
T. Joachims. Optimizing search engines using clickthrough data. In KDD. ACM, 2002.
[15]
T. Joachims. Evaluating retrieval performance using clickthrough data. Text Mining, 2003.
[16]
T. Joachims, L. Granka, B. Pan, H. Hembrooke, and G. Gay. Accurately interpreting clickthrough data as implicit feedback. In SIGIR. ACM, 2005.
[17]
A. K. Ponnuswami, K. Pattabiraman, Q. Wu, R. Gilad-Bachrach, and T. Kanungo. On composition of a federated web search result page: using online users to provide pairwise preference for heterogeneous verticals. In WSDM. ACM, 2011.
[18]
F. Radlinski and N. Craswell. Optimized interleaving for online retrieval evaluation. In WSDM, 2013.
[19]
F. Radlinski, M. Kurup, and T. Joachims. How does clickthrough data reflect retrieval quality? In CIKM. ACM, 2008.
[20]
A. Schuth, K. Hofmann, S. Whiteson, and M. de Rijke. Lerot: an Online Learning to Rank Framework. In Living Labs workshop at CIKM. ACM, 2013.
[21]
J. Seo, W. B. Croft, K. H. Kim, and J. H. Lee. Smoothing click counts for aggregated vertical search. Advances in Information Retrieval, 2011.
[22]
K. Zhou, R. Cummins, M. Lalmas, and J. M. Jose. Evaluating Aggregated Search Pages. In SIGIR, 2012.

Cited By

View all
  • (2024)Validating Synthetic Usage Data in Living Lab EnvironmentsJournal of Data and Information Quality10.1145/362364016:1(1-33)Online publication date: 6-Mar-2024
  • (2022)Ranking Task in RAS: A Comparative Study of Learning to Rank Algorithms and Interleaving MethodsDigital Technologies and Applications10.1007/978-3-031-01942-5_16(158-168)Online publication date: 8-May-2022
  • (2022)Click Models for Web SearchundefinedOnline publication date: 10-Mar-2022
  • Show More Cited By

Index Terms

  1. Evaluating aggregated search using interleaving

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management
    October 2013
    2612 pages
    ISBN:9781450322638
    DOI:10.1145/2505515
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 October 2013

    Check for updates

    Author Tags

    1. a/b-testing
    2. evaluation
    3. implicit feedback
    4. vertical search

    Qualifiers

    • Research-article

    Conference

    CIKM'13
    Sponsor:
    CIKM'13: 22nd ACM International Conference on Information and Knowledge Management
    October 27 - November 1, 2013
    California, San Francisco, USA

    Acceptance Rates

    CIKM '13 Paper Acceptance Rate 143 of 848 submissions, 17%;
    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)112
    • Downloads (Last 6 weeks)12
    Reflects downloads up to 01 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Validating Synthetic Usage Data in Living Lab EnvironmentsJournal of Data and Information Quality10.1145/362364016:1(1-33)Online publication date: 6-Mar-2024
    • (2022)Ranking Task in RAS: A Comparative Study of Learning to Rank Algorithms and Interleaving MethodsDigital Technologies and Applications10.1007/978-3-031-01942-5_16(158-168)Online publication date: 8-May-2022
    • (2022)Click Models for Web SearchundefinedOnline publication date: 10-Mar-2022
    • (2021)De-Biased Modeling of Search Click Behavior with Reinforcement LearningProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3463228(1637-1641)Online publication date: 11-Jul-2021
    • (2020)Studies on Search: Designing Meaningful IIR Studies on Commercial Search EnginesDatenbank-Spektrum10.1007/s13222-020-00331-120:1(5-15)Online publication date: 24-Jan-2020
    • (2019)Aggregating E-commerce Search Results from Heterogeneous Sources via Hierarchical Reinforcement LearningThe World Wide Web Conference10.1145/3308558.3313455(1771-1781)Online publication date: 13-May-2019
    • (2017)Aggregated SearchFoundations and Trends in Information Retrieval10.1561/150000005210:5(365-502)Online publication date: 6-Mar-2017
    • (2017)Adaptive Persistence for Search Effectiveness MeasuresProceedings of the 2017 ACM on Conference on Information and Knowledge Management10.1145/3132847.3133033(747-756)Online publication date: 6-Nov-2017
    • (2017)Evaluating and Analyzing Click Simulation in Web SearchProceedings of the ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3121050.3121096(281-284)Online publication date: 1-Oct-2017
    • (2017)Evaluation of Contextualization and Diversification Approaches in Aggregated Search2017 28th International Workshop on Database and Expert Systems Applications (DEXA)10.1109/DEXA.2017.37(103-107)Online publication date: Aug-2017
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media