skip to main content
10.1145/2911451.2911494acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Interleaved Evaluation for Retrospective Summarization and Prospective Notification on Document Streams

Published: 07 July 2016 Publication History

Abstract

We propose and validate a novel interleaved evaluation methodology for two complementary information seeking tasks on document streams: retrospective summarization and prospective notification. In the first, the user desires relevant and non-redundant documents that capture important aspects of an information need. In the second, the user wishes to receive timely, relevant, and non-redundant update notifications for a standing information need. Despite superficial similarities, interleaved evaluation methods for web ranking cannot be directly applied to these tasks; for example, existing techniques do not account for temporality or redundancy. Our proposed evaluation methodology consists of two components: a temporal interleaving strategy and a heuristic for credit assignment to handle redundancy. By simulating user interactions with interleaved results on submitted runs to the TREC 2014 tweet timeline generation (TTG) task and the TREC 2015 real-time filtering task, we demonstrate that our methodology yields system comparisons that accurately match the result of batch evaluations. Analysis further reveals weaknesses in current batch evaluation methodologies to suggest future directions for research.

References

[1]
E. Agichtein, E. Brill, S. Dumais, and R. Ragno. Learning user interaction models for predicting web search result preferences. SIGIR, 2006.
[2]
J. Aslam, M. Ekstrand-Abueg, V. Pavlu, F. Diaz, R. McCreadie, and T. Sakai. TREC 2014 Temporal Summarization Track overview. TREC, 2014.
[3]
O. Chapelle, T. Joachims, F. Radlinski, and Y. Yue. Large-scale validation and analysis of interleaved search evaluation. ACM TOIS, 30(1):Article 6, 2012.
[4]
O. Chapelle and Y. Zhang. A Dynamic Bayesian Network click model for web search ranking. WWW, 2009.
[5]
L. Granka, T. Joachims, and G. Gay. Eye-tracking analysis of user behavior in WWW search. SIGIR, 2004.
[6]
J. He, C. Zhai, and X. Li. Evaluation of methods for relative comparison of retrieval systems based on clickthroughs. CIKM, 2009.
[7]
K. Hofmann, S. Whiteson, and M. de Rijke. A probabilistic method for inferring preferences from clicks. CIKM, 2011.
[8]
T. Joachims. Optimizing search engines using clickthrough data. KDD, 2002.
[9]
T. Joachims, L. Granka, B. Pan, H. Hembrooke, F. Radlinski, and G. Gay. Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search. ACM TOIS, 25(2):1--27, 2007.
[10]
D. Kelly. Understanding implicit feedback and document preference: A naturalistic user study. SIGIR Forum, 38(1):77--77, 2004.
[11]
D. Kelly and J. Teevan. Implicit feedback for inferring user preference: A bibliography. SIGIR Forum, 37(2):18--28, 2003.
[12]
R. Kohavi, R. M. Henne, and D. Sommerfield. Practical guide to controlled experiments on the web: Listen to your customers not to the HiPPO. KDD, 2007.
[13]
J. Lin, M. Efron, Y. Wang, and G. Sherman. Overview of the TREC-2014 Microblog Track. TREC, 2014.
[14]
J. Lin, M. Efron, Y. Wang, G. Sherman, and E. Voorhees. Overview of the TREC-2015 Microblog Track. TREC, 2015.
[15]
F. Radlinski and N. Craswell. Comparing the sensitivity of information retrieval metrics. SIGIR, 2010.
[16]
F. Radlinski and N. Craswell. Optimized interleaving for online retrieval evaluation. WSDM, 2013.
[17]
F. Radlinski, M. Kurup, and T. Joachims. How does clickthrough data reflect retrieval quality? CIKM, 2008.
[18]
T. Sakai, N. Craswell, R. Song, S. Robertson, Z. Dou, and C. Lin. Simple evaluation metrics for diversified search results. EVIA, 2010.
[19]
A. Schuth, K. Hofmann, and F. Radlinski. Predicting search satisfaction metrics with interleaved comparisons. SIGIR, 2015.
[20]
A. Schuth, F. Sietsma, S. Whiteson, D. Lefortier, and M. de Rijke. Multileaved comparisons for fast online evaluation. CIKM, 2014.
[21]
L. Tan, A. Roegiest, J. Lin, and C. L. A. Clarke. An exploration of evaluation metrics for mobile push notifications. SIGIR, 2016.
[22]
Y. Wang, G. Sherman, J. Lin, and M. Efron. Assessor differences and user preferences in tweet timeline generation. SIGIR, 2015.
[23]
C. X. Zhai, W. W. Cohen, and J. Lafferty. Beyond independent relevance: Methods and evaluation metrics for subtopic retrieval. SIGIR, 2003.

Cited By

View all

Index Terms

  1. Interleaved Evaluation for Retrospective Summarization and Prospective Notification on Document Streams

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval
      July 2016
      1296 pages
      ISBN:9781450340694
      DOI:10.1145/2911451
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 07 July 2016

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. TREC
      2. microblogs
      3. push notifications
      4. summarization
      5. tweets

      Qualifiers

      • Research-article

      Funding Sources

      • National Science Foundation
      • Natural Sciences and Engineering Research Council of Canada

      Conference

      SIGIR '16
      Sponsor:

      Acceptance Rates

      SIGIR '16 Paper Acceptance Rate 62 of 341 submissions, 18%;
      Overall Acceptance Rate 792 of 3,983 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)17
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 15 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Validating Synthetic Usage Data in Living Lab EnvironmentsJournal of Data and Information Quality10.1145/362364016:1(1-33)Online publication date: 6-Mar-2024
      • (2021)An Effective Hybrid Learning Model for Real-Time Event SummarizationIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2020.301774732:10(4419-4431)Online publication date: Oct-2021
      • (2018)Update Delivery Mechanisms for Prospective Information NeedsThe 41st International ACM SIGIR Conference on Research & Development in Information Retrieval10.1145/3209978.3210018(785-794)Online publication date: 27-Jun-2018
      • (2017)A Comparison of Nuggets and Clusters for Evaluating Timeline SummariesProceedings of the 2017 ACM on Conference on Information and Knowledge Management10.1145/3132847.3133000(67-76)Online publication date: 6-Nov-2017
      • (2017)Event Detection on Curated Tweet StreamsProceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3077136.3084141(1325-1328)Online publication date: 7-Aug-2017
      • (2017)Online In-Situ Interleaved Evaluation of Real-Time Push Notification SystemsProceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3077136.3080808(415-424)Online publication date: 7-Aug-2017
      • (2017)On the Reusability of "Living Labs" Test CollectionsProceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3077136.3080644(793-796)Online publication date: 7-Aug-2017
      • (2017)Word Similarity Based Model for Tweet Stream Prospective NotificationAdvances in Information Retrieval10.1007/978-3-319-56608-5_62(655-661)Online publication date: 8-Apr-2017
      • (2016)A Platform for Streaming Push Notifications to Mobile AssessorsProceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval10.1145/2911451.2911463(1077-1080)Online publication date: 7-Jul-2016

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media