skip to main content
10.1145/3331184.3331266acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article
Open access

Revisiting Online Personal Search Metrics with the User in Mind

Published: 18 July 2019 Publication History

Abstract

Traditional online quality metrics are based on search and browsing signals, such as position and time of the click. Such metrics typically model all users' behavior in exactly the same manner. Modeling individuals' behavior in Web search may be challenging as the user's historical behavior may not always be available (e.g., if the user is not signed into a given service). However, in personal search, individual users issue queries over their personal corpus (e.g. emails, files, etc.) while they are logged into the service. This brings an opportunity to calibrate online quality metrics with respect to an individual's search habits. With this goal in mind, the current paper focuses on a user-centric evaluation framework for personal search by taking into account variability of search and browsing behavior across individuals. The main idea is to calibrate each interaction of a user with respect to their historical behavior and search habits. To formalize this, a characterization of online metrics is proposed according to the relevance signal of interest and how the signal contributes to the computation of the gain in a metric. The proposed framework introduces a variant of online metrics called pMetrics (short for personalized metrics) that are based on the average search habits of users for the relevance signal of interest. Through extensive online experiments on a large population of GMail search users, we show that pMetrics are effective in terms of their sensitivity, robustness, and stability compared to their standard variants as well as baselines with different normalization factors.

References

[1]
Azin Ashkan and Charles L. A. Clarke. 2012. Modeling browsing behavior for click analysis in sponsored search. In Proceedings of the 21st ACM international conference on information and knowledge management. ACM, 2015--2019.
[2]
Paul N. Bennett, Ryen W. White, Wei Chu, Susan T. Dumais, Peter Bailey, Fedor Borisyuk, and Xiaoyuan Cui. 2012. Modeling the impact of short-and long-term behavior on search personalization. In Proceedings of the 35th international ACM SIGIR conference on research and development in information retrieval. ACM, 185--194.
[3]
Chris Buckley and Ellen M. Voorhees. 2004. Retrieval evaluation with incomplete information. In Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval. ACM, 25--32.
[4]
David Carmel, Guy Halawi, Liane Lewin-Eytan, Yoelle Maarek, and Ariel Raviv. 2015. Rank by time or by relevance? Revisiting email search. In Proceedings of the 24th ACM international on conference on information and knowledge management. ACM, 283--292.
[5]
Ye Chen, Ke Zhou, Yiqun Liu, Min Zhang, and Shaoping Ma. 2017. Meta-evaluation of online and offline web search evaluation metrics. In Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval. 15--24.
[6]
Charles L. A. Clarke, Nick Craswell, Ian Soboroff, and Azin Ashkan. 2011. A comparative analysis of cascade measures for novelty and diversity. In Proceedings of the fourth ACM international conference on Web search and data mining. ACM, 75--84.
[7]
Dotan Di Castro, Liane Lewin-Eytan, Yoelle Maarek, Ran Wolff, and Eyal Zohar. 2016. Enforcing k-anonymity in web mail auditing. In Proceedings of the ninth ACM international conference on Web search and data mining. ACM, 327--336.
[8]
Susan Dumais, Edward Cutrell, Jonathan J. Cadiz, Gavin Jancke, Raman Sarin, and Daniel C. Robbins. 2016. Stuff I've seen: a system for personal information retrieval and re-use. In ACM SIGIR Forum, Vol. 49. ACM, 28--35.
[9]
Bradley Efron. 1981. Nonparametric estimates of standard error: the jackknife, the bootstrap and other methods. Biometrika, Vol. 68, 3 (1981), 589--599.
[10]
David Elsweiler, Morgan Harvey, and Martin Hacker. 2011. Understanding re-finding behavior in naturalistic email interaction logs. In Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval. ACM, 35--44.
[11]
David Elsweiler and Ian Ruthven. 2007. Towards task-based personal information management evaluations. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval. ACM, 23--30.
[12]
Katja Hofmann, Lihong Li, and Filip Radlinski. 2016. Online Evaluation for Information Retrieval. Foundations and trends in information retrieval, Vol. 10, 1 (2016), 1--117.
[13]
Botao Hu, Yuchen Zhang, Weizhu Chen, Gang Wang, and Qiang Yang. 2011. Characterizing search intent diversity into click models. In Proceedings of the 20th international conference on World Wide Web. ACM, 17--26.
[14]
Jiepu Jiang, Ahmed Hassan Awadallah, Xiaolin Shi, and Ryen W. White. 2015. Understanding and predicting graded search satisfaction. In Proceedings of the eighth ACM international conference on Web search and data mining. ACM, 57--66.
[15]
Jiepu Jiang, Daqing He, and James Allan. 2014. Searching, browsing, and clicking in a search session: changes in user behavior by task and over time. In Proceedings of the 37th international ACM SIGIR conference on research and development in information retrieval. ACM, 607--616.
[16]
Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke, and Geri Gay. 2017. Accurately interpreting clickthrough data as implicit feedback. In Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval, Vol. 51. ACM, 4--11.
[17]
Jiaxin Mao, Yiqun Liu, Ke Zhou, Jian-Yun Nie, Jingtao Song, Min Zhang, Shaoping Ma, Jiashen Sun, and Hengliang Luo. 2016. When does Relevance Mean Usefulness and User Satisfaction in Web Search? In Proceedings of the 39th International ACM SIGIR conference on research and development in information retrieval. ACM, 463--472.
[18]
Filip Radlinski and Thorsten Joachims. 2006. Minimally invasive randomization for collecting unbiased preferences from clickthrough logs. In Proceedings of the national conference on artificial intelligence, Vol. 21. 1406--1412.
[19]
Tetsuya Sakai. 2006. Evaluating evaluation metrics based on the Bootstrap. In Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval. ACM, 525--532.
[20]
Tetsuya Sakai, Nick Craswell, Ruihua Song, Stephen Robertson, Zhicheng Dou, and Chin-Yew Lin. 2010. Simple Evaluation Metrics for Diversified Search Results. In EVIA@ NTCIR. 42--50.
[21]
Tetsuya Sakai and Zhicheng Dou. 2013. Summaries, ranked retrieval and sessions: a unified framework for information access evaluation. In Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval. ACM, 473--482.
[22]
Mark Sanderson, Monica Lestari Paramita, Paul Clough, and Evangelos Kanoulas. 2010. Do user preferences and evaluation measures line up? In Proceedings of the 33rd international ACM SIGIR conference on research and development in information retrieval. ACM, 555--562.
[23]
Si Shen, Botao Hu, Weizhu Chen, and Qiang Yang. 2012. Personalized click model through collaborative filtering. In Proceedings of the fifth ACM international conference on Web search and data mining. ACM, 323--332.
[24]
Mark D. Smucker and Charles L. A. Clarke. 2012. Modeling user variance in time-biased gain. In Proceedings of the symposium on human-computer interaction and information retrieval. ACM.
[25]
Mark D. Smucker and Charles L. A. Clarke. 2012. Time-based calibration of effectiveness measures. In Proceedings of the 35th international ACM SIGIR conference on research and development in information retrieval. ACM, 95--104.
[26]
Alan Stuart. 1983. Kendall's tau. Encyclopedia of statistical sciences (1983).
[27]
Sarah K. Tyler and Jaime Teevan. 2010. Large scale query log analysis of re-finding. In Proceedings of the third ACM international conference on Web search and data mining. ACM, 191--200.
[28]
Ellen M. Voorhees. 2000. Variations in relevance judgments and the measurement of retrieval effectiveness. Information processing and management, Vol. 36, 5 (2000), 697--716.
[29]
Xuanhui Wang, Michael Bendersky, Donald Metzler, and Marc Najork. 2016. Learning to rank with selection bias in personal search. In Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval. ACM, 115--124.
[30]
Ryen W. White and Diane Kelly. 2006. A study on the effects of personalization and task information on implicit feedback performance. In Proceedings of the 15th ACM international conference on information and knowledge management. ACM, 297--306.
[31]
Steve Whittaker, Tara Matthews, Julian Cerruti, Hernan Badenes, and John Tang. 2011. Am I wasting my time organizing email? A study of email refinding. In Proceedings of the SIGCHI conference on human factors in computing systems. ACM, 3449--3458.
[32]
Emine Yilmaz, Javed A. Aslam, and Stephen Robertson. 2008. A new rank correlation coefficient for information retrieval. In Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval. ACM, 587--594.
[33]
Yisong Yue, Rajan Patel, and Hein Roehrig. 2010. Beyond position bias: Examining result attractiveness as a source of presentation bias in clickthrough data. In Proceedings of the 19th international conference on World Wide Web. ACM, 1011--1018.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR'19: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval
July 2019
1512 pages
ISBN:9781450361729
DOI:10.1145/3331184
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 July 2019

Check for updates

Author Tags

  1. information retrieval evaluation
  2. online experiments
  3. online quality metrics
  4. personal search

Qualifiers

  • Research-article

Conference

SIGIR '19
Sponsor:

Acceptance Rates

SIGIR'19 Paper Acceptance Rate 84 of 426 submissions, 20%;
Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)90
  • Downloads (Last 6 weeks)13
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media