skip to main content
10.1145/2484028.2484041acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

User model-based metrics for offline query suggestion evaluation

Published: 28 July 2013 Publication History

Abstract

Query suggestion or auto-completion mechanisms are widely used by search engines and are increasingly attracting interest from the research community. However, the lack of commonly accepted evaluation methodology and metrics means that it is not possible to compare results and approaches from the literature. Moreover, often the metrics used to evaluate query suggestions tend to be an adaptation from other domains without a proper justification. Hence, it is not necessarily clear if the improvements reported in the literature would result in an actual improvement in the users' experience. Inspired by the cascade user models and state-of-the-art evaluation metrics in the web search domain, we address the query suggestion evaluation, by first studying the users behaviour from a search engine's query log and thereby deriving a new family of user models describing the users interaction with a query suggestion mechanism. Next, assuming a query log-based evaluation approach, we propose two new metrics to evaluate query suggestions, pSaved and eSaved. Both metrics are parameterised by a user model. pSaved is defined as the probability of using the query suggestions while submitting a query. eSaved equates to the expected relative amount of effort (keypresses) a user can avoid due to the deployed query suggestion mechanism. Finally, we experiment with both metrics using four user model instantiations as well as metrics previously used in the literature on a dataset of 6.1M sessions. Our results demonstrate that pSaved and eSaved show the best alignment with the users satisfaction amongst the considered metrics.

References

[1]
R. Baeza-Yates, C. Hurtado, and M. Mendoza. Query recommendation using query logs in search engines. In Current Trends in Database Technology, volume 3268 of LNCS, pages 588--596. 2005.
[2]
Z. Bar-Yossef and N. Kraus. Context-sensitive query auto-completion. In WWW '11.
[3]
P. N. Bennett, F. Radlinski, R. W. White, and E. Yilmaz. Inferring and using location metadata to personalize web search. In SIGIR '11.
[4]
B. Carterette, E. Kanoulas, and E. Yilmaz. Incorporating variability in user behavior into systems based evaluation. In CIKM '12.
[5]
O. Chapelle, S. Ji, C. Liao, E. Velipasaoglu, L. Lai, and S.-L. Wu. Intent-based diversification of web search results: metrics and algorithms. Inf. Retr., 2011.
[6]
O. Chapelle, D. Metlzer, Y. Zhang, and P. Grinspan. Expected reciprocal rank for graded relevance. In CIKM '09.
[7]
O. Chapelle and Y. Zhang. A dynamic bayesian network click model for web search ranking. In WWW '09.
[8]
C. L. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Büttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In SIGIR '08.
[9]
C. Cleverdon. The Cranfield tests on index language devices. Aslib Proceedings, 19(6):173--194, 1967.
[10]
K. Collins-Thompson, P. N. Bennett, R. W. White, S. de la Chica, and D. Sontag. Personalizing web search results by reading level. In CIKM '11.
[11]
N. Craswell, O. Zoeter, M. Taylor, and B. Ramsey. An experimental comparison of click position-bias models. In WSDM '08.
[12]
H. Duan and B.-J. P. Hsu. Online spelling correction for query completion. In WWW '11.
[13]
G. E. Dupret and B. Piwowarski. A user browsing model to predict search engine click data from past observations. In SIGIR '08.
[14]
K. J\"arvelin and J. Kek\"al\"ainen. Cumulated gain-based evaluation of ir techniques. ACM Trans. Inf. Syst., 20(4):422--446, 2002.
[15]
F. Radlinski and N. Craswell. Comparing the sensitivity of information retrieval metrics. In SIGIR '10.
[16]
F. Radlinski, M. Kurup, and T. Joachims. How does clickthrough data reflect retrieval quality? In CIKM '08.
[17]
M. Sanderson, M. L. Paramita, P. Clough, and E. Kanoulas. Do user preferences and evaluation measures line up? In SIGIR '10.
[18]
M. Shokouhi and K. Radinsky. Time-sensitive query auto-completion. In SIGIR '12.
[19]
A. Strizhevskaya, A. Baytin, I. Galinskaya, and P. Serdyukov. Actualization of query suggestions using query logs. In WWW '12 Companion.
[20]
E. Voorhees. The ilosophy of information retrieval evaluation. In Evaluation of Cross-Language Information Retrieval Systems, LNCS, pages 355--370. 2002.
[21]
E. Yilmaz, M. Shokouhi, N. Craswell, and S. Robertson. Expected browsing utility for web search evaluation. In CIKM '10.
[22]
Y. Zhang, W. Chen, D. Wang, and Q. Yang. User-click modeling for understanding and predicting search-behavior. In KDD '11.
[23]
X. Zhu, J. Guo, X. Cheng, and Y. Lan. More than relevance: high utility query recommendation by mining users' search behaviors. In CIKM '12.

Cited By

View all

Index Terms

  1. User model-based metrics for offline query suggestion evaluation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
    July 2013
    1188 pages
    ISBN:9781450320344
    DOI:10.1145/2484028
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 July 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. evaluation measures
    2. query suggestions
    3. user models

    Qualifiers

    • Research-article

    Conference

    SIGIR '13
    Sponsor:

    Acceptance Rates

    SIGIR '13 Paper Acceptance Rate 73 of 366 submissions, 20%;
    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)14
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 14 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media