skip to main content
research-article

Predicting Query Performance by Query-Drift Estimation

Published: 01 May 2012 Publication History

Abstract

Predicting query performance, that is, the effectiveness of a search performed in response to a query, is a highly important and challenging problem. We present a novel approach to this task that is based on measuring the standard deviation of retrieval scores in the result list of the documents most highly ranked. We argue that for retrieval methods that are based on document-query surface-level similarities, the standard deviation can serve as a surrogate for estimating the presumed amount of query drift in the result list, that is, the presence (and dominance) of aspects or topics not related to the query in documents in the list. Empirical evaluation demonstrates the prediction effectiveness of our approach for several retrieval models. Specifically, the prediction quality often transcends that of current state-of-the-art prediction methods.

References

[1]
Abdul-Jaleel, N., Allan, J., Croft, W. B., Diaz, F., Larkey, L., Li, X., Smucker, M. D., and Wade, C. 2004. UMASS at trec 2004 -- Novelty and hard. In Proceedings of the Text Retrieval Conference (TREC-13).
[2]
Amati, G., Carpineto, C., and Romano, G. 2004. Query difficulty, robustness and selective application of query expansion. In Proceedings of the European Conference on IR Research (ECIR’04). 127--137.
[3]
Arampatzis, A. and Robertson, S. 2011. Modeling score distributions in information retrieval. Inf. Retriev. 14, 1, 26--46.
[4]
Arampatzis, A., Kamps, J., and Robertson, S. 2009. Where to stop reading a ranked list? Threshold optimization using truncated score distributions. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 524--531.
[5]
Aslam, J. A. and Pavlu, V. 2007. Query hardness estimation using Jensen-Shannon divergence among multiple scoring functions. In Proceedings of the European Conference on IR Research (ECIR’07). 198--209.
[6]
Bendersky, M., Croft, W. B., and Diao, Y. 2011. Quality-Biased ranking of Web documents. In Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM’11). 95--104.
[7]
Bernstein, Y., Billerbeck, B., Garcia, S., Lester, N., Scholer, F., and Zobel, J. 2005. RMIT university at trec 2006: Terabyte and robust track. In Proceedings of the Text Retrieval Conference (TREC-14).
[8]
Buckley, C. 2004. Why current IR engines fail. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. Poster. 584--585.
[9]
Buckley, C., Salton, G., Allan, J., and Singhal, A. 1994. Automatic query expansion using SMART: TREC3. In Proceedings of the Text Retrieval Conference (TREC-3). 69--80.
[10]
Carmel, D. and Yom-Tov, E. 2010. Estimating the Query Difficulty for Information Retrieval. Synthesis Lectures on Information Concepts, Retrieval, and Services. Morgan & Claypool.
[11]
Carmel, D., Yom-Tov, E., Darlow, A., and Pelleg, D. 2006. What makes a query difficult? In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 390--397.
[12]
Clarke, C. L. A., Craswell, N., and Soboroff, I. 2009. Overview of the trec 2009 Web track. In Proceedings of the Text Retrieval Conference (TREC).
[13]
Cormack, G. V., Smucker, M. D., and Clarke, C. L. A. 2010. Efficient and effective spam filtering and re-ranking for large Web datasets. CoRR abs/1004.5168.
[14]
Croft, W. B. and Lafferty, J. 2003. Language Modeling for Information Retrieval. Information Retrieval Book Series, Number 13. Kluwer.
[15]
Cronen-Townsend, S., Zhou, Y., and Croft, W. B. 2002. Predicting query performance. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 299--306.
[16]
Cronen-Townsend, S., Zhou, Y., and Croft, W. B. 2004. A language modeling framework for selective query expansion. Tech. rep. IR-338, Center for Intelligent Information Retrieval, University of Massachusetts.
[17]
Cronen-Townsend, S., Zhou, Y., and Croft, W. B. 2006. Precision prediction based on ranked list coherence. Inf. Retriev. 9, 6, 723--755.
[18]
Cummins, R., Jose, J. M., and O’Riordan, C. 2011a. Improved query performance prediction using standard deviation. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 1089--1090.
[19]
Cummins, R., Lalmas, M., O’Riordan, C., and Jose, J. M. 2011b. Navigating the user query space. In Proceedings of the International Symposium on String Processing and Information Retrieval (SPIRE’11). 380--385.
[20]
Dai, K., Kanoulas, E., Pavlu, V., and Aslam, J. A. 2011. Variational bayes for modeling score distributions. Inf. Retriev. 14, 1, 47--67.
[21]
Diaz, F. 2007. Performance prediction using spatial autocorrelation. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 583--590.
[22]
Fang, H. and Zhai, C. 2005. An exploration of axiomatic approaches to information retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 480--487.
[23]
Fang, H., Tao, T., and Zhai, C. 2004. A formal study of information retrieval heuristics. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 49--56.
[24]
Fuhr, N. 1992. Probabilistic models in information retrieval. Comput. J. 35, 3, 243--255.
[25]
Harman, D. 1992. Relevance feedback revisited. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 1--10.
[26]
Harman, D. and Buckley, C. 2004. The NRRC reliable information access (ria) workshop. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 528--529.
[27]
Hauff, C., Hiemstra, D., and de Jong, F. 2008a. A survey of preretrieval query performance predictors. In Proceedings of the International Conference on Information and Knowledge Management (CIKM’08). 1419--1420.
[28]
Hauff, C., Murdock, V., and Baeza-Yates, R. 2008b. Improved query difficulty prediction for the Web. In Proceedings of the International Conference on Information and Knowledge Management (CIKM’08). 439--448.
[29]
Hauff, C., Kelly, D., and Azzopardi, L. 2010. A comparison of user and system query performance predictions. In Proceedings of the International Conference on Information and Knowledge Management (CIKM’10). 979--988.
[30]
He, B. and Ounis, I. 2004. Inferring query performance using pre-retrieval predictors. In Proceedings of the International Symposium on String Processing and Information Retrieval (SPIRE’04). 43--54.
[31]
Kanoulas, E., Dai, K., Pavlu, V., and Aslam, J. A. 2010. Score distribution models: Assumptions, intuition, and robustness to score manipulation. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 242--249.
[32]
Lafferty, J. D. and Zhai, C. 2001. Document language models, query models, and risk minimization for information retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 111--119.
[33]
Lavrenko, V. and Croft, W. B. 2001. Relevance-based language models. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 120--127.
[34]
Lin, J., Metzler, D., Elsayed, T., and Wang, L. 2010. Of ivory and smurfs: Loxodontan mapreduce experiments for Web search. In Proceedings of the Text Retrieval Conference (TREC).
[35]
Liu, X. and Croft, W. B. 2008. Evaluating text representations for retrieval of the best group of documents. In Proceedings of the European Conference on IR Research (ECIR’08). 454--462.
[36]
Lv, Y. and Zhai, C. 2011. When documents are very long, bm25 fails! In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 1103--1104.
[37]
Manmatha, R., Rath, T. M., and Feng, F. 2001. Modeling score distributions for combining the outputs of search engines. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 267--275.
[38]
Metzler, D. and Croft, W. B. 2005. A Markov random field model for term dependencies. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 472--479.
[39]
Metzler, D. and Croft, W. B. 2007. Linear feature-based models for information retrieval. Inf. Retriev. 10, 3, 257--274.
[40]
Mitra, M., Singhal, A., and Buckley, C. 1998. Improving automatic query expansion. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 206--214.
[41]
Mothe, J. and Tanguy, L. 2005. Linguistic features to predict query difficulty. In ACM SIGIR’05 Workshop on Predicting Query Difficulty - Methods and Applications.
[42]
Pérez-Iglesias, J. and Araujo, L. 2009. Ranking list dispersion as a query performance predictor. In Proceedings of the 2nd International Conference on Theory of Information Retrieval (ICTIR’09). 371--374.
[43]
Pérez-Iglesias, J. and Araujo, L. 2010. Standard deviation as a query hardness estimator. In Proceedings of the International Symposium on String Processing and Information Retrieval (SPIRE’10). 207--212.
[44]
Ponte, J. M. and Croft, W. B. 1998. A language modeling approach to information retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 275--281.
[45]
Raiber, F. and Kurland, O. 2010. On identifying representative relevant documents. In Proceedings of the International Conference on Information and Knowledge Management (CIKM’10). 99--108.
[46]
Robertson, S. 2007. On score distributions and relevance. In Proceedings of the European Conference on IR Research (ECIR’07). 40--51.
[47]
Robertson, S. E., Walker, S., Jones, S., Hancock-Beaulieu, M., and Gatford, M. 1994. Okapi at trec-3. In Proceedings of the Text Retrieval Conference (TREC).
[48]
Rocchio, J. J. 1971. Relevance feedback in information retrieval. In The SMART Retrieval System: Experiments in Automatic Document Processing, G. Salton Ed., Prentice Hall, 313--323.
[49]
Salton, J., Wong, A., and Yang, C. S. 1975. A vector space model for automatic indexing. Comm. ACM 18, 11, 613--620.
[50]
Scholer, F. and Garcia, S. 2009. A case for improved evaluation of query difficulty prediction. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 640--641.
[51]
Scholer, F., Williams, H. E., and Turpin, A. 2004. Query association surrogates for Web search. J. Am. Soc. Inf. Sci. Technol. 55, 7, 637--650.
[52]
Seo, J. and Croft, W. B. 2010. Geometric representations for multiple documents. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 251--258.
[53]
Shtok, A., Kurland, O., and Carmel, D. 2009. Predicting query performance by query-drift estimation. In Proceedings of the International Conference on Theory of Information Retrieval (ICTIR’09). 305--312.
[54]
Shtok, A., Kurland, O., and Carmel, D. 2010. Using statistical decision theory and relevance models for query performance prediction. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 259--266.
[55]
Song, F. and Croft, W. B. 1999. A general language model for information retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval (Poster abstract). 279--280.
[56]
Terra, E. L. and Warren, R. 2005. Poison pills: Harmful relevant documents in feedback. In Proceedings of the International Conference on Information and Knowledge Management (CIKM’05). 319--320.
[57]
Tomlinson, S. 2004. Robust, Web and terabyte retrieval with hummingbird search server at trec 2004. In Proceedings of the Text Retrieval Conference (TREC-13).
[58]
Turtle, H. R. and Croft, W. B. 1990. Inference networks for document retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 1--24.
[59]
Vinay, V., Cox, I. J., Milic-Frayling, N., and Wood, K. R. 2006. On ranking the effectiveness of searches. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 398--404.
[60]
Voorhees, E. M. 2004. Overview of the trec 2004 robust retrieval track. In Proceedings of the Text Retrieval Conference (TREC-13).
[61]
Yom-Tov, E., Fine, S., Carmel, D., and Darlow, A. 2005. Learning to estimate query difficulty: Including applications to missing content detection and distributed information retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 512--519.
[62]
Zhai, C. and Lafferty, J. D. 2001a. Model-Based feedback in the language modeling approach to information retrieval. In Proceedings of the International Conference on Information and Knowledge Management (CIKM’01). 403--410.
[63]
Zhai, C. and Lafferty, J. D. 2001b. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 334--342.
[64]
Zhao, Y., Scholer, F., and Tsegay, Y. 2008. Effective preretrieval query performance prediction using similarity and variability evidence. In Proceedings of the European Conference on IR Research (ECIR’08). 52--64.
[65]
Zhou, Y. 2007. Retrieval performance prediction and document quality. Ph.D. thesis, University of Massachusetts Amherst.
[66]
Zhou, Y. and Croft, W. B. 2006. Ranking robustness: A novel framework to predict query performance. In Proceedings of the International Conference on Information and Knowledge Management (CIKM’06). 567--574.
[67]
Zhou, Y. and Croft, W. B. 2007. Query performance prediction in Web search environments. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 543--550.

Cited By

View all

Index Terms

  1. Predicting Query Performance by Query-Drift Estimation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Information Systems
    ACM Transactions on Information Systems  Volume 30, Issue 2
    May 2012
    245 pages
    ISSN:1046-8188
    EISSN:1558-2868
    DOI:10.1145/2180868
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 May 2012
    Accepted: 01 February 2012
    Revised: 01 January 2012
    Received: 01 March 2011
    Published in TOIS Volume 30, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Query-performance prediction
    2. query drift
    3. score distribution

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)55
    • Downloads (Last 6 weeks)9
    Reflects downloads up to 01 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Query Augmentation with Brain SignalsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681658(7561-7570)Online publication date: 28-Oct-2024
    • (2024)Coherence-based Query Performance Measures for Dense RetrievalProceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3664190.3672518(15-24)Online publication date: 2-Aug-2024
    • (2024)Generalized Weak Supervision for Neural Information RetrievalACM Transactions on Information Systems10.1145/364763942:5(1-26)Online publication date: 27-Apr-2024
    • (2024)A Surprisingly Simple yet Effective Multi-Query Rewriting Method for Conversational Passage RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657933(2271-2275)Online publication date: 10-Jul-2024
    • (2024)"In-Context Learning" or: How I learned to stop worrying and love "Applied Information Retrieval"Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657842(14-25)Online publication date: 10-Jul-2024
    • (2024)Leveraging LLMs for Unsupervised Dense Retriever RankingProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657798(1307-1317)Online publication date: 10-Jul-2024
    • (2024)Optimization Methods for Personalizing Large Language Models through Retrieval AugmentationProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657783(752-762)Online publication date: 10-Jul-2024
    • (2024)Embark on DenseQuest: A System for Selecting the Best Dense Retriever for a Custom CollectionProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657674(2739-2743)Online publication date: 10-Jul-2024
    • (2024)Unsupervised Search Algorithm Configuration using Query Performance PredictionCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3651579(658-661)Online publication date: 13-May-2024
    • (2024)Query Performance Prediction: From Fundamentals to Advanced TechniquesAdvances in Information Retrieval10.1007/978-3-031-56069-9_51(381-388)Online publication date: 24-Mar-2024
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media