research-article

Mining term association patterns from search logs for effective query reformulation

Authors:

ChengXiang ZhaiAuthors Info & Claims

CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management

Pages 479 - 488

https://doi.org/10.1145/1458082.1458147

Published: 26 October 2008 Publication History

Abstract

Search engine logs are an emerging new type of data that offers interesting opportunities for data mining. Existing work on mining such data has mostly attempted to discover knowledge at the level of queries (e.g., query clusters). In this paper, we propose to mine search engine logs for patterns at the level of terms through analyzing the relations of terms inside a query. We define two novel term association patterns (i.e., context-sensitive term substitutions and term additions) and propose new methods for mining such patterns from search engine logs. These two patterns can be used to address the mis-specification and under-specification problems of ineffective queries. Experiment results on real search engine logs show that the mined context-sensitive term substitutions can be used to effectively reword queries and improve their accuracy, while the mined context-sensitive term addition patterns can be used to support query refinement in a more effective way.

References

[1]

E. Agichtein, E. Brill, and S. T. Dumais. Improving web search ranking by incorporating user behavior information. In SIGIR, pages 19-26, 2006.

Digital Library

[2]

P. Anick. Using terminological feedback for web search refinement: a log-based study. In SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pages 88--95, 2003.

Digital Library

[3]

J. A. Aslam, E. Pelekov, and D. Rus. The star clustering algorithm for static and dynamic information organization. Journal of Graph Algorithms and Applicatins, 8(1):95--129, 2004.

[4]

D. Beeferman and A. L. Berger. Agglomerative clustering of a search engine query log. In KDD, pages 407--416, 2000.

Digital Library

[5]

P. F. Brown, V. J. D. Pietra, S. A. D. Pietra, and R. L. Mercer. The mathematics of statistical machine translation: parameter estimation. Comput. Linguist., 19(2):263--311, 1993.

Digital Library

[6]

S. Chien and N. Immorlica. Semantic similarity between search engine queries using temporal correlation. In WWW '05: Proceedings of the 14th international conference on World Wide Web, pages 2--11, 2005.

Digital Library

[7]

S. Cucerzan and E. Brill. Spelling correction as an iterative process that exploits the collective knowledge of web users. In EMNLP, pages 293--300, 2004.

[8]

H. Cui, J.-R. Wen, J.-Y. Nie, and W.-Y. Ma. Probabilistic query expansion using query logs. In WWW '02: Proceedings of the 11th international conference on World Wide Web, pages 325--332, 2002.

Digital Library

[9]

R. Green. Syntagmatic relationships in index languages: A reassessment. Library Quarterly, 65(4):365--385, 1995.

[10]

D. Inkpen and G. Hirst. Building and using a lexical knowledge-base of near-synonym differences. Computational Linguistics, 32(2):223--262, June 2006.

Digital Library

[11]

T. Joachims. Optimizing search engines using clickthrough data. In KDD, pages 133--142, 2002.

Digital Library

[12]

R. Jones, B. Rey, O. Madani, and W. Greiner. Generating query substitutions. In WWW, pages 387--396, 2006.

Digital Library

[13]

D. Lin. Automatic retrieval and clustering of similar words. In COLING-ACL, pages 768--774, 1998.

Digital Library

[14]

Microsoft Live Labs. Accelerating search in academic research, 2006. http://research.microsoft.com/ur/us/fundingopps/RFPs/Search_2006_RFP.aspx.

[15]

F. Peng, N. Ahmed, X. Li, and Y. Lu. Context sensitive stemming for web search. In SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 639--646, 2007.

Digital Library

[16]

F. Radlinski and T. Joachims. Query chains: learning to rank from implicit feedback. In KDD, pages 239--248, 2005.

Digital Library

[17]

R. Rapp. The computation of word associations: comparing syntagmatic and paradigmatic approaches. In Proceedings of the 19th international conference on Computational linguistics, pages 1--7, Morristown, NJ, USA, 2002. Association for Computational Linguistics.

Digital Library

[18]

J. J. Rocchio. Relevance feedback in information retrieval. In The SMART Retrieval System Experiments in Automatic Document Processing, pages 313--323, 1971.

[19]

M. Sahami and T. D. Heilman. A web-based kernel function for measuring the similarity of short text snippets. In WWW, pages 377--386, 2006.

Digital Library

[20]

G. Salton, A. Wong, and C. S. Yang. A vector space model for automatic indexing. Commun. ACM, 18(11):613--620, 1975.

Digital Library

[21]

D. Shen, M. Qin, W. Chen, Q. Yang, and Z. Chen. Mining web query hierarchies from clickthrough data. In AAAI, pages 341--346, 2007.

Digital Library

[22]

X. Shen, B. Tan, and C. Zhai. Context-sensitive information retrieval using implicit feedback. In SIGIR, pages 43--50, 2005.

Digital Library

[23]

M. Vlachos, C. Meek, Z. Vagena, and D. Gunopulos. Identifying similarities, periodicities and bursts for online search queries. In SIGMOD, pages 131--142, 2004.

Digital Library

[24]

X. Wang, H. Fang, and C. Zhai. A study of methods for negative relevance feedback. In SIGIR, pages 219--226, 2008.

Digital Library

[25]

X. Wang and C. Zhai. Learn from web search logs to organize search results. In SIGIR, pages 87--94, 2007.

Digital Library

[26]

J.-R. Wen, J.-Y. Nie, and H. Zhang. Clustering user queries of a search engine. In WWW, pages 162--168, 2001.

Digital Library

[27]

J. Xu and W. B. Croft. Improving the effectiveness of information retrieval with local context analysis. ACM Trans. Inf. Syst., 18(1):79--112, 2000.

Digital Library

[28]

C. Zhai and J. D. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In SIGIR, pages 334--342, 2001.

Digital Library

Cited By

Liu XGuo CYao BSarikaya R(2024)A Self-Learning Framework for Large-Scale Conversational AI SystemsIEEE Computational Intelligence Magazine10.1109/MCI.2024.336397119:2(34-48)Online publication date: 5-Apr-2024
https://dl.acm.org/doi/10.1109/MCI.2024.3363971
Roy PSharma CGao CValegerepura KFrommholz IHopfgartner FLee MOakes MLalmas MZhang MSantos R(2023)Deep Query Rewriting For GeocodingProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615466(4801-4807)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1145/3583780.3615466
Yu PRahimi RHuang ZAllan JFrommholz IHopfgartner FLee MOakes MLalmas MZhang MSantos R(2023)Search Result Diversification Using Query Aspects as BottlenecksProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615050(3040-3051)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1145/3583780.3615050
Show More Cited By

Index Terms

Mining term association patterns from search logs for effective query reformulation
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
2. Information systems
  1. Information retrieval
    1. Document representation
      1. Content analysis and feature selection
    2. Information retrieval query processing

Recommendations

Analyzing and evaluating query reformulation strategies in web search logs
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

Users frequently modify a previous search query in hope of retrieving better results. These modifications are called query reformulations or query refinements. Existing research has studied how web search engines can propose reformulations, but has given ...
Mining query subtopics from search log data
SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

Most queries in web search are ambiguous and multifaceted. Identifying the major senses and facets of queries from search log data, referred to as query subtopic mining in this paper, is a very important issue in web search. Through search log analysis, ...
Location-aware query reformulation for search engines

Query reformulation, including query recommendation and query auto-completion, is a popular add-on feature of search engines, which provide related and helpful reformulations of a keyword query. Due to the dropping prices of smartphones and the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management

October 2008

1562 pages

ISBN:9781595939913

DOI:10.1145/1458082

General Chair:
James G. Shanahan
Church and Duncan Group Inc, USA
,
Program Chairs:
Sihem Amer-Yahia
Yahoo! Research, USA
,
Ioana Manolescu
INRIA, France
,
Yi Zhang
University of California, Santa Cruz, USA
,
David A. Evans
JustSystems Evans Research, USA
,
Alek Kolcz
Microsoft Live Labs, USA
,
Key-Sun Choi
KAIST, Korea
,
Abdur Chowdury
Twitter, USA

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CIKM08

Sponsor:

CIKM08: Conference on Information and Knowledge Management

October 26 - 30, 2008

California, Napa Valley, USA

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

86
Total Citations
View Citations
941
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liu XGuo CYao BSarikaya R(2024)A Self-Learning Framework for Large-Scale Conversational AI SystemsIEEE Computational Intelligence Magazine10.1109/MCI.2024.336397119:2(34-48)Online publication date: 5-Apr-2024
https://dl.acm.org/doi/10.1109/MCI.2024.3363971
Roy PSharma CGao CValegerepura KFrommholz IHopfgartner FLee MOakes MLalmas MZhang MSantos R(2023)Deep Query Rewriting For GeocodingProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615466(4801-4807)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1145/3583780.3615466
Yu PRahimi RHuang ZAllan JFrommholz IHopfgartner FLee MOakes MLalmas MZhang MSantos R(2023)Search Result Diversification Using Query Aspects as BottlenecksProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615050(3040-3051)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1145/3583780.3615050
He YLu DHuang KWang T(2022)Evaluating persistent memory range indexesProceedings of the VLDB Endowment10.14778/3551793.355180815:11(2477-2490)Online publication date: 29-Sep-2022
https://dl.acm.org/doi/10.14778/3551793.3551808
Haimovich DKaramshuk DLeeper TRiabenko EVojnovic M(2022)Popularity prediction for social media over arbitrary time horizonsProceedings of the VLDB Endowment10.14778/3503585.350359315:4(841-849)Online publication date: 14-Apr-2022
https://dl.acm.org/doi/10.14778/3503585.3503593
Labhishetty SZhai CXie MGong LSharnagat RChembolu SSelcuk Candan KLiu HAkoglu LLuna Dong XTang J(2022)Differential Query Semantic AnalysisProceedings of the Fifteenth ACM International Conference on Web Search and Data Mining10.1145/3488560.3498503(535-543)Online publication date: 11-Feb-2022
https://dl.acm.org/doi/10.1145/3488560.3498503
Yu PRahimi RAllan JAmigo ECastells PGonzalo JCarterette BCulpepper JKazai G(2022)Towards Explainable Search ResultsProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3532067(669-680)Online publication date: 6-Jul-2022
https://dl.acm.org/doi/10.1145/3477495.3532067
Pang WDuan R(2022)History-Aware Expansion and Fuzzy for Query ReformulationArtificial Intelligence10.1007/978-3-030-93049-3_19(227-238)Online publication date: 1-Jan-2022
https://doi.org/10.1007/978-3-030-93049-3_19
Chen JMao JLiu YZhang FZhang MMa S(2021)Towards a Better Understanding of Query Reformulation Behavior in Web SearchProceedings of the Web Conference 202110.1145/3442381.3450127(743-755)Online publication date: 19-Apr-2021
https://dl.acm.org/doi/10.1145/3442381.3450127
Cao KChen CBaltes STreude CChen X(2021)Automated Query Reformulation for Efficient Search based on Query Logs From Stack OverflowProceedings of the 43rd International Conference on Software Engineering10.1109/ICSE43902.2021.00116(1273-1285)Online publication date: 22-May-2021
https://dl.acm.org/doi/10.1109/ICSE43902.2021.00116
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents