skip to main content
10.1145/1458082.1458147acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Mining term association patterns from search logs for effective query reformulation

Published: 26 October 2008 Publication History

Abstract

Search engine logs are an emerging new type of data that offers interesting opportunities for data mining. Existing work on mining such data has mostly attempted to discover knowledge at the level of queries (e.g., query clusters). In this paper, we propose to mine search engine logs for patterns at the level of terms through analyzing the relations of terms inside a query. We define two novel term association patterns (i.e., context-sensitive term substitutions and term additions) and propose new methods for mining such patterns from search engine logs. These two patterns can be used to address the mis-specification and under-specification problems of ineffective queries. Experiment results on real search engine logs show that the mined context-sensitive term substitutions can be used to effectively reword queries and improve their accuracy, while the mined context-sensitive term addition patterns can be used to support query refinement in a more effective way.

References

[1]
E. Agichtein, E. Brill, and S. T. Dumais. Improving web search ranking by incorporating user behavior information. In SIGIR, pages 19-26, 2006.
[2]
P. Anick. Using terminological feedback for web search refinement: a log-based study. In SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pages 88--95, 2003.
[3]
J. A. Aslam, E. Pelekov, and D. Rus. The star clustering algorithm for static and dynamic information organization. Journal of Graph Algorithms and Applicatins, 8(1):95--129, 2004.
[4]
D. Beeferman and A. L. Berger. Agglomerative clustering of a search engine query log. In KDD, pages 407--416, 2000.
[5]
P. F. Brown, V. J. D. Pietra, S. A. D. Pietra, and R. L. Mercer. The mathematics of statistical machine translation: parameter estimation. Comput. Linguist., 19(2):263--311, 1993.
[6]
S. Chien and N. Immorlica. Semantic similarity between search engine queries using temporal correlation. In WWW '05: Proceedings of the 14th international conference on World Wide Web, pages 2--11, 2005.
[7]
S. Cucerzan and E. Brill. Spelling correction as an iterative process that exploits the collective knowledge of web users. In EMNLP, pages 293--300, 2004.
[8]
H. Cui, J.-R. Wen, J.-Y. Nie, and W.-Y. Ma. Probabilistic query expansion using query logs. In WWW '02: Proceedings of the 11th international conference on World Wide Web, pages 325--332, 2002.
[9]
R. Green. Syntagmatic relationships in index languages: A reassessment. Library Quarterly, 65(4):365--385, 1995.
[10]
D. Inkpen and G. Hirst. Building and using a lexical knowledge-base of near-synonym differences. Computational Linguistics, 32(2):223--262, June 2006.
[11]
T. Joachims. Optimizing search engines using clickthrough data. In KDD, pages 133--142, 2002.
[12]
R. Jones, B. Rey, O. Madani, and W. Greiner. Generating query substitutions. In WWW, pages 387--396, 2006.
[13]
D. Lin. Automatic retrieval and clustering of similar words. In COLING-ACL, pages 768--774, 1998.
[14]
Microsoft Live Labs. Accelerating search in academic research, 2006. http://research.microsoft.com/ur/us/fundingopps/RFPs/Search_2006_RFP.aspx.
[15]
F. Peng, N. Ahmed, X. Li, and Y. Lu. Context sensitive stemming for web search. In SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 639--646, 2007.
[16]
F. Radlinski and T. Joachims. Query chains: learning to rank from implicit feedback. In KDD, pages 239--248, 2005.
[17]
R. Rapp. The computation of word associations: comparing syntagmatic and paradigmatic approaches. In Proceedings of the 19th international conference on Computational linguistics, pages 1--7, Morristown, NJ, USA, 2002. Association for Computational Linguistics.
[18]
J. J. Rocchio. Relevance feedback in information retrieval. In The SMART Retrieval System Experiments in Automatic Document Processing, pages 313--323, 1971.
[19]
M. Sahami and T. D. Heilman. A web-based kernel function for measuring the similarity of short text snippets. In WWW, pages 377--386, 2006.
[20]
G. Salton, A. Wong, and C. S. Yang. A vector space model for automatic indexing. Commun. ACM, 18(11):613--620, 1975.
[21]
D. Shen, M. Qin, W. Chen, Q. Yang, and Z. Chen. Mining web query hierarchies from clickthrough data. In AAAI, pages 341--346, 2007.
[22]
X. Shen, B. Tan, and C. Zhai. Context-sensitive information retrieval using implicit feedback. In SIGIR, pages 43--50, 2005.
[23]
M. Vlachos, C. Meek, Z. Vagena, and D. Gunopulos. Identifying similarities, periodicities and bursts for online search queries. In SIGMOD, pages 131--142, 2004.
[24]
X. Wang, H. Fang, and C. Zhai. A study of methods for negative relevance feedback. In SIGIR, pages 219--226, 2008.
[25]
X. Wang and C. Zhai. Learn from web search logs to organize search results. In SIGIR, pages 87--94, 2007.
[26]
J.-R. Wen, J.-Y. Nie, and H. Zhang. Clustering user queries of a search engine. In WWW, pages 162--168, 2001.
[27]
J. Xu and W. B. Croft. Improving the effectiveness of information retrieval with local context analysis. ACM Trans. Inf. Syst., 18(1):79--112, 2000.
[28]
C. Zhai and J. D. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In SIGIR, pages 334--342, 2001.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management
October 2008
1562 pages
ISBN:9781595939913
DOI:10.1145/1458082
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. query reformulation
  2. search log mining
  3. term association patterns

Qualifiers

  • Research-article

Conference

CIKM08
CIKM08: Conference on Information and Knowledge Management
October 26 - 30, 2008
California, Napa Valley, USA

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media