research-article

Discovering missing click-through query language information for web search

Authors:

James AllanAuthors Info & Claims

CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

Pages 153 - 162

https://doi.org/10.1145/2063576.2063604

Published: 24 October 2011 Publication History

Abstract

The click-through information in web query logs has been widely used for web search tasks. However, it usually suffers from the data sparseness problem, known as the missing/incomplete click problems, where large volume of pages receive few or no clicks. In this paper, we adapt two language modeling based approaches to address this issue in the context of using web query logs for web search. The first approach discovers missing click-through query language features for web pages with no or few clicks from their similar pages' click-associated queries in the query logs, to help search. We further propose combining this content based approach with the random walk approach on the click graph to further reduce click-through sparseness for search. The second approach follows the query expansion method and utilizes the queries and their clicked web pages in the query logs to reconstruct a structured variant of the relevance based language models for each user-input query for search. We design experiments with a publicly available query log excerpt and two TREC web search tasks on the GOV2 and ClueWeb09 corpora to evaluate the search performance of different approaches. Our results show that using discovered semantic click-through query language features can statistically significantly improve search performance, compared with the baselines that do not use the discovered information. The combination approach that uses discovered click-through features from both random walk and the content based approach can further improve search performance.

References

[1]

E. Agichtein, E. Brill, and S. Dumais. Improving web search ranking by incorporating user behavior information. In SIGIR, pages 19--26, 2006.

Digital Library

[2]

J. A. Aslam and V. Pavlu. A practical sampling strategy for efficient retrieval evaluation. Technical report, Northeastern University, 2007.

[3]

M. Bendersky and W. B. Croft. Analysis of long queries in a large scale search log. In Workshop on Web Search Click Data, pages 8--14, 2009.

Digital Library

[4]

J. Broglio, J. P. Callan, and W. B. Croft. An overview of the INQUERY system as used for the TIPSTER project. Technical report, University of Massachusetts, Amherst, 1993.

Digital Library

[5]

C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In Proceedings of ICML, pages 89--96, 2005.

Digital Library

[6]

C. A. Clarke, N. Craswell, and I. Soboroff. Overview of the TREC 2009 Web Track. In TREC, 2009.

[7]

C. L. A. Clarke, N. Craswell, and I. Soboroff. Overview of the TREC 2004 Terabyte Track. In TREC, 2004.

[8]

C. L. A. Clarke, F. Scholer, and I. Soboroff. The TREC 2005 Terabyte Track. In TREC, 2005.

[9]

N. Craswell and M. Szummer. Random walks on the click graph. In SIGIR, pages 239--246, 2007.

Digital Library

[10]

F. Diaz and D. Metzler. Improving the estimation of relevance models using large external corpora. In SIGIR, pages 154--161, 2006.

Digital Library

[11]

J. Gao, W. Yuan, X. Li, K. Deng, and J.-Y. Nie. Smoothing clickthrough data for web search ranking. In SIGIR, pages 355--362, 2009.

Digital Library

[12]

I. Good. The population frequencies of species and the estimation of population parameters. Biomerika, 40(3):237--264, 1953.

[13]

K. Jarvelin and J. Kekalainen. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 20(4):422--446, 2002.

Digital Library

[14]

T. Joachims. Optimizing search engines using clickthrough data. In SIGKDD, pages 133--142, 2002.

Digital Library

[15]

M. Koolen and J. Kamps. The importance of anchor text for ad hoc search revisited. In SIGIR, pages 122--129, 2010.

Digital Library

[16]

O. Kurland and L. Lee. Respect my authority!: Hits without hyperlinks, utilizing cluster-based language models. In SIGIR, pages 83--90, 2006.

Digital Library

[17]

J. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In SIGIR, pages 111--119, 2001.

Digital Library

[18]

V. Lavrenko and W. B. Croft. Relevance based language models. In SIGIR, pages 120--127, 2001.

Digital Library

[19]

V. Lavrenko, X. Yi, and J. Allan. Information retrieval on empty fields. In NAACL-HLT, pages 89--96, 2007.

[20]

X. Li, Y. Wang, and A. Acero. Learning query intent from regularized click graphs. In SIGIR, pages 339--346, 2008.

Digital Library

[21]

X. Liu and W. B. Croft. Cluster-based retrieval using language models. In SIGIR, pages 186--193, 2004.

Digital Library

[22]

Q. Mei, D. Zhang, and C. Zhai. A general optimization framework for smoothing language models on graph structures. In SIGIR, pages 611--618, 2008.

Digital Library

[23]

R. Nallapati, W. B. Croft, and J. Allan. Relevant query feedback in statistical language modeling. In CIKM, pages 560--563, 2003.

Digital Library

[24]

P. Ogilvie and J. Callan. Combining document representations for known-item search. In SIGIR, pages 143--150, 2003.

Digital Library

[25]

J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In SIGIR, pages 275--281, 1998.

Digital Library

[26]

F. Radlinski and T. Joachims. Active exploration for learning rankings from clickthrough data. In SIGKDD, pages 570--579, 2007.

Digital Library

[27]

J. Seo, W. B. Croft, K. Kim, and J. Lee. Smoothing click counts for aggregated vertical search. In Proceedings of ECIR, pages 387--398, 2011.

Digital Library

[28]

X. Wang and C. Zhai. Mining term association patterns from search logs for effective query reformulation. In CIKM, pages 479--488, 2008.

Digital Library

[29]

X. Wei and W. B. Croft. LDA based document models for ad hoc retrieval. In SIGIR, pages 178--185, 2006.

Digital Library

[30]

G.-R. Xue, H.-J. Zeng, Z. Chen, Y. Yu, W.-Y. Ma, W. Xi, and W. Fan. Optimizing web search using web click-through data. In CIKM, pages 118--126, 2004.

Digital Library

[31]

X. Yi and J. Allan. A content based approach for discovering missing anchor text for web search. In SIGIR, pages 427--434, 2010.

Digital Library

Cited By

Wang YLiu JChen JHuang Y(2018)Finding similar queries based on query representation analysisWorld Wide Web10.1007/s11280-013-0233-517:5(1161-1188)Online publication date: 25-Dec-2018
https://dl.acm.org/doi/10.1007/s11280-013-0233-5
Zhukovskiy MKhatkevich TGusev GSerdyukov PBailey JMoffat AAggarwal Cde Rijke MKumar RMurdock VSellis TYu J(2015)An Optimization Framework for Propagation of Query-Document Features by Query Similarity FunctionsProceedings of the 24th ACM International on Conference on Information and Knowledge Management10.1145/2806416.2806487(981-990)Online publication date: 17-Oct-2015
https://dl.acm.org/doi/10.1145/2806416.2806487
Li HXu J(2014)Semantic Matching in SearchFoundations and Trends in Information Retrieval10.1561/15000000357:5(343-469)Online publication date: 12-Jun-2014
https://dl.acm.org/doi/10.1561/1500000035
Show More Cited By

Index Terms

Discovering missing click-through query language information for web search
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing
    2. Retrieval models and ranking
  2. World Wide Web
    1. Web applications
    2. Web services

Recommendations

A content based approach for discovering missing anchor text for web search
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval

Although anchor text provides very useful information for web search, a large portion of web pages have few or no incoming hyperlinks (anchors), which is known as the anchor text sparsity problem. In this paper, we propose a language modeling based ...
Mining Web search engines for query suggestion

Queries to Web search engines are usually short and ambiguous, which provides insufficient information needs of users for effectively retrieving relevant Web pages. To address this problem, query suggestion is implemented by most search engines. However,...
Regularized query classification using search click information

Hundreds of millions of users each day submit queries to the Web search engine. The user queries are typically very short which makes query understanding a challenging problem. In this paper, we propose a novel approach for query representation and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

October 2011

2712 pages

ISBN:9781450307178

DOI:10.1145/2063576

Editors:
Bettina Berendt,
Arjen de Vries,
Wenfei Fan,
Craig Macdonald
University of Glasgow, UK
,
Iadh Ounis
University of Glasgow, UK
,
Ian Ruthven
University of Strathclyde, UK

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 October 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CIKM '11

Sponsor:

CIKM '11: International Conference on Information and Knowledge Management

October 24 - 28, 2011

Glasgow, Scotland, UK

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
227
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 26 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang YLiu JChen JHuang Y(2018)Finding similar queries based on query representation analysisWorld Wide Web10.1007/s11280-013-0233-517:5(1161-1188)Online publication date: 25-Dec-2018
https://dl.acm.org/doi/10.1007/s11280-013-0233-5
Zhukovskiy MKhatkevich TGusev GSerdyukov PBailey JMoffat AAggarwal Cde Rijke MKumar RMurdock VSellis TYu J(2015)An Optimization Framework for Propagation of Query-Document Features by Query Similarity FunctionsProceedings of the 24th ACM International on Conference on Information and Knowledge Management10.1145/2806416.2806487(981-990)Online publication date: 17-Oct-2015
https://dl.acm.org/doi/10.1145/2806416.2806487
Li HXu J(2014)Semantic Matching in SearchFoundations and Trends in Information Retrieval10.1561/15000000357:5(343-469)Online publication date: 12-Jun-2014
https://dl.acm.org/doi/10.1561/1500000035
Chen JWang YLiu JHuang Y(2013)Modeling semantic and behavioral relations for query suggestionProceedings of the 14th international conference on Web-Age Information Management10.1007/978-3-642-38562-9_68(666-678)Online publication date: 14-Jun-2013
https://dl.acm.org/doi/10.1007/978-3-642-38562-9_68

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten