skip to main content
10.1145/2063576.2063604acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Discovering missing click-through query language information for web search

Published: 24 October 2011 Publication History

Abstract

The click-through information in web query logs has been widely used for web search tasks. However, it usually suffers from the data sparseness problem, known as the missing/incomplete click problems, where large volume of pages receive few or no clicks. In this paper, we adapt two language modeling based approaches to address this issue in the context of using web query logs for web search. The first approach discovers missing click-through query language features for web pages with no or few clicks from their similar pages' click-associated queries in the query logs, to help search. We further propose combining this content based approach with the random walk approach on the click graph to further reduce click-through sparseness for search. The second approach follows the query expansion method and utilizes the queries and their clicked web pages in the query logs to reconstruct a structured variant of the relevance based language models for each user-input query for search. We design experiments with a publicly available query log excerpt and two TREC web search tasks on the GOV2 and ClueWeb09 corpora to evaluate the search performance of different approaches. Our results show that using discovered semantic click-through query language features can statistically significantly improve search performance, compared with the baselines that do not use the discovered information. The combination approach that uses discovered click-through features from both random walk and the content based approach can further improve search performance.

References

[1]
E. Agichtein, E. Brill, and S. Dumais. Improving web search ranking by incorporating user behavior information. In SIGIR, pages 19--26, 2006.
[2]
J. A. Aslam and V. Pavlu. A practical sampling strategy for efficient retrieval evaluation. Technical report, Northeastern University, 2007.
[3]
M. Bendersky and W. B. Croft. Analysis of long queries in a large scale search log. In Workshop on Web Search Click Data, pages 8--14, 2009.
[4]
J. Broglio, J. P. Callan, and W. B. Croft. An overview of the INQUERY system as used for the TIPSTER project. Technical report, University of Massachusetts, Amherst, 1993.
[5]
C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In Proceedings of ICML, pages 89--96, 2005.
[6]
C. A. Clarke, N. Craswell, and I. Soboroff. Overview of the TREC 2009 Web Track. In TREC, 2009.
[7]
C. L. A. Clarke, N. Craswell, and I. Soboroff. Overview of the TREC 2004 Terabyte Track. In TREC, 2004.
[8]
C. L. A. Clarke, F. Scholer, and I. Soboroff. The TREC 2005 Terabyte Track. In TREC, 2005.
[9]
N. Craswell and M. Szummer. Random walks on the click graph. In SIGIR, pages 239--246, 2007.
[10]
F. Diaz and D. Metzler. Improving the estimation of relevance models using large external corpora. In SIGIR, pages 154--161, 2006.
[11]
J. Gao, W. Yuan, X. Li, K. Deng, and J.-Y. Nie. Smoothing clickthrough data for web search ranking. In SIGIR, pages 355--362, 2009.
[12]
I. Good. The population frequencies of species and the estimation of population parameters. Biomerika, 40(3):237--264, 1953.
[13]
K. Jarvelin and J. Kekalainen. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 20(4):422--446, 2002.
[14]
T. Joachims. Optimizing search engines using clickthrough data. In SIGKDD, pages 133--142, 2002.
[15]
M. Koolen and J. Kamps. The importance of anchor text for ad hoc search revisited. In SIGIR, pages 122--129, 2010.
[16]
O. Kurland and L. Lee. Respect my authority!: Hits without hyperlinks, utilizing cluster-based language models. In SIGIR, pages 83--90, 2006.
[17]
J. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In SIGIR, pages 111--119, 2001.
[18]
V. Lavrenko and W. B. Croft. Relevance based language models. In SIGIR, pages 120--127, 2001.
[19]
V. Lavrenko, X. Yi, and J. Allan. Information retrieval on empty fields. In NAACL-HLT, pages 89--96, 2007.
[20]
X. Li, Y. Wang, and A. Acero. Learning query intent from regularized click graphs. In SIGIR, pages 339--346, 2008.
[21]
X. Liu and W. B. Croft. Cluster-based retrieval using language models. In SIGIR, pages 186--193, 2004.
[22]
Q. Mei, D. Zhang, and C. Zhai. A general optimization framework for smoothing language models on graph structures. In SIGIR, pages 611--618, 2008.
[23]
R. Nallapati, W. B. Croft, and J. Allan. Relevant query feedback in statistical language modeling. In CIKM, pages 560--563, 2003.
[24]
P. Ogilvie and J. Callan. Combining document representations for known-item search. In SIGIR, pages 143--150, 2003.
[25]
J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In SIGIR, pages 275--281, 1998.
[26]
F. Radlinski and T. Joachims. Active exploration for learning rankings from clickthrough data. In SIGKDD, pages 570--579, 2007.
[27]
J. Seo, W. B. Croft, K. Kim, and J. Lee. Smoothing click counts for aggregated vertical search. In Proceedings of ECIR, pages 387--398, 2011.
[28]
X. Wang and C. Zhai. Mining term association patterns from search logs for effective query reformulation. In CIKM, pages 479--488, 2008.
[29]
X. Wei and W. B. Croft. LDA based document models for ad hoc retrieval. In SIGIR, pages 178--185, 2006.
[30]
G.-R. Xue, H.-J. Zeng, Z. Chen, Y. Yu, W.-Y. Ma, W. Xi, and W. Fan. Optimizing web search using web click-through data. In CIKM, pages 118--126, 2004.
[31]
X. Yi and J. Allan. A content based approach for discovering missing anchor text for web search. In SIGIR, pages 427--434, 2010.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management
October 2011
2712 pages
ISBN:9781450307178
DOI:10.1145/2063576
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 October 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. clickthrough sparseness
  2. content similarity
  3. language models
  4. query log
  5. relevance models
  6. web search

Qualifiers

  • Research-article

Conference

CIKM '11
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 26 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media