research-article

That's Not My Question: Learning to Weight Unmatched Terms in CQA Vertical Search

Authors:

Koby CrammerAuthors Info & Claims

SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval

Pages 225 - 234

https://doi.org/10.1145/2911451.2911496

Published: 07 July 2016 Publication History

Abstract

A fundamental task in Information Retrieval (IR) is term weighting. Early IR theory considered both the presence or absence of all terms in the lexicon for ranking and needed to weight them all. Yet, as the size of lexicons grew and models became too complex, common weighting models preferred to aggregate only the weights of the query terms that are matched in candidate documents. Thus, unmatched term contribution in these models is only considered indirectly, such as in probability smoothing with corpus distribution, or in weight normalization by document length. In this work we propose a novel term weighting model that directly assesses the weights of unmatched terms, and show its benefits. Specifically, we propose a Learning To Rank framework, in which features corresponding to matched terms are also "mirrored" in similar features that account only for unmatched terms. The relative importance of each feature is learned via a click-through query log. As a test case, we consider vertical search in Community-based Question Answering(CQA) sites from Web queries. Queries that result in viewing CQA content often contain fine grained information needs and benefit more from unmatched term weighting. We assess our model both via manual evaluation and via automatic evaluation over a clickthrough log. Our results show consistent improvement in retrieval when unmatched information is taken into account. This holds both when only identical terms are considered matched, and when related terms are matched via distributional similarity.

References

[1]

G. Amati, V. Rijsbergen, and C. Joost. Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst., 20(4), Oct. 2002.

Digital Library

[2]

J. Arguello, F. Diaz, J. Callan, and J.-F. Crespo. Sources of evidence for vertical selection. In SIGIR, 2009.

Digital Library

[3]

M. Bendersky, D. Metzler, and W. B. Croft. Learning concept importance using a weighted dependence model. In WSDM, 2010.

Digital Library

[4]

A. Berger and J. Lafferty. Information retrieval as statistical translation. In SIGIR, 1999.

Digital Library

[5]

D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. JMLR, 3:993--1022, 2003.

Digital Library

[6]

L. Cai, G. Zhou, K. Liu, and J. Zhao. Learning the latent topics for question retrieval in community qa. In AFNLP, 2011.

[7]

X. Cao, G. Cong, B. Cui, C. S. Jensen, and C. Zhang. The use of categorization information in language models for question retrieval. In CIKM, 2009.

Digital Library

[8]

Y. Cao, J. Xu, T.-Y. Liu, H. Li, Y. Huang, and H.-W. Hon. Adapting ranking svm to document retrieval. In SIGIR, 2006.

Digital Library

[9]

D. Carmel, A. Mejer, Y. Pinter, and I. Szpektor. Improving term weighting for community question answering search using syntactic analysis. In CIKM, 2014.

Digital Library

[10]

R.-C. Chen, D. Spina, W. B. Croft, M. Sanderson, and F. Scholer. Harnessing semantics for answer sentence retrieval. In ESAIR Workshop, 2015.

Digital Library

[11]

K. Crammer, A. Kulesza, and M. Dredze. Adaptive regularization of weight vectors. MLJ, 91(2):155--187, 2013.

Digital Library

[12]

S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. JAsIs, 41(6):391--407, 1990.

[13]

H. Duan, Y. Cao, C.-Y. Lin, and Y. Yu. Searching questions by identifying question topic and question focus. In ACL, 2008.

[14]

D. Ganguly, D. Roy, M. Mitra, and G. J. Jones. Word embedding based generalized language model for information retrieval. In SIGIR, 2015.

Digital Library

[15]

P.-S. Huang, X. He, J. Gao, L. Deng, A. Acero, and L. Heck. Learning deep structured semantic models for web search using clickthrough data. In CIKM, 2013.

Digital Library

[16]

J. Jeon, W. B. Croft, and J. H. Lee. Finding similar questions in large question and answer archives. In CIKM, 2005.

Digital Library

[17]

R. Jin, A. G. Hauptmann, and C. X. Zhai. Language model for information retrieval. In SIGIR, 2002.

Digital Library

[18]

Q. Liu, E. Agichtein, G. Dror, E. Gabrilovich, Y. Maarek, D. Pelleg, and I. Szpektor. Predicting web searcher satisfaction with existing community-based answers. In SIGIR, 2011.

Digital Library

[19]

T.-Y. Liu. Learning to rank for information retrieval. Found. Trends Inf. Retr., 3(3):225--331, Mar. 2009.

Digital Library

[20]

T. y. Liu, J. Xu, T. Qin, W. Xiong, and H. Li. Letor: Benchmark dataset for research on learning to rank for information retrieval. In SIGIR Workshop on Learning to Rank for Information Retrieval, 2007.

[21]

Y. Liu, C. Sun, L. Lin, Y. Zhao, and X. Wang. Computing semantic text similarity using rich features. In PACLIC, 2015.

[22]

C. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval. Cambridge University Press, 2008.

[23]

T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS. 2013.

Digital Library

[24]

D. R. Miller, T. Leek, and R. M. Schwartz. A hidden markov model information retrieval system. In SIGIR, 1999.

Digital Library

[25]

V. Murdock and M. Lalmas. Workshop on aggregated search. SIGIR Forum, 42(2):80--83, Nov. 2008.

Digital Library

[26]

J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In SIGIR, 1998.

Digital Library

[27]

F. Radlinski, M. Kurup, and T. Joachims. How does clickthrough data reflect retrieval quality? In CIKM, 2008.

Digital Library

[28]

S. Robertson and H. Zaragoza. The probabilistic relevance framework: Bm25 and beyond. Found. Trends Inf. Retr., 3(4):333--389, Apr. 2009.

Digital Library

[29]

S. E. Robertson and K. S. Jones. Relevance weighting of search terms. Journal of the American Society for Information science, 27(3):129--146, 1976.

[30]

S. E. Robertson, C. J. van Rijsbergen, and M. F. Porter. Probabilistic models of indexing and searching. In SIGIR, 1980.

Digital Library

[31]

G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Inf. Process. Manage., 24(5):513--523, Aug. 1988.

Digital Library

[32]

A. Severyn and A. Moschitti. Learning to rank short text pairs with convolutional deep neural networks. In SIGIR, 2015.

Digital Library

[33]

F. Song and W. B. Croft. A general language model for information retrieval. In CIKM, 1999.

Digital Library

[34]

K. Tymoshenko and A. Moschitti. Assessing the impact of syntactic and semantic structures for answer passages reranking. In CIKM, 2015.

Digital Library

[35]

X. Wei and W. B. Croft. Lda-based document models for ad-hoc retrieval. In SIGIR, 2006.

Digital Library

[36]

R. W. White, M. Richardson, and W.-t. Yih. Questions vs. queries in informational search tasks. In WWW Companion, 2015.

Digital Library

[37]

H. Wu, W. Wu, M. Zhou, E. Chen, L. Duan, and H.-Y. Shum. Improving search relevance for short queries in community question answering. In WSDM, 2014.

Digital Library

[38]

Q. Wu, C. J. Burges, K. M. Svore, and J. Gao. Adapting boosting for information retrieval measures. Inf. Retr., 13(3):254--270, June 2010.

Digital Library

[39]

X. Xue, J. Jeon, and W. B. Croft. Retrieval models for question and answer archives. In SIGIR, 2008.

Digital Library

[40]

C. Zhai. Statistical language models for information retrieval. Synthesis Lectures on HLT, 1(1):1--141, 2008.

Digital Library

[41]

C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In SIGIR, 2001.

Digital Library

[42]

W. Zhang, Z. Ming, Y. Zhang, L. Nie, T. Liu, and T. Chua. The use of dependency relation graph to enhance the term weighting in question retrieval. In COLING, 2012.

[43]

G. Zheng and J. Callan. Learning to reweight terms with distributed representations. In SIGIR, 2015.

Digital Library

Cited By

Levi OGuy IRaiber FKurland O(2018)Selective Cluster Presentation on the Search Results PageACM Transactions on Information Systems10.1145/315867236:3(1-42)Online publication date: 28-Feb-2018
https://dl.acm.org/doi/10.1145/3158672
Moschitti AMárquez LNakov PAgichtein EClarke CSzpektor IPerego RSebastiani FAslam JRuthven IZobel J(2016)SIGIR 2016 Workshop WebQA IIProceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval10.1145/2911451.2917767(1251-1252)Online publication date: 7-Jul-2016
https://dl.acm.org/doi/10.1145/2911451.2917767

Index Terms

That's Not My Question: Learning to Weight Unmatched Terms in CQA Vertical Search
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
      1. Learning to rank
      2. Novelty in information retrieval
    2. Retrieval tasks and goals
      1. Question answering

Recommendations

Recency and quality-based ranking question in CQAs: A Stack Overflow case study
Abstract
Recency ranking, in Community-based Question Answering (CQA), would refer to put recent answers in a list’s top positions. To be recent is not related to how new is the date of creation or editing of a given answer, but how current is ...
Highlights
- An automatic proposal for quality and recency-based answer ranking.
- Proposal of ...
Information Retrieval by Modified Term Weighting Method Using Random Walk Model with Query Term Position Ranking
ICSPS '09: Proceedings of the 2009 International Conference on Signal Processing Systems

Term weighting is a core idea behind any information retrieval technique which has crucial importance in document ranking. In graph based ranking algorithm, terms within a document are represented as a graph of that document. Term weights for ...
From query to question in one click: suggesting synthetic questions to searchers
WWW '13: Proceedings of the 22nd international conference on World Wide Web

In Web search, users may remain unsatisfied for several reasons: the search engine may not be effective enough or the query might not reflect their intent. Years of research focused on providing the best user experience for the data available to the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval

July 2016

1296 pages

ISBN:9781450340694

DOI:10.1145/2911451

General Chairs:
Raffaele Perego
ISTI-CNR, Italy
,
Fabrizio Sebastiani
Qatar Computing Research Institute, HBKU, Qatar
,
Program Chairs:
Javed Aslam
Northeastern University, US
,
Ian Ruthven
University of Strathclyde, UK
,
Justin Zobel
University of Melbourne, Australia

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 July 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGIR '16

Sponsor:

SIGIR

SIGIR '16: The 39th International ACM SIGIR conference on research and development in Information Retrieval

July 17 - 21, 2016

Pisa, Italy

Acceptance Rates

SIGIR '16 Paper Acceptance Rate 62 of 341 submissions, 18%;

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
314
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 06 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Levi OGuy IRaiber FKurland O(2018)Selective Cluster Presentation on the Search Results PageACM Transactions on Information Systems10.1145/315867236:3(1-42)Online publication date: 28-Feb-2018
https://dl.acm.org/doi/10.1145/3158672
Moschitti AMárquez LNakov PAgichtein EClarke CSzpektor IPerego RSebastiani FAslam JRuthven IZobel J(2016)SIGIR 2016 Workshop WebQA IIProceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval10.1145/2911451.2917767(1251-1252)Online publication date: 7-Jul-2016
https://dl.acm.org/doi/10.1145/2911451.2917767

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten