skip to main content
10.1145/2566486.2567999acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Joint question clustering and relevance prediction for open domain non-factoid question answering

Published: 07 April 2014 Publication History

Abstract

Web searches are increasingly formulated as natural language questions, rather than keyword queries. Retrieving answers to such questions requires a degree of understanding of user expectations. An important step in this direction is to automatically infer the type of answer implied by the question, e.g., factoids, statements on a topic, instructions, reviews, etc. Answer Type taxonomies currently exist for factoid-style questions, but not for open-domain questions. Building taxonomies for non-factoid questions is a harder problem since these questions can come from a very broad semantic space. A few attempts have been made to develop taxonomies for non-factoid questions, but these tend to be too narrow or domain specific. In this paper, we address this problem by modeling the Answer Type as a latent variable that is learned in a data-driven fashion, allowing the model to be more adaptive to new domains and data sets. We propose approaches that detect the relevance of candidate answers to a user question by jointly 'clustering' questions according to the hidden variable, and modeling relevance conditioned on this hidden variable.
In this paper we propose 3 new models: (a) Logistic Regression Mixture (LRM), (b) Glocal Logistic Regression Mixture (G-LRM) and (c) Mixture Glocal Logistic Regression Mixture (MG-LRM) that automatically learn question-clusters and cluster-specific relevance models. All three models perform better than a baseline relevance model that uses explicit Answer Type categories predicted by a supervised Answer-Type classifier, on a newsgroups dataset. Our models also perform better than a baseline relevance model that does not use any answer-type information on a blogs dataset.

References

[1]
Bing feature update: Searching for a good deal? new natural language capabilities in bing shopping understand prices. http://www.bing.com/blogs/site_blogs/b/search/archive/2011/03/01/bing-feature-update-searching-for-a-good-deal-new-natural-language-capabilities-in-bing-shopping-understand-prices.aspx.
[2]
Meet hummingbird: Google just revamped search to answer your long questions better. http://www.forbes.com/sites/roberthof/2013/09/26/google-just-revamped-search-to-handle-your-long-questions.
[3]
N. Aikawa, T. Sakai, and H. Yamana. Community QA Question Classification: Is the Asker Looking for Subjective Answers or Not? IPSJ Online Transactions, 4:160--168, 2011.
[4]
S. M. Beitzel, E. C. Jensen, O. Frieder, D. Grossman, D. D. Lewis, A. Chowdhury, and A. Kolcz. Automatic web query classification using labeled and unlabeled training data. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '05, pages 581--582, New York, NY, USA, 2005. ACM.
[5]
S. M. Beitzel, E. C. Jensen, O. Frieder, D. D. Lewis, A. Chowdhury, and A. Kolcz. Improving automatic query classification via semi-supervised learning. In Proceedings of the 5th IEEE International Conference on Data Mining, ICDM '05, pages 42--49, Washington, DC, USA, 2005. IEEE Computer Society.
[6]
M. S. Bernstein, J. Teevan, S. Dumais, D. Liebling, and E. Horvitz. Direct answers for search queries in the long tail. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI '12, pages 237--246, New York, NY, USA, 2012. ACM.
[7]
A. Broder. A taxonomy of web search. SIGIR Forum, 36(2):3--10, Sept. 2002.
[8]
F. Bu, X. Zhu, Y. Hao, and X. Zhu. Function-based question classification for general qa. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP '10, pages 1119--1128, Stroudsburg, PA, USA, 2010. Association for Computational Linguistics.
[9]
H. Cao, D. H. Hu, D. Shen, D. Jiang, J.-T. Sun, E. Chen, and Q. Yang. Context-aware query classification. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, SIGIR '09, pages 3--10, New York, NY, USA, 2009. ACM.
[10]
L. Chen, D. Zhang, and L. Mark. Understanding user intent in community question answering. In Proceedings of the 21st international conference companion on World Wide Web, WWW '12 Companion, pages 823--828, New York, NY, USA, 2012. ACM.
[11]
M. Chen, J.-T. Sun, X. Ni, and Y. Chen. Improving context-aware query classification via adaptive self-training. In Proceedings of the 20th ACM international conference on Information and knowledge management, CIKM '11, pages 115--124, New York, NY, USA, 2011. ACM.
[12]
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B, 39:1--38, 1977.
[13]
R. Florian, H. Hassan, A. Ittycheriah, H. Jing, N. Kambhatla, X. Luo, N. Nicolov, and S. Roukos. A statistical model for multilingual entity detection and tracking. In D. M. Susan Dumais and S. Roukos, editors, HLT-NAACL 2004: Main Proceedings, pages 1--8, Boston, MA, USA, May 2 - May 7 2004. Association for Computational Linguistics.
[14]
P. E. Gill and W. Murray. Minimization Subject to Bounds on the Variables. NPL Report NAC72, 1976.
[15]
P. E. Gill, W. Murray, and M. H. Wright. Practical optimization. Academic Press Inc. {Harcourt Brace Jovanovich Publishers}, London, 1981.
[16]
N. Goharian and S. S. Mengle. Context aware query classification using dynamic query window and relationship net. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, SIGIR '10, pages 723--724, New York, NY, USA, 2010. ACM.
[17]
U. Hermjakob. Parsing and question classification for question answering. In Proceedings of the workshop on Open-domain question answering - Volume 12, ODQA '01, pages 1--6, Stroudsburg, PA, USA, 2001. Association for Computational Linguistics.
[18]
R. Higashinaka and H. Isozaki. Corpus-based question answering for why-questions. In In Proceedings of IJCNLP, pages 418--425, 2008.
[19]
E. Hovy, L. Gerber, U. Hermjakob, C.-Y. Lin, and D. Ravichandran. Toward semantics-based answer pinpointing. In Proceedings of the first international conference on Human language technology research, HLT '01, pages 1--7, Stroudsburg, PA, USA, 2001. Association for Computational Linguistics.
[20]
Z. Huang, M. Thint, and Z. Qin. Question classification using head words and their hypernyms. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP '08, pages 927--936, Stroudsburg, PA, USA, 2008. Association for Computational Linguistics.
[21]
R. Jones, B. Rey, O. Madani, and W. Greiner. Generating query substitutions. In Proceedings of the 15th international conference on World Wide Web, WWW '06, pages 387--396, New York, NY, USA, 2006. ACM.
[22]
A. Lally, J. M. Prager, M. C. McCord, B. Boguraev, S. Patwardhan, J. Fan, P. Fodor, and J. Chu-Carroll. Question analysis: How watson reads a clue. IBM Journal of Research and Development, 56(3):2, 2012.
[23]
E. H. Laurie, L. Gerber, U. Hermjakob, M. Junk, and C. yew Lin. Question answering in webclopedia. In Proceedings of the Ninth Text REtrieval Conference (TREC-9, pages 655--664, 2000.
[24]
M. Le Nguyen, T. T. Nguyen, and A. Shimazu. Subtree mining for question classification problem. In Proceedings of the 20th international joint conference on Artifical intelligence, IJCAI'07, pages 1695--1700, San Francisco, CA, USA, 2007. Morgan Kaufmann Publishers Inc.
[25]
B. Li, Y. Liu, and E. Agichtein. Cocqa: Co-training over questions and answers with an application to predicting question subjectivity orientation. In EMNLP, pages 937--946. ACL, 2008.
[26]
B. Li, Y. Liu, A. Ram, E. V. Garcia, and E. Agichtein. Exploring question subjectivity prediction in community qa. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '08, pages 735--736, New York, NY, USA, 2008. ACM.
[27]
X. Li and D. Roth. Learning question classifiers. In Proceedings of the 19th international conference on Computational linguistics - Volume 1, COLING '02, pages 1--7, Stroudsburg, PA, USA, 2002. Association for Computational Linguistics.
[28]
X. Luo, H. Raghavan, V. Castelli, S. Maskey, and R. Florian. Finding What Matters in Questions. In Proceedings of NAACL-HLT, pages 878--887, 2013.
[29]
D. Moldovan, S. Harabagiu, A. Harabagiu, M. Pasca, R. Mihalcea, R. Girju, R. Goodrum, V. Rus, and I. Background. The structure and performance of an open-domain question answering system. In In Proceedings of the Conference of the Association for Computational Linguistics (ACL-2000, pages 563--570, 2000.
[30]
A. Moschitti, S. Quarteroni, R. Basili, and S. Manandhar. Exploiting syntactic and shallow semantic kernels for question answer classification. In Proc. of ACL-07, pages 776--783, 2007.
[31]
J.-H. Oh, K. Torisawa, C. Hashimoto, T. Kawada, S. De Saeger, J. Kazama, and Y. Wang. Why question answering using sentiment analysis and word classes. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL '12, pages 368--378, Stroudsburg, PA, USA, 2012. Association for Computational Linguistics.
[32]
B. Qu, G. Cong, C. Li, A. Sun, and H. Chen. An evaluation of classification models for question topic categorization. Journal of the American Society for Information Science and Technology, 63(5):889--903, 2012.
[33]
M. Razmara and L. Kosseim. Answering list questions using co-occurrence and clustering. In LREC, 2008.
[34]
R. Srihari and W. Li. A question answering system supported by information extraction. In Proceedings of the sixth conference on Applied natural language processing, ANLC '00, pages 166--172, Stroudsburg, PA, USA, 2000. Association for Computational Linguistics.
[35]
S. Verberne, L. Boves, N. Oostdijk, and P.-A. Coppen. Evaluating discourse-based answer extraction for why-question answering. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '07, pages 735--736, New York, NY, USA, 2007. ACM.
[36]
E. M. Voorhees. Overview of the TREC 2004 question answering track. In TREC, 2004.
[37]
X. Wei and W. B. Croft. Lda-based document models for ad-hoc retrieval. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '06, pages 178--185, New York, NY, USA, 2006. ACM.
[38]
J.-R. Wen, J.-Y. Nie, and H.-J. Zhang. Clustering user queries of a search engine. In Proceedings of the 10th international conference on World Wide Web, WWW '01, pages 162--168, New York, NY, USA, 2001. ACM.
[39]
T. C. Zhou, X. Si, E. Y. Chang, I. King, and M. R. Lyu. A data-driven approach to question subjectivity identification in community question answering, 2012.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '14: Proceedings of the 23rd international conference on World wide web
April 2014
926 pages
ISBN:9781450327442
DOI:10.1145/2566486

Sponsors

  • IW3C2: International World Wide Web Conference Committee

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 April 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. latent variable models
  2. question answering
  3. question clustering
  4. relevance prediction

Qualifiers

  • Research-article

Conference

WWW '14
Sponsor:
  • IW3C2

Acceptance Rates

WWW '14 Paper Acceptance Rate 84 of 645 submissions, 13%;
Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media