skip to main content
10.1145/1995966.1995999acmconferencesArticle/Chapter ViewAbstractPublication PageshtConference Proceedingsconference-collections
research-article

A community question-answering refinement system

Published: 06 June 2011 Publication History

Abstract

Community Question Answering (CQA) websites, which archive millions of questions and answers created by CQA users to provide a rich resource of information that is missing at web search engines and QA websites, have become increasingly popular. Web users who search for answers to their questions at CQA websites, however, are often required to either (i) wait for days until other CQA users post answers to their questions which might even be incorrect, offensive, or spam, or (ii) deal with restricted answer sets created by CQA websites due to the exact-match constraint that is employed and imposed between archived questions and user-formulated questions. To automate and enhance the process of locating high-quality answers to a user's question Q at a CQA website, we introduce a CQA refinement system, called QAR. Given Q, QAR first retrieves a set of CQA questions QS that are the same as, or similar to, Q in terms of its specified information need. Thereafter, QAR selects as answers to Q the top-ranked answers (among the ones to the questions in QS) based on various similarity scores and the length of the answers. Empirical studies, which were conducted using questions provided by the Text Retrieval Conference (TREC) and Text Analysis Conference (TAC), in addition to more than four millions questions (and their corresponding answers) extracted from Yahoo! Answers, show that QAR is effective in locating archived answers, if they exist, that satisfy the information need specified in Q. We have further assessed the performance of QAR by comparing its question-matching and answer-ranking strategies with their Yahoo! Answers' counterparts and verified that QAR outperforms Yahoo! Answers in (i) locating the set of questions QS that have the highest degrees of similarity with Q and (ii) ranking archived answers to QS as answers to Q.

References

[1]
Agichtein, E., Castillo, C., Donato, D., Gionis, A., Mishne, G.:Finding High-quality Content in Social Media.In: Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM), pp. 183--193. (2008)
[2]
Bendersky, M., Croft, W.:Discovering Key Concepts in Verbose Queries. In: Proceedings of the International ACM Conference on Research and Development in Information Retrieval (SIGIR), pp. 491--498. (2008)
[3]
Bian, J., Liu, Y., Agichtein, E., Zha, H.: Finding the Right Facts in the Crowd: Factoid Question Answering overSocial Media.In: Proceedings of the International Conference on World Wide Web (WWW), pp. 467--476. (2008)
[4]
Bian, J., Liu, Y., Agichtein, E., Zha. H.:A Few Bad Votes Too Many?: Towards Robust Ranking in Social Media.In: Proceedings of the International Workshop on Adversarial Information Retrieval on the Web (AIRWeb), pp. 53--60. (2008)
[5]
Brants, T., Franz, A.:Web IT 5-gram Version 1 (www.ldc. upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T13). (2006)
[6]
Cao, X., Cong, G., Cui, B., Jensen, C., Zhang, C.: The Use of Categorization Information in Language Models for Question Retrieval.In: Proceedings of the ACM Conference on Information and Knowledge Management (CIKM), pp. 265--274. (2009)
[7]
Christen, P.:Automatic Record Linkage Using Seeded Nearest Neighbor and Support VectorMachine Classification. In: Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 151--159. (2008)
[8]
Church, K., Gale, W.:Poison Mixtures. Natural Language Engineering.1(2), 163--190 (1995)
[9]
Croft, W., Metzler, D., Strohman, T.:Search Engines: Information Retrieval in Practice. Addison Wesley, (2010)
[10]
Gustafson, N., Ng, Y.-K.:Augmenting Data Retrieval with Information Retrieval Techniques by Using Word Similarity. In: Proceedings of the International Conference on Applications of Natural Language to Information Systems (NLDB), pp. 163--174. (2008)
[11]
Hoscher, C., Strube, G.:Web Search Behavior of Internet Experts and Newbies. Computer Networks: The International Journal of Computer and Telecommunications Networking.33, 337--346 (2000)
[12]
Jeon, J., Croft, W., Lee, J.: Finding Similar Questions in Large Question and Answer Archives. In: Proceedings of the ACM Conference on Information and Knowledge Management (CIKM), pp. 84--90. (2005)
[13]
Jeon, J., Croft, W., Lee, J., Park. S.:A Framework to Predict the Quality of Answers with Non-textual Features. In: Proceedings of the International ACM Conference on Research and Development in Information Retrieval (SIGIR), pp. 228--235. (2006)
[14]
Judea, P.:Probabilistic Reasoning in the Intelligent Systems: Networks of PlausibleInference.Morgan Kaufmann. (1988)
[15]
Kelley, R.: Blocking Considerations for Record Linkage Under Conditions of Uncertainty. In: Proceedings of Social Statistics Section, pp. 602--605. (1984)
[16]
Koberstein, J., Ng, Y.-K.:Using Word Clusters to Detect Similar Web Documents. In: Proceedings of the International Conference on Knowledge Science, Engineering and Management (KSEM), pages 215--228. 2006.
[17]
Lee, C., Rodrigues, E., Kazai, G., Milic-Frayling, N., Ignjatovic. A.:Model for Voter Scoring and Best Answer Selection in Community Q&A Services.In: Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (WI-IAT), pp. 116--123. (2009)
[18]
Liu, Y., Agichtein, E.: On the Evolution of the Yahoo! Answers QA Community.In: Proceedings of the International ACM Conference on Research and Development in Information Retrieval (SIGIR), pp. 737--738. (2008)
[19]
Luger, G.:Artificial Intelligence: Structures and Strategies for Complex Problem Solving, 6th Ed. Addison Wesley. (2009)
[20]
Manning, C., Raghavan, P., Schutze, H.: Introduction to Information Retrieval,Cambridge University Press. 2008.
[21]
Pera, M.S., Lund, W., Ng, Y.-K.:A Sophisticated Library Search Strategy Using Folksonomies and SimilarityMatches. Journal of the American Society for Information Science and Technology (JASIST). 60(7), 1392--1406 (2009)
[22]
Sahami, M., Heilman, T.:A Web-Based Kernel Function for Measuring the Similarity of Short Text Snippets.In: Proceedings of the International Conference on World Wide Web (WWW),pp. 377--386. (2006)
[23]
Salton G., Buckley C.:Term-Weighting Approaches in Automatic Text Retrieval. Information Processing and Management: an International Journal. 24(5), 513--523 (1988)
[24]
Spink, A., Ozmutlu, S., Ozmutlu, H., Jansen, B.:U.S. versus European Web Searching Trends.ACM SIGIR Forum.36(2), 32--38 (2002)
[25]
Suryanto, M., Lim, E., Sun, A., Chiang, R.: Quality-aware Collaborative Question Answering: Methods and Evaluation.In: Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM), pp. 142--151. (2009)
[26]
Tu, X., Wang, X., Feng, D., Zhang, L.:Ranking Community Answers via Analogical Reasoning. In: Proceedings of the International Conference on World Wide Web (WWW), pp. 1227--1228. (2009)
[27]
Wang, K., Ming, Z., Chua, T.: A Syntactic Tree Matching Approach to Finding Similar Questions inCommunity-based QA Services. In: Proceedings of the International ACM Conference on Research and Development in Information Retrieval (SIGIR), pp. 187--194. (2009)
[28]
Wang, X., Tu, X., Feng, D., Zhang, L.: Ranking Community Answers by Modeling Question-Answer Relationships viaAnalogical Reasoning.In: Proceedings of the International ACM Conference on Research and Development in Information Retrieval (SIGIR), pp. 179--186. (2009)
[29]
Xue, X., Jeon, J., Croft, W.:Retrieval Models for Question and Answer Archives. In: Proceedings of the International ACM Conference on Research and Development in Information Retrieval (SIGIR), pp. 475--482. (2008)
[30]
Yahoo! Webscope Dataset.:L6-Yahoo! Answers Comprehensive Questions and Answers version 1.0.http://research.yahoo.com/Academic\_Relations. 2009.
[31]
Zhou, Y., Croft, W.:Query Performance Prediction in Web Search Environments. In: Proceedings of the International ACM Conference on Research and Development in Information Retrieval (SIGIR), pp. 543--550. (2007)

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
HT '11: Proceedings of the 22nd ACM conference on Hypertext and hypermedia
June 2011
348 pages
ISBN:9781450302562
DOI:10.1145/1995966
  • General Chair:
  • Paul De Bra,
  • Program Chair:
  • Kaj Grønbæk
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. answer ranking
  2. community question answering
  3. question matching
  4. word similarity measure

Qualifiers

  • Research-article

Conference

HT '11
Sponsor:
HT '11: 22nd ACM Conference on Hypertext and Hypermedia
June 6 - 9, 2011
Eindhoven, The Netherlands

Acceptance Rates

Overall Acceptance Rate 378 of 1,158 submissions, 33%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media