skip to main content
10.1145/3041823.3041837acmotherconferencesArticle/Chapter ViewAbstractPublication PagescodsConference Proceedingsconference-collections
research-article

Detecting Factual and Non-Factual Content in News Articles

Published: 09 March 2017 Publication History

Abstract

News articles are a major source of facts about the current state and events of our surrounding world. However, not all news articles are equally rich in presenting the facts. In this paper, we consider the problem of detecting factual and non-factual parts in news articles. We present a comprehensive survey on the existing literature on fact classification on news articles as well as a related and more widely studied problem of subjectivity vs objectivity classification of statements. Combining these techniques and some new features we design a framework for classifying facts and non-facts in news articles. We present extensive experiments on this task using several features and combinations of those on two datasets, one of which was used for subjectivity classification in previous works. We show that standard textual dataset dependent features such as n-grams produce good results on both datasets, but more general features such as part of speech tags and entity types produce inconsistent results. We analyze the results based on the nature of the datasets to present insights on the usefulness of the features and their applicability in the classification task we are considering.

References

[1]
A. Balahur, R. Steinberger, E. v. d. Goot, B. Pouliquen, and M. Kabadjov. Opinion mining on newspaper quotations. In Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology -Volume 03, WI-IAT '09, pages 523--526, Washington, DC, USA, 2009. IEEE Computer Society.
[2]
P. Biyani, C. Caragea, A. Singh, and P. Mitra. I want what i need!: Analyzing subjectivity of online forum threads. In Proceedings of the 21st A CM International Conference on Information and Knowledge Management, CIKM '12, pages 2495--2498, New York, NY, USA, 2012. ACM.
[3]
J. M. Chenlo and D. E. Losada. An empirical study of sentence features for subjectivity and polarity classification. Information Sciences, 280:275--288, 2014.
[4]
D. Corney, D. Albakour, M. Martinez, and S. Moussa. What do a million news articles look like? In Proceedings of the First International Workshop on Recent Trends in News Information Retrieval co-located with 38th European Conference on Information Retrieval (ECIR 2016), Padua, Italy, March 20, 2016., pages 42--47, 2016.
[5]
L. Deng and J. Wiebe. Mpqa 3.0: An entity/event-level sentiment corpus. In Conference of the North American Chapter of the Association of Computational Linguistics: Human Language Technologies, 2015.
[6]
R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9:1871--1874, 2008.
[7]
N. Hassan, B. Adair, J. T. Hamilton, C. Li, M. Tremayne, J. Yang, and C. Yu. The quest to automate fact-checking. world, 2015.
[8]
N. Hassan, C. Li, and M. Tremayne. Detecting check-worthy factual claims in presidential debates. In Proceedings of the 24th A CM International on Conference on Information and Knowledge Management, CIKM '15, pages 1835--1838, New York, NY, USA, 2015. ACM.
[9]
W. Jayawardene. A content analysis of online news media reporting on american health care reform. In Proceedings of the International Conference on Data Mining (DMIN), page 1. The Steering Committee of The World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp), 2012.
[10]
T. Joachims. Text categorization with support vector machines: Learning with many relevant features. In European conference on machine learning, pages 137--142. Springer, 1998.
[11]
I. Kastner and C. Monz. Automatic single-document key fact extraction from newswire articles. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, EACL '09, pages 415--423, Stroudsburg, PA, USA, 2009. Association for Computational Linguistics.
[12]
C. D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. J. Bethard, and D. McClosky. The Stanford CoreNLP natural language processing toolkit. In Association for Computational Linguistics (ACL) System Demonstrations, pages 55--60, 2014.
[13]
S. Oraby, L. Reed, R. Compton, E. Riloff, M. Walker, and S. Whittaker. And that's a fact: Distinguishing factual and emotional argumentation in online dialogue. NAACL HLT 2015, page 116, 2015.
[14]
B. Pang and L. Lee. Opinion mining and sentiment analysis. Found. Trends Inf. Retr., 2(1-2):1--135, Jan. 2008.
[15]
J. Park and C. Cardie. Identifying appropriate support for propositions in online user comments. In Proceedings of the First Workshop on Argumentation Mining, pages 29--38, Baltimore, Maryland, June 2014. Association for Computational Linguistics.
[16]
S. Regmi and B. K. Bal. What make facts stand out from opinions? distinguishing facts from opinions in news media. In Creativity in Intelligent, Technologies and Data Science, pages 655--662. Springer, 2015.
[17]
E. Riloff and W. Phillips. An introduction to the sundance and autoslog systems. Technical report, Technical Report UUCS-04-015, School of Computing, University of Utah, 2004.
[18]
E. Riloff and J. Wiebe. Learning extraction patterns for subjective expressions. In Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, EMNLP '03, pages 105--112, Stroudsburg, PA, USA, 2003. Association for Computational Linguistics.
[19]
E. Riloff, J. Wiebe, and W. Phillips. Exploiting subjectivity classification to improve information extraction. In Proceedings of the national conference on artificial intelligence, volume 20, page 1106. Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999, 2005.
[20]
K. Sagae, A. S. Gordon, M. Dehghani, M. Metke, J. S. Kim, S. I. Gimbel, C. Tipper, J. Kaplan, and M. H. Immordino-Yang. A data-driven approach for classification of subjectivity in personal narratives. In OASIcs-OpenAccess Series in Informatics, volume 32. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2013.
[21]
R. Sauri and J. Pustejovsky. Are you sure that this happened? assessing the factuality degree of events in text. Comput. Linguist., 38(2):261--299, June 2012.
[22]
R. Socher, A. Perelygin, J. Y. Wu, J. Chuang, C. D. Manning, A. Y. Ng, and C. Potts. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the conference on empirical methods in natural language processing (EMNLP), volume 1631, page 1642. Citeseer, 2013.
[23]
A. Stepinski and V. Mittal. A fact/opinion classifier for news articles. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '07, pages 807--808, New York, NY, USA, 2007. ACM.
[24]
F. Su and K. Markert. From words to senses: A case study of subjectivity recognition. In Proceedings of the 22Nd International Conference on Computational Linguistics - Volume 1, COLING '08, pages 825--832, Stroudsburg, PA, USA, 2008. Association for Computational Linguistics.
[25]
M. Tsytsarau and T. Palpanas. Survey on mining subjective data on the web. Data Min. Knowl. Discov., 24(3):478--514, May 2012.
[26]
P. D. Turney. Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL '02, pages 417--424, Stroudsburg, PA, USA, 2002. Association for Computational Linguistics.
[27]
M. van Erp, G. Satyukov, P. Vossen, and M. Nijssen. Discovering and visualising stories in news. In Proceedings of the 9th Language Resources and Evaluation Conference (LREC2014), Reykjavik, Iceland, May 26-31 2014.
[28]
C. van Son, M. van Erp, A. Fokkens, and P. Vossen. Hope and fear: Interpreting perspectives by integrating sentiment and event factuality. In Proceedings of the 9th Language Resources and Evaluation Conference (LREC2014), Reykjavik, Iceland, May 26-31 2014.
[29]
J. Wiebe and R. Mihalcea. Word sense and subjectivity. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, ACL-44, pages 1065--1072, Stroudsburg, PA, USA, 2006. Association for Computational Linguistics.
[30]
J. Wiebe and E. Riloff. Creating subjective and objective sentence classifiers from unannotated texts. In Proceedings of the 6th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing'05, pages 486--497, Berlin, Heidelberg, 2005. Springer-Verlag.
[31]
J. Wiebe, T. Wilson, and C. Cardie. Annotating expressions of opinions and emotions in language. Language Resources and Evaluation, 39(2):165--210, 2005.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
CODS '17: Proceedings of the 4th ACM IKDD Conferences on Data Sciences
March 2017
136 pages
ISBN:9781450348461
DOI:10.1145/3041823
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 March 2017

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

CODS '17

Acceptance Rates

Overall Acceptance Rate 197 of 680 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)28
  • Downloads (Last 6 weeks)3
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media