skip to main content
10.1145/3394231.3397902acmconferencesArticle/Chapter ViewAbstractPublication PageswebsciConference Proceedingsconference-collections
research-article
Open access

Measuring and Characterizing Hate Speech on News Websites

Published: 06 July 2020 Publication History

Abstract

The Web has become the main source for news acquisition. At the same time, news discussion has become more social: users can post comments on news articles or discuss news articles on other platforms like Reddit. These features empower and enable discussions among the users; however, they also act as the medium for the dissemination of toxic discourse and hate speech. The research community lacks a general understanding on what type of content attracts hateful discourse and the possible effects of social networks on the commenting activity on news articles.
In this work, we perform a large-scale quantitative analysis of 125M comments posted on 412K news articles over the course of 19 months. We analyze the content of the collected articles and their comments using temporal analysis, user-based analysis, and linguistic analysis, to shed light on what elements attract hateful comments on news articles. We also investigate commenting activity when an article is posted on either 4chan’s Politically Incorrect board (/pol/) or six selected subreddits. We find statistically significant increases in hateful commenting activity around real-world divisive events like the “Unite the Right” rally in Charlottesville and political events like the second and third 2016 US presidential debates. Also, we find that articles that attract a substantial number of hateful comments have different linguistic characteristics when compared to articles that do not attract hateful comments. Furthermore, we observe that the post of a news articles on either /pol/ or the six subreddits is correlated with an increase of (hateful) commenting activity on the news articles.

Supplementary Material

MP4 File (3394231.3397902.mp4)
Presentation Video

References

[1]
2019. Disqus API. https://disqus.com/api/docs/.
[2]
2019. Facebook Graph API. https://developers.facebook.com/docs/graph-api/.
[3]
2019. Fleiss’ Kappa. https://en.wikipedia.org/wiki/Fleiss_kappa.
[4]
2019. Full list of sites we use. https://bit.ly/2XZtwvA.
[5]
2019. List of monthly views. https://bit.ly/3bB9vzi.
[6]
2019. Media Bias/Fact Check Site. https://mediabiasfactcheck.com/.
[7]
2019. Newspaper3k. https://newspaper.readthedocs.io/en/latest/.
[8]
2019. SimilarWeb Site. https://www.similarweb.com/.
[9]
2019. Spot.Im API. https://developers.spot.im/.
[10]
2019. Virus Total API. https://www.virustotal.com/.
[11]
Ramy Baly, Georgi Karadzhov, Dimitar Alexandrov, James Glass, and Preslav Nakov. 2018. Predicting factuality of reporting and bias of news media sources. In EMNLP.
[12]
Jason Baumgartner, Savvas Zannettou, Brian Keegan, Megan Squire, and Jeremy Blackburn. 2020. The Pushshift Reddit Dataset. In ICWSM.
[13]
Dylan Byers. 2016. Trump picks Sean Spicer as White House press secretary. http://cnnmon.ie/2hZDxUE.
[14]
Christina Caron. 2017. Heather Heyer, Charlottesville Victim, Is Recalled as “a Strong Woman”. https://nyti.ms/2vuxFZx.
[15]
Eshwar Chandrasekharan, Umashanthi Pavalanathan, Anirudh Srinivasan, Adam Glynn, Jacob Eisenstein, and Eric Gilbert. 2017. You can’t stay here: The efficacy of reddit’s 2015 ban examined through hate speech. CSCW (2017).
[16]
Eshwar Chandrasekharan, Mattia Samory, Anirudh Srinivasan, and Eric Gilbert. 2017. The bag of communities: Identifying abusive behavior online with preexisting Internet data. In CHI.
[17]
Despoina Chatzakou, Nicolas Kourtellis, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, and Athena Vakali. 2017. Mean birds: Detecting aggression and bullying on twitter. In WebSci.
[18]
Stephen Collinson. 2016. It’s official: Trump is Republican nominee. http://cnn.it/2a6ytZN.
[19]
Thomas Davidson, Dana Warmsley, Michael Macy, and Ingmar Weber. 2017. Automated Hate Speech Detection and the Problem of Offensive Language. In ICWSM.
[20]
Nicholas Diakopoulos and Mor Naaman. 2011. Towards quality discourse in online news comments. In CSCW.
[21]
Nemanja Djuric, Jing Zhou, Robin Morris, Mihajlo Grbovic, Vladan Radosavljevic, and Narayan Bhamidipati. 2015. Hate Speech Detection with Comment Embeddings. In WWW.
[22]
Mai ElSherief, Vivek Kulkarni, Dana Nguyen, William Yang Wang, and Elizabeth M. Belding-Royer. 2018. Hate Lingo: A Target-based Linguistic Analysis of Hate Speech in Social Media. In ICWSM.
[23]
Mai ElSherief, Shirin Nilizadeh, Dana Nguyen, Giovanni Vigna, and Elizabeth M. Belding-Royer. 2018. Peer to Peer Hate: Hate Speech Instigators and Their Targets. In ICWSM.
[24]
Karmen Erjavec and Melita Poler Kovačič. 2012. “You Don’t Understand, This is a New War!” Analysis of Hate Speech in News Web Sites’ Comments. Mass Communication and Society(2012).
[25]
Claudia Flores-Saviaga, Brian C Keegan, and Saiph Savage. 2018. Mobilizing the Trump Train: Understanding Collective Action in a Political Trolling Community. In ICWSM.
[26]
Antigoni-Maria Founta, Despoina Chatzakou, Nicolas Kourtellis, Jeremy Blackburn, Athena Vakali, and Ilias Leontiadis. 2019. A Unified Deep Learning Architecture for Abuse Detection. WebSci.
[27]
Fox News. 2016. Congress passes bill letting 9/11 victims sue Saudi Arabia, in face of veto threat. http://fxn.ws/2cKQFuW.
[28]
Fox News. 2018. Intel report says Putin ordered campaign to influence US election. http://fxn.ws/2jjHnt0.
[29]
Björn Gambäck and Utpal Kumar Sikdar. 2017. Using Convolutional Neural Networks to Classify Hate-Speech. In Workshop on Abusive Language Online.
[30]
Lei Gao and Ruihong Huang. 2017. Detecting Online Hate Speech Using Context Aware Models. In RANLP.
[31]
Lei Gao, Alexis Kuppersmith, and Ruihong Huang. 2017. Recognizing Explicit and Implicit Hate Speech Using a Weakly Supervised Two-path Bootstrapping Approach. In IJCNLP.
[32]
Emanuella Gringberg and Eric Levenson. 2018. At least 17 dead in Florida school shooting, law enforcement says. https://edition.cnn.com/2018/02/14/us/florida-high-school-shooting/index.html.
[33]
Summer Harlow. 2015. Story-chatterers stirring up hate: Racist discourse in reader comments on US newspaper websites. Howard Journal of Communications(2015).
[34]
Barney Henderson. 2016. Donald Trump and Hillary Clinton to clash in Las Vegas “Fight Night” debate: US election briefing and polls. https://bit.ly/3cDVKQu.
[35]
Gabriel Emile Hine, Jeremiah Onaolapo, Emiliano De Cristofaro, Nicolas Kourtellis, Ilias Leontiadis, Riginos Samaras, Gianluca Stringhini, and Jeremy Blackburn. 2017. Kek, Cucks, and God Emperor Trump: A Measurement Study of 4chan’s Politically Incorrect Forum and its Effects on the Web. ICWSM.
[36]
Steve Holland and Emily Stephenson. 2017. Trump, now president, pledges to put “America First” in nationalist speech. http://reut.rs/2iQMMmK.
[37]
Matthew W Hughey and Jessie Daniels. 2013. Racist comments at online news sites: a methodological dilemma for discourse analysis. Media, Culture & Society(2013).
[38]
Shagun Jhaver, Sucheta Ghoshal, Amy Bruckman, and Eric Gilbert. 2018. Online harassment and content moderation: The case of blocklists. TOCHI (2018).
[39]
Rebecca Killick, Paul Fearnhead, and Idris A Eckley. 2012. Optimal detection of changepoints with a linear computational cost. J. Amer. Statist. Assoc.(2012).
[40]
Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a social network or a news media?. In WWW.
[41]
Irene Kwok and Yuzhou Wang. 2013. Locate the Hate: Detecting Tweets against Blacks. In AAAI.
[42]
Annabelle Lukin. 2013. Journalism, ideology and linguistics: The paradox of Chomsky’s linguistic legacy and his ’propaganda model’. Journalism (2013).
[43]
Elaine Ly and Angela Dewan. 2016. Thousands say “No” to Brexit in colorful protest. http://cnn.it/29drqiT.
[44]
Enrico Mariconti, Guillermo Suarez-Tangil, Jeremy Blackburn, Emiliano De Cristofaro, Nicolas Kourtellis, Ilias Leontiadis, Jordi Luque Serrano, and Gianluca Stringhini. 2019. “You Know What to Do”: Proactive Detection of YouTube Videos Targeted by Coordinated Hate Attacks. In CSCW.
[45]
Jonathan Martin and Amy Chozick. 2016. Hillary Clinton’s Doctor Says Pneumonia Led to Abrupt Exit From 9/11 Event. https://nyti.ms/2cFiCkr.
[46]
Katherine C McAdams. 1984. Psycholinguistics explains many journalism caveats. The Journalism Educator(1984).
[47]
Mainack Mondal, Leandro Araújo Silva, and Fabrício Benevenuto. 2017. A Measurement Study of Hate Speech in Social Media. In HT.
[48]
Mainack Mondal, Leandro Araújo Silva, and Fabrício Benevenuto. 2017. A measurement study of hate speech in social media. In HT.
[49]
Nytimes. 2017. Multiple Weapons Found in Las Vegas Gunman’s Hotel Room. https://nyti.ms/2fKkQ8p.
[50]
Alexandra Olteanu, Carlos Castillo, Jeremy Boy, and Kush R Varshney. 2018. The effect of extremist violence on hateful speech online. In ICWSM.
[51]
John Pavlopoulos, Prodromos Malakasiotis, and Ion Androutsopoulos. 2017. Deep learning for user comment moderation. In ACL.
[52]
James W. Pennebaker, Ryan L. Boyd, Kayla Jordan, and Kate Blackburn. 2015. The Development and Psychometric Properties of LIWC2015.
[53]
[53] Perspective API.2018. https://www.perspectiveapi.com/.
[54]
Haji Mohammad Saleem, Kelly P. Dillon, Susan Benesch, and Derek Ruths. 2017. A Web of Hate: Tackling Hateful Speech in Online Social Spaces. CoRR (2017).
[55]
Joan Serra, Ilias Leontiadis, Dimitris Spathis, Gianluca Stringhini, Jeremy Blackburn, and Athena Vakali. 2017. Class-based Prediction Errors to Detect Hate Speech with Out-of-vocabulary Words.
[56]
Rebecca Shabad. 2016. Second presidential debate 2016: What time, how to watch and live stream online. https://cbsn.ws/2S0a4eh.
[57]
David Sherfinski. 2016. Kellyanne Conway selected as Donald Trump’s counselor. https://go.shr.lc/2TOpkcv.
[58]
Leandro Araújo Silva, Mainack Mondal, Denzil Correa, Fabrício Benevenuto, and Ingmar Weber. 2016. Analyzing the Targets of Hate in Online Social Media. In ICWSM.
[59]
Tom De Smedt, Guy De Pauw, and Pieter Van Ostaeyen. 2018. Automatic Detection of Online Jihadist Hate Speech. CoRR (2018).
[60]
Hawes Spencer. 2017. A Far-Right Gathering Bursts Into Brawls. https://nyti.ms/2uTmIgV.
[61]
Manos Tsagkias, Wouter Weerkamp, and Maarten De Rijke. 2010. News comments: Exploring, modeling, and online prediction. In ECIR.
[62]
Tom Van Hout. 2015. Between text and social practice: Balancing linguistics and ethnography in journalism studies. In Linguistic Ethnography.
[63]
Fabio Del Vigna, Andrea Cimino, Felice Dell’Orletta, Marinella Petrocchi, and Maurizio Tesconi. 2017. Hate Me, Hate Me Not: Hate Speech Detection on Facebook. In ITASEC.
[64]
William Warner and Julia Hirschberg. 2012. Detecting Hate Speech on the World Wide Web. In Workshop on Language in Social Media.
[65]
Eli Watkins. 2018. Trump taunts North Korea: My nuclear button is “much bigger,” “more powerful”. http://cnn.it/2A7Q4e5.
[66]
Savvas Zannettou, Barry Bradlyn, Emiliano De Cristofaro, Haewoon Kwak, Michael Sirivianos, Gianluca Stringini, and Jeremy Blackburn. 2018. What is Gab: A Bastion of Free Speech or an Alt-Right Echo Chamber. In WWW Companion.
[67]
Savvas Zannettou, Tristan Caulfield, Jeremy Blackburn, Emiliano De Cristofaro, Michael Sirivianos, Gianluca Stringhini, and Guillermo Suarez-Tangil. 2018. On the Origins of Memes by Means of Fringe Web Communities. In IMC.
[68]
Savvas Zannettou, Tristan Caulfield, Emiliano De Cristofaro, Nicolas Kourtellis, Ilias Leontiadis, Michael Sirivianos, Gianluca Stringhini, and Jeremy Blackburn. 2017. The Web Centipede: Understanding How Web Communities Influence Each Other Through the Lens of Mainstream and Alternative News Sources. In IMC.
[69]
Savvas Zannettou, Joel Finkelstein, Barry Bradlyn, and Jeremy Blackburn. 2020. A Quantitative Approach to Understanding Online Antisemitism. ICWSM.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WebSci '20: Proceedings of the 12th ACM Conference on Web Science
July 2020
361 pages
ISBN:9781450379892
DOI:10.1145/3394231
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 July 2020

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

WebSci '20
Sponsor:
WebSci '20: 12th ACM Conference on Web Science
July 6 - 10, 2020
Southampton, United Kingdom

Acceptance Rates

Overall Acceptance Rate 245 of 933 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)451
  • Downloads (Last 6 weeks)37
Reflects downloads up to 24 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media