research-article

Open access

Measuring and Characterizing Hate Speech on News Websites

Authors:

Savvas Zannettou,

Elizabeth Belding,

Shirin Nilizadeh,

Gianluca StringhiniAuthors Info & Claims

WebSci '20: Proceedings of the 12th ACM Conference on Web Science

Pages 125 - 134

https://doi.org/10.1145/3394231.3397902

Published: 06 July 2020 Publication History

All formats PDF

Abstract

The Web has become the main source for news acquisition. At the same time, news discussion has become more social: users can post comments on news articles or discuss news articles on other platforms like Reddit. These features empower and enable discussions among the users; however, they also act as the medium for the dissemination of toxic discourse and hate speech. The research community lacks a general understanding on what type of content attracts hateful discourse and the possible effects of social networks on the commenting activity on news articles.

In this work, we perform a large-scale quantitative analysis of 125M comments posted on 412K news articles over the course of 19 months. We analyze the content of the collected articles and their comments using temporal analysis, user-based analysis, and linguistic analysis, to shed light on what elements attract hateful comments on news articles. We also investigate commenting activity when an article is posted on either 4chan’s Politically Incorrect board (/pol/) or six selected subreddits. We find statistically significant increases in hateful commenting activity around real-world divisive events like the “Unite the Right” rally in Charlottesville and political events like the second and third 2016 US presidential debates. Also, we find that articles that attract a substantial number of hateful comments have different linguistic characteristics when compared to articles that do not attract hateful comments. Furthermore, we observe that the post of a news articles on either /pol/ or the six subreddits is correlated with an increase of (hateful) commenting activity on the news articles.

Supplementary Material

MP4 File (3394231.3397902.mp4)

Presentation Video

Download
20.87 MB

References

[1]

2019. Disqus API. https://disqus.com/api/docs/.

[2]

2019. Facebook Graph API. https://developers.facebook.com/docs/graph-api/.

[3]

2019. Fleiss’ Kappa. https://en.wikipedia.org/wiki/Fleiss_kappa.

[4]

2019. Full list of sites we use. https://bit.ly/2XZtwvA.

[5]

2019. List of monthly views. https://bit.ly/3bB9vzi.

[6]

2019. Media Bias/Fact Check Site. https://mediabiasfactcheck.com/.

[7]

2019. Newspaper3k. https://newspaper.readthedocs.io/en/latest/.

[8]

2019. SimilarWeb Site. https://www.similarweb.com/.

[9]

2019. Spot.Im API. https://developers.spot.im/.

[10]

2019. Virus Total API. https://www.virustotal.com/.

[11]

Ramy Baly, Georgi Karadzhov, Dimitar Alexandrov, James Glass, and Preslav Nakov. 2018. Predicting factuality of reporting and bias of news media sources. In EMNLP.

[12]

Jason Baumgartner, Savvas Zannettou, Brian Keegan, Megan Squire, and Jeremy Blackburn. 2020. The Pushshift Reddit Dataset. In ICWSM.

[13]

Dylan Byers. 2016. Trump picks Sean Spicer as White House press secretary. http://cnnmon.ie/2hZDxUE.

[14]

Christina Caron. 2017. Heather Heyer, Charlottesville Victim, Is Recalled as “a Strong Woman”. https://nyti.ms/2vuxFZx.

[15]

Eshwar Chandrasekharan, Umashanthi Pavalanathan, Anirudh Srinivasan, Adam Glynn, Jacob Eisenstein, and Eric Gilbert. 2017. You can’t stay here: The efficacy of reddit’s 2015 ban examined through hate speech. CSCW (2017).

[16]

Eshwar Chandrasekharan, Mattia Samory, Anirudh Srinivasan, and Eric Gilbert. 2017. The bag of communities: Identifying abusive behavior online with preexisting Internet data. In CHI.

[17]

Despoina Chatzakou, Nicolas Kourtellis, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, and Athena Vakali. 2017. Mean birds: Detecting aggression and bullying on twitter. In WebSci.

[18]

Stephen Collinson. 2016. It’s official: Trump is Republican nominee. http://cnn.it/2a6ytZN.

[19]

Thomas Davidson, Dana Warmsley, Michael Macy, and Ingmar Weber. 2017. Automated Hate Speech Detection and the Problem of Offensive Language. In ICWSM.

[20]

Nicholas Diakopoulos and Mor Naaman. 2011. Towards quality discourse in online news comments. In CSCW.

[21]

Nemanja Djuric, Jing Zhou, Robin Morris, Mihajlo Grbovic, Vladan Radosavljevic, and Narayan Bhamidipati. 2015. Hate Speech Detection with Comment Embeddings. In WWW.

[22]

Mai ElSherief, Vivek Kulkarni, Dana Nguyen, William Yang Wang, and Elizabeth M. Belding-Royer. 2018. Hate Lingo: A Target-based Linguistic Analysis of Hate Speech in Social Media. In ICWSM.

[23]

Mai ElSherief, Shirin Nilizadeh, Dana Nguyen, Giovanni Vigna, and Elizabeth M. Belding-Royer. 2018. Peer to Peer Hate: Hate Speech Instigators and Their Targets. In ICWSM.

[24]

Karmen Erjavec and Melita Poler Kovačič. 2012. “You Don’t Understand, This is a New War!” Analysis of Hate Speech in News Web Sites’ Comments. Mass Communication and Society(2012).

[25]

Claudia Flores-Saviaga, Brian C Keegan, and Saiph Savage. 2018. Mobilizing the Trump Train: Understanding Collective Action in a Political Trolling Community. In ICWSM.

[26]

Antigoni-Maria Founta, Despoina Chatzakou, Nicolas Kourtellis, Jeremy Blackburn, Athena Vakali, and Ilias Leontiadis. 2019. A Unified Deep Learning Architecture for Abuse Detection. WebSci.

[27]

Fox News. 2016. Congress passes bill letting 9/11 victims sue Saudi Arabia, in face of veto threat. http://fxn.ws/2cKQFuW.

[28]

Fox News. 2018. Intel report says Putin ordered campaign to influence US election. http://fxn.ws/2jjHnt0.

[29]

Björn Gambäck and Utpal Kumar Sikdar. 2017. Using Convolutional Neural Networks to Classify Hate-Speech. In Workshop on Abusive Language Online.

[30]

Lei Gao and Ruihong Huang. 2017. Detecting Online Hate Speech Using Context Aware Models. In RANLP.

[31]

Lei Gao, Alexis Kuppersmith, and Ruihong Huang. 2017. Recognizing Explicit and Implicit Hate Speech Using a Weakly Supervised Two-path Bootstrapping Approach. In IJCNLP.

[32]

Emanuella Gringberg and Eric Levenson. 2018. At least 17 dead in Florida school shooting, law enforcement says. https://edition.cnn.com/2018/02/14/us/florida-high-school-shooting/index.html.

[33]

Summer Harlow. 2015. Story-chatterers stirring up hate: Racist discourse in reader comments on US newspaper websites. Howard Journal of Communications(2015).

[34]

Barney Henderson. 2016. Donald Trump and Hillary Clinton to clash in Las Vegas “Fight Night” debate: US election briefing and polls. https://bit.ly/3cDVKQu.

[35]

Gabriel Emile Hine, Jeremiah Onaolapo, Emiliano De Cristofaro, Nicolas Kourtellis, Ilias Leontiadis, Riginos Samaras, Gianluca Stringhini, and Jeremy Blackburn. 2017. Kek, Cucks, and God Emperor Trump: A Measurement Study of 4chan’s Politically Incorrect Forum and its Effects on the Web. ICWSM.

[36]

Steve Holland and Emily Stephenson. 2017. Trump, now president, pledges to put “America First” in nationalist speech. http://reut.rs/2iQMMmK.

[37]

Matthew W Hughey and Jessie Daniels. 2013. Racist comments at online news sites: a methodological dilemma for discourse analysis. Media, Culture & Society(2013).

[38]

Shagun Jhaver, Sucheta Ghoshal, Amy Bruckman, and Eric Gilbert. 2018. Online harassment and content moderation: The case of blocklists. TOCHI (2018).

Digital Library

[39]

Rebecca Killick, Paul Fearnhead, and Idris A Eckley. 2012. Optimal detection of changepoints with a linear computational cost. J. Amer. Statist. Assoc.(2012).

[40]

Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a social network or a news media?. In WWW.

[41]

Irene Kwok and Yuzhou Wang. 2013. Locate the Hate: Detecting Tweets against Blacks. In AAAI.

[42]

Annabelle Lukin. 2013. Journalism, ideology and linguistics: The paradox of Chomsky’s linguistic legacy and his ’propaganda model’. Journalism (2013).

[43]

Elaine Ly and Angela Dewan. 2016. Thousands say “No” to Brexit in colorful protest. http://cnn.it/29drqiT.

[44]

Enrico Mariconti, Guillermo Suarez-Tangil, Jeremy Blackburn, Emiliano De Cristofaro, Nicolas Kourtellis, Ilias Leontiadis, Jordi Luque Serrano, and Gianluca Stringhini. 2019. “You Know What to Do”: Proactive Detection of YouTube Videos Targeted by Coordinated Hate Attacks. In CSCW.

[45]

Jonathan Martin and Amy Chozick. 2016. Hillary Clinton’s Doctor Says Pneumonia Led to Abrupt Exit From 9/11 Event. https://nyti.ms/2cFiCkr.

[46]

Katherine C McAdams. 1984. Psycholinguistics explains many journalism caveats. The Journalism Educator(1984).

[47]

Mainack Mondal, Leandro Araújo Silva, and Fabrício Benevenuto. 2017. A Measurement Study of Hate Speech in Social Media. In HT.

[48]

Mainack Mondal, Leandro Araújo Silva, and Fabrício Benevenuto. 2017. A measurement study of hate speech in social media. In HT.

[49]

Nytimes. 2017. Multiple Weapons Found in Las Vegas Gunman’s Hotel Room. https://nyti.ms/2fKkQ8p.

[50]

Alexandra Olteanu, Carlos Castillo, Jeremy Boy, and Kush R Varshney. 2018. The effect of extremist violence on hateful speech online. In ICWSM.

[51]

John Pavlopoulos, Prodromos Malakasiotis, and Ion Androutsopoulos. 2017. Deep learning for user comment moderation. In ACL.

[52]

James W. Pennebaker, Ryan L. Boyd, Kayla Jordan, and Kate Blackburn. 2015. The Development and Psychometric Properties of LIWC2015.

[53]

[53] Perspective API.2018. https://www.perspectiveapi.com/.

[54]

Haji Mohammad Saleem, Kelly P. Dillon, Susan Benesch, and Derek Ruths. 2017. A Web of Hate: Tackling Hateful Speech in Online Social Spaces. CoRR (2017).

[55]

Joan Serra, Ilias Leontiadis, Dimitris Spathis, Gianluca Stringhini, Jeremy Blackburn, and Athena Vakali. 2017. Class-based Prediction Errors to Detect Hate Speech with Out-of-vocabulary Words.

[56]

Rebecca Shabad. 2016. Second presidential debate 2016: What time, how to watch and live stream online. https://cbsn.ws/2S0a4eh.

[57]

David Sherfinski. 2016. Kellyanne Conway selected as Donald Trump’s counselor. https://go.shr.lc/2TOpkcv.

[58]

Leandro Araújo Silva, Mainack Mondal, Denzil Correa, Fabrício Benevenuto, and Ingmar Weber. 2016. Analyzing the Targets of Hate in Online Social Media. In ICWSM.

[59]

Tom De Smedt, Guy De Pauw, and Pieter Van Ostaeyen. 2018. Automatic Detection of Online Jihadist Hate Speech. CoRR (2018).

[60]

Hawes Spencer. 2017. A Far-Right Gathering Bursts Into Brawls. https://nyti.ms/2uTmIgV.

[61]

Manos Tsagkias, Wouter Weerkamp, and Maarten De Rijke. 2010. News comments: Exploring, modeling, and online prediction. In ECIR.

[62]

Tom Van Hout. 2015. Between text and social practice: Balancing linguistics and ethnography in journalism studies. In Linguistic Ethnography.

[63]

Fabio Del Vigna, Andrea Cimino, Felice Dell’Orletta, Marinella Petrocchi, and Maurizio Tesconi. 2017. Hate Me, Hate Me Not: Hate Speech Detection on Facebook. In ITASEC.

[64]

William Warner and Julia Hirschberg. 2012. Detecting Hate Speech on the World Wide Web. In Workshop on Language in Social Media.

[65]

Eli Watkins. 2018. Trump taunts North Korea: My nuclear button is “much bigger,” “more powerful”. http://cnn.it/2A7Q4e5.

[66]

Savvas Zannettou, Barry Bradlyn, Emiliano De Cristofaro, Haewoon Kwak, Michael Sirivianos, Gianluca Stringini, and Jeremy Blackburn. 2018. What is Gab: A Bastion of Free Speech or an Alt-Right Echo Chamber. In WWW Companion.

[67]

Savvas Zannettou, Tristan Caulfield, Jeremy Blackburn, Emiliano De Cristofaro, Michael Sirivianos, Gianluca Stringhini, and Guillermo Suarez-Tangil. 2018. On the Origins of Memes by Means of Fringe Web Communities. In IMC.

[68]

Savvas Zannettou, Tristan Caulfield, Emiliano De Cristofaro, Nicolas Kourtellis, Ilias Leontiadis, Michael Sirivianos, Gianluca Stringhini, and Jeremy Blackburn. 2017. The Web Centipede: Understanding How Web Communities Influence Each Other Through the Lens of Mainstream and Alternative News Sources. In IMC.

[69]

Savvas Zannettou, Joel Finkelstein, Barry Bradlyn, and Jeremy Blackburn. 2020. A Quantitative Approach to Understanding Online Antisemitism. ICWSM.

Cited By

Suganda DYuliawati SRachmat ASuparman T(2025)Exploring the negative criticism of readers' comments on online news on the construction of a new capital city of Indonesia (IKN)Cogent Arts & Humanities10.1080/23311983.2025.245088812:1Online publication date: 11-Jan-2025
https://doi.org/10.1080/23311983.2025.2450888
Sanderson Z(2024)Beyond Competition: Designing Data Portability to Support Research on the Digital Information EnvironmentSSRN Electronic Journal10.2139/ssrn.4739362Online publication date: 2024
https://doi.org/10.2139/ssrn.4739362
Kobellarz JBrocic MSilver DSilva T(2024)Bubble reachers and uncivil discourse in polarized online public spherePLOS ONE10.1371/journal.pone.030456419:6(e0304564)Online publication date: 20-Jun-2024
https://doi.org/10.1371/journal.pone.0304564
Show More Cited By

Recommendations

Fake news and hate speech: who is to blame?: Study of the perceptions of Spanish citizens about the actors responsible for the production and spread of fake news and hate speech
TEEM'21: Ninth International Conference on Technological Ecosystems for Enhancing Multiculturality (TEEM'21)

Fake news and hate speech are among the biggest challenges of online communication. A survey to 421 Spanish citizens tries to discover who they consider responsible for the production and spread of these phenomena. In general, politicians and radical ...
Spread of Hate Speech in Online Social Media
WebSci '19: Proceedings of the 10th ACM Conference on Web Science

Hate speech is considered to be one of the major issues currently plaguing the online social media. With online hate speech culminating in gruesome scenarios like the Rohingya genocide in Myanmar, anti-Muslim mob violence in Sri Lanka, and the ...
The Virality of Hate Speech on Social Media
CSCW

Online hate speech is responsible for violent attacks such as, e.g., the Pittsburgh synagogue shooting in 2018, thereby posing a significant threat to vulnerable groups and society in general. However, little is known about what makes hate speech on ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WebSci '20: Proceedings of the 12th ACM Conference on Web Science

July 2020

361 pages

ISBN:9781450379892

DOI:10.1145/3394231

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 July 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Science Foundation

Conference

WebSci '20

Sponsor:

SIGWEB

WebSci '20: 12th ACM Conference on Web Science

July 6 - 10, 2020

Southampton, United Kingdom

Acceptance Rates

Overall Acceptance Rate 245 of 933 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

33
Total Citations
View Citations
2,201
Total Downloads

Downloads (Last 12 months)451
Downloads (Last 6 weeks)37

Reflects downloads up to 24 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Suganda DYuliawati SRachmat ASuparman T(2025)Exploring the negative criticism of readers' comments on online news on the construction of a new capital city of Indonesia (IKN)Cogent Arts & Humanities10.1080/23311983.2025.245088812:1Online publication date: 11-Jan-2025
https://doi.org/10.1080/23311983.2025.2450888
Sanderson Z(2024)Beyond Competition: Designing Data Portability to Support Research on the Digital Information EnvironmentSSRN Electronic Journal10.2139/ssrn.4739362Online publication date: 2024
https://doi.org/10.2139/ssrn.4739362
Kobellarz JBrocic MSilver DSilva T(2024)Bubble reachers and uncivil discourse in polarized online public spherePLOS ONE10.1371/journal.pone.030456419:6(e0304564)Online publication date: 20-Jun-2024
https://doi.org/10.1371/journal.pone.0304564
Dobber THameleers M(2024)The Social Media Comment Section as an Unruly Public Arena: How Comment Reading Erodes Trust in News MediaElectronic News10.1177/1931243124126801119:1(3-18)Online publication date: 28-Aug-2024
https://doi.org/10.1177/19312431241268011
Ling CStringhini G(2024)"It was jerks on the Internet being jerks on the Internet": Understanding Zoombombing Through the Eyes of Its VictimsProceedings of the 2024 European Symposium on Usable Security10.1145/3688459.3688466(261-276)Online publication date: 30-Sep-2024
https://dl.acm.org/doi/10.1145/3688459.3688466
Gunturi UKumar ADing XRho E(2024)Linguistically Differentiating Acts and Recalls of Racial Microaggressions on Social MediaProceedings of the ACM on Human-Computer Interaction10.1145/36373668:CSCW1(1-36)Online publication date: 26-Apr-2024
https://dl.acm.org/doi/10.1145/3637366
Efstratiou AChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Deliberate Exposure to Opposing Views and Its Association with Behavior and Rewards on Political CommunitiesProceedings of the ACM Web Conference 202410.1145/3589334.3645375(2347-2358)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645375
He XZannettou SShen YZhang Y(2024)You Only Prompt Once: On the Capabilities of Prompt Learning on Large Language Models to Tackle Toxic Content2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00061(770-787)Online publication date: 19-May-2024
https://doi.org/10.1109/SP54263.2024.00061
Vu AHutchings AAnderson R(2024)No Easy Way Out: the Effectiveness of Deplatforming an Extremist Forum to Suppress Hate and Harassment2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00007(717-734)Online publication date: 19-May-2024
https://doi.org/10.1109/SP54263.2024.00007
Franco ACunha ÍOliveira L(2024)Evaluation of deep neural network architectures for authorship obfuscation of Portuguese textsNatural Language Processing Journal10.1016/j.nlp.2024.1001079(100107)Online publication date: Dec-2024
https://doi.org/10.1016/j.nlp.2024.100107
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten