skip to main content
10.5555/2390374.2390377dlproceedingsArticle/Chapter ViewAbstractPublication PageslsmConference Proceedingsconference-collections
research-article
Free access

Detecting hate speech on the world wide web

Published: 07 June 2012 Publication History

Abstract

We present an approach to detecting hate speech in online text, where hate speech is defined as abusive speech targeting specific group characteristics, such as ethnic origin, religion, gender, or sexual orientation. While hate speech against any group may exhibit some common characteristics, we have observed that hatred against each different group is typically characterized by the use of a small set of high frequency stereotypical words; however, such words may be used in either a positive or a negative sense, making our task similar to that of words sense disambiguation. In this paper we describe our definition of hate speech, the collection and annotation of our hate speech corpus, and a mechanism for detecting some commonly used methods of evading common "dirty word" filters. We describe pilot classification experiments in which we classify anti-semitic speech reaching an accuracy 94%, precision of 68% and recall at 60%, for an F1 measure of. 6375.

References

[1]
{Choi et al 2005} Yejin Choi, Claire Cardie, Ellen Riloff, Siddharth Patwardhan, Identifying Sources of Opinions with Conditional Random Fields and Extraction Patterns. In HLT '05 Association for Computational Linguistics Stroudsburg, PA, USA, pp. 355--362, 2005
[2]
{Yarowsky 1994} David Yarowsky, Decision Lists for Lexical Ambiguity Resolution: Application to Accent Restoration in Spanish and French. In ACL-94, Stroudsburg, PA, pp. 88--95, 1994
[3]
{Yarowsky 1995} David Yarowsky, Unsupervised Word Sense Disambiguation Rivaling Supervised Methods. In ACL-95, Cambridge, MA, pp. 189--196, 1995.
[4]
{Nockleby 2000} John T. Nockleby, Hate Speech. In Encyclopedia of the American Constitution (2nd ed., edited by Leonard W. Levy, Kenneth L. Karst et al., New York: Macmillan, 2000), pp. 1277--1279 (see http://www.jiffynotes.com/a_study_guides/book_notes/eamc_03/eamc_03_01193.html)
[5]
{Stephens-Davidowitz 2011} Seth Stephens-Davidowitz, The Effects of Racial Animus on Voting: Evidence Using Google Search Data http://www.people.fas.harvard.edu/~sstephen/papers/RacialAnimusAndVotingSethStephensDavidowitz.pdf
[6]
{McDonald et al 2007} McDonald, R. Hannan, K. Neylon, T. Wells, M. Reynar, J. Structured Models for Fine-to-Coarse Sentiment Analysis. In ANNUAL MEETING- ASSOCIATION FOR COMPUTATIONAL LINGUISTICS 2007, CONF 45; VOL 1, pages 432--439
[7]
{Pang and Lee 2008} Pang, Bo and Lee, Lillian, Opinion Mining and Sentiment Analysis. In Foundations and Trends in Information Retrieval, issue 1--2, vol. 2, Now Publishers Inc., Hanover, MA, USA, 2008 pp. 1--135
[8]
{Pang, Lee and Vaithyanathan 2002} Pang, Bo and Lee, Lillian and Vaithyanathan, Shivakumar Thumbs up?: sentiment classification using machine learning techniques. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10, Association for Computational Linguistics, Stroudsburg, PA, USA, 2002 pp. 79--86
[9]
{Qiu et al 2009} Qiu, Guang and Liu, Bing and Bu, Jiajun and Chen, Chun Expanding domain sentiment lexicon through double propagation. In Proceedings of the 21st international jont conference on Artificial intelligence, Morgan Kaufmann Publishers Inc. San Francisco, CA, USA, 2009 pp. 1199--1204
[10]
{Joachims 1999} Making large-Scale SVM Learning Practical. Advances in Kernel Methods - Support Vector Learning, B. Schlkopf and C. Burges and A. Smola (ed.), MIT-Press, 1999.
[11]
{Koo, Carreras and Collins 2008} Simple Semi-supervised Dependency Parsing In Proc. ACL/HLT 2008
[12]
{Xu and Zhu 2010} Filtering Offensive Language in Online Communities using Grammatical Relations
[13]
{A Razavi, Diana Inkpen, Sasha Uritsky, Stan Matwin 2010} Offensive Language Detection Using Multi-level Classification In Advances in Artificial Intelligence Springer, 2010, pp. 1627
[14]
{Attenberg and Provost 2010} Why Label When You Can Search?: Alternatives to active learning for applying human resources to build classification models under extreme class imbalance, KDD 2010
[15]
{Lipscombe, Venditti and Hirschberg 2003} Classifying Subject Ratings of Emotional Speech Using Acoustic Features. In Proceedings of Eurospeech 2003, Geneva.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
LSM '12: Proceedings of the Second Workshop on Language in Social Media
June 2012
88 pages

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 07 June 2012

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)97
  • Downloads (Last 6 weeks)18
Reflects downloads up to 09 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)PositionProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694580(60644-60673)Online publication date: 21-Jul-2024
  • (2024)MultiHateClip: A Multilingual Benchmark Dataset for Hateful Video Detection on YouTube and BilibiliProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681521(7493-7502)Online publication date: 28-Oct-2024
  • (2024)Fusion Network for Multimodal Hate Speech DetectionProceedings of the 2024 9th International Conference on Intelligent Information Technology10.1145/3654522.3654562(1-1)Online publication date: 23-Feb-2024
  • (2024)Beyond Initial Removal: Lasting Impacts of Discriminatory Content Moderation to Marginalized Creators on InstagramProceedings of the ACM on Human-Computer Interaction10.1145/36373008:CSCW1(1-28)Online publication date: 26-Apr-2024
  • (2024)Analysing the Spread of Toxicity on TwitterProceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)10.1145/3632410.3632436(118-126)Online publication date: 4-Jan-2024
  • (2023)Diverse Perspectives Can Mitigate Political Bias in Crowdsourced Content ModerationProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency10.1145/3593013.3594080(1280-1291)Online publication date: 12-Jun-2023
  • (2023)Detection of Hateful Social Media Content for Arabic LanguageACM Transactions on Asian and Low-Resource Language Information Processing10.1145/359279222:9(1-26)Online publication date: 22-Sep-2023
  • (2023)Detection of Offensive Language and ITS Severity for Low Resource LanguageACM Transactions on Asian and Low-Resource Language Information Processing10.1145/358047622:6(1-27)Online publication date: 19-Jan-2023
  • (2023)Revisiting Hate Speech Benchmarks: From Data Curation to System DeploymentProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599896(4333-4345)Online publication date: 6-Aug-2023
  • (2023)From Yellow Peril to Model Minority: Asian stereotypes in social media during the COVID-19 pandemicProceedings of the 15th ACM Web Science Conference 202310.1145/3578503.3583614(283-291)Online publication date: 30-Apr-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media