other

Hate Speech Detection with Comment Embeddings

Authors:

Vladan Radosavljevic,

Narayan BhamidipatiAuthors Info & Claims

WWW '15 Companion: Proceedings of the 24th International Conference on World Wide Web

Pages 29 - 30

https://doi.org/10.1145/2740908.2742760

Published: 18 May 2015 Publication History

Get Access

Abstract

We address the problem of hate speech detection in online user comments. Hate speech, defined as an "abusive speech targeting specific group characteristics, such as ethnicity, religion, or gender", is an important problem plaguing websites that allow users to leave feedback, having a negative impact on their online business and overall user experience. We propose to learn distributed low-dimensional representations of comments using recently proposed neural language models, that can then be fed as inputs to a classification algorithm. Our approach addresses issues of high-dimensionality and sparsity that impact the current state-of-the-art, resulting in highly efficient and effective hate speech detectors.

References

[1]

P. Burnap and M. Williams. Hate speech, machine classification and statistical modelling of information flows on Twitter: Interpretation and communication for policy decision making. In IPP, 2014.

Google Scholar

[2]

I. Kwok and Y. Wang. Locate the hate: Detecting tweets against blacks. In AAAI, 2013.

Digital Library

Google Scholar

[3]

Q. V. Le and T. Mikolov. Distributed representations of sentences and documents. arXiv:1405.4053, 2014.

Google Scholar

[4]

T. M. Massaro. Equality and freedom of expression: The hate speech dilemma. Wm. & Mary L. Rev., 32:211, 1990.

Google Scholar

[5]

B. Pang and L. Lee. Opinion mining and sentiment analysis. Foundations and trends in information retrieval, 2(1--2):1--135, 2008.

Digital Library

Google Scholar

[6]

W. Warner and J. Hirschberg. Detecting hate speech on the World Wide Web. In Workshop on Language in Social Media at ACL, pages 19--26, 2012.

Digital Library

Google Scholar

[7]

Z. Xu and S. Zhu. Filtering offensive language in online communities using grammatical relations. In Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference, 2010.

Google Scholar

Cited By

View all

Ojha NVaish A(2024)Hate Speech in Indian Cyber Space at the Intersection of Law and TechnologyIntersections Between Rights and Technology10.4018/979-8-3693-1127-1.ch009(156-191)Online publication date: 10-Jul-2024
https://doi.org/10.4018/979-8-3693-1127-1.ch009
Grotti L(2024)Italian Linguistic Features for Toxic Language Detection in Social MediaItalian Journal of Computational Linguistics10.4000/125no10:1Online publication date: 2024
https://doi.org/10.4000/125no
Bonechi S(2024)Development of an Automated Moderator for Deliberative EventsElectronics10.3390/electronics1303054413:3(544)Online publication date: 29-Jan-2024
https://doi.org/10.3390/electronics13030544
Show More Cited By

Index Terms

Hate Speech Detection with Comment Embeddings
1. Applied computing
  1. Document management and text processing
2. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

DeepHate: Hate Speech Detection via Multi-Faceted Text Representations
WebSci '20: Proceedings of the 12th ACM Conference on Web Science

Online hate speech is an important issue that breaks the cohesiveness of online social communities and even raises public safety concerns in our societies. Motivated by this rising issue, researchers have developed many traditional machine learning and ...
Hate Speech Detection Using Static BERT Embeddings
Big Data Analytics
Abstract
With increasing popularity of social media platforms hate speech is emerging as a major concern, where it expresses abusive speech that targets specific group characteristics, such as gender, religion or ethnicity to spread violence. Earlier ...
Hate Speech Detection in Roman Urdu
Special issue on Deep Learning for Low-Resource Natural Language Processing, Part 1 and Regular Papers

Hate speech is a specific type of controversial content that is widely legislated as a crime that must be identified and blocked. However, due to the sheer volume and velocity of the Twitter data stream, hate speech detection cannot be performed ...

Reviews

Reviewer: Lalit P Saxena

Hate speech comments in online forums are a form of offensive language targeted at specific groups with an aim to dishonor. Hate speech is also considered as synonym to misinformation, smears, and social pollution. The unmonitored activities of online social communities and uncontrollable access to the Internet are proliferating hate speech in online comments. The authors propose a two-step method to address the issue of hate speech detection in online comments. The method comprises a continuous bag-of-words (BOW) neural language model and embeddings using paragraph-to-vector and a binary classifier for training, respectively. In the first step, the method uses hierarchical soft-max to reduce time complexity, which enables efficient training. In the second step, the method learns vector representations for processing through a linear regression classifier to distinguish between hate speech and clean comments. The authors collected 56,280 hate speech comments and 895,456 clean comments from 209,776 anonymous Yahoo Finance website users over six months. They claim that the vocabulary size of 304,427 is the largest dataset of hate speech comments available in the literature. The neural language model accepts a continuous feature vector of dimensionality of size 200 and the context for word sequences of length 5 for 5 iterative processing. The authors compared the proposed method with BOW (term frequency) and BOW (term frequency-inverse document frequency) and use the area under the curve to validate their results. The authors present insights on the proposed method in terms of reduced training time and less memory usage compared to other methods. They further propose that their method is a solution to the hate speech detection problem, alongside reducing high dimensionality and sparsity issues in online comments. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

WWW '15 Companion: Proceedings of the 24th International Conference on World Wide Web

May 2015

1602 pages

ISBN:9781450334730

DOI:10.1145/2740908

General Chairs:
Aldo Gangemi
National Research Council, Italy & Paris 13 University-CNRS, France
,
Stefano Leonardi
Sapienza University of Rome, Italy
,
Alessandro Panconesi
Sapienza University of Rome, Italy

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

In-Cooperation

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 May 2015

Check for updates

Author Tags

Qualifiers

Other

Conference

WWW '15

Sponsor:

IW3C2

WWW '15: 24th International World Wide Web Conference

May 18 - 22, 2015

Florence, Italy

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

363
Total Citations
View Citations
4,019
Total Downloads

Downloads (Last 12 months)147
Downloads (Last 6 weeks)23

Reflects downloads up to 23 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Ojha NVaish A(2024)Hate Speech in Indian Cyber Space at the Intersection of Law and TechnologyIntersections Between Rights and Technology10.4018/979-8-3693-1127-1.ch009(156-191)Online publication date: 10-Jul-2024
https://doi.org/10.4018/979-8-3693-1127-1.ch009
Grotti L(2024)Italian Linguistic Features for Toxic Language Detection in Social MediaItalian Journal of Computational Linguistics10.4000/125no10:1Online publication date: 2024
https://doi.org/10.4000/125no
Bonechi S(2024)Development of an Automated Moderator for Deliberative EventsElectronics10.3390/electronics1303054413:3(544)Online publication date: 29-Jan-2024
https://doi.org/10.3390/electronics13030544
Moreno-Sandoval LPomares-Quimbaya ABarbosa-Sierra SPantoja-Rojas L(2024)Detection of Hate Speech, Racism and Misogyny in Digital Social Networks: Colombian Case StudyBig Data and Cognitive Computing10.3390/bdcc80901138:9(113)Online publication date: 6-Sep-2024
https://doi.org/10.3390/bdcc8090113
Yadav AKhan FSingh V(2024)A Multi-Architecture Approach for Offensive Language Identification Combining Classical Natural Language Processing and BERT-Variant ModelsApplied Sciences10.3390/app14231120614:23(11206)Online publication date: 1-Dec-2024
https://doi.org/10.3390/app142311206
Alhazmi AMahmud RIdris NMohamed Abo MEke C(2024)Code-mixing unveiled: Enhancing the hate speech detection in Arabic dialect tweets using machine learning modelsPLOS ONE10.1371/journal.pone.030565719:7(e0305657)Online publication date: 17-Jul-2024
https://doi.org/10.1371/journal.pone.0305657
Badri NKboubi FHabacha Chaibi A(2024)Abusive and Hate speech Classification in Arabic Text Using Pre-trained Language Models and Data AugmentationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/367904923:11(1-28)Online publication date: 21-Nov-2024
https://dl.acm.org/doi/10.1145/3679049
Baruah AWahlang LJyrwa FShadap FBarbhuiya FDey K(2024)Abusive Language Detection in Khasi Social Media CommentsACM Transactions on Asian and Low-Resource Language Information Processing10.1145/3664285Online publication date: 14-May-2024
https://dl.acm.org/doi/10.1145/3664285
Li LFan LAtreja SHemphill L(2024)“HOT” ChatGPT: The Promise of ChatGPT in Detecting and Discriminating Hateful, Offensive, and Toxic Comments on Social MediaACM Transactions on the Web10.1145/364382918:2(1-36)Online publication date: 2-Feb-2024
https://dl.acm.org/doi/10.1145/3643829
Chanda SDhaka APal S(2024)Towards Safer Online Spaces: Deep Learning for Hate Speech Detection in Code-Mixed Social Media ConversationsCompanion Publication of the 16th ACM Web Science Conference10.1145/3630744.3663610(103-109)Online publication date: 21-May-2024
https://dl.acm.org/doi/10.1145/3630744.3663610
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

DeepHate: Hate Speech Detection via Multi-Faceted Text Representations

Hate Speech Detection Using Static BERT Embeddings

Hate Speech Detection in Roman Urdu

Reviews

Access critical reviews of Computing literature here