skip to main content
10.1145/3126858.3131576acmotherconferencesArticle/Chapter ViewAbstractPublication PageswebmediaConference Proceedingsconference-collections
short-paper

Detecting Hate, Offensive, and Regular Speech in Short Comments

Published: 17 October 2017 Publication History

Abstract

The freedom of expression provided by the Internet also favors malicious groups that propagate contents of hate, recruit new members, and threaten users. In this context, we propose a new approach for hate speech identification based on Information Theory quantifiers (entropy and divergence) to represent documents. As a differential of our approach, we capture weighted information of words, rather than just their frequency in documents. The results show that our approach overperforms techniques that use data representation, such as TF-IDF and unigrams combined to text classifiers, achieving an F1-score of 86%, 84% e 96% for classifying hate, offensive, and regular speech classes, respectively. Compared to the baselines, our proposal is a win-win solution that improves efficacy (F1-score) and efficiency (by reducing the dimension of the feature vector). The proposed solution is up to 2.27 times faster than the baseline.

References

[1]
Swati Agarwal and Ashish Sureka. 2015. Applying social media intelligence for predicting and identifying on-line radicalization and civil unrest oriented threats. arXiv preprint arXiv:1511.06858 (2015).
[2]
Ricardo Baeza-Yates and Berthier Ribeiro-Neto. 2013. Recuperação de Informação-: Conceitos e Tecnologia das Máquinas de Busca. Bookman Editora. 328--329 pages.
[3]
Michael Chau and Jennifer Xu. 2007. Mining communities and their relationships in blogs: A study of online hate groups. Int'l Journal of Human-Computer Studies 65 (2007), 57--70.
[4]
Hsinchun Chen, Sven Thoms, and Tianjun Fu. 2008. Cyber extremism in Web 2.0: An exploratory study of international Jihadist groups. In Proc. of the Int'l Conf. on Intelligence and Security Informatics. 98--103.
[5]
Raphael Cohen-Almagor. 2011. Fighting hate and bigotry on the Internet. Policy & Internet 3 (2011), 1--26.
[6]
Thomas Davidson, Dana Warmsley, Michael Macy, and Ingmar Weber. 2017. Automated Hate Speech Detection and the Problem of Offensive Language. In Proc. of the 11th Int'l AAAI Conf. on Web and Social Media. 512--515.
[7]
James Hawdon. 2012. Applying differential association theory to online hate groups: a theoretical statement. 5 (2012), 39--47.
[8]
David Mimno, Hanna Wallach, Jason Naradowsky, David Smith, and Andrew McCallum. 2009. Polylingual topic models. In Proc. of the Conf. on Empirical Methods in Natural Language Processing. 880--889.
[9]
Mainack Mondal, Leandro Araújo Silva, and Fabrício Benevenuto. 2017. A Measurement Study of Hate Speech in Social Media. In Proc. of the 28th Conf. on Hypertext and Social Media. 85--94.
[10]
Felipe Moraes, Marisa Vasconcelos, Patrick Prado, Jussara Almeida, and Marcos Gonçalves. 2013. Polarity analysis of micro reviews in foursquare. In Proc. of the 19th Brazilian Symposium on Multimedia and the Web. 113--120.
[11]
Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up?: sentiment classification using machine learning techniques. In Proc. of the ACL-02 Conf. on Empirical Methods in Natural Language Processing. 79--86.
[12]
Rogers Pelle and Viviane Moreira. 2017. Offensive Comments in the Brazilian Web: a dataset and baselines results. In Proc. of the 6th Brazilian Workshop on Social Network Analysis and Mining. 1--160.
[13]
Juliana Postal and Eduardo Nakamura. 2017. Utilizando Teoria da Informação para Identificar Conversas de Pedofilia em Redes Sociais de Mensagens Instantâneas. In Proc. of the 14th Simpósio Brasileiro de Sistemas Colaborativos. 1--345.
[14]
Bruno A. Souza, Thais Almeida, Alice Menezes, Fabíola Nakamura, Carlos Figueiredo, and Eduardo Nakamura. 2016. For or Against?: Polarity Analysis in Tweets about Impeachment Process of Brazil President. In Proc. of the 22nd Brazilian Symposium on Multimedia and the Web. 335--338.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
WebMedia '17: Proceedings of the 23rd Brazillian Symposium on Multimedia and the Web
October 2017
522 pages
ISBN:9781450350969
DOI:10.1145/3126858
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • SBC: Brazilian Computer Society
  • CNPq: Conselho Nacional de Desenvolvimento Cientifico e Tecn
  • CGIBR: Comite Gestor da Internet no Brazil
  • CAPES: Brazilian Higher Education Funding Council

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. hate speech
  2. social networks
  3. supervised learning

Qualifiers

  • Short-paper

Conference

Webmedia '17
Sponsor:
  • SBC
  • CNPq
  • CGIBR
  • CAPES
Webmedia '17: Brazilian Symposium on Multimedia and the Web
October 17 - 20, 2017
RS, Gramado, Brazil

Acceptance Rates

WebMedia '17 Paper Acceptance Rate 38 of 138 submissions, 28%;
Overall Acceptance Rate 270 of 873 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)1
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media