skip to main content
10.1145/3323503.3360619acmotherconferencesArticle/Chapter ViewAbstractPublication PageswebmediaConference Proceedingsconference-collections
short-paper

Hate speech detection using brazilian imageboards

Published: 29 October 2019 Publication History

Abstract

With the changes in human interaction prompted by the development of communications platforms over the internet, hate speech and offensive language emerged as a contemporary problem. Social networks allow users with different opinions and backgrounds to interact without direct eye-to-eye contact. It brings a sense of safety to promote hate speech, which is even more significant in anonymous environments. There are sites called imageboards, composed of different boards aggregating different topics. On some boards, anonymous users widely promote hate speech. However, only a few works in literature have focused on hate speech in imageboards content. This work aims to classify Brazilian Portuguese texts to detect hate speech, using data from the Brazilian 55chan imageboard to build a dataset with hate speech content. Three classifiers were trained to hate speech binary classification. The Linear Support Vector Classifier achieved the best result with 0.955 of F1-score.

References

[1]
Rakesh Agrawal, Roberto Bayardo, and Ramakrishnan Srikant. 2000. Athena: Mining-based interactive management of text databases. In International Conference on Extending Database Technology. Springer, Berlin, Heidelberg, 365--379.
[2]
Ika Alfina, Rio Mulia, Mohamad Ivan Fanany, and Yudo Ekanata. 2017. Hate speech detection in the Indonesian language: A dataset and preliminary study. In 2017 International Conference on Advanced Computer Science and Information Systems (ICACSIS). IEEE, Bali, Indonesia, 233--238.
[3]
Thais G Almeida, Bruno À Souza, Fabíola G Nakamura, and Eduardo F Nakamura. 2017. Detecting Hate, Offensive, and Regular Speech in Short Comments. In Proceedings of the 23rd Brazilian Symposium on Multimedia and the Web. SBC, Gramado, Brazil, 225--228.
[4]
Carlos Argueta, Fernando H Calderon, and Yi-Shin Chen. 2016. Multilingual emotion classifier using unsupervised pattern extraction from microblog data. Intelligent Data Analysis 20, 6 (2016), 1477--1502.
[5]
Leo Breiman. 2001. Random forests. Machine learning 45, 1 (2001), 5--32.
[6]
Flavio Carvalho, Rafael Guimarães Rodrigues, Gabriel dos Santos, Pedro Cruz, Lilian Ferrari, and Gustavo Paiva Guedes. 2019. Evaluating the 2015 Brazilian Portuguese LIWC Lexicon with sentiment analysis in social networks. In CSBC 2019 - 8th BraSNAM. SBC, Belém, Brazil, 24--34.
[7]
Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine learning 20, 3 (1995), 273--297.
[8]
Douglas Crockford. 2006. The application/json media type for javascript object notation (json).
[9]
Fernando Fontanella. 2010. Nós somos anonymous: anonimato, trolls e a subcultura dos imageboards.
[10]
Florian Heimerl, Steffen Lohmann, Simon Lange, and Thomas Ertl. 2014. Word cloud explorer: Text analytics based on word clouds. In HICSS '14: Proceedings of the 2014 47th Hawaii International Conference on System Sciences. IEEE Computer Society, Washington, DC, USA, 1833--1842.
[11]
Gabriel Emile Hine, Jeremiah Onaolapo, Emiliano De Cristofaro, Nicolas Kourtellis, Ilias Leontiadis, Riginos Samaras, Gianluca Stringhini, and Jeremy Blackburn. 2017. Kek, cucks, and god emperor Trump: A measurement study of 4chan's politically incorrect forum and its effects on the web. In International AAAI Conference on Web and Social Media. AAAI, North America.
[12]
Clayton J Hutto and Eric Gilbert. 2014. Vader: A parsimonious rule-based model for sentiment analysis of social media text.
[13]
Dillon Ludemann. 2018. /pol/emics: Ambiguity, scales, and digital discourse on 4chan. Discourse, Context & Media 24 (2018), 92--98.
[14]
Andrew McCallum, Kamal Nigam, et al. 1998. A comparison of event models for naive bayes text classification. In AAAI-98 Workshop on Learning for Text categorization, Vol. 752. Citeseer, California, 41--48.
[15]
Prem Melville, Wojciech Gryc, and Richard D Lawrence. 2009. Sentiment analysis of blogs by combining lexical knowledge with text classification. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, Paris, France, 1275--1284.
[16]
Alexandros Mittos, Savvas Zannettou, Jeremy Blackburn, and Emiliano De Cristofaro. 2019. "And We Will Fight For Our Race!'" A Measurement Study of Genetic Testing Conversations on Reddit and 4chan. (2019).
[17]
Angela Nagle. 2017. Kill all normies: Online culture wars from 4chan and Tumblr to Trump and the alt-right. John Hunt Publishing, UK.
[18]
Thais Mayumi Oshiro, Pedro Santoro Perez, and José Augusto Baranauskas. 2012. How many trees in a random forest?. In International Workshop on Machine Learning and Data Mining in Pattern Recognition. Springer, Berlin, Germany, 154--168.
[19]
James W Pennebaker, Ryan L Boyd, Kayla Jordan, and Kate Blackburn. 2015. The development and psychometric properties of LIWC2015. Technical Report. University of Texas, Austin, TX, EUA.
[20]
Juan Ramos et al. 2003. Using TF-IDF to determine word relevance in document queries., 133--142 pages.
[21]
Julio CS Reis, Pollyanna Gonçalves, Matheus Araújo, Adriano CM Pereira, and Fabrıcio Benevenuto. 2015. Uma abordagem multilıngue para análise de sentimentos. In IV Brazilian Workshop on Social Network Analysis and Mining (BraSNAM 2015). SBC, Porto Alegre, RS, Brasil.
[22]
Axel Rodríguez, Carlos Argueta, and Yi-Ling Chen. 2019. Automatic Detection of Hate Speech on Facebook Using Sentiment and Emotion Analysis. In 2019 International Conference on Artificial Intelligence in Information and Communication (ICAIIC). IEEE, Okinawa, Japan, 169--174.
[23]
Anna Schmidt and Michael Wiegand. 2017. A Survey on Hate Speech Detection using Natural Language Processing. In Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media. Association for Computational Linguistics, Valencia, Spain, 1--10.
[24]
Anna Stavrianou, Periklis Andritsos, and Nicolas Nicoloyannis. 2007. Overview and semantic issues of text mining. ACM Sigmod Record 36, 3 (2007), 23--34.
[25]
John Suler. 2004. The Online Disinhibition Effect. CyberPsychology & Behavior 7, 3 (June 2004), 321--326.
[26]
H. Watanabe, M. Bouazizi, and T. Ohtsuki. 2018. Hate Speech on Twitter: A Pragmatic Approach to Collect Hateful and Offensive Expressions and Perform Hate Speech Detection. IEEE Access 6 (2018), 13825--13835.
[27]
Savvas Zannettou, Barry Bradlyn, Emiliano De Cristofaro, Haewoon Kwak, Michael Sirivianos, Gianluca Stringini, and Jeremy Blackburn. 2018. What is gab: A bastion of free speech or an alt-right echo chamber. In Companion of the The Web Conference 2018 on The Web Conference 2018. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 1007--1014.
[28]
Savvas Zannettou, Tristan Caulfield, Emiliano De Cristofaro, Nicolas Kourtelris, Ilias Leontiadis, Michael Sirivianos, Gianluca Stringhini, and Jeremy Blackburn. 2017. The Web Centipede: Understanding How Web Communities Influence Each Other Through the Lens of Mainstream and Alternative News Sources. In Proceedings of the 2017 Internet Measurement Conference (IMC '17). ACM, New York, NY, USA, 405--417.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
WebMedia '19: Proceedings of the 25th Brazillian Symposium on Multimedia and the Web
October 2019
537 pages
ISBN:9781450367639
DOI:10.1145/3323503
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 October 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. hate speech detection
  2. imageboards
  3. text mining

Qualifiers

  • Short-paper

Funding Sources

  • Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
  • Fundação Carlos Chagas Filho de Amparo à Pesquisa do Estado do Rio de Janeiro
  • Conselho Nacional de Desenvolvimento Científico e Tecnológico

Conference

WebMedia '19
WebMedia '19: Brazilian Symposium on Multimedia and the Web
October 29 - November 1, 2019
Rio de Janeiro, Brazil

Acceptance Rates

Overall Acceptance Rate 270 of 873 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)20
  • Downloads (Last 6 weeks)6
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media