Detecting misogyny in Spanish tweets. An approach based on linguistics features and word embeddings
JA García-Díaz, M Cánovas-García… - Future Generation …, 2021 - Elsevier
Future Generation Computer Systems, 2021•Elsevier
Online social networks allow powerless people to gain enormous amounts of control over
particular people's lives and profit from the anonymity or social distance that the Internet
provides in order to harass other people. One of the most frequently targeted groups
comprise women, as misogyny is, unfortunately, a reality in our society. However, although
great efforts have recently been made to identify misogyny, it is still difficult to distinguish as
it can sometimes be very subtle and deep, signifying that the use of statistical approaches is …
particular people's lives and profit from the anonymity or social distance that the Internet
provides in order to harass other people. One of the most frequently targeted groups
comprise women, as misogyny is, unfortunately, a reality in our society. However, although
great efforts have recently been made to identify misogyny, it is still difficult to distinguish as
it can sometimes be very subtle and deep, signifying that the use of statistical approaches is …
Abstract
Online social networks allow powerless people to gain enormous amounts of control over particular people’s lives and profit from the anonymity or social distance that the Internet provides in order to harass other people. One of the most frequently targeted groups comprise women, as misogyny is, unfortunately, a reality in our society. However, although great efforts have recently been made to identify misogyny, it is still difficult to distinguish as it can sometimes be very subtle and deep, signifying that the use of statistical approaches is not sufficient. Moreover, as Spanish is spoken worldwide, context and cultural differences can complicate this identification. Our contribution to the detection of misogyny in Spanish is two-fold. On the one hand, we apply Sentiment Analysis and Social Computing technologies for detecting misogynous messages in Twitter. On the other, we have compiled the Spanish MisoCorpus-2020, a balanced corpus regarding misogyny in Spanish, and classified it into three subsets concerning (1) violence towards relevant women, (2) messages harassing women in Spanish from Spain and Spanish from Latin America, and (3) general traits related to misogyny. Our proposal combines a classification based on average word embeddings and linguistic features in order to understand which linguistic phenomena principally contribute to the identification of misogyny. We have evaluated our proposal with three machine-learning classifiers, achieving the best accuracy of 85.175%. Finally the proposed approach is also validated with existing corpora for misogyny and aggressiveness detection such as AMI and HatEval obtaining good results
Elsevier