GERMS-AT: A Sexism/Misogyny Dataset of Forum Comments from an Austrian Online Newspaper

Brigitte Krenn; Johann Petrak; Marina Kubina; Christian Burger

GERMS-AT: A Sexism/Misogyny Dataset of Forum Comments from an Austrian Online Newspaper

Brigitte Krenn, Johann Petrak, Marina Kubina, Christian Burger

Abstract

Brigitte Krenn, Johann Petrak, Marina Kubina, Christian Burger This paper presents a sexism/misogyny dataset extracted from comments of a large online forum of an Austrian newspaper. The comments are in Austrian German language, and in some cases interspersed with dialectal or English elements. We describe the data collection, the annotation guidelines and the annotation process resulting in a corpus of approximately 8 000 comments which were annotated with 5 levels of sexism/misogyny, ranging from 0 (not sexist/misogynist) to 4 (highly sexist/misogynist). The professional forum moderators (self-identified females and males) of the online newspaper were involved as experts in the creation of the annotation guidelines and the annotation of the user comments. In addition, we also describe first results of training transformer-based classification models for both binarized and original label classification of the corpus.

Anthology ID:: 2024.lrec-main.683
Volume:: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:: May
Year:: 2024
Address:: Torino, Italia
Editors:: Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:: LREC | COLING
SIG:
Publisher:: ELRA and ICCL
Note:
Pages:: 7728–7739
Language:
URL:: https://aclanthology.org/2024.lrec-main.683
DOI:
Bibkey:
Cite (ACL):: Brigitte Krenn, Johann Petrak, Marina Kubina, and Christian Burger. 2024. GERMS-AT: A Sexism/Misogyny Dataset of Forum Comments from an Austrian Online Newspaper. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 7728–7739, Torino, Italia. ELRA and ICCL.
Cite (Informal):: GERMS-AT: A Sexism/Misogyny Dataset of Forum Comments from an Austrian Online Newspaper (Krenn et al., LREC-COLING 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.lrec-main.683.pdf

PDF Cite Search