skip to main content
10.1145/3599696.3612900acmconferencesArticle/Chapter ViewAbstractPublication PageshtConference Proceedingsconference-collections
research-article

Sexism in Focus: An Annotated Dataset of YouTube Comments for Gender Bias Research

Published: 04 September 2023 Publication History

Abstract

This paper presents a novel dataset of 200k YouTube comments from 468 videos across 109 channels in four content categories: Entertainment, Gaming, People & Blogs, and Science & Technology. We applied state-of-the-art NLP methods to augment the dataset with sexism-related features such as sentiment, toxicity, offensiveness, and hate speech. These features can assist manual content analyses and enable automated analysis of sexism in online platforms. Furthermore, we develop an annotation framework inspired by the Ambivalent Sexism Theory to promote a nuanced understanding of how comments relate to the gender of content creators. We release a small sample of comments annotated using this framework. Our dataset analysis confirms that female content creators receive more sexist and hateful comments than their male counterparts, underscoring the need for further research and intervention in addressing online sexism.

References

[1]
Alan Akbik, Tanja Bergmann, Duncan Blythe, Kashif Rasul, Stefan Schweter, and Roland Vollgraf. 2019. FLAIR: An easy-to-use framework for state-of-the-art NLP. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics (demonstrations). 54–59.
[2]
Inoka Amarasekara and Will J Grant. 2019. Exploring the YouTube science communication gender gap: A sentiment analysis. Public Understanding of Science 28, 1 (2019), 68–84.
[3]
Maria Anzovino, Elisabetta Fersini, and Paolo Rosso. 2018. Automatic Identification and Classification of Misogynistic Language on Twitter. NLDB (2018). https://doi.org/10.1007/978-3-319-91947-8_6
[4]
Francesco Barbieri, Jose Camacho-Collados, Leonardo Neves, and Luis Espinosa-Anke. 2020. Tweeteval: Unified benchmark and comparative evaluation for tweet classification. arXiv preprint arXiv:2010.12421 (2020).
[5]
Thales Bertaglia, Andreea Grigoriu, Michel Dumontier, and Gijs van Dijck. 2021. Abusive Language on Social Media Through the Legal Looking Glass. In Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021). 191–200.
[6]
Danielle Caled and Mário J Silva. 2022. A Transfer Learning Analysis of Political Leaning Classification in Cross-domain Content. In International Conference on Computational Processing of the Portuguese Language. Springer, 267–277.
[7]
Tommaso Caselli, Valerio Basile, Jelena Mitrović, and Michael Granitzer. 2020. Hatebert: Retraining bert for abusive language detection in english. arXiv preprint arXiv:2010.12472 (2020).
[8]
Patricia Chiril, Véronique Moriceau, Farah Benamara, Alda Mari, Gloria Origgi, and Marlène Coulomb-Gully. 2020. He said “who’s gonna take care of your children when you are at ACL?”: Reported Sexist Acts are Not Sexist. ACL (2020). https://doi.org/10.18653/v1/2020.acl-main.373
[9]
Edilson Anselmo Corrêa, Vanessa Queiroz Marinho, Leandro Borges dos Santos, Thales Bertaglia, Marcos Vinícius Treviso, and Henrico Bertini Brum. 2017. PELESent: Cross-domain polarity classification using distant supervision. In 2017 Brazilian Conference on Intelligent Systems (BRACIS). IEEE, 49–54.
[10]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[11]
Nicola Döring and M. Mohseni. 2020. Gendered hate speech in YouTube and YouNow comments: Results of two content analyses. Studies in Communication and Media 9 (03 2020), 62–88. https://doi.org/10.5771/2192-4007-2020-1-62
[12]
Mai ElSherief, Elizabeth Belding, and Dana Nguyen. 2017. #NotOkay: Understanding Gender-Based Violence in Social Media.ICWSM (2017). https://doi.org/null
[13]
María Lameiras Fernández, Yolanda Rodríguez Castro, and Manuel González Lorenzo. 2004. Evolution of hostile sexism and benevolent sexism in a Spanish sample. Social Indicators Research 66, 3 (2004), 197–211.
[14]
Elisabetta Fersini, Paolo Rosso, and Maria Anzovino. 2018. Overview of the Task on Automatic Misogyny Identification at IberEval 2018.Ibereval@ sepln 2150 (2018), 214–228.
[15]
Joseph L Fleiss, Bruce Levin, Myunghee Cho Paik, 1981. The measurement of interrater agreement. Statistical methods for rates and proportions 2, 212-236 (1981), 22–23.
[16]
Jesse Fox, Carlos Cruz, and Ji Young Lee. 2015. Perpetuating online sexism offline: Anonymity, interactivity, and the effects of sexist hashtags on social media. Computers in human behavior 52 (2015), 436–442.
[17]
Peter Glick and Susan T. Fiske. 1996. The Ambivalent Sexism Inventory: Differentiating hostile and benevolent sexism.Journal of Personality and Social Psychology (1996). https://doi.org/10.1037/0022-3514.70.3.491
[18]
Peter Glick and Susan T Fiske. 1999. The ambivalence toward men inventory: Differentiating hostile and benevolent beliefs about men. Psychology of women quarterly 23, 3 (1999), 519–536.
[19]
Peter Glick, Mariah Wilkerson, and Marshall Cuffe. 2015. Masculine identity, ambivalent sexism, and attitudes toward gender subtypes: Favoring masculine men and feminine women.Social Psychology 46, 4 (2015), 210.
[20]
Lucy Hackworth. 2018. Limitations of “just gender”: The need for an intersectional reframing of online harassment discourse and research. In Mediating misogyny. Springer, 51–70.
[21]
Andrew F Hayes and Klaus Krippendorff. 2007. Answering the call for a standard reliability measure for coding data. Communication methods and measures 1, 1 (2007), 77–89.
[22]
Akshita Jha and Radhika Mamidi. 2017. When does a compliment become sexist? analysis and classification of ambivalent sexism using twitter data. In Proceedings of the second workshop on NLP and computational social science. 7–16.
[23]
Binny Mathew, Punyajoy Saha, Seid Muhie Yimam, Chris Biemann, Pawan Goyal, and Animesh Mukherjee. 2021. Hatexplain: A benchmark dataset for explainable hate speech detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 14867–14875.
[24]
Mairead Eastin Moloney and Tony P. Love. 2018. Assessing online misogyny: Perspectives from sociology and feminist media studies. Sociology Compass (2018). https://doi.org/10.1111/soc4.12577
[25]
Bailey Poland. 2016. Haters: Harassment, Abuse, and Violence Online. null (2016). https://doi.org/10.2307/j.ctt1fq9wdp
[26]
Juan Manuel Pérez, Juan Carlos Giudici, and Franco Luque. 2021. pysentimiento: A Python Toolkit for Sentiment Analysis and SocialNLP tasks. arxiv:2106.09462 [cs.CL]
[27]
Francisco Rodríguez-Sánchez, Jorge Carrillo-de Albornoz, Laura Plaza, Julio Gonzalo, Paolo Rosso, Miriam Comet, and Trinidad Donoso. 2021. Overview of exist 2021: sexism identification in social networks. Procesamiento del Lenguaje Natural 67 (2021), 195–207.
[28]
Mattia Samory, Indira Sen, Julian Kohne, Fabian Floeck, and Claudia Wagner. 2021. "Call me sexist, but...": Revisiting Sexism Detection Using Psychological Scales and Adversarial Samples. arxiv:2004.12764 [cs.CY]
[29]
Heather Savigny. 2020. Sexism and Misogyny. John Wiley & Sons, Ltd, 1–7. https://doi.org/10.1002/9781119429128.iegmc092 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/9781119429128.iegmc092
[30]
Natasha Szostak. 2013. Girls on YouTube: Gender politics and the potential for a public sphere. The McMaster Journal of Communication 10 (2013).
[31]
Zeerak Waseem and Dirk Hovy. 2016. Hateful symbols or hateful people? predictive features for hate speech detection on twitter. In Proceedings of the NAACL student research workshop. 88–93.
[32]
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, 2019. Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771 (2019).
[33]
Lindsey Wotanis and Laurie McMillan. 2014. Performing gender on YouTube: How Jenna Marbles negotiates a hostile online environment. Feminist Media Studies 14, 6 (2014), 912–928.

Cited By

View all

Index Terms

  1. Sexism in Focus: An Annotated Dataset of YouTube Comments for Gender Bias Research

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    OASIS '23: Proceedings of the 3rd International Workshop on Open Challenges in Online Social Networks
    September 2023
    53 pages
    ISBN:9798400702259
    DOI:10.1145/3599696
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 September 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    HT '23
    Sponsor:

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)100
    • Downloads (Last 6 weeks)13
    Reflects downloads up to 06 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media