Skip to main content

Showing 1–12 of 12 results for author: Stańczak, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.10920  [pdf, other

    cs.CV cs.AI cs.CL

    Benchmarking Vision Language Models for Cultural Understanding

    Authors: Shravan Nayak, Kanishk Jain, Rabiul Awal, Siva Reddy, Sjoerd van Steenkiste, Lisa Anne Hendricks, Karolina Stańczak, Aishwarya Agrawal

    Abstract: Foundation models and vision-language pre-training have notably advanced Vision Language Models (VLMs), enabling multimodal processing of visual and linguistic data. However, their performance has been typically assessed on general scene understanding - recognizing objects, attributes, and actions - rather than cultural comprehension. This study introduces CulturalVQA, a visual question-answering… ▽ More

    Submitted 18 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

  2. arXiv:2403.10699  [pdf, other

    cs.CL

    A Multilingual Perspective on Probing Gender Bias

    Authors: Karolina Stańczak

    Abstract: Gender bias represents a form of systematic negative treatment that targets individuals based on their gender. This discrimination can range from subtle sexist remarks and gendered stereotypes to outright hate speech. Prior research has revealed that ignoring online abuse not only affects the individuals targeted but also has broader societal implications. These consequences extend to the discoura… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: Ph.D. Thesis

  3. arXiv:2311.18567  [pdf, other

    cs.CL

    Grammatical Gender's Influence on Distributional Semantics: A Causal Perspective

    Authors: Karolina Stańczak, Kevin Du, Adina Williams, Isabelle Augenstein, Ryan Cotterell

    Abstract: How much meaning influences gender assignment across languages is an active area of research in modern linguistics and cognitive science. We can view current approaches as aiming to determine where gender assignment falls on a spectrum, from being fully arbitrarily determined to being largely semantically determined. For the latter case, there is a formulation of the neo-Whorfian hypothesis, which… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

  4. arXiv:2311.17627  [pdf, other

    cs.SI cs.CY

    Invisible Women in Digital Diplomacy: A Multidimensional Framework for Online Gender Bias Against Women Ambassadors Worldwide

    Authors: Yevgeniy Golovchenko, Karolina Stańczak, Rebecca Adler-Nissen, Patrice Wangen, Isabelle Augenstein

    Abstract: Despite mounting evidence that women in foreign policy often bear the brunt of online hostility, the extent of online gender bias against diplomats remains unexplored. This paper offers the first global analysis of the treatment of women diplomats on social media. Introducing a multidimensional and multilingual methodology for studying online gender bias, it focuses on three critical elements: gen… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  5. arXiv:2311.09090  [pdf, other

    cs.CL

    Social Bias Probing: Fairness Benchmarking for Language Models

    Authors: Marta Marchiori Manerba, Karolina Stańczak, Riccardo Guidotti, Isabelle Augenstein

    Abstract: While the impact of social biases in language models has been recognized, prior methods for bias evaluation have been limited to binary association tests on small datasets, limiting our understanding of bias complexities. This paper proposes a novel framework for probing language models for social biases by assessing disparate treatment, which involves treating individuals differently according to… ▽ More

    Submitted 22 June, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

  6. arXiv:2305.12376  [pdf, other

    cs.CL cs.CY cs.LG

    Measuring Intersectional Biases in Historical Documents

    Authors: Nadav Borenstein, Karolina Stańczak, Thea Rolskov, Natália da Silva Perez, Natacha Klein Käfer, Isabelle Augenstein

    Abstract: Data-driven analyses of biases in historical texts can help illuminate the origin and development of biases prevailing in modern society. However, digitised historical documents pose a challenge for NLP practitioners as these corpora suffer from errors introduced by optical character recognition (OCR) and are written in an archaic language. In this paper, we investigate the continuities and tran… ▽ More

    Submitted 21 May, 2023; originally announced May 2023.

    Comments: Accepted to Findings of ACL2023

  7. arXiv:2304.05783  [pdf, other

    cs.CL

    Measuring Gender Bias in West Slavic Language Models

    Authors: Sandra Martinková, Karolina Stańczak, Isabelle Augenstein

    Abstract: Pre-trained language models have been known to perpetuate biases from the underlying datasets to downstream tasks. However, these findings are predominantly based on monolingual language models for English, whereas there are few investigative studies of biases encoded in language models for languages beyond English. In this paper, we fill this gap by analysing gender bias in West Slavic language m… ▽ More

    Submitted 25 May, 2023; v1 submitted 12 April, 2023; originally announced April 2023.

  8. arXiv:2205.02023  [pdf, other

    cs.CL

    Same Neurons, Different Languages: Probing Morphosyntax in Multilingual Pre-trained Models

    Authors: Karolina Stańczak, Edoardo Ponti, Lucas Torroba Hennigen, Ryan Cotterell, Isabelle Augenstein

    Abstract: The success of multilingual pre-trained models is underpinned by their ability to learn representations shared by multiple languages even in absence of any explicit supervision. However, it remains unclear how these models learn to generalise across languages. In this work, we conjecture that multilingual pre-trained models can derive language-universal abstractions about grammar. In particular, w… ▽ More

    Submitted 8 May, 2022; v1 submitted 4 May, 2022; originally announced May 2022.

    Comments: Accepted at NAACL 2022 (Main Conference)

  9. arXiv:2201.08214  [pdf, other

    cs.CL

    A Latent-Variable Model for Intrinsic Probing

    Authors: Karolina Stańczak, Lucas Torroba Hennigen, Adina Williams, Ryan Cotterell, Isabelle Augenstein

    Abstract: The success of pre-trained contextualized representations has prompted researchers to analyze them for the presence of linguistic information. Indeed, it is natural to assume that these pre-trained representations do encode some level of linguistic knowledge as they have brought about large empirical improvements on a wide variety of NLP tasks, which suggests they are learning true linguistic gene… ▽ More

    Submitted 11 July, 2024; v1 submitted 20 January, 2022; originally announced January 2022.

  10. arXiv:2112.14168  [pdf, other

    cs.CL cs.CY

    A Survey on Gender Bias in Natural Language Processing

    Authors: Karolina Stanczak, Isabelle Augenstein

    Abstract: Language can be used as a means of reproducing and enforcing harmful stereotypes and biases and has been analysed as such in numerous research. In this paper, we present a survey of 304 papers on gender bias in natural language processing. We analyse definitions of gender and its categories within social sciences and connect them to formal definitions of gender bias in NLP research. We survey lexi… ▽ More

    Submitted 28 December, 2021; originally announced December 2021.

  11. Quantifying Gender Biases Towards Politicians on Reddit

    Authors: Sara Marjanovic, Karolina Stańczak, Isabelle Augenstein

    Abstract: Despite attempts to increase gender parity in politics, global efforts have struggled to ensure equal female representation. This is likely tied to implicit gender biases against women in authority. In this work, we present a comprehensive study of gender biases that appear in online political discussion. To this end, we collect 10 million comments on Reddit in conversations about male and female… ▽ More

    Submitted 7 September, 2022; v1 submitted 22 December, 2021; originally announced December 2021.

  12. arXiv:2104.07505  [pdf, other

    cs.CL stat.ML

    Quantifying Gender Bias Towards Politicians in Cross-Lingual Language Models

    Authors: Karolina Stańczak, Sagnik Ray Choudhury, Tiago Pimentel, Ryan Cotterell, Isabelle Augenstein

    Abstract: Recent research has demonstrated that large pre-trained language models reflect societal biases expressed in natural language. The present paper introduces a simple method for probing language models to conduct a multilingual study of gender bias towards politicians. We quantify the usage of adjectives and verbs generated by language models surrounding the names of politicians as a function of the… ▽ More

    Submitted 9 November, 2023; v1 submitted 15 April, 2021; originally announced April 2021.