Esra Dönmez
2024
Please note that I’m just an AI: Analysis of Behavior Patterns of LLMs in (Non-)offensive Speech Identification
Esra Dönmez
|
Thang Vu
|
Agnieszka Falenska
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Offensive speech is highly prevalent on online platforms. Being trained on online data, Large Language Models (LLMs) display undesirable behaviors, such as generating harmful text or failing to recognize it. Despite these shortcomings, the models are becoming a part of our everyday lives by being used as tools for information search, content creation, writing assistance, and many more. Furthermore, the research explores using LLMs in applications with immense social risk, such as late-life companions and online content moderators. Despite the potential harms from LLMs in such applications, whether LLMs can reliably identify offensive speech and how they behave when they fail are open questions. This work addresses these questions by probing sixteen widely used LLMs and showing that most fail to identify (non-)offensive online language. Our experiments reveal undesirable behavior patterns in the context of offensive speech detection, such as erroneous response generation, over-reliance on profanity, and failure to recognize stereotypes. Our work highlights the need for extensive documentation of model reliability, particularly in terms of the ability to detect offensive language.
2023
HNC: Leveraging Hard Negative Captions towards Models with Fine-Grained Visual-Linguistic Comprehension Capabilities
Esra Dönmez
|
Pascal Tilli
|
Hsiu-Yu Yang
|
Ngoc Thang Vu
|
Carina Silberer
Proceedings of the 27th Conference on Computational Natural Language Learning (CoNLL)
Image-Text-Matching (ITM) is one of the defacto methods of learning generalized representations from a large corpus in Vision and Language (VL). However, due to the weak association between the web-collected image–text pairs, models fail to show fine-grained understanding of the combined semantics of these modalities. To this end, we propose Hard Negative Captions (HNC): an automatically created dataset containing foiled hard negative captions for ITM training towards achieving fine-grained cross-modal comprehension in VL. Additionally, we provide a challenging manually-created test set for benchmarking models on a fine-grained cross-modal mismatch with varying levels of compositional complexity. Our results show the effectiveness of training on HNC by improving the models’ zero-shot capabilities in detecting mismatches on diagnostic tasks and performing robustly under noisy visual input scenarios. Also, we demonstrate that HNC models yield a comparable or better initialization for fine-tuning. Our code and data are publicly available.
Search
Co-authors
- Thang Vu 1
- Agnieszka Falenska 1
- Pascal Tilli 1
- Hsiu-Yu Yang 1
- Ngoc Thang Vu 1
- show all...