ANHALTEN: Cross-Lingual Transfer for German Token-Level Reference-Free Hallucination Detection

Herrlein, Janek; Hung, Chia-Chien; Glavaš, Goran

Computer Science > Computation and Language

arXiv:2407.13702 (cs)

[Submitted on 18 Jul 2024]

Title:ANHALTEN: Cross-Lingual Transfer for German Token-Level Reference-Free Hallucination Detection

Authors:Janek Herrlein, Chia-Chien Hung, Goran Glavaš

View PDF HTML (experimental)

Abstract:Research on token-level reference-free hallucination detection has predominantly focused on English, primarily due to the scarcity of robust datasets in other languages. This has hindered systematic investigations into the effectiveness of cross-lingual transfer for this important NLP application. To address this gap, we introduce ANHALTEN, a new evaluation dataset that extends the English hallucination detection dataset to German. To the best of our knowledge, this is the first work that explores cross-lingual transfer for token-level reference-free hallucination detection. ANHALTEN contains gold annotations in German that are parallel (i.e., directly comparable to the original English instances). We benchmark several prominent cross-lingual transfer approaches, demonstrating that larger context length leads to better hallucination detection in German, even without succeeding context. Importantly, we show that the sample-efficient few-shot transfer is the most effective approach in most setups. This highlights the practical benefits of minimal annotation effort in the target language for reference-free hallucination detection. Aiming to catalyze future research on cross-lingual token-level reference-free hallucination detection, we make ANHALTEN publicly available: this https URL

Comments:	ACL 2024 Student Research Workshop
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2407.13702 [cs.CL]
	(or arXiv:2407.13702v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2407.13702

Submission history

From: Chia-Chien Hung [view email]
[v1] Thu, 18 Jul 2024 17:01:38 UTC (7,724 KB)

Computer Science > Computation and Language

Title:ANHALTEN: Cross-Lingual Transfer for German Token-Level Reference-Free Hallucination Detection

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:ANHALTEN: Cross-Lingual Transfer for German Token-Level Reference-Free Hallucination Detection

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators