Open Conversational LLMs do not know most Spanish words

Conde, Javier; González, Miguel; Melero, Nina; Ferrando, Raquel; Martínez, Gonzalo; Merino-Gómez, Elena; Hernández, José Alberto; Reviriego, Pedro

doi:10.26342/2024-73-7

Computer Science > Computation and Language

arXiv:2403.15491 (cs)

[Submitted on 21 Mar 2024 (v1), last revised 24 Sep 2024 (this version, v2)]

Title:Open Conversational LLMs do not know most Spanish words

Authors:Javier Conde, Miguel González, Nina Melero, Raquel Ferrando, Gonzalo Martínez, Elena Merino-Gómez, José Alberto Hernández, Pedro Reviriego

View PDF HTML (experimental)

Abstract:The growing interest in Large Language Models (LLMs) and in particular in conversational models with which users can interact has led to the development of a large number of open-source chat LLMs. These models are evaluated on a wide range of benchmarks to assess their capabilities in answering questions or solving problems on almost any possible topic or to test their ability to reason or interpret texts. Instead, the evaluation of the knowledge that these models have of the languages has received much less attention. For example, the words that they can recognize and use in different languages. In this paper, we evaluate the knowledge that open-source chat LLMs have of Spanish words by testing a sample of words in a reference dictionary. The results show that open-source chat LLMs produce incorrect meanings for an important fraction of the words and are not able to use most of the words correctly to write sentences with context. These results show how Spanish is left behind in the open-source LLM race and highlight the need to push for linguistic fairness in conversational LLMs ensuring that they provide similar performance across languages.

Comments:	Procesamiento del Lenguaje Natural, 73, 95-108
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2403.15491 [cs.CL]
	(or arXiv:2403.15491v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2403.15491
Journal reference:	Procesamiento del Lenguaje Natural, n. 73, 2024. http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6603
Related DOI:	https://doi.org/10.26342/2024-73-7

Submission history

From: Javier Conde [view email]
[v1] Thu, 21 Mar 2024 15:41:02 UTC (1,192 KB)
[v2] Tue, 24 Sep 2024 13:25:01 UTC (572 KB)

Computer Science > Computation and Language

Title:Open Conversational LLMs do not know most Spanish words

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Open Conversational LLMs do not know most Spanish words

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators