Iakes Goenaga


2022

pdf bib
Unsupervised Machine Translation in Real-World Scenarios
Ona de Gibert Bonet | Iakes Goenaga | Jordi Armengol-Estapé | Olatz Perez-de-Viñaspre | Carla Parra Escartín | Marina Sanchez | Mārcis Pinnis | Gorka Labaka | Maite Melero
Proceedings of the Thirteenth Language Resources and Evaluation Conference

In this work, we present the work that has been carried on in the MT4All CEF project and the resources that it has generated by leveraging recent research carried out in the field of unsupervised learning. In the course of the project 18 monolingual corpora for specific domains and languages have been collected, and 12 bilingual dictionaries and translation models have been generated. As part of the research, the unsupervised MT methodology based only on monolingual corpora (Artetxe et al., 2017) has been tested on a variety of languages and domains. Results show that in specialised domains, when there is enough monolingual in-domain data, unsupervised results are comparable to those of general domain supervised translation, and that, at any rate, unsupervised techniques can be used to boost results whenever very little data is available.

2019

pdf bib
IxaMed at PharmacoNER Challenge 2019
Xabier Lahuerta | Iakes Goenaga | Koldo Gojenola | Aitziber Atutxa Salazar | Maite Oronoz
Proceedings of the 5th Workshop on BioNLP Open Shared Tasks

The aim of this paper is to present our approach (IxaMed) in the PharmacoNER 2019 task. The task consists of identifying chemical, drug, and gene/protein mentions from clinical case studies written in Spanish. The evaluation of the task is divided in two scenarios: one corresponding to the detection of named entities and one corresponding to the indexation of named entities that have been previously identified. In order to identify named entities we have made use of a Bi-LSTM with a CRF on top in combination with different types of word embeddings. We have achieved our best result (86.81 F-Score) combining pretrained word embeddings of Wikipedia and Electronic Health Records (50M words) with contextual string embeddings of Wikipedia and Electronic Health Records. On the other hand, for the indexation of the named entities we have used the Levenshtein distance obtaining a 85.34 F-Score as our best result.

2013

pdf bib
Exploiting the Contribution of Morphological Information to Parsing: the BASQUE TEAM system in the SPRML‘2013 Shared Task
Iakes Goenaga | Koldo Gojenola | Nerea Ezeiza
Proceedings of the Fourth Workshop on Statistical Parsing of Morphologically-Rich Languages

pdf bib
Overview of the SPMRL 2013 Shared Task: A Cross-Framework Evaluation of Parsing Morphologically Rich Languages
Djamé Seddah | Reut Tsarfaty | Sandra Kübler | Marie Candito | Jinho D. Choi | Richárd Farkas | Jennifer Foster | Iakes Goenaga | Koldo Gojenola Galletebeitia | Yoav Goldberg | Spence Green | Nizar Habash | Marco Kuhlmann | Wolfgang Maier | Joakim Nivre | Adam Przepiórkowski | Ryan Roth | Wolfgang Seeker | Yannick Versley | Veronika Vincze | Marcin Woliński | Alina Wróblewska | Eric Villemonte de la Clergerie
Proceedings of the Fourth Workshop on Statistical Parsing of Morphologically-Rich Languages

2012

pdf bib
Combining Rule-Based and Statistical Syntactic Analyzers
Iakes Goenaga | Koldobika Gojenola | María Jesús Aranzabe | Arantza Díaz de Ilarraza | Kepa Bengoetxea
Proceedings of the ACL 2012 Joint Workshop on Statistical Parsing and Semantic Processing of Morphologically Rich Languages