Domain Adaptation of Neural Machine Translation by Lexicon Induction

Hu, Junjie; Xia, Mengzhou; Neubig, Graham; Carbonell, Jaime

Computer Science > Computation and Language

arXiv:1906.00376 (cs)

[Submitted on 2 Jun 2019]

Title:Domain Adaptation of Neural Machine Translation by Lexicon Induction

Authors:Junjie Hu, Mengzhou Xia, Graham Neubig, Jaime Carbonell

View PDF

Abstract:It has been previously noted that neural machine translation (NMT) is very sensitive to domain shift. In this paper, we argue that this is a dual effect of the highly lexicalized nature of NMT, resulting in failure for sentences with large numbers of unknown words, and lack of supervision for domain-specific words. To remedy this problem, we propose an unsupervised adaptation method which fine-tunes a pre-trained out-of-domain NMT model using a pseudo-in-domain corpus. Specifically, we perform lexicon induction to extract an in-domain lexicon, and construct a pseudo-parallel in-domain corpus by performing word-for-word back-translation of monolingual in-domain target sentences. In five domains over twenty pairwise adaptation settings and two model architectures, our method achieves consistent improvements without using any in-domain parallel sentences, improving up to 14 BLEU over unadapted models, and up to 2 BLEU over strong back-translation baselines.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:1906.00376 [cs.CL]
	(or arXiv:1906.00376v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1906.00376
Journal reference:	published at the 57th Annual Meeting of the Association for Computational Linguistics (ACL). July 2019

Submission history

From: Junjie Hu [view email]
[v1] Sun, 2 Jun 2019 09:50:12 UTC (112 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2019-06

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Junjie Hu
Mengzhou Xia
Graham Neubig
Jaime G. Carbonell

export BibTeX citation

Computer Science > Computation and Language

Title:Domain Adaptation of Neural Machine Translation by Lexicon Induction

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Domain Adaptation of Neural Machine Translation by Lexicon Induction

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators