Listen, Attend, Spell and Adapt: Speaker Adapted Sequence-to-Sequence ASR

Weninger, Felix; Andrés-Ferrer, Jesús; Li, Xinwei; Zhan, Puming

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:1907.04916 (eess)

[Submitted on 8 Jul 2019]

Title:Listen, Attend, Spell and Adapt: Speaker Adapted Sequence-to-Sequence ASR

Authors:Felix Weninger, Jesús Andrés-Ferrer, Xinwei Li, Puming Zhan

View PDF

Abstract:Sequence-to-sequence (seq2seq) based ASR systems have shown state-of-the-art performances while having clear advantages in terms of simplicity. However, comparisons are mostly done on speaker independent (SI) ASR systems, though speaker adapted conventional systems are commonly used in practice for improving robustness to speaker and environment variations. In this paper, we apply speaker adaptation to seq2seq models with the goal of matching the performance of conventional ASR adaptation. Specifically, we investigate Kullback-Leibler divergence (KLD) as well as Linear Hidden Network (LHN) based adaptation for seq2seq ASR, using different amounts (up to 20 hours) of adaptation data per speaker. Our SI models are trained on large amounts of dictation data and achieve state-of-the-art results. We obtained 25% relative word error rate (WER) improvement with KLD adaptation of the seq2seq model vs. 18.7% gain from acoustic model adaptation in the conventional system. We also show that the WER of the seq2seq model decreases log-linearly with the amount of adaptation data. Finally, we analyze adaptation based on the minimum WER criterion and adapting the language model (LM) for score fusion with the speaker adapted seq2seq model, which result in further improvements of the seq2seq system performance.

Comments:	To appear in INTERSPEECH 2019
Subjects:	Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
Cite as:	arXiv:1907.04916 [eess.AS]
	(or arXiv:1907.04916v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.1907.04916

Submission history

From: Felix Weninger [view email]
[v1] Mon, 8 Jul 2019 15:09:40 UTC (209 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Listen, Attend, Spell and Adapt: Speaker Adapted Sequence-to-Sequence ASR

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Listen, Attend, Spell and Adapt: Speaker Adapted Sequence-to-Sequence ASR

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators