Semantic Segmentation with Bidirectional Language Models Improves Long-form ASR

Huang, W. Ronny; Zhang, Hao; Kumar, Shankar; Chang, Shuo-yiin; Sainath, Tara N.

Computer Science > Computation and Language

arXiv:2305.18419 (cs)

[Submitted on 28 May 2023]

Title:Semantic Segmentation with Bidirectional Language Models Improves Long-form ASR

Authors:W. Ronny Huang, Hao Zhang, Shankar Kumar, Shuo-yiin Chang, Tara N. Sainath

View PDF

Abstract:We propose a method of segmenting long-form speech by separating semantically complete sentences within the utterance. This prevents the ASR decoder from needlessly processing faraway context while also preventing it from missing relevant context within the current sentence. Semantically complete sentence boundaries are typically demarcated by punctuation in written text; but unfortunately, spoken real-world utterances rarely contain punctuation. We address this limitation by distilling punctuation knowledge from a bidirectional teacher language model (LM) trained on written, punctuated text. We compare our segmenter, which is distilled from the LM teacher, against a segmenter distilled from a acoustic-pause-based teacher used in other works, on a streaming ASR pipeline. The pipeline with our segmenter achieves a 3.2% relative WER gain along with a 60 ms median end-of-segment latency reduction on a YouTube captioning task.

Comments:	Interspeech 2023. First 3 authors contributed equally
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2305.18419 [cs.CL]
	(or arXiv:2305.18419v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2305.18419

Submission history

From: Wenqian Ronny Huang [view email]
[v1] Sun, 28 May 2023 19:31:45 UTC (1,043 KB)

Computer Science > Computation and Language

Title:Semantic Segmentation with Bidirectional Language Models Improves Long-form ASR

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Semantic Segmentation with Bidirectional Language Models Improves Long-form ASR

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators