Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation.

AllImages Books Videos Maps News Shopping

[2210.17027] Joint Pre-Training with Speech and Bilingual Text for Direct ...

Oct 31, 2022 · We propose in this paper a Speech2S model, which is jointly pre-trained with unpaired speech and bilingual text data for direct speech-to-speech translation ...

Joint Pre-Training with Speech and Bilingual Text for ... - IEEE Xplore

ieeexplore.ieee.org › iel7

This paper proposes a novel pre-training method with unlabeled speech and paired text data for direct speech to speech translation. The core of the proposed ...

[PDF] Joint Pre-Training with Speech and Bilingual Text for Direct Speech ...

www.semanticscholar.org › paper › Joint...

A Speech2S model is proposed, which is jointly pre-trained with unpaired speech and bilingual text data for direct speech-to-speech translation tasks, ...

Joint Pre-Training with Speech and Bilingual Text for Direct Speech to ...

www.researchgate.net › publication › 36...

Sep 14, 2024 · To address this issue, we propose in this paper a Speech2S model, which is jointly pre-trained with unpaired speech and bilingual text data for ...

inspect - arxiv-sanity

www.arxiv-sanity-lite.com › inspect

To address this issue, we propose in this paper a Speech2S model, which is jointly pre-trained with unpaired speech and bilingual text data for direct speech-to ...

Speech-to-Speech Translation | Papers With Code

paperswithcode.com › task › speech-to-s...

We present an attention-based sequence-to-sequence neural network which can directly translate speech from one language into speech in another language.

[PDF] Direct Speech-to-Speech Translation With Discrete Units

www.semanticscholar.org › paper › Dire...

Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation · Computer Science. ICASSP 2023 - 2023 IEEE International Conference…

mSLAM: Massively multilingual joint pre-training for speech and text - arXiv

arxiv.org › cs

Feb 3, 2022 · We present mSLAM, a multilingual Speech and LAnguage Model that learns cross-lingual cross-modal representations of speech and text by pre-training jointly.

Missing: Direct | Show results with:Direct

[PDF] Unified Speech-Text Pre-training for Speech Translation and Recognition

aclanthology.org › 2022.acl-long.1...

May 22, 2022 · Abstract. We describe a method to jointly pre-train speech and text in an encoder-decoder mod- eling framework for speech translation and.

Missing: Direct | Show results with:Direct

Toward Joint Language Modeling for Speech Units and Text - AI at Meta

ai.meta.com › research › publications › t...

Aug 1, 2024 · In this study, we compare the training dynamics of a system using a pretrained encoder, the conventional approach, and one trained from scratch.