×
Feb 22, 2023 · This paper investigates this correlation and proposes a cross-modal speech co-learning paradigm. The primary motivation of our cross-modal co-learning method ...
Experimental results on the test sce- narios demonstrate that our proposed method achieves around 60% and 20% average relative performance improvement over ...
This paper proposes a cross-modal speech co-learning paradigm based on an audio-visual pseudo-siamese structure to learn the modality-transformed ...
Feb 22, 2023 · The primary motivation of our cross-modal co-learning method is modeling one modality aided by exploiting knowledge from another modality.
Jul 17, 2023 · Visual speech (i.e., lip motion) is highly related to auditory speech due to the co-occurrence and synchronization in speech production.
Simulated reverberation [3] , additive noise, and Specaugment [4] are effective methods for augmenting data in speaker verification. These techniques can expand ...
Introduction. This is the official implementation of ICASSP23 paper CROSS-MODAL AUDIO-VISUAL CO-LEARNING FOR TEXT-INDEPENDENT SPEAKER VERIFICATION ...
People also ask
Jun 21, 2024 · When individual modalities are corrupted, even the state-of-the-art face and speaker verification models fail to retain robust performances.
Visual speech (i.e., lip motion) is highly related to auditory speech due to the co-occurrence and synchronization in speech production.
Dec 13, 2023 · ABSTRACT. In this paper, we introduce a large-scale and high-quality audio- visual speaker verification dataset, named VoxBlink.