Cyrta: Speaker diarization using deep recurrent convoluti...

Speaker diarization using deep recurrent convolutional neural networks for speaker embeddings

P Cyrta, T Trzciński, W Stokowiec - International Conference on …, 2017 - Springer

International Conference on Information Systems Architecture and Technology, 2017•Springer

In this paper we propose a new method of speaker diarization that employs a deep learning
architecture to learn speaker embeddings. In contrast to the traditional approaches that build
their speaker embeddings using manually hand-crafted spectral features, we propose to
train for this purpose a recurrent convolutional neural network applied directly on magnitude
spectrograms. To compare our approach with the state of the art, we collect and release for
the public an additional dataset of over 6 h of fully annotated broadcast material. The results …

Abstract

In this paper we propose a new method of speaker diarization that employs a deep learning architecture to learn speaker embeddings. In contrast to the traditional approaches that build their speaker embeddings using manually hand-crafted spectral features, we propose to train for this purpose a recurrent convolutional neural network applied directly on magnitude spectrograms. To compare our approach with the state of the art, we collect and release for the public an additional dataset of over 6 h of fully annotated broadcast material. The results of our evaluation on the new dataset and three other benchmark datasets show that our proposed method significantly outperforms the competitors and reduces diarization error rate by a large margin of over 30% with respect to the baseline.

Springer

Show moreShow less

Save Cite Cited by 47 Related articles All 9 versions

Cite

Advanced search

Saved to My library

Speaker diarization using deep recurrent convolutional neural networks for speaker embeddings