research-article

On the vulnerability of speaker verification to realistic voice spoofing

Authors:

Serife Kucur Ergunay,

Alexandros Lazaridis,

Sebastien MarcelAuthors Info & Claims

2015 IEEE 7th International Conference on Biometrics Theory, Applications and Systems (BTAS)

Pages 1 - 6

https://doi.org/10.1109/BTAS.2015.7358783

Published: 08 September 2015 Publication History

Abstract

Automatic speaker verification (ASV) systems are subject to various kinds of malicious attacks. Replay, voice conversion and speech synthesis attacks drastically degrade the performance of a standard ASV system by increasing its false acceptance rates. This issue raised a high level of interest in the speech research community where the possible voice spoofing attacks and their related countermeasures have been investigated. However, much less effort has been devoted in creating realistic and diverse spoofing attack databases that foster researchers to correctly evaluate their countermeasures against attacks. The existing studies are not complete in terms of types of attacks, and often difficult to reproduce because of unavailability of public databases. In this paper we introduce the voice spoofing data-set of AVspoof, a public audio-visual spoofing database. AVspoof includes ten realistic spoofing threats generated using replay, speech synthesis and voice conversion. In addition, we provide a set of experimental results that show the effect of such attacks on current state-of-the-art ASV systems.

References

[1]

Z. Aktar. Security of Multimodal Biometric Systems against Spoofing Attacks. PhD thesis, University of Cagliari, 2012.

[2]

F. Alegre, A. Amehraye, and N. W. D. Evans. Spoofing countermeasures to protect automatic speaker verification from voice conversion. In ICASSP 2013, 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, 2013.

[3]

T. Anastasakos, J. McDonough, and J. Makhoul. Speaker adaptive training: a maximum likelihood approach to speaker normalization. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, volume 2, pages 1043–1046 2, Apr 1997.

[4]

A. Anjos and S. Marcel. Counter-measures to photo attacks in face recognition: A public database and a baseline. In Biometrics (IJCB), 2011 International Joint Conference on, pages 1–7, Oct 2011.

[5]

M. Chakka, A. Anjos, S. Marcel, R. Tronci, D. Muntoni, G. Fadda, M. Pili, N. Sirena, G. Murgia, M. Ristori, F. Roli, J. Yan, D. Yi, Z. Lei, Z. Zhang, S. Li, W. Schwartz, A. Rocha, H. Pedrini, J. Lorenzo-Navarro, M. Castrillon-Santana, J. Maatta, A. Hadid, and M. Pietikainen. Competition on counter measures to 2-d facial spoofing attacks. In Biometrics (IJCB), 2011 International Joint Conference on, pages 1–6, Oct 2011.

[6]

N. Dehak, P. Kenny, R. Dehak, P. Dumouchel, and P. Ouel-let. Front-end factor analysis for speaker verification. Audio, Speech, and Language Processing, IEEE Transactions on, 19(4):788–798, 2011.

Digital Library

[7]

N. Dehak, P. Kenny, R. Dehak, P. Dumouchel, and P. Ouel-let. Front-end factor analysis for speaker verification. IEEE Trans. on Audio, Speech, and Language Processing, 2011.

[8]

N. Evans, T. Kinnunen, and J. Yamagishi. Spoofing and countermeasures for automatic speaker verification. In IN-TERSPEECH, pages 925–929, 2013.

[9]

R. A. Fisher. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 19: 179–188, 1936.

[10]

D. Garcia-Romero and C. Espy-Wilson. Analysis of i-vector length normalization in speaker recognition systems. In Interspeech, pages 249–252, 2011.

[11]

A. Hadid. Face biometrics under spoofing attacks: Vulnerabilities, countermeasures, open issues, and research directions. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2014 IEEE Conference on, pages 113–118, June 2014.

[12]

A. Hatch, S. Kajarekar, and A. Stolcke. Within-class covariance normalization for SVM-based speaker recognition. In 9th Intl. Conf. on Spoken Language Processing (ICSLP), 2006.

[13]

HTS. HMM-based speech synthesis system version 2.1 2010.

[14]

H. Kawahara, J. Estill, and O. Fujimura. Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system straight. In MAVEBA, 2001.

[15]

P. Kenny, G. Boulianne, P. Ouellet, and P. Dumouchel. Joint factor analysis versus eigenchannels in speaker recognition. IEEE Trans. on Audio, Speech, and Language Processing, 15(4):1435–1447, May 2007.

Digital Library

[16]

E. Khoury, L. El Shafey, and S. Marcel. Spear: An open source toolbox for speaker recognition based on Bob. In IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2014.

[17]

E. Khoury, L. El Shafey, C. McCool, M. Günther, and S. Marcel. Bi-modal biometric authentication on mobile phones in challenging conditions. Image and Vision Computing, 2014.

[18]

E. Khoury, T. Kinnunen, A. Sizov, Z. Wu, and S. Marcel. Introducing i-vectors for joint anti-spoofing and speaker verification. In Proc. Interspeech, 2014.

[19]

A. Larcher, K.-A. Lee, B. Ma, and H. Li. Rsr2015: Database for text-dependent speaker verification using multiple pass-phrases. In INTERSPEECH, 2012.

[20]

M. Lincoln, I. McCowan, J. Vepa, and H. K. Maganti. The multi-channel wall street journal audio visual corpus (MC-WSJ-AV): Specification and initial experiments. In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding, pages 357–362, San Juan, US, November 2005.

[21]

C. McCool, S. Marcel, A. Hadid, M. Pietikainen, P. Matejka, J. Cernocky, N. Poh, J. Kittler, A. Larcher, C. Levy, D. Ma-trouf, J.-F. Bonastre, P. Tresadern, and T. Cootes. Bi-modal person recognition on a mobile phone: using mobile phone data. In IEEE ICME Workshop on Hot Topics in Mobile Multimedia, 2012.

[22]

T. Nose, J. Yamagishi, T. Masuko, and T. Kobayashi. A style control technique for HMM-based expressive speech synthesis. IEICE-Trans. Inf. Syst., E90-D(9):1406–1413, 2007.

[23]

S. J. Prince and J. H. Elder. Probabilistic linear discriminant analysis for inferences about identity. In IEEE International Conference on Computer Vision (ICCV), volume 0, pages 1–8, 2007.

[24]

D. A. Reynolds, T. F. Quatieri, and R. B. Dunn. Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 2000.

[25]

A. Sizov, E. Khoury, T. Kinnunen, Z. Wu, and S. Marcel. J oint speaker verification and anti-spoofing in the i-vector space. IEEE Transactions on Information Forensics and Security (under revision), 2015.

[26]

Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. Deepface: Closing the gap to human-level performance in face verification. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pages 1701–1708, June 2014.

[27]

J. Villalba and E. Lleida. Speaker verification performance degradation against spoofing and tampering attacks. In FALA workshop pages 131–134, 2010.

[28]

R. Vogt and S. Sridharan. Explicit modelling of session variability for speaker verification. Computer Speech & Language, 22(1): 17–38, 2008.

Digital Library

[29]

M. Wester, J. Dines, M. Gibson, H. Liang, Y.-J. Wu, L. Sa-heer, S. King, K. Oura, P. N. Garner, W. Byrne, Y. Guan, T. Hirsimaki, R. Karhila, M. Kurimo, M. Shannon, S. Shiota, J. Tian, K. Tokuda, and J. Yamagishi. Speaker adaptation and the evaluation of speaker similarity in the EMIME speech-to-speech translation project. In SSW7, pages 192–197, 2010.

[30]

Z. Wu, N. Evans, T. Kinnunen, J. Yamagishi, F. Alegre, and H. Li. Spoofing and countermeasures for speaker verification: A survey. Speech Communication, 66(0):130–153, 2015.

Digital Library

[31]

Z. Wu, S. Gao, E. S. Cling, and H. Li. A study on replay attack and anti-spoofing for text-dependent speaker verification. In Asia-Pacific Signal and Information Processing Association, 2014 Annual Summit and Conference (APSIPA), pages 1–5, Dec 2014.

[32]

Z. Wu, A. Khodabakhsh, C. Demiroglu, J. Yamagishi, D. Saito, T. Toda, and S. King. Sas: A speaker verification spoofing database containing diverse attacks. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2015.

[33]

Z. Wu, A. Larcher, K.-A. Lee, E. Chng, T. Kinnunen, and H. Li. Vulnerability evaluation of speaker verification under voice conversion spoofing: the effect of text constraints. In INTERSPEECH, pages 950–954, 2013.

[34]

J. Yamagishi, T. Kobayashi, Y. Nakano, K. Ogata, and J. Isogai. Analysis of speaker adaptation algorithms for hmm-based speech synthesis and a constrained smaplr adaptation algorithm. Audio, Speech, and Language Processing, IEEE Transactions on, 17(1):66–83, Jan 2009.

[35]

J. Yamagishi, T. Kobayashi, M. Tachibana, K. Ogata, and Y. Nakano. Model adaptation approach to speech synthesis with diverse voices and styles. In ICASSP, pages 1233–1236, 2007.

[36]

T. Yoshimura, K. Tokuda, T. Kobayashi, T. Masuko, and T. Kitamura. Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis. In EU-ROSPEECH, 1999.

[37]

H. Zen, T. Nose, J. Yamagishi, S. Sako, T. Masuko, A. Black, and K. Tokuda. The HMM-based speech synthesis system (HTS) version 2.0. In Proceedings of the 6th ISCA Speech Synthesis Workshop, pages 294–299, 2007.

[38]

H. Zen, K. Tokuda, and A. W. Black. Statistical parametric speech synthesis. Speech Communication, 51(11): 1039–1064, 2009.

[39]

H. Zen, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura. Hidden semi-Markov model based speech synthesis. In Proc. of ICSLP, 2004.

Cited By

Walker PZhang TShi CSaxena NChen YBoureanu ISchneider SReaves BTippenhauer N(2023)BarrierBypass: Out-of-Sight Clean Voice Command Injection Attacks through Physical BarriersProceedings of the 16th ACM Conference on Security and Privacy in Wireless and Mobile Networks10.1145/3558482.3581772(203-214)Online publication date: 29-May-2023
https://dl.acm.org/doi/10.1145/3558482.3581772
Kwak IChoi SYang JLee YHan SOh STao JLi HMeng HYu DAkagi MYi JFan CFu RLian SZhang P(2022)Low-quality Fake Audio Detection through Frequency Feature MaskingProceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia10.1145/3552466.3556533(9-17)Online publication date: 14-Oct-2022
https://dl.acm.org/doi/10.1145/3552466.3556533
Zhao CLi ZDing HXi WWang GZhao J(2021)Anti-Spoofing Voice CommandsProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/34781165:3(1-22)Online publication date: 14-Sep-2021
https://dl.acm.org/doi/10.1145/3478116

Index Terms

On the vulnerability of speaker verification to realistic voice spoofing

Index terms have been assigned to the content through auto-classification.

Recommendations

Anti-spoofing for text-independent speaker verification: an initial database, comparison of countermeasures, and human performance

In this paper, we present a systematic study of the vulnerability of automatic speaker verification to a diverse range of spoofing attacks. We start with a thorough analysis of the spoofing effects of five speech synthesis and eight voice conversion ...
Spoofing and countermeasures for speaker verification

While biometric authentication has advanced significantly in recent years, evidence shows the technology can be susceptible to malicious spoofing attacks. The research community has responded with dedicated countermeasures which aim to detect and ...
Voice Mimicry Attacks Assisted by Automatic Speaker Verification
Highlights
- Automatic speaker verification (ASV) to find targets for mimicry attacks.
- ...
Abstract
In this work, we simulate a scenario, where a publicly available ASV system is used to enhance mimicry attacks against another closed source ASV system. In specific, ASV technology is used to perform a similarity search between the ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

2015 IEEE 7th International Conference on Biometrics Theory, Applications and Systems (BTAS)

Sep 2015

421 pages

Copyright © 2015.

Publisher

IEEE Press

Publication History

Published: 08 September 2015

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Walker PZhang TShi CSaxena NChen YBoureanu ISchneider SReaves BTippenhauer N(2023)BarrierBypass: Out-of-Sight Clean Voice Command Injection Attacks through Physical BarriersProceedings of the 16th ACM Conference on Security and Privacy in Wireless and Mobile Networks10.1145/3558482.3581772(203-214)Online publication date: 29-May-2023
https://dl.acm.org/doi/10.1145/3558482.3581772
Kwak IChoi SYang JLee YHan SOh STao JLi HMeng HYu DAkagi MYi JFan CFu RLian SZhang P(2022)Low-quality Fake Audio Detection through Frequency Feature MaskingProceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia10.1145/3552466.3556533(9-17)Online publication date: 14-Oct-2022
https://dl.acm.org/doi/10.1145/3552466.3556533
Zhao CLi ZDing HXi WWang GZhao J(2021)Anti-Spoofing Voice CommandsProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/34781165:3(1-22)Online publication date: 14-Sep-2021
https://dl.acm.org/doi/10.1145/3478116

View Options

View options

Media

Figures

Other

Tables

View Table of Contents