skip to main content
research-article

Spoofing and countermeasures for speaker verification

Published: 01 February 2015 Publication History

Abstract

While biometric authentication has advanced significantly in recent years, evidence shows the technology can be susceptible to malicious spoofing attacks. The research community has responded with dedicated countermeasures which aim to detect and deflect such attacks. Even if the literature shows that they can be effective, the problem is far from being solved; biometric systems remain vulnerable to spoofing. Despite a growing momentum to develop spoofing countermeasures for automatic speaker verification, now that the technology has matured sufficiently to support mass deployment in an array of diverse applications, greater effort will be needed in the future to ensure adequate protection against spoofing. This article provides a survey of past work and identifies priority research directions for the future. We summarise previous studies involving impersonation, replay, speech synthesis and voice conversion spoofing attacks and more recent efforts to develop dedicated countermeasures. The survey shows that future research should address the lack of standard datasets and the over-fitting of existing countermeasures to specific, known spoofing attacks.

References

[1]
Abe, M., Nakamura, S., Shikano, K., Kuwabara, H., 1988. Voice conversion through vector quantization. In: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP).
[2]
Adami, A.G., Mihaescu, R., Reynolds, D.A., Godfrey, J.J., 2003. Modeling prosodic dynamics for speaker recognition. In: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP).
[3]
Akhtar, Z., Fumera, G., Marcialis, G.L., Roli, F., 2012. Evaluation of serial and parallel multibiometric systems under spoofing attacks. In: Proc. 5th Int. Conf. on Biometrics (ICB 2012).
[4]
Alegre, F., Amehraye, A., Evans, N., 2013a. A one-class classification approach to generalised speaker verification spoofing countermeasures using local binary patterns. In: Proc. Int. Conf. on Biometrics: Theory, Applications and Systems (BTAS).
[5]
Alegre, F., Amehraye, A., Evans, N., 2013b. Spoofing countermeasures to protect automatic speaker verification from voice conversion. In: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP).
[6]
F. Alegre, N. Evans, T. Kinnunen, Z. Wu, J. Yamagishi, Anti-spoofing: voice databases, in: Encyclopedia of Biometrics, Springer-Verlag, US, 2014.
[7]
Alegre, F., Vipperla, R., Amehraye, A., Evans, N., 2013c. A new speaker verification spoofing countermeasure based on local binary patterns. In: Proc. Interspeech.
[8]
Alegre, F., Vipperla, R., Evans, N., Fauve, B., 2012a. On the vulnerability of automatic speaker recognition to spoofing attacks with artificial signals. In: Proc. European Signal Processing Conference (EUSIPCO).
[9]
Alegre, F., Vipperla, R., Evans, N., et al., 2012b. Spoofing countermeasures for the protection of automatic speaker recognition systems against attacks with artificial signals. In: Proc. Interspeech.
[10]
T.B. Amin, J.S. German, P. Marziliano, Detecting voice disguise from speech variability: analysis of three glottal and vocal tract measures, J. Acoust. Soc. Am., 134 (2013).
[11]
T.B. Amin, P. Marziliano, J.S. German, Glottal and vocal tract characteristics of voice impersonators, IEEE Trans. Multimedia, 16 (2014) 668-678.
[12]
Anjos, A., El-Shafey, L., Wallace, R., Günther, M., McCool, C., Marcel, S., 2012. Bob: a free signal processing and machine learning toolbox for researchers. In: Proc. the 20th ACM Int. Conf. on Multimedia.
[13]
Beutnagel, B., Conkie, A., Schroeter, J., Stylianou, Y., Syrdal, A., 1999. The AT&T Next-Gen TTS system. In: Proc. Joint ASA, EAA and DAEA Meeting.
[14]
F. Bimbot, J.F. Bonastre, C. Fredouille, G. Gravier, I. Magrin-Chagnolleau, S. Meignier, T. Merlin, J. Ortega-García, D. Petrovska-Delacrétaz, D.A. Reynolds, A tutorial on text-independent speaker verification, EURASIP J. Appl. Signal Process., 2004 (2004) 430-451.
[15]
Black, A.W., 2006. CLUSTERGEN: A statistical parametric synthesizer using trajectory modeling. In: Proc. Interspeech.
[16]
Blomberg, M., Elenius, D., Zetterholm, E., 2004. Speaker verification scores and acoustic analysis of a professional impersonator. In: Proc. FONETIK.
[17]
Boersma, P., Weenink, D., 2014. Praat: doing phonetics by computer. Computer program. Version 5.3.64, retrieved 12 February 2014 from <http://www.praat.org/>.
[18]
Bonastre, J.F., Matrouf, D., Fredouille, C., 2007. Artificial impostor voice transformation effects on false acceptance rates. In: Proc. Interspeech.
[19]
Bredin, H., Miguel, A., Witten, I.H., Chollet, G., 2006. Detecting replay attacks in audiovisual identity verification. In: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP).
[20]
Breen, A., Jackson, P., 1998. A phonologically motivated method of selecting nonuniform units. In: Proc. Int. Conf. on Spoken Language Processing (ICSLP).
[21]
N. Brümmer, L. Burget, J. ¿ernocký, O. Glembek, F. Grézl, M. Karafiát, D. Leeuwen, P. Mat¿jka, P. Schwartz, A. Strasheim, Fusion of heterogeneous speaker recognition systems in the STBU submission for the NIST speaker recognition evaluation 2006, IEEE Trans. Audio Speech Language Process., 15 (2007) 2072-2084.
[22]
L. Burget, P. Mat¿jka, P. Schwarz, O. Glembek, J. ¿ernocký, Analysis of feature extraction and channel compensation in a GMM speaker recognition system, IEEE Trans. Audio Speech Language Process., 15 (2007) 1979-1986.
[23]
W.M. Campbell, D.E. Sturim, D.A. Reynolds, Support vector machines using GMM supervectors for speaker verification, IEEE Signal Process. Lett., 13 (2006) 308-311.
[24]
J.P. Campbell, Speaker recognition: a tutorial, Proc. IEEE, 85 (1997) 1437-1462.
[25]
Chen, L.H., Ling, Z.H., Song, Y., Dai, L.R., 2013. Joint spectral distribution modeling using restricted Boltzmann machines for voice conversion. In: Proc. Interspeech.
[26]
Chen, L.W., Guo, W., Dai, L.R., 2010. Speaker verification against synthetic speech. In: 7th Int. Symposium on Chinese Spoken Language Processing (ISCSLP).
[27]
Chen, Y., Chu, M., Chang, E., Liu, J., Liu, R., 2003. Voice conversion with smoothed GMM and MAP adaptation. In: Proc. European Conference on Speech Communication and Technology (Eurospeech).
[28]
Chingovska, I., Anjos, A., Marcel, S., 2013. Anti-spoofing in action: joint operation with a verification system. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition Workshops (CVPRW).
[29]
I. Chingovska, A. Anjos, S. Marcel, Biometrics evaluation under spoofing attacks, IEEE Trans. Inform. Forensics Security (2014).
[30]
Coorman, G., Fackrell, J., Rutten, P., Coile, B., 2000. Segment selection in the L&H realspeak laboratory TTS system. In: Proc. Int. Conf. on Spoken Language Processing (ICSLP), pp. 395-398.
[31]
De Leon, P.L., Apsingekar, V.R., Pucher, M., Yamagishi, J., 2010a. Revisiting the security of speaker verification systems against imposture using synthetic speech. In: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP).
[32]
De Leon, P.L., Hernaez, I., Saratxaga, I., Pucher, M., Yamagishi, J., 2011. Detection of synthetic speech for the problem of imposture. In: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP).
[33]
De Leon, P.L., Pucher, M., Yamagishi, J., 2010b. Evaluation of the vulnerability of speaker verification to synthetic speech. In: Proc. Odyssey: the Speaker and Language Recognition Workshop.
[34]
P.L. De Leon, M. Pucher, J. Yamagishi, I. Hernaez, I. Saratxaga, Evaluation of speaker verification security and detection of HMM-based synthetic speech, IEEE Trans. Audio Speech Language Process., 20 (2012) 2280-2290.
[35]
De Leon, P.L., Stewart, B., Yamagishi, J., 2012b. Synthetic speech discrimination using pitch pattern statistics derived from image analysis. In: Proc. Interspeech.
[36]
N. Dehak, P. Dumouchel, P. Kenny, Modeling prosodic features with joint factor analysis for speaker verification, IEEE Trans. Audio Speech Language Process., 15 (2007) 2095-2103.
[37]
N. Dehak, P. Kenny, R. Dehak, P. Dumouchel, P. Ouellet, Front-end factor analysis for speaker verification, IEEE Trans. Audio Speech Language Process., 19 (2011) 788-798.
[38]
S. Desai, A. Black, B. Yegnanarayana, K. Prahallad, Spectral mapping using artificial neural networks for voice conversion, IEEE Trans. Audio Speech Language Process., 18 (2010) 954-964.
[39]
Doddington, G., 2001. Speaker recognition based on idiolectal differences between speakers. In: Proc. European Conference on Speech Communication and Technology (Eurospeech).
[40]
Doddington, G., Liggett, W., Martin, A., Przybocki, M., Reynolds, D., 1998. Sheep, goats, lambs and wolves: a statistical analysis of speaker performance in the NIST 1998 speaker recognition evaluation, Gaithersburg, MD. National Institute of Standards and Technology.
[41]
Donovan, R.E., Eide, E.M., 1998. The IBM trainable speech synthesis system. In: Proc. Int. Conf. on Spoken Language Processing (ICSLP).
[42]
Dutoit, T., Holzapfel, A., Jottrand, M., Moinet, A., Perez, J., Stylianou, Y., 2007. Towards a voice conversion system based on frame selection. In: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP).
[43]
W. Endres, W. Bambach, G. Flösser, Voice spectrograms as a function of age, voice disguise, and voice imitation, J. Acoust. Soc. Am., 49 (1971) 1842-1848.
[44]
Eriksson, A., Wretling, P., 1997. How flexible is the human voice? - a case study of mimicry. In: Proc. European Conference on Speech Communication and Technology (Eurospeech).
[45]
D. Erro, A. Moreno, A. Bonafonte, Voice conversion based on weighted frequency warping, IEEE Trans. Audio Speech Language Process., 18 (2010) 922-931.
[46]
D. Erro, E. Navas, I. Hernaez, Parametric voice conversion based on bilinear frequency warping plus amplitude scaling, IEEE Trans. Audio Speech Language Process., 21 (2013) 556-566.
[47]
N. Evans, F. Alegre, Z. Wu, T. Kinnunen, Anti-spoofing: voice conversion, in: Encyclopedia of Biometrics, Springer-Verlag, US, 2014.
[48]
Evans, N., Kinnunen, T., Yamagishi, J., 2013. Spoofing and countermeasures for automatic speaker verification. In: Proc. Interspeech.
[49]
N. Evans, T. Kinnunen, J. Yamagishi, Z. Wu, F. Alegre, P. DeLeon, Speaker recognition anti-spoofing, in: Handbook of Biometric Anti-spoofing, Springer, 2014.
[50]
Farrús, M., Wagner, M., Anguita, J., Hernando, J., 2008. How vulnerable are prosodic features to professional imitators? In: Proc. Odyssey: the Speaker and Language Recognition Workshop.
[51]
M. Farrús, M. Wagner, D. Erro, J. Hernando, Automatic speaker recognition as a measurement of voice imitation and conversion, Int. J. Speech Language Law, 17 (2010) 119-142.
[52]
M. Faundez-Zanuy, On the vulnerability of biometric security systems, IEEE Aerospace Electron. Syst. Mag., 19 (2004) 3-8.
[53]
M. Faundez-Zanuy, M. Hagmüller, G. Kubin, Speaker verification security improvement by means of speech watermarking, Speech Commun., 48 (2006) 1608-1619.
[54]
Ferrer, L., Scheffer, N., Shriberg, E., 2010. A comparison of approaches for modeling prosodic features in speaker recognition. In: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP).
[55]
Foomany, F., Hirschfield, A., Ingleby, M., 2009. Toward a dynamic framework for security evaluation of voice verification systems. In: Proc. IEEE Toronto Int. Conf. Science and Technology for Humanity (TIC-STH), pp. 22-27.
[56]
J. Galbally, C. McCool, J. Fierrez, S. Marcel, J. Ortega-Garcia, On the vulnerability of face verification systems to hill-climbing attacks, Pattern Recogn., 43 (2010) 1027-1038.
[57]
Galou, G., 2011. Synthetic voice forgery in the forensic context: a short tutorial. In: Forensic Speech and Audio Analysis Working Group (ENFSI-FSAAWG), pp. 1-3.
[58]
Garcia-Romero, D., Espy-Wilson, C.Y., 2011. Analysis of i-vector length normalization in speaker recognition systems. In: Proc. Interspeech.
[59]
J.L. Gauvain, C.H. Lee, Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains, IEEE Trans. Speech Audio Process., 2 (1994) 291-298.
[60]
Gerhard, D., 2003. Pitch extraction and fundamental frequency: History and current techniques. Technical Report TR-CS 2003-06, Department of Computer Science, University of Regina.
[61]
Gillet, B., King, S., 2003. Transforming F0 contours. In: Proc. European Conference on Speech Communication and Technology (Eurospeech).
[62]
E. Godoy, O. Rosec, T. Chonavel, Voice conversion using dynamic frequency warping with amplitude scaling, for parallel or nonparallel corpora, IEEE Trans. Audio Speech Language Process., 20 (2012) 1313-1323.
[63]
Hatch, A.O., Kajarekar, S., Stolcke, A., 2006. Within-class covariance normalization for SVM-based speaker recognition. In: Proc. Int. Conf. on Spoken Language Processing (ICSLP).
[64]
Hautamäki, R.G., Kinnunen, T., Hautamäki, V., Laukkanen, A.M., 2014. Comparison of human listeners and speaker verification systems using voice mimicry data. In: Proc. Odyssey: the Speaker and Language Recognition Workshop, Joensuu, Finland. pp. 137-144.
[65]
Hautamäki, R.G., Kinnunen, T., Hautamäki, V., Leino, T., Laukkanen, A.M., 2013a. I-vectors meet imitators: on vulnerability of speaker verification systems against voice mimicry. In: Proc. Interspeech.
[66]
V. Hautamäki, T. Kinnunen, F. Sedlák, K.A. Lee, B. Ma, H. Li, Sparse classifier fusion for speaker verification, IEEE Trans. Audio Speech Language Process., 21 (2013) 1622-1631.
[67]
E. Helander, H. Silén, T. Virtanen, M. Gabbouj, Voice conversion using dynamic kernel partial least squares regression, IEEE Trans. Audio Speech Language Process., 20 (2012) 806-817.
[68]
Helander, E.E., Nurminen, J., 2007. A novel method for prosody prediction in voice conversion. In: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP).
[69]
G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, B. Kingsbury, Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups, IEEE Signal Process. Mag., 29 (2012) 82-97.
[70]
Hunt, A., Black, A.W., 1996. Unit selection in a concatenative speech synthesis system using a large speech database. In: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP).
[71]
A.K. Jain, A. Ross, S. Pankanti, Biometrics: a tool for information security, IEEE Trans. Inform. Forensics Security, 1 (2006) 125-143.
[72]
Johnson, P., Tan, B., Schuckers, S., 2010. Multimodal fusion vulnerability to non-zero effort (spoof) imposters. In: IEEE Int. Workshop on Information Forensics and Security (WIFS).
[73]
Kain, A., Macon, M.W., 1998. Spectral voice conversion for text-to-speech synthesis. In: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP).
[74]
Kajarekar, G.S., Stolcke, E.S.K.S.A., Venkataraman, A., 2003. Modeling duration patterns for speaker recognition. In: Proc. European Conference on Speech Communication and Technology (Eurospeech).
[75]
Kenny, P., 2006. Joint factor analysis of speaker and session variability: theory and algorithms. Technical report CRIM-06/08-14.
[76]
P. Kenny, G. Boulianne, P. Ouellet, P. Dumouchel, Speaker and session variability in GMM-based speaker verification, IEEE Trans. Audio Speech Language Process., 15 (2007) 1448-1460.
[77]
P. Kenny, P. Ouellet, N. Dehak, V. Gupta, P. Dumouchel, A study of inter-speaker variability in speaker verification, IEEE Trans. Audio Speech Language Process., 16 (2008) 980-988.
[78]
T. Kinnunen, H. Li, An overview of text-independent speaker recognition: from features to supervectors, Speech Commun., 52 (2010) 12-40.
[79]
Kinnunen, T., Wu, Z.Z., Lee, K.A., Sedlak, F., Chng, E.S., Li, H., 2012. Vulnerability of speaker verification systems against voice conversion spoofing attacks: the case of telephone speech. In: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP).
[80]
Kitamura, T., 2008. Acoustic analysis of imitated voice produced by a professional impersonator. In: Proc. Interspeech.
[81]
D.H. Klatt, Software for a cascade/parallel formant synthesizer, J. Acoust. Soc. Am., 67 (1980) 971-995.
[82]
Kockmann, M., 2012. Subspace Modeling of Prosodic Features for Speaker Verification. Ph.D. thesis, BRNO University of Technology, Brno, Czech Republic.
[83]
Kons, Z., Aronowitz, H., 2013. Voice transformation-based spoofing of text-dependent speaker verification systems. In: Proc. Interspeech.
[84]
Larcher, A., Bonastre, J.F., Fauve, B., Lee, K.A., Lévy, C., Li, H., Mason, J.S., Parfait, J.Y., ValidSoft Ltd, U., 2013a. Alize 3.0-open source toolkit for state-of-the-art speaker recognition. In: Proc. Interspeech.
[85]
Larcher, A., Lee, K.A., Ma, B., Li, H., 2012. RSR2015: database for text-dependent speaker verification using multiple pass-phrases. In: Proc. Interspeech.
[86]
Larcher, A., Lee, K.A., Ma, B., Li, H., 2013b. Phonetically-constrained PLDA modeling for text-dependent speaker verification with multiple short utterances. In: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP).
[87]
A. Larcher, K.A. Lee, B. Ma, H. Li, Text-dependent speaker verification: classifiers, databases and RSR2015, Speech Commun., 60 (2014) 5677.
[88]
Lau, Y., Tran, D., Wagner, M., 2005. Testing voice mimicry with the YOHO speaker verification corpus. In: Knowledge-Based Intelligent Information and Engineering Systems, Springer, pp. 907-907.
[89]
Lau, Y.W., Wagner, M., Tran, D., 2004. Vulnerability of speaker verification to voice mimicking. In: Proc. Int. Symposium on Intelligent Multimedia, Video and Speech Processing.
[90]
Lee, K.A., Ma, B., Li, H., 2013. Speaker verification makes its debut in smartphone. In: IEEE Signal Processing Society Speech and language Technical Committee Newsletter.
[91]
C.J. Leggetter, P.C. Woodland, Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models, Comput. Speech Language, 9 (1995) 171-185.
[92]
Leskelä, J., 2011. Changes in F0, formant frequencies and spectral slope in imitation. Master's thesis, University of Tampere, Finland, In Finnish.
[93]
H. Li, B. Ma, Techware: speaker and spoken language recognition resources best of the web}, IEEE Signal Process. Mag., 27 (2010) 139-142.
[94]
H. Li, B. Ma, K.A. Lee, Spoken language recognition: from fundamentals to practice, Proc. IEEE, 101 (2013) 1136-1159.
[95]
P. Li, Y. Fu, U. Mohammed, J.H. Elder, S.J. Prince, Probabilistic models for inference about identity, IEEE Trans. Pattern Anal. Machine Intell., 34 (2012) 144-157.
[96]
Lindberg, J., Blomberg, M., et al., 1999. Vulnerability in speaker verification - A study of technical impostor techniques. In: Proc. European Conference on Speech Communication and Technology (Eurospeech).
[97]
Z.H. Ling, L. Deng, D. Yu, Modeling spectral envelopes using restricted Boltzmann machines and deep belief networks for statistical parametric speech synthesis, IEEE Trans. Audio Speech Language Process., 21 (2013) 2129-2139.
[98]
Ling, Z.H., Wu, Y.J., Wang, Y.P., Qin, L., Wang, R.H., 2006. USTC system for Blizzard Challenge 2006 an improved HMM-based speech synthesis method. In: The Blizzard Challenge Workshop.
[99]
Ling, Z.H., Xia, X.J., Song, Y., Yang, C.Y., Chen, L.H., Dai, L.R., 2012. The USTC system for Blizzard Challenge 2012. In: Blizzard Challenge workshop.
[100]
Lolive, D., Barbot, N., Boeffard, O., 2008. Pitch and duration transformation with non-parallel data. In: Proc. Speech Prosody.
[101]
Lu, H., King, S., Watts, O., 2013. Combining a vector space representation of linguistic context with a deep neural network for text-to-speech synthesis. In: Proc. the 8th ISCA Speech Synthesis Workshop.
[102]
Marcel, S., 2013. Spoofing and anti-spoofing in biometrics: Lessons learned from the tabula rasa project. Tutorial. Retrieved 26 February 2014 from <http://www.idiap.ch/marcel/professional/BTAS_2013.html>.
[103]
Mariéthoz, J., Bengio, S., 2006. Can a professional imitator fool a GMM-based speaker verification system? IDIAP Research Report (No. Idiap-RR 05-61).
[104]
Martin, A., Doddington, G., Kamm, T., Ordowski, M., Przybocki, M., 1997. The DET curve in assessment of detection task performance. In: Proc. European Conference on Speech Communication and Technology (Eurospeech).
[105]
Masuko, T., Hitotsumatsu, T., Tokuda, K., Kobayashi, T., 1999. On the security of HMM-based speaker verification systems against imposture using synthetic speech. In: Proc. European Conference on Speech Communication and Technology (Eurospeech).
[106]
Masuko, T., Tokuda, K., Kobayashi, T., 2000. Imposture using synthetic speech against speaker verification based on spectrum and pitch. In: Proc. Interspeech.
[107]
Masuko, T., Tokuda, K., Kobayashi, T., Imai, S., 1996. Speech synthesis using HMMs with dynamic features. In: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP).
[108]
Masuko, T., Tokuda, K., Kobayashi, T., Imai, S., 1997. Voice characteristics conversion for HMM-based speech synthesis system. In: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP).
[109]
Matrouf, D., Bonastre, J.F., Fredouille, C., 2006. Effect of speech transformation on impostor acceptance. In: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP).
[110]
T. Matsui, S. Furui, Likelihood normalization for speaker verification using a phoneme- and speaker-independent model, Speech Commun., 17 (1995) 109-116.
[111]
E. Moulines, F. Charpentier, Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones, Speech Commun., 9 (1990) 453-467.
[112]
Nakashika, T., Takashima, R., Takiguchi, T., Ariki, Y., 2013. Voice conversion in high-order eigen space using deep belief nets. In: Proc. Interspeech.
[113]
Nuance, 2013. Nuance vocalpassword. In: <http://www.nuance.com/landing-pages/products/voicebiometrics/vocalpassword.asp>.
[114]
A. Ogihara, H. Unno, A. Shiozakai, Discrimination method of synthetic speech using pitch frequency against synthetic speech falsification, IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 88 (2005) 280-286.
[115]
Panjwani, S., Prakash, A., 2014. Finding impostors in the crowd: the use of crowdsourcing to attack biometric systems. Unpublished manuscript, Bell Labs India.
[116]
Pellom, B.L., Hansen, J.H., 1999. An experimental study of speaker verification sensitivity to computer voice-altered imposters. In: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP).
[117]
Perrot, P., Aversano, G., Blouet, R., Charbit, M., Chollet, G., 2005. Voice forgery using ALISP: indexation in a client memory. In: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP).
[118]
M. Pietikäinen, A. Hadid, G. Zhao, Computer Vision Using Local Binary Patterns, Springer, 2011.
[119]
Prince, S., Elder, J., 2007. Probabilistic linear discriminant analysis for inferences about identity. In: Proc. IEEE Int. Conf. on Computer Vision (ICCV).
[120]
Qian, Y., Fan, Y., Hu, W., Soong, F.K., 2014. On the training aspects of deep neural network (dnn) for parametric tts synthesis. In: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP).
[121]
Y. Qian, Z. Wu, B. Gao, F.K. Soong, Improved prosody generation by maximizing joint probability of state and longer units, IEEE Trans. Audio Speech Language Process., 19 (2011) 1702-1710.
[122]
T.F. Quatieri, Discrete-Time Speech Signal Processing Principles and Practice, Prentice-Hall, Inc., 2002.
[123]
N.K. Ratha, J.H. Connell, R.M. Bolle, Enhancing security and privacy in biometrics-based authentication systems, IBM Syst. J., 40 (2001) 614-634.
[124]
Reynolds, D., Andrews, W., Campbell, J., Navratil, J., Peskin, B., Adami, A., Jin, Q., Klusacek, D., Abramson, J., Mihaescu, R., Godfrey, J., Jones, D., Xiang, B., 2003. The SuperSID project: exploiting high-level information for high-accuracy speaker recognition. In: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP).
[125]
D. Reynolds, T. Quatieri, R. Dunn, Speaker verification using adapted Gaussian mixture models, Digital Signal Process., 10 (2000) 19-41.
[126]
D. Reynolds, R. Rose, Robust text-independent speaker identification using Gaussian mixture speaker models, IEEE Trans. Speech Audio Process., 3 (1995) 72-83.
[127]
Riera, A., Soria-Frisch, A., Acedo, J., Hadid, A., Alegre, F., Evans, N., Marcialis, G.L., 2012. Evaluation of initial non-ICAO countermeasures for spoofing attacks. Technical Report Deliverable D4.2, Trusted biometrics under spoofing attacks (TABULA RASA), 7th Framework Programme of the European, grant agreement number 257289.
[128]
R.N. Rodriques, L.L. Ling, V. Govindaraju, Robustness of multimodal biometric fusion methods against spoof attacks, J. Visual Languages Comput., 20 (2009) 169-179.
[129]
Saeidi, R., et al., 2013. I4U submission to NIST SRE 2012: a large-scale collaborative effort for noise-robust speaker verification. In: Proc. Interspeech.
[130]
Satoh, T., Masuko, T., Kobayashi, T., Tokuda, K., 2001. A robust speaker verification system against imposture using an HMM-based speech synthesis system. In: Proc. European Conference on Speech Communication and Technology (Eurospeech).
[131]
Shang, W., Stevenson, M., 2010. Score normalization in playback attack detection. In: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP).
[132]
E. Shriberg, L. Ferrer, S. Kajarekar, A. Venkataraman, A. Stolcke, Modeling prosodic feature sequences for speaker recognition, Speech Commun., 46 (2005) 455-472.
[133]
Siddiq, S., Kinnunen, T., Vainio, M., Werner, S., 2012. Intonational speaker verification: a study on parameters and performance under noisy conditions. In: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP).
[134]
Solomonoff, A., Campbell, W., Boardman, I., 2005. Advances in channel compensation for SVM speaker recognition. In: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP).
[135]
Stafylakis, T., Kenny, P., Ouellet, P., Perez, J., Kockmann, M., Dumouchel, P., 2013. Text-dependent speaker recognition using PLDA with uncertainty propagation. In: Proc. Interspeech.
[136]
Stoll, L., Doddington, G., 2010. Hunting for wolves in speaker recognition. In: Proc. Odyssey: the Speaker and Language Recognition Workshop.
[137]
Stylianou, Y., 2009. Voice transformation: a survey. In: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP).
[138]
Y. Stylianou, O. Cappé, E. Moulines, Continuous probabilistic transform for voice conversion, IEEE Trans. Speech Audio Process., 6 (1998) 131-142.
[139]
Sundermann, D., Hoge, H., Bonafonte, A., Ney, H., Black, A., Narayanan, S., 2006. Text-independent voice conversion based on unit selection. In: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP).
[140]
Sundermann, D., Ney, H., 2003. VTLN-based voice conversion. In: Proc. the 3rd IEEE Int. Symposium on Signal Processing and Information Technology.
[141]
T. Toda, A.W. Black, K. Tokuda, Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory, IEEE Trans. Audio Speech Language Process., 15 (2007) 2222-2235.
[142]
Toda, T., Saruwatari, H., Shikano, K., 2001. Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of STRAIGHT spectrum. In: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP).
[143]
R. Togneri, D. Pullella, An overview of speaker identification: accuracy and robustness issues, IEEE Circ. Syst. Mag., 11 (2011) 23-61.
[144]
T. Tomoki, K. Tokuda, A speech parameter generation algorithm considering global variance for HMM-based speech synthesis, IEICE Trans. Inform. Syst., 90 (2007) 816-824.
[145]
Villalba, J., Lleida, E., 2010. Speaker verification performance degradation against spoofing and tampering attacks. In: FALA 10 workshop, pp. 131-134.
[146]
J. Villalba, E. Lleida, Detecting replay attacks from far-field recordings on speaker verification systems, in: Lecture Notes in Computer Science, Springer, 2011, pp. 274-285.
[147]
Villalba, J., Lleida, E., 2011b. Preventing replay attacks on speaker verification systems. In: IEEE Int. Carnahan Conf. on Security Technology (ICCST).
[148]
Wang, Z.F., Wei, G., He, Q.H., 2011. Channel pattern noise based playback attack detection algorithm for speaker recognition. In: Proc. IEEE Int. Conf. Machine Learning and Cybernetics (ICMLC).
[149]
Woodland, P.C., 2001. Speaker adaptation for continuous density HMMs: a review. In: Proc. ISCA Workshop on Adaptation Methods for Speech Recognition.
[150]
C.H. Wu, C.C. Hsia, T.H. Liu, J.F. Wang, Voice conversion using duration-embedded bi-HMMs for expressive speech synthesis, IEEE Trans. Audio Speech Language Process., 14 (2006) 1109-1116.
[151]
Wu, Z., Chng, E.S., Li, H., 2012a. Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition. In: Proc. Interspeech 2012.
[152]
Wu, Z., Kinnunen, T., Chng, E.S., Li, H., Ambikairajah, E., 2012b. A study on spoofing attack in state-of-the-art speaker verification: the telephone speech case. In: Proc. Asia-Pacific Signal Information Processing Association Annual Summit and Conference (APSIPA ASC).
[153]
Wu, Z., Li, H., 2013. Voice conversion and spoofing attack on speaker verification systems. In: Proc. Asia-Pacific Signal Information Processing Association Annual Summit and Conference (APSIPA ASC).
[154]
Wu, Z., Virtanen, T., Kinnunen, T., Chng, E.S., Li, H., 2013a. Exemplar-based unit selection for voice conversion utilizing temporal information. In: Proc. Interspeech.
[155]
Wu, Z., Xiao, X., Chng, E.S., Li, H., 2013b. Synthetic speech detection using temporal modulation feature. In: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP).
[156]
Wu, Z.Z., Kinnunen, T., Chng, E.S., Li, H., 2010. Text-independent F0 transformation with non-parallel data for voice conversion. In: Proc. Interspeech.
[157]
N. Yager, T. Dunstone, The biometric menagerie, IEEE Trans. Pattern Anal. Machine Intell., 32 (2010) 220-230.
[158]
J. Yamagishi, T. Kobayashi, Y. Nakano, K. Ogata, J. Isogai, Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm, IEEE Trans. Audio Speech Language Process., 17 (2009) 66-83.
[159]
Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T., 1999. Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis. In: Proc. European Conference on Speech Communication and Technology (Eurospeech).
[160]
Zen, H., Senior, A., Schuster, M., 2013. Statistical parametric speech synthesis using deep neural networks. In: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP).
[161]
H. Zen, T. Toda, M. Nakamura, K. Tokuda, Details of the Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005, IEICE Trans. Inform. Syst. (2007) 325-333.
[162]
H. Zen, K. Tokuda, A.W. Black, Statistical parametric speech synthesis, Speech Commun., 51 (2009) 1039-1064.
[163]
Zetterholm, E., Blomberg, M., Elenius, D., 2004. A comparison between human perception and a speaker verification system score of a voice imitation. In: Proc. of Tenth Australian Int. Conf. on Speech Science & Technology.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Speech Communication
Speech Communication  Volume 66, Issue C
February 2015
243 pages

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 February 2015

Author Tags

  1. Anti-Spoofing
  2. Automatic speaker verification
  3. Countermeasure
  4. Security
  5. Spoofing attack

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media