skip to main content
research-article

An adaptive transmission line cochlear model based front-end for replay attack detection

Published: 01 September 2021 Publication History

Highlights

We propose an adaptive transmission line cochlear model for use in speech front-ends.
The proposed adaptive elements of the cochlear model lead to improved frequency selectivity and dynamic range compression.
The model helps capture low amplitude channel characteristics which aid in replay detection.

Abstract

The cochlea is a remarkable spectrum analyser with desirable properties including sharp frequency tuning and level-dependent compression and the potential advantages of incorporating these characteristics in a speech processing front-end are investigated. This paper develops a framework for an active transmission line cochlear model employing adaptive notch and resonant filters. The proposed model reproduces the observed asymmetric auditory filter shape with a sharp high-frequency roll-off and level-dependent nonlinear dynamic range compression characteristics. Experimental analysis demonstrates that sharp frequency tuning and dynamic range compression of the proposed model lead to an enhanced spectral representation compared with other spectral analysis methods. The proposed model was employed in the front-end of replay spoofing attack detection systems, and experiments on the ASVspoof 2017 version 2.0 and ASVspoof 2019 databases demonstrate that the proposed model outperforms linear and nonlinear level-dependent parallel filter bank auditory models and classical spectro-temporal front-ends. The use of the proposed model leads to relative improvements of 45.6%, 51.9% and 60.8% over the baseline feature CQCCs of ASVspoof version 2.0 and CQCCs and LFCCs of ASVspoof2019 on evaluation datasets, respectively.

References

[1]
J. Allen, Nonlinear cochlear signal processing, Physiol. Ear (2001) 393–442.
[2]
E. Ambikairajah, N.D. Black, R. Linggard, Digital filter simulation of the basilar membrane, Comput. Speech Lang. 3 (1989) 105–118.
[3]
D. Baby, S. Verhulst, Biophysically-inspired features improve the generalizability of neural network-based speech enhancement systems, in: 19th Annual Conference of the International-Speech-Communication-Association (INTERSPEECH). ISCA, 2018, pp. 3264–3268.
[4]
H. Delgado, M. Todisco, M. Sahidullah, N. Evans, T. Kinnunen, K.A. Lee, J. Yamagishi, ASVspoof 2017 Version 2.0: meta-data analysis and baseline enhancements, in: Proc. Odyssey The Speaker and Language Recognition Workshop, 2018, pp. 296–303.
[5]
R. Font, J.M. Espın, M.J. Cano, Experimental analysis of features for replay attack detection–results on the ASVspoof 2017 Challenge, Proc. Interspeech (2017) 7–11.
[6]
N.R. French, J.C. Steinberg, Factors governing the intelligibility of speech sounds, J. Acoust. Soc. Am. 19 (1947) 90–119.
[7]
J. Gałka, M. Grzywacz, R. Samborski, Playback attack detection for text-dependent speaker verification over telephone channels, Speech Commun. 67 (2015) 143–153.
[8]
C. Giguere, P.C. Woodland, A computational model of the auditory periphery for speech and hearing research. I. Ascending path, J. Acoust. Soc. Am. 95 (1994) 331–342.
[9]
J.L. Goldstein, Modeling rapid waveform compression on the basilar membrane as multiple-bandpass-nonlinearity filtering, Hear. Res. 49 (1990) 39–60.
[10]
T. Gunendradasan, E. Ambikairajah, J. Epps, H. Li, An adaptive-Q cochlear model for replay spoofing detection, in: Proc. Interspeech, 2019, pp. 2918–2922.
[11]
T. Gunendradasan, S. Irtza, E. Ambikairajah, J. Epps, Transmission line cochlear model based AM-FM features for replay attack detection, in: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2019, pp. 6136–6140.
[12]
T. Gunendradasan, B. Wickramasinghe, N.P. Le, E. Ambikairajah, J. Epps, Detection of replay-spoofing attacks using frequency modulation features, Proc. Interspeech (2018) 636–640.
[13]
J. Hall, Spatial differentiation as an auditory “second filter’’: assessment on a nonlinear model of the basilar membrane, J. Acoust. Soc. Am. 61 (1977) 520–524.
[14]
C. Hanilci, T. Kinnunen, M. Sahidullah, A. Sizov, Spoofing detection goes noisy: an analysis of synthetic speech detection in the presence of additive noise, Speech Commun. 85 (2016) 83–97.
[15]
W. Hemmert, M. Holmberg, D. Gelbart, Auditory-based automatic speech recognition, ISCA Tutorial and Research Workshop (ITRW) On Statistical and Perceptual Audio Processing, 2004.
[16]
T. Hirahara, T. Komakine, A computational cochlear nonlinear preprocessing model with adaptive Q circuits, in: International Conference on Acoustics, Speech, and Signal Processing, IEEE, 1989, pp. 496–499.
[17]
V. Hohmann, Frequency analysis and synthesis using a Gammatone filterbank, Acta Acustica united with Acustica 88 (2002) 433–442.
[18]
T. Irino, R.D. Patterson, A compressive gammachirp auditory filter for both physiological and psychophysical data, J. Acoust. Soc. Am. 109 (2001) 2008–2022.
[19]
T. Irino, R.D. Patterson, A dynamic compressive gammachirp auditory filterbank, IEEE Trans. Audio Speech Lang. Process. 14 (2006) 2222–2232.
[20]
P. Johannesma, The pre-response stimulus ensemble of neurons in the cochlear nucleus, in: Symposium on Hearing Theory. IPO, 1972.
[21]
B. Johnstone, R. Patuzzi, G. Yates, Basilar membrane measurements and the travelling wave, Hear. Res. 22 (1986) 147–153.
[22]
M. Kamble, H. Tak, H. Patil, Effectiveness of Speech Demodulation-Based Features for Replay Detection, in: Proc. Interspeech, 2018, pp. 641–645.
[23]
M.R. Kamble, H.A. Patil, Combination of amplitude and frequency modulation features for presentation attack detection, J. Signal Process Syst. (2020) 1–15.
[24]
M.R. Kamble, H. Tak, H.A. Patil, Amplitude and frequency modulation-based features for detection of replay spoof speech, Speech Commun. 125 (2020) 114–127.
[25]
J.M. Kates, A time-domain digital cochlear model, IEEE Trans. Signal Process. 39 (1991) 2573–2592.
[26]
J.M. Kates, Accurate tuning curves in a cochlear model, IEEE Trans. Speech Audio Process. 1 (1993) 453–462.
[27]
D.-.S. Kim, S.-.Y. Lee, R.M. Kil, Auditory processing of speech signals for robust speech recognition in real-world noisy environments, IEEE Trans. Speech Audio Process. 7 (1999) 55–69.
[28]
T. Kinnunen, Z.-.Z. Wu, K.A. Lee, F. Sedlak, E.S. Chng, H. Li, Vulnerability of speaker verification systems against voice conversion spoofing attacks: the case of telephone speech, in: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2012, pp. 4401–4404.
[29]
Y.W. Lau, M. Wagner, D. Tran, Vulnerability of speaker verification to voice mimicking, in: Proc. International Symposium on Intelligent Multimedia, Video and Speech Processing, IEEE, 2004, pp. 145–148.
[30]
Q. Li, An auditory-based transfrom for audio signal processing, in: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, IEEE, 2009, pp. 181–184.
[31]
G.K. Liu, Evaluating Gammatone Frequency Cepstral Coefficients With Neural Networks for Emotion Recognition from Speech, 2018, arXiv preprint arXiv:1806.09010.
[32]
R.F. Lyon, Automatic Gain Control in Cochlear mechanics, The mechanics and Biophysics of Hearing, Springer, 1990, pp. 395–402.
[33]
Lyon, R.F., 1996. The all-pole gammatone filter and auditory models, Acustica. Citeseer.
[34]
R.F. Lyon, All-pole models of auditory filtering, Diversity in Auditory Mechanics, 1997, pp. 205–211.
[35]
R.F. Lyon, Cascades of two-pole–two-zero asymmetric resonators are good models of peripheral auditory function, J. Acoust. Soc. Am. 130 (2011) 3893–3904.
[36]
R.F. Lyon, Using a Cascade of Asymmetric Resonators with Fast-Acting Compression as a Cochlear Model for Machine-Hearing Applications, 2011.
[37]
R.F. Lyon, C.A. Mead, Cochlear Hydrodynamics Demystified, 1988.
[38]
R. Meddis, L.P. O'Mard, E.A. Lopez-Poveda, A computational algorithm for computing nonlinear auditory frequency selectivity, J. Acoust. Soc. Am. 109 (2001) 2852–2861.
[39]
V. Mitra, H. Franco, M. Graciarena, A. Mandal, Normalized amplitude modulation features for large vocabulary noise-robust speech recognition, in: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2012, pp. 4117–4120.
[40]
B.C. Moore, Psychophysical tuning curves measured in simultaneous and forward masking, J. Acoust. Soc. Am. 63 (1978) 524–532.
[41]
B.C. Moore, Frequency selectivity and temporal resolution in normal and hearing-impaired listeners, Br. J. Audiol. 19 (1985) 189–201.
[42]
D.A. Nelson, High-level psychophysical tuning curves: forward masking in normal-hearing and hearing-impaired listeners, J. Speech Lang. Hear. Res. 34 (1991) 1233–1249.
[43]
E.S. Olson, H. Duifhuis, C.R. Steele, Von Békésy and cochlear mechanics, Hear. Res. 293 (2012) 31–43.
[44]
R.D. Patterson, Auditory filter shapes derived with noise stimuli, J. Acoust. Soc. Am. 59 (1976) 640–654.
[45]
R.D. Patterson, I. Nimmo-Smith, J. Holdsworth, P. Rice, An efficient auditory filterbank based on the gammatone function, a meeting of the IOC Speech Group on Auditory Modelling at RSRE, 1987.
[46]
J. Qi, D. Wang, Y. Jiang, R. Liu, Auditory features based on gammatone filters for robust speech recognition, in: 2013 IEEE International Symposium on Circuits and Systems (ISCAS), IEEE, 2013, pp. 305–308.
[47]
L. Robles, M.A. Ruggero, Mechanics of the mammalian cochlea, Physiol. Rev. 81 (2001) 1305–1352.
[48]
M. Singh, D. Pati, Countermeasures to Replay Attacks: a Review, IETE Tech. Rev. (2019) 1–16.
[49]
J. Tchorz, B. Kollmeier, A model of auditory perception as front end for automatic speech recognition, J. Acoust. Soc. Am. 106 (1999) 2040–2050.
[50]
M. Todisco, X. Wang, V. Vestman, M. Sahidullah, H. Delgado, A. Nautsch, J. Yamagishi, N. Evans, T. Kinnunen, K.A. Lee, Asvspoof 2019: Future horizons in Spoofed and Fake Audio Detection, 2019, arXiv preprint arXiv:1904.05441.
[51]
E. Villchur, Comments on "the negative effect of amplitude compression in multichannel hearing aids in the light of the modulation-transfer function", J. Acoust. Soc. Am. 86 (1989) 425–427.
[52]
T.C. Walters, Auditory-based Processing of Communication Sounds, University of Cambridge, 2011.
[53]
B. Wickramasinghe, E. Ambikairajah, J. Epps, Biologically Inspired Adaptive-Q Filterbanks for Replay Spoofing Attack Detection, INTERSPEECH, 2019, pp. 2953–2957.
[54]
B. Wickramasinghe, E. Ambikairajah, J. Epps, V. Sethu, H. Li, Auditory inspired spatial differentiation for replay spoofing attack detection, in: ICASSP IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2019, pp. 6011–6015.
[55]
M. Witkowski, S. Kacprzak, P. Zelasko, K. Kowalczyk, J. Gałka, Audio replay attack detection using high-frequency features, Proc. Interspeech (2017) 27–31.
[56]
Z. Wu, N. Evans, T. Kinnunen, J. Yamagishi, F. Alegre, H. Li, Spoofing and countermeasures for speaker verification: a survey, Speech Commun. 66 (2015) 130–153.
[57]
H. Yin, V. Hohmann, C. Nadeu, Acoustic features for speech recognition based on Gammatone filterbank and instantaneous frequency, Speech Commun. 53 (2011) 707–715.

Cited By

View all

Index Terms

  1. An adaptive transmission line cochlear model based front-end for replay attack detection
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image Speech Communication
          Speech Communication  Volume 132, Issue C
          Sep 2021
          146 pages

          Publisher

          Elsevier Science Publishers B. V.

          Netherlands

          Publication History

          Published: 01 September 2021

          Author Tags

          1. Active cochlea model
          2. Sharp frequency tuning
          3. Dynamic range compression
          4. Level dependent nonlinearity
          5. Spoofing attack
          6. Speaker verification

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 23 Jan 2025

          Other Metrics

          Citations

          Cited By

          View all

          View Options

          View options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media