skip to main content
article

Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model

Published: 01 March 2008 Publication History

Abstract

In this paper, we describe a statistical approach to both an articulatory-to-acoustic mapping and an acoustic-to-articulatory inversion mapping without using phonetic information. The joint probability density of an articulatory parameter and an acoustic parameter is modeled using a Gaussian mixture model (GMM) based on a parallel acoustic-articulatory speech database. We apply the GMM-based mapping using the minimum mean-square error (MMSE) criterion, which has been proposed for voice conversion, to the two mappings. Moreover, to improve the mapping performance, we apply maximum likelihood estimation (MLE) to the GMM-based mapping method. The determination of a target parameter trajectory having appropriate static and dynamic properties is obtained by imposing an explicit relationship between static and dynamic features in the MLE-based mapping. Experimental results demonstrate that the MLE-based mapping with dynamic features can significantly improve the mapping performance compared with the MMSE-based mapping in both the articulatory-to-acoustic mapping and the inversion mapping.

References

[1]
Inversion of articulatory-to-acoustic transformation in the vocal tract by a computer-sorting technique. J. Acoust. Soc. Amer. v63. 1535-1555.
[2]
Chu, M., Peng, H., Yang, H., Chang, E., 2001. Selecting non-uniform units from a very large corpus for concatenative speech synthesizer. In: Proc. ICASSP. Salt Lake City, USA, pp. 785-788.
[3]
Frankel, J., Richmond, K., King, S., Taylor, P., 2000. An automatic speech recognition system using neural networks and linear dynamic models to recover and model articulatory traces. In: Proc. ICSLP, Beijing, China, Vol. 4, pp. 254-257.
[4]
Estimation of articulatory movements from speech acoustics using an HMM-based speech production model. IEEE Trans. Speech Audio Process. v12 i2. 175-185.
[5]
Speaker adaptation method for acoustic-to-articulatory inversion using an HMM-based speech production model. IEICE Trans. Inf. Systems. vE87-D i5. 1071-1078.
[6]
Accurate recovery of articulator positions from acoustics: new conclusions based on human data. J. Acoust. Soc. Amer. v100. 1819-1834.
[7]
Hunt, A.J., Black, A.W., 1996. Unit selection in a concatenative speech synthesis system using a large speech database. In: Proc. ICASSP, Atlanta, USA, pp. 373-376.
[8]
Kaburagi, T., Honda, M., 1998. Determination of the vocal tract spectrum from the articulatory movements based on the search of an articulatory-acoustic database. In: Proc. ICSLP, Sydney, Australia, pp. 433-436.
[9]
Kain, A., Macon, M.W., 1998. Spectral voice conversion for text-to-speech synthesis. In: Proc. ICASSP, Seattle, USA, pp. 285-288.
[10]
Kain, A., Niu, X., Hosom, J.-P., Miao, Q., van Santen, J., 2004. Formant re-synthesis of dysarthric speech. In: Proc. 5th ISCA Speech Synthesis Workshop, Pittsburgh, USA, pp. 25-30.
[11]
Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds. Speech Comm. v27 i3-4. 187-207.
[12]
Kawahara, H., Katayose, H., de Cheveigné, A., Patterson, R.D., 1999. Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F0 and periodicity. In: Proc. EUROSPEECH, Budapest, Hungary, pp. 2781-2784.
[13]
Kawai, H., Toda, T., Ni, J., Tsuzaki, M., Tokuda, K. 2004. XIMERA: a new TTS from ATR based on corpus-based technologies. In: Proc. 5th ISCA Speech Synthesis Workshop (SSW5). Pittsburgh, USA, pp. 179-184.
[14]
A neural network model of the articulatory-acoustic forward mapping trained on recordings of articulatory parameters. J. Acoust. Soc. Amer. v116 i4. 2354-2364.
[15]
Minami, Y., McDermott, E., Nakamura, A., Katagiri, S., 2004. A theoretical analysis of speech recognition based on feature trajectory models. In: Proc. INTERSPEECH, Jeju, Korea, pp. 549-552.
[16]
Nakamura, K., Toda, T., Nankaku, Y., Tokuda, K., 2006. On the use of phonetic information for mapping from articulatory movements to vocal tract spectrum. In: Proc. ICASSP. Toulouse, France, pp. 93-96.
[17]
Park, K.Y., Kim, H.S., 2000. Narrowband to wideband conversion of speech using GMM based transformation. In: Proc. ICASSP, Istanbul, pp. 1847-1850.
[18]
Richmond, K., 2001. Estimating articulatory parameters from the acoustic speech signal. Ph.D. Thesis, The Centre for Speech Technology Research, University of Edinburgh.
[19]
Richmond, K., 2006. A trajectory mixture density network for the acoustic-articulatory inversion mapping. In: Proc. INTERSPEECH, Pittsburgh, USA, pp. 577-580.
[20]
Modelling the uncertainty in recovering articulation from acoustics. Computer Speech Language. v17 i2. 153-172.
[21]
Sagisaka, Y., 1988. Speech synthesis by rule using an optimal selection of non-uniform synthesis units. In: Proc. ICASSP, New York, USA, pp. 679-682.
[22]
Determination of the geometry of the human vocal tract by acoustic measurements. J. Acoust. Soc. Amer. v41. 1002-1010.
[23]
Speech coding based on physiological models of speech production. In: Furui, S., Sondhi, M.M. (Eds.), Advances in Speech Signal Processing, Marcel Dekker, New York. pp. 231-267.
[24]
Techniques for estimating vocal-tract shapes from the speech signal. IEEE Trans. Speech Audio Process. v2. 133-150.
[25]
Shiga, Y., King, S. 2004. Accurate spectral envelope estimation for articulation-to-speech synthesis. In: Proc. 5th ISCA Speech Synthesis Workshop. Pittsburgh, USA, pp. 19-24.
[26]
Sondhi, M.M. 2002. Articulatory modeling: a possible role in concatenative text-to-speech synthesis. IEEE 2002 Workshop on Speech Synthesis, Santa Monica, USA.
[27]
Continuous probabilistic transform for voice conversion. IEEE Trans. Speech Audio Process. v6 i2. 131-142.
[28]
Suzuki, S., Okadome, T., Honda, M., 1998. Determination of articulatory positions from speech acoustics by applying dynamic articulatory constraints. In: Proc. ICSLP. Sydney, Australia, pp. 2251-2254.
[29]
Syrdal, A.K., Wightman, C.W., Conkie, A., Stylianou, Y., Beutnagel, M., Schroeter, J., Strom, V., Lee, K.-S., Makashay, M.J., 2000. Corpus-based techniques in the AT& T NextGen synthesis system. In: Proc. ICSLP, Beijing, China, Vol. 3, pp. 410-415.
[30]
Toda, T., Black, A.W., Tokuda, K., 2004. Mapping from articulatory movements to vocal tract spectrum with Gaussian mixture model for articulatory speech synthesis. In: Proc. 5th ISCA Speech Synthesis Workshop. Pittsburgh, USA, pp. 31-36.
[31]
Toda, T., Black, A.W., Tokuda, K., 2004. Acoustic-to-articulatory inversion mapping with Gaussian mixture model. In: Proc. INTERSPEECH. Jeju, Korea, pp. 1129-1132.
[32]
Tokuda, K., Yoshimura, T., Masuko, T., Kobayashi, T., Kitamura, T., 2000. Speech parameter generation algorithms for HMM-based speech synthesis. In: Proc. ICASSP, Istanbul, Turkey, pp. 1315-1318.
[33]
Wrench, A. 1999. The MOCHA-TIMIT articulatory database. http://www.cstr.ed.ac.uk/research/projects/artic/mocha.html, Queen Margaret University College.
[34]
Wrench, A.A., Richmond, K., 2000. Continuous speech recognition using articulatory data. In: Proc. ICSLP. Beijing, China, pp. 145-148.
[35]
Reformulating the HMM as a trajetory model by imposing explicit relationships between static and dynamic feature vector sequences. Computer Speech Language. v21. 153-173.
[36]
Zheng, Y., Liu, Z., Zhang, Z., Sinclair, M., Droppo, J., Deng, L., Acero, A., Huang, X., 2003. Air- and bone-conductive integrated microphones for robust speech detection and enhancement. In: Proc. ASRU, St. Thomas, USA, pp. 249-254.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Speech Communication
Speech Communication  Volume 50, Issue 3
March, 2008
101 pages

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 March 2008

Author Tags

  1. 43.70.+i
  2. Acoustic-to-articulatory inversion mapping
  3. Articulatory-to-acoustic mapping
  4. Dynamic features
  5. GMM
  6. MMSE

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Feb 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media