skip to main content
10.1145/1670252.1670260acmconferencesArticle/Chapter ViewAbstractPublication PagessiggraphConference Proceedingsconference-collections
research-article

Real-time 3D talking head from a synthetic viseme dataset

Published: 14 December 2009 Publication History

Abstract

In this paper, we describe a simple and fast way to build a 3D talking head which can be used in many applications requiring audiovisual speech animation system. The talking head is constructed from a synthetic 3D viseme dataset, which is realistic enough and can be generated with 3D modeling software. To build the talking head, at first the viseme dataset is analyzed statistically to obtain the optimal linear parameters to control the movements of the lips and jaw of the 3D head model. These parameters correspond to some of the low-level MPEG-4 FAPs, hence our method can be used to extract the speech-relevant MPEG-4 FAPs from a dataset of phonemes/visemes. The parameterized head model is eventually combined with a Text-to-Speech (TTS) system to synthesize audiovisual speech from a given text. To make the talking head looks more realistic, eye-blink and movements are also animated during the speech. We implemented this work in an interactive text-to-audio-visual speech system.

References

[1]
Andreassi, J. L. 2006. Psychophysiology: Human Behavior and Physiological Response. Lawrence Erlbaum Associates.
[2]
Benoît, C., and LeGoff, B. 1998. Audio-Visual Speech Synthesis from French Text: Eight Years of Models, Designs and Evaluation at the ICP. Speech Communication 26, 1, 117--129.
[3]
Deng, Z., Neumann, U., Lewis, J. P., Kim, T.-Y., Bulut, M., and Narayanan, S. 2006. Expressive Facial Animation Synthesis by Learning Speech Coarticulation and Expression Spaces. IEEE Transactions on Visualization and Computer Graphics 12, 6, 1523--1534.
[4]
Elisei, F., Odisio, M., Bailly, G., and Badin, P. 2001. Creating and Controlling Video-Realistic Talking Heads. In Proceedings of the Auditory-Visual Speech Processing Workshop (AVSP) 2001, International Speech Communication Association, 90--97.
[5]
Jeffers, J., and Barley, M. 1978. Speechreading (lipreading). Charles C. Thomas Publishers.
[6]
Jones, C. M., and Dlay, S. S. 1999. The Face as an Interface: The New Paradigm for HCI. In Proceedings of IEEE International Conference on Systems, Man, and Cybernetics (SMC) 1999, IEEE, 774--779.
[7]
Kshirsagar, S., Molet, T., and Magnenat-Thalmann, N. 2001. Principal Components of Expressive Speech Animation. In Proceedings of Computer Graphics International (CGI) 2001, IEEE, 38--44.
[8]
Lee, Y., Terzopoulos, D., and Waters, K. 1995. Realistic Modeling for Facial Animation. In Proceedings of SIGGRAPH 1995, ACM, 55--62.
[9]
Parke, F. I., and Waters, K. 1996. Computer Facial Animation. A. K. Peters.
[10]
Press, W. H., Teukolsky, S. A., Vetterling, W. T., and Flannery, B. P. 1992. Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press.
[11]
Weissenfeld, A., Liu, K., Klomp, S., and Ostermann, J. 2005. Personalized Unit Selection for an Image-based Facial Animation System. In Proceedings of IEEE International Workshop on Multimedia Signal Processing (MMSP) 2005, IEEE, 1--4.
[12]
Zhang, Z., Liu, Z., Adler, D., Cohen, M., Hanson, E., and Shan, Y. 2004. Robust and Rapid Generation of Animated Faces from Video Images: A Model-Based Modeling Approach. International Journal of Computer Vision 58, 2, 93--120.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
VRCAI '09: Proceedings of the 8th International Conference on Virtual Reality Continuum and its Applications in Industry
December 2009
374 pages
ISBN:9781605589121
DOI:10.1145/1670252
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 December 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. MPEG-4 FAPs
  2. face animation
  3. principal component analysis
  4. visual speech synthesis

Qualifiers

  • Research-article

Conference

VRCAI '09
Sponsor:

Acceptance Rates

Overall Acceptance Rate 51 of 107 submissions, 48%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Feb 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media