research-article

Real-time 3D talking head from a synthetic viseme dataset

Authors:

Arthur Niswar,

Ee Ping Ong,

Hong Thai Nguyen,

Zhiyong HuangAuthors Info & Claims

VRCAI '09: Proceedings of the 8th International Conference on Virtual Reality Continuum and its Applications in Industry

Pages 29 - 33

https://doi.org/10.1145/1670252.1670260

Published: 14 December 2009 Publication History

Get Access

Abstract

In this paper, we describe a simple and fast way to build a 3D talking head which can be used in many applications requiring audiovisual speech animation system. The talking head is constructed from a synthetic 3D viseme dataset, which is realistic enough and can be generated with 3D modeling software. To build the talking head, at first the viseme dataset is analyzed statistically to obtain the optimal linear parameters to control the movements of the lips and jaw of the 3D head model. These parameters correspond to some of the low-level MPEG-4 FAPs, hence our method can be used to extract the speech-relevant MPEG-4 FAPs from a dataset of phonemes/visemes. The parameterized head model is eventually combined with a Text-to-Speech (TTS) system to synthesize audiovisual speech from a given text. To make the talking head looks more realistic, eye-blink and movements are also animated during the speech. We implemented this work in an interactive text-to-audio-visual speech system.

References

[1]

Andreassi, J. L. 2006. Psychophysiology: Human Behavior and Physiological Response. Lawrence Erlbaum Associates.

Google Scholar

[2]

Benoît, C., and LeGoff, B. 1998. Audio-Visual Speech Synthesis from French Text: Eight Years of Models, Designs and Evaluation at the ICP. Speech Communication 26, 1, 117--129.

Digital Library

Google Scholar

[3]

Deng, Z., Neumann, U., Lewis, J. P., Kim, T.-Y., Bulut, M., and Narayanan, S. 2006. Expressive Facial Animation Synthesis by Learning Speech Coarticulation and Expression Spaces. IEEE Transactions on Visualization and Computer Graphics 12, 6, 1523--1534.

Digital Library

Google Scholar

[4]

Elisei, F., Odisio, M., Bailly, G., and Badin, P. 2001. Creating and Controlling Video-Realistic Talking Heads. In Proceedings of the Auditory-Visual Speech Processing Workshop (AVSP) 2001, International Speech Communication Association, 90--97.

Google Scholar

[5]

Jeffers, J., and Barley, M. 1978. Speechreading (lipreading). Charles C. Thomas Publishers.

Google Scholar

[6]

Jones, C. M., and Dlay, S. S. 1999. The Face as an Interface: The New Paradigm for HCI. In Proceedings of IEEE International Conference on Systems, Man, and Cybernetics (SMC) 1999, IEEE, 774--779.

Google Scholar

[7]

Kshirsagar, S., Molet, T., and Magnenat-Thalmann, N. 2001. Principal Components of Expressive Speech Animation. In Proceedings of Computer Graphics International (CGI) 2001, IEEE, 38--44.

Digital Library

Google Scholar

[8]

Lee, Y., Terzopoulos, D., and Waters, K. 1995. Realistic Modeling for Facial Animation. In Proceedings of SIGGRAPH 1995, ACM, 55--62.

Digital Library

Google Scholar

[9]

Parke, F. I., and Waters, K. 1996. Computer Facial Animation. A. K. Peters.

Digital Library

Google Scholar

[10]

Press, W. H., Teukolsky, S. A., Vetterling, W. T., and Flannery, B. P. 1992. Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press.

Digital Library

Google Scholar

[11]

Weissenfeld, A., Liu, K., Klomp, S., and Ostermann, J. 2005. Personalized Unit Selection for an Image-based Facial Animation System. In Proceedings of IEEE International Workshop on Multimedia Signal Processing (MMSP) 2005, IEEE, 1--4.

Google Scholar

[12]

Zhang, Z., Liu, Z., Adler, D., Cohen, M., Hanson, E., and Shan, Y. 2004. Robust and Rapid Generation of Animated Faces from Video Images: A Model-Based Modeling Approach. International Journal of Computer Vision 58, 2, 93--120.

Digital Library

Google Scholar

Cited By

View all

Zhao YQiang CLi HHu YZhou WLi S(2024)Enhancing Realism in 3D Facial Animation Using Conformer-Based Generation and Automated Post-ProcessingICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10447526(8341-8345)Online publication date: 14-Apr-2024
https://doi.org/10.1109/ICASSP48485.2024.10447526
Seib VMemmesheimer RPaulus D(2016)A ROS-Based System for an Autonomous Service RobotRobot Operating System (ROS)10.1007/978-3-319-26054-9_9(215-252)Online publication date: 10-Feb-2016
https://doi.org/10.1007/978-3-319-26054-9_9
Liu SHuang DLin WDong MLi HOng E(2014)Emotional facial expression transfer based on temporal restricted Boltzmann machinesSignal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific10.1109/APSIPA.2014.7041738(1-7)Online publication date: Dec-2014
https://doi.org/10.1109/APSIPA.2014.7041738
Show More Cited By

Index Terms

Real-time 3D talking head from a synthetic viseme dataset

Recommendations

Accurate automatic visible speech synthesis of arbitrary 3D models based on concatenation of diviseme motion capture data: Research Articles

We present a technique for accurate automatic visible speech synthesis from textual input. When provided with a speech waveform and the text of a spoken sentence, the system produces accurate visible speech synchronized with the audio signal. To develop ...
HMM trajectory-guided sample selection for photo-realistic talking head

In this paper, we propose an HMM trajectory-guided, real image sample concatenation approach to photo-realistic talking head synthesis. An audio-visual database of a person is recorded first for training a statistical Hidden Markov Model (HMM) of Lips ...
Comprehensive many-to-many phoneme-to-viseme mapping and its application for concatenative visual speech synthesis

The use of visemes as atomic speech units in visual speech analysis and synthesis systems is well-established. Viseme labels are determined using a many-to-one phoneme-to-viseme mapping. However, due to visual coarticulation effects, an accurate mapping ...

Comments

Information & Contributors

Information

Published In

VRCAI '09: Proceedings of the 8th International Conference on Virtual Reality Continuum and its Applications in Industry

December 2009

374 pages

ISBN:9781605589121

DOI:10.1145/1670252

Editor:
Stephen N. Spencer
University of Washington
,
General Chairs:
Masayuki Nakajima
Tokyo Institute of Technology, Japan
,
Enhua Wu
University of Macau & Academia Sinica, China
,
Program Chairs:
Kazunori Miyata
JAIST, Japan
,
Daniel Thalmann
Ecole Polytechnique Federale de Lausanne, Switzerland
,
Zhiyong Huang
Institute for Infocomm Research, Singapore

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 December 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

VRCAI '09

Sponsor:

SIGGRAPH

VRCAI '09: Virtual Reality Continuum and its Applications in Industry

December 14 - 15, 2009

Yokohama, Japan

Acceptance Rates

Overall Acceptance Rate 51 of 107 submissions, 48%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
308
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Zhao YQiang CLi HHu YZhou WLi S(2024)Enhancing Realism in 3D Facial Animation Using Conformer-Based Generation and Automated Post-ProcessingICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10447526(8341-8345)Online publication date: 14-Apr-2024
https://doi.org/10.1109/ICASSP48485.2024.10447526
Seib VMemmesheimer RPaulus D(2016)A ROS-Based System for an Autonomous Service RobotRobot Operating System (ROS)10.1007/978-3-319-26054-9_9(215-252)Online publication date: 10-Feb-2016
https://doi.org/10.1007/978-3-319-26054-9_9
Liu SHuang DLin WDong MLi HOng E(2014)Emotional facial expression transfer based on temporal restricted Boltzmann machinesSignal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific10.1109/APSIPA.2014.7041738(1-7)Online publication date: Dec-2014
https://doi.org/10.1109/APSIPA.2014.7041738
YUE SKONDO SKITAJIMA K(2012)A Study of Simulating the Fusion of Speech and Facial ExpressionJournal of the Japan Society for Precision Engineering10.2493/jjspe.78.71678:8(716-721)Online publication date: 2012
https://doi.org/10.2493/jjspe.78.716
Yue SKitajima K(2011)A method of simulating the fusion of speech and facial expression for individuals based on the GFFD method2011 IEEE International Conference on Mechatronics and Automation10.1109/ICMA.2011.5985709(898-903)Online publication date: Aug-2011
https://doi.org/10.1109/ICMA.2011.5985709

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Accurate automatic visible speech synthesis of arbitrary 3D models based on concatenation of diviseme motion capture data: Research Articles

HMM trajectory-guided sample selection for photo-realistic talking head

Comprehensive many-to-many phoneme-to-viseme mapping and its application for concatenative visual speech synthesis

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations