skip to main content
10.1145/2683483.2683530acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicvgipConference Proceedingsconference-collections
research-article

3D Visual Speech Animation from Image Sequences

Published: 14 December 2014 Publication History

Abstract

In this paper we describe an early version of our system which synthesizes 3D visual speech including tongue and teeth from frontal facial image sequences. This system is developed for 3D Visual Speech Animation (VSA) using images generated by an existing state-of-the-art image-based VSA system. In fact, the prime motivation for this system is to have a 3D VSA system from limited amount of training data when compared to that required for developing a conventional corpus based 3D VSA system. It consists of two modules. The first module iteratively estimates the 3D shape of the external facial surface for each image in the input sequence. The second module complements the external face with 3D tongue and teeth to complete the perceptually crucial visual speech information. This has the added advantages of 3D visual speech, which are renderability of the face in different poses and illumination conditions and, enhanced visual information of tongue and teeth. The first module for 3D shape estimation is based on the detection of facial landmarks in images. It uses a prior 3D Morphable Model (3D-MM) trained using 3D facial data. For the time being it is developed for a person-specific domain, i.e., the 3D-MM and the 2D facial landmark detector are trained using the data of a single person and tested with the same person-specific data. The estimated 3D shape sequences are provided as input to the second module along with the phonetic segmentation. For any particular 3D shape, tongue and teeth information is generated by rotating the lower jaw based on few skin points on the jaw and animating a rigid 3D tongue through keyframe interpolation.

References

[1]
R. Anderson, B. Stenger, V. Wan, and R. Cipolla. An expressive text-driven 3D talking head. In SIGGRAPH 2013 Posters.
[2]
A. Asthana, S. Zafeiriou, S. Cheng, and M. Pantic. Robust discriminative response map fitting with constrained local models. In CVPR 2013.
[3]
P. Badin, G. Bailly, L. Revéret, M. Baciu, C. Segebarth, and C. Savariaux. Three-dimensional linear articulatory modeling of tongue, lips and face, based on MRI and video images. J. Phon. 2002.
[4]
V. Blanz, C. Basso, T. Poggio, and T. Vetter. Reanimating faces in images and video. Comput. Graph. Forum 2003.
[5]
V. Blanz and T. Vetter. A morphable model for the synthesis of 3D faces. In SIGGRAPH 1999.
[6]
M. M. Cohen and D. W. Massaro. Modeling coarticulation in synthetic visual speech. In Models and Techniques in Computer Animation. Springer-Verlag, 1993.
[7]
T. Cootes, G. Edwards, and C. Taylor. Active appearance models. TPAMI 2001.
[8]
D. Cristinacce and T. Cootes. Facial feature detection and tracking with automatic template selection. In AFGR 2006.
[9]
dmitrij leppée. Teeth model set. http://www.badking.com.au/site/shop/medical-models/human-teethby-dmitrij-leppee/.
[10]
O. Engwall. Combining MRI, EMA and EPG measurements in a three-dimensional tongue model. Speech Communication 2003.
[11]
G. Fanelli, J. Gall, H. Romsdorfer, T. Weise, and L. V. Gool. A 3-D audio-visual corpus of affective communication. IEEE Trans. Multimedia 2010.
[12]
G. Gibert, V. Attina, M. Tiede, R. Bundgaard-Nielsen, C. Kroos, B. Kasisopa, E. Vatikiotis-Bateson, and C. Best. Multimodal speech animation from electromagnetic articulography data. In EUSIPCO 2012.
[13]
J. Hesch and S. Roumeliotis. A direct least-squares (DLS) method for PnP. In ICCV 2011.
[14]
M. D. Ilie, C. Negrescu, and D. Stanomir. An efficient parametric model for real-time 3D tongue skeletal animation. In ICCIT 2012.
[15]
N. H. Kassab. The selection of maxillary anterior teeth width in relation to facial measurements at different types of face form. Al-Rafidain Dental Journal, 2005.
[16]
I. Kemelmacher-Shlizerman and R. Basri. 3D face reconstruction from a single image using a single reference face shape. TPAMI 2011.
[17]
S. A. King and R. E. Parent. A 3D parametric tongue model for animated speech. J. Visual. Comput. Animat. 2001.
[18]
M. D. Levine and Y. C. Yu. State-of-the-art of 3D facial reconstruction methods for face recognition based on a single 2D training image per person. Pattern Recogn. Lett. 2009.
[19]
I. Matthews and S. Baker. Active appearance models revisited. IJCV 2004.
[20]
E. Murphy-Chutorian and M. Trivedi. Head pose estimation in computer vision: A survey. TPAMI 2009.
[21]
S. Ouni, L. Mangeonjean, and I. Steiner. VisArtico: a visualization tool for articulatory data. In INTERSPEECH 2012.
[22]
C. Pelachaud, C. van Overveld, and C. Seah. Modeling and animating the human tongue during speech production. In Proc. Computer Animation, 1994.
[23]
C. Qin and M. Carreira-Perpinan. Reconstructing the full tongue contour from EMA/X-ray microbeam. In ICASSP 2010.
[24]
J. Saragih, S. Lucey, and J. Cohn. Deformable model fitting by regularized landmark mean-shift. IJCV 2011.
[25]
I. Steiner, K. Richmond, and S. Ouni. Speech animation using electromagnetic articulography as motion capture data. In AVSP 2013.
[26]
L. Vezzaro. ICAAM - inverse compositional active appearance models. http://sourceforge.net/projects/icaam/.
[27]
F. Vogt, J. E. Lloyd, S. Buchaillard, P. Perrier, M. Chabanas, Y. Payan, and S. S. Fels. An efficient biomechanical tongue model for speech research. In ISSP 2006.
[28]
A. Wrench. The MOCHA-TIMIT articulatory database. http://www.cstr.ed.ac.uk/research/projects/artic/mocha.html, 1999.
[29]
M.-H. Yang, D. Kriegman, and N. Ahuja. Detecting faces in images: a survey. TPAMI 2002.
[30]
Z. Zhang, Z. Liu, D. Adler, M. F. Cohen, E. Hanson, and Y. Shan. Robust and rapid generation of animated faces from video images: A model-based modeling approach. IJCV 2004.
[31]
Z. Zhou, G. Zhao, Y. Guo, and M. Pietikäinen. An image-based visual speech animation system. IEEE Trans. Circuits Syst. Video Technol. 2012.
[32]
Z. Zhou, G. Zhao, X. Hong, and M. Pietikäinen. A review of recent advances in visual speech decoding. Image and Vision Computing 2014.
[33]
X. Zhu and D. Ramanan. Face detection, pose estimation, and landmark localization in the wild. In CVPR 2012.

Index Terms

  1. 3D Visual Speech Animation from Image Sequences

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICVGIP '14: Proceedings of the 2014 Indian Conference on Computer Vision Graphics and Image Processing
    December 2014
    692 pages
    ISBN:9781450330619
    DOI:10.1145/2683483
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 14 December 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. 3D facial shape estimation from images
    2. 3D morphable models
    3. 3D visual speech
    4. active appearance models
    5. facial landmark detection
    6. tongue animation
    7. visual speech animation

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICVGIP '14

    Acceptance Rates

    Overall Acceptance Rate 95 of 286 submissions, 33%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 138
      Total Downloads
    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 15 Sep 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media