research-article

Reconstructing detailed dynamic face geometry from monocular video

Authors:

Christian TheobaltAuthors Info & Claims

ACM Transactions on Graphics (TOG), Volume 32, Issue 6

Article No.: 158, Pages 1 - 10

https://doi.org/10.1145/2508363.2508380

Published: 01 November 2013 Publication History

Abstract

Detailed facial performance geometry can be reconstructed using dense camera and light setups in controlled studios. However, a wide range of important applications cannot employ these approaches, including all movie productions shot from a single principal camera. For post-production, these require dynamic monocular face capture for appearance modification. We present a new method for capturing face geometry from monocular video. Our approach captures detailed, dynamic, spatio-temporally coherent 3D face geometry without the need for markers. It works under uncontrolled lighting, and it successfully reconstructs expressive motion including high-frequency face detail such as folds and laugh lines. After simple manual initialization, the capturing process is fully automatic, which makes it versatile, lightweight and easy-to-deploy. Our approach tracks accurate sparse 2D features between automatically selected key frames to animate a parametric blend shape model, which is further refined in pose, expression and shape by temporally coherent optical flow and photometric stereo. We demonstrate performance capture results for long and complex face sequences captured indoors and outdoors, and we exemplify the relevance of our approach as an enabling technology for model-based face editing in movies and video, such as adding new facial textures, as well as a step towards enabling everyone to do facial performance capture with a single affordable camera.

Supplementary Material

ZIP File (a158-garrido.zip)

Supplemental material.

Download
290.77 MB

References

[1]

Ahonen, T., Hadid, A., and Pietikainen, M. 2006. Face description with local binary patterns: Application to face recognition. IEEE TPAMI 28, 12, 2037--2041.

Digital Library

[2]

Alexander, O., Rogers, M., Lambeth, W., Chiang, M., and Debevec, P. 2009. The Digital Emily Project: photoreal facial modeling and animation. In ACM SIGGRAPH Courses, 12:1--12:15.

Digital Library

[3]

Arun, K. S., Huang, T. S., and Blostein, S. D. 1987. Least-squares fitting of two 3-D point sets. IEEE TPAMI 9, 5, 698--700.

Digital Library

[4]

Beeler, T., Hahn, F., Bradley, D., Bickel, B., Beardsley, P., Gotsman, C., Sumner, R. W., and Gross, M. 2011. High-quality passive facial performance capture using anchor frames. ACM TOG (Proc. SIGGRAPH) 30, 75:1--75:10.

Digital Library

[5]

Bickel, B., Botsch, M., Angst, R., Matusik, W., Otaduy, M., Pfister, H., and Gross, M. 2007. Multi-scale capture of facial geometry and motion. ACM TOG (Proc. SIGGRAPH) 26, 33:1--33:10.

Digital Library

[6]

Black, M., and Yacoob, Y. 1995. Tracking and recognizing rigid and non-rigid facial motions using local parametric models of image motion. In Proc. ICCV, 374--381.

Digital Library

[7]

Blanz, V., Basso, C., Vetter, T., and Poggio, T. 2003. Reanimating faces in images and video. CGF (Proc. EUROGRAPHICS) 22, 641--650.

[8]

Borshukov, G., Piponi, D., Larsen, O., Lewis, J. P., and Tempelaar-Lietz, C. 2003. Universal capture: image-based facial animation for "The Matrix Reloaded". In ACM SIGGRAPH 2003 Sketches, 16:1--16:1.

Digital Library

[9]

Bouaziz, S., Wang, Y., and Pauly, M. 2013. Online modeling for realtime facial animation. ACM TOG (Proc. SIGGRAPH) 32, 4, 40:1--40:10.

Digital Library

[10]

Bradley, D., Heidrich, W., Popa, T., and Sheffer, A. 2010. High resolution passive facial performance capture. ACM TOG (Proc. SIGGRAPH) 29, 4, 41:1--41:10.

Digital Library

[11]

Brand, M., and Bhotika, R. 2001. Flexible flow for 3D nonrigid tracking and shape recovery. In Proc. CVPR, 315--322.

[12]

Cao, C., Weng, Y., Lin, S., and Zhou, K. 2013. 3D shape regression for real-time facial animation. ACM TOG (Proc. SIGGRAPH) 32, 4, 41:1--41:10.

Digital Library

[13]

Chai, J.-x., Xiao, J., and Hodgins, J. 2003. Vision-based control of 3D facial animation. In Proc. SCA, 193--206.

Digital Library

[14]

Chuang, E., and Bregler, C. 2002. Performance-driven facial animation using blend shape interpolation. Tech. Rep. CS-TR-2002-02, Stanford University.

[15]

Cootes, T. F., Edwards, G. J., and Taylor, C. J. 2001. Active appearance models. IEEE TPAMI 23, 6, 681--685.

Digital Library

[16]

Dale, K., Sunkavalli, K., Johnson, M. K., Vlasic, D., Matusik, W., and Pfister, H. 2011. Video face replacement. ACM TOG (Proc. SIGGRAPH Asia) 30, 6, 130:1--130:10.

Digital Library

[17]

Dantone, M., Gall, J., Fanelli, G., and Gool, L. V. 2012. Real-time facial feature detection using conditional regression forests. In Proc. CVPR, 2578--2585.

Digital Library

[18]

David, P., DeMenthon, D., Duraiswami, R., and Samet, H. 2004. SoftPOSIT: Simultaneous pose and correspondence determination. IJCV 59, 3, 259--284.

Digital Library

[19]

DeCarlo, D., and Metaxas, D. 1996. The integration of optical flow and deformable models with applications to human face shape and motion estimation. In Proc. CVPR, 231--238.

Digital Library

[20]

Essa, I., Basu, S., Darrell, T., and Pentland, A. 1996. Modeling, tracking and interactive animation of faces and heads using input from video. In Proc. CA, 68--79.

Digital Library

[21]

Furukawa, Y., and Ponce, J. 2009. Dense 3D motion capture for human faces. In Proc. CVPR, 1674--1681.

[22]

Guenter, B., Grimm, C., Wood, D., Malvar, H., and Pighin, F. 1998. Making faces. In Proc. SIGGRAPH, 55--66.

Digital Library

[23]

Huang, H., Chai, J., Tong, X., and Wu, H.-T. 2011. Leveraging motion capture and 3D scanning for high-fidelity facial performance acquisition. ACM TOG (Proc. SIGGRAPH) 30, 74:1--74:10.

Digital Library

[24]

Kemelmacher-Shlizerman, I., Sankar, A., Shechtman, E., and Seitz, S. M. 2010. Being John Malkovich. In Proc. ECCV, 341--353.

Digital Library

[25]

Li, H., Roivainen, P., and Forcheimer, R. 1993. 3-D motion estimation in model-based facial image coding. IEEE TPAMI 15, 6, 545--555.

Digital Library

[26]

Li, H., Weise, T., and Pauly, M. 2010. Example-based facial rigging. ACM TOG (Proc. SIGGRAPH) 29, 3, 32:1--32:6.

Digital Library

[27]

Li, K., Xu, F., Wang, J., Dai, Q., and Liu, Y. 2012. A data-driven approach for facial expression synthesis in video. In Proc. CVPR, 57--64.

Digital Library

[28]

Li, H., Yu, J., Ye, Y., and Bregler, C. 2013. Realtime facial animation with on-the-fly correctives. ACM TOG (Proc. SIGGRAPH) 32, 4, 42:1--42:10.

Digital Library

[29]

Nehab, D., Rusinkiewicz, S., Davis, J., and Ramamoorthi, R. 2005. Efficiently combining positions and normals for precise 3D geometry. ACM TOG 24, 3, 536--543.

Digital Library

[30]

Pighin, F., and Lewis, J. 2006. Performance-driven facial animation. In ACM SIGGRAPH Courses.

[31]

Pighin, F., Szeliski, R., and Salesin, D. 1999. Resynthesizing facial animation through 3D model-based tracking. In Proc. CVPR, 143--150.

[32]

Platt, J. C. 1998. Sequential minimal optimization: A fast algorithm for training support vector machines. Tech. Rep. MSRTR-98-14, Microsoft Research.

[33]

Saragih, J. M., Lucey, S., and Cohn, J. F. 2011. Deformable model fitting by regularized landmark mean-shift. IJCV 91, 2, 200--215.

Digital Library

[34]

Sorkine, O. 2005. Laplacian mesh processing. In EUROGRAPHICS STAR report, 53--70.

[35]

Valgaerts, L., Bruhn, A., Mainberger, M., and Weickert, J. 2011. Dense versus sparse approaches for estimating the fundamental matrix. IJCV 96, 2, 212--234.

Digital Library

[36]

Valgaerts, L., Wu, C., Bruhn, A., Seidel, H.-P., and Theobalt, C. 2012. Lightweight binocular facial performance capture under uncontrolled lighting. ACM TOG (Proc. SIGGRAPH Asia) 31, 6, 187:1--187:11.

Digital Library

[37]

Vlasic, D., Brand, M., Pfister, H., and Popovíc, J. 2005. Face transfer with multilinear models. ACM TOG (Proc. SIGGRAPH) 24, 3, 426--433.

Digital Library

[38]

Volz, S., Bruhn, A., Valgaerts, L., and Zimmer, H. 2011. Modeling temporal coherence for optical flow. In Proc. ICCV, 1116--1123.

Digital Library

[39]

Wang, Y., Huang, X., Su Lee, C., Zhang, S., Li, Z., Samaras, D., Metaxas, D., Elgammal, A., and Huang, P. 2004. High resolution acquisition, learning and transfer of dynamic 3-D facial expressions. CGF 23, 677--686.

[40]

Weise, T., Leibe, B., and Gool, L. J. V. 2007. Fast 3D scanning with automatic motion compensation. In Proc. CVPR.

[41]

Weise, T., Li, H., Gool, L. J. V., and Pauly, M. 2009. Face/Off: live facial puppetry. In Proc. SIGGRAPH/Eurographics Symposium on Computer Animation, 7--16.

Digital Library

[42]

Weise, T., Bouaziz, S., Li, H., and Pauly, M. 2011. Realtime performance-based facial animation. ACM TOG (Proc. SIGGRAPH) 30, 77:1--77:10.

Digital Library

[43]

Williams, L. 1990. Performance-driven facial animation. In Proc. SIGGRAPH, 235--242.

Digital Library

[44]

Wilson, C. A., Ghosh, A., Peers, P., Chiang, J.-Y., Busch, J., and Debevec, P. 2010. Temporal upsampling of performance geometry using photometric alignment. ACM TOG 29, 17:1--17:11.

Digital Library

[45]

Xiao, J., Baker, S., Matthews, I., and Kanade, T. 2004. Real-time combined 2D+3D active appearance models. In Proc. CVPR, 535--542.

Digital Library

[46]

Zhang, L., Noah, Curless, B., and Seitz, S. M. 2004. Spacetime faces: high resolution capture for modeling and animation. ACM TOG (Proc. SIGGRAPH) 23, 548--558.

Digital Library

Cited By

Ha HHwang IMonzon NCho JKim DBaek SMuñoz AGutierrez DKim M(2024)Polarimetric BSSRDF Acquisition of Dynamic FacesACM Transactions on Graphics10.1145/368776743:6(1-11)Online publication date: 19-Dec-2024
https://dl.acm.org/doi/10.1145/3687767
Lu G(2024)Morphable-SfS: Enhancing Shape-from-Silhouette Via Morphable Modeling2024 IEEE International Conference on Robotics and Automation (ICRA)10.1109/ICRA57147.2024.10610152(12312-12318)Online publication date: 13-May-2024
https://doi.org/10.1109/ICRA57147.2024.10610152
Diao HJiang XFan YLi MWu H(2024)3D Face Reconstruction Based on a Single Image: A ReviewIEEE Access10.1109/ACCESS.2024.338197512(59450-59473)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3381975
Show More Cited By

Index Terms

Reconstructing detailed dynamic face geometry from monocular video
1. Applied computing
  1. Document management and text processing
    1. Document capture
      1. Document scanning
2. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Computer graphics
    1. Animation

Recommendations

Automatic acquisition of high-fidelity facial performances using monocular videos

This paper presents a facial performance capture system that automatically captures high-fidelity facial performances using uncontrolled monocular videos (e.g., Internet videos). We start the process by detecting and tracking important facial features ...
Lightweight binocular facial performance capture under uncontrolled lighting

Recent progress in passive facial performance capture has shown impressively detailed results on highly articulated motion. However, most methods rely on complex multi-camera set-ups, controlled lighting or fiducial markers. This prevents them from ...
Silhouette lookup for monocular 3D pose tracking

Computers should be able to detect and track the articulated 3D pose of a human being moving through a video sequence. Incremental tracking methods often prove slow and unreliable, and many must be initialized by a human operator before they can track a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics

ACM Transactions on Graphics Volume 32, Issue 6

November 2013

671 pages

ISSN:0730-0301

EISSN:1557-7368

DOI:10.1145/2508363

Issue’s Table of Contents

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 2013

Published in TOG Volume 32, Issue 6

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

151
Total Citations
View Citations
898
Total Downloads

Downloads (Last 12 months)23
Downloads (Last 6 weeks)5

Reflects downloads up to 24 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ha HHwang IMonzon NCho JKim DBaek SMuñoz AGutierrez DKim M(2024)Polarimetric BSSRDF Acquisition of Dynamic FacesACM Transactions on Graphics10.1145/368776743:6(1-11)Online publication date: 19-Dec-2024
https://dl.acm.org/doi/10.1145/3687767
Lu G(2024)Morphable-SfS: Enhancing Shape-from-Silhouette Via Morphable Modeling2024 IEEE International Conference on Robotics and Automation (ICRA)10.1109/ICRA57147.2024.10610152(12312-12318)Online publication date: 13-May-2024
https://doi.org/10.1109/ICRA57147.2024.10610152
Diao HJiang XFan YLi MWu H(2024)3D Face Reconstruction Based on a Single Image: A ReviewIEEE Access10.1109/ACCESS.2024.338197512(59450-59473)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3381975
Kwak JKo H(2024)4D Facial Avatar Reconstruction From Monocular Video via Efficient and Controllable Neural Radiance FieldsIEEE Access10.1109/ACCESS.2024.335505212(15675-15683)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3355052
Zanfir MAlldieck TSminchisescu C(2024)PhoMoH: Implicit Photorealistic 3D Models of Human Heads2024 International Conference on 3D Vision (3DV)10.1109/3DV62453.2024.00107(1229-1239)Online publication date: 18-Mar-2024
https://doi.org/10.1109/3DV62453.2024.00107
Kabadayi BZielonka WBhatnagar BPons-Moll GThies J(2024)GAN-Avatar: Controllable Personalized GAN-based Human Head Avatar2024 International Conference on 3D Vision (3DV)10.1109/3DV62453.2024.00058(882-892)Online publication date: 18-Mar-2024
https://doi.org/10.1109/3DV62453.2024.00058
Ming XLi JLing JZhang LXu F(2024)High-Quality Mesh Blendshape Generation from Face Videos via Neural Inverse RenderingComputer Vision – ECCV 202410.1007/978-3-031-72897-6_7(106-125)Online publication date: 2-Dec-2024
https://doi.org/10.1007/978-3-031-72897-6_7
Kumar SNandini Arkam MChaturvedi S(2024)A Comparative Overview of Deep Learning Aided Image GenerationProceedings of 4th International Conference on Artificial Intelligence and Smart Energy10.1007/978-3-031-61471-2_2(18-34)Online publication date: 12-Jun-2024
https://doi.org/10.1007/978-3-031-61471-2_2
Shimada SGolyanik VPérez PTheobalt C(2023)Decaf: Monocular Deformation Capture for Face and Hand InteractionsACM Transactions on Graphics10.1145/361832942:6(1-16)Online publication date: 5-Dec-2023
https://dl.acm.org/doi/10.1145/3618329
Chen SLai YXia SRosin PGao L(2023)3D Face Reconstruction and Gaze Tracking in the HMD for Virtual InteractionIEEE Transactions on Multimedia10.1109/TMM.2022.315682025(3166-3179)Online publication date: 2023
https://doi.org/10.1109/TMM.2022.3156820
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents