skip to main content

Reconstructing detailed dynamic face geometry from monocular video

Published: 01 November 2013 Publication History


Detailed facial performance geometry can be reconstructed using dense camera and light setups in controlled studios. However, a wide range of important applications cannot employ these approaches, including all movie productions shot from a single principal camera. For post-production, these require dynamic monocular face capture for appearance modification. We present a new method for capturing face geometry from monocular video. Our approach captures detailed, dynamic, spatio-temporally coherent 3D face geometry without the need for markers. It works under uncontrolled lighting, and it successfully reconstructs expressive motion including high-frequency face detail such as folds and laugh lines. After simple manual initialization, the capturing process is fully automatic, which makes it versatile, lightweight and easy-to-deploy. Our approach tracks accurate sparse 2D features between automatically selected key frames to animate a parametric blend shape model, which is further refined in pose, expression and shape by temporally coherent optical flow and photometric stereo. We demonstrate performance capture results for long and complex face sequences captured indoors and outdoors, and we exemplify the relevance of our approach as an enabling technology for model-based face editing in movies and video, such as adding new facial textures, as well as a step towards enabling everyone to do facial performance capture with a single affordable camera.

Supplementary Material

ZIP File (
Supplemental material.


Ahonen, T., Hadid, A., and Pietikainen, M. 2006. Face description with local binary patterns: Application to face recognition. IEEE TPAMI 28, 12, 2037--2041.
Alexander, O., Rogers, M., Lambeth, W., Chiang, M., and Debevec, P. 2009. The Digital Emily Project: photoreal facial modeling and animation. In ACM SIGGRAPH Courses, 12:1--12:15.
Arun, K. S., Huang, T. S., and Blostein, S. D. 1987. Least-squares fitting of two 3-D point sets. IEEE TPAMI 9, 5, 698--700.
Beeler, T., Hahn, F., Bradley, D., Bickel, B., Beardsley, P., Gotsman, C., Sumner, R. W., and Gross, M. 2011. High-quality passive facial performance capture using anchor frames. ACM TOG (Proc. SIGGRAPH) 30, 75:1--75:10.
Bickel, B., Botsch, M., Angst, R., Matusik, W., Otaduy, M., Pfister, H., and Gross, M. 2007. Multi-scale capture of facial geometry and motion. ACM TOG (Proc. SIGGRAPH) 26, 33:1--33:10.
Black, M., and Yacoob, Y. 1995. Tracking and recognizing rigid and non-rigid facial motions using local parametric models of image motion. In Proc. ICCV, 374--381.
Blanz, V., Basso, C., Vetter, T., and Poggio, T. 2003. Reanimating faces in images and video. CGF (Proc. EUROGRAPHICS) 22, 641--650.
Borshukov, G., Piponi, D., Larsen, O., Lewis, J. P., and Tempelaar-Lietz, C. 2003. Universal capture: image-based facial animation for "The Matrix Reloaded". In ACM SIGGRAPH 2003 Sketches, 16:1--16:1.
Bouaziz, S., Wang, Y., and Pauly, M. 2013. Online modeling for realtime facial animation. ACM TOG (Proc. SIGGRAPH) 32, 4, 40:1--40:10.
Bradley, D., Heidrich, W., Popa, T., and Sheffer, A. 2010. High resolution passive facial performance capture. ACM TOG (Proc. SIGGRAPH) 29, 4, 41:1--41:10.
Brand, M., and Bhotika, R. 2001. Flexible flow for 3D nonrigid tracking and shape recovery. In Proc. CVPR, 315--322.
Cao, C., Weng, Y., Lin, S., and Zhou, K. 2013. 3D shape regression for real-time facial animation. ACM TOG (Proc. SIGGRAPH) 32, 4, 41:1--41:10.
Chai, J.-x., Xiao, J., and Hodgins, J. 2003. Vision-based control of 3D facial animation. In Proc. SCA, 193--206.
Chuang, E., and Bregler, C. 2002. Performance-driven facial animation using blend shape interpolation. Tech. Rep. CS-TR-2002-02, Stanford University.
Cootes, T. F., Edwards, G. J., and Taylor, C. J. 2001. Active appearance models. IEEE TPAMI 23, 6, 681--685.
Dale, K., Sunkavalli, K., Johnson, M. K., Vlasic, D., Matusik, W., and Pfister, H. 2011. Video face replacement. ACM TOG (Proc. SIGGRAPH Asia) 30, 6, 130:1--130:10.
Dantone, M., Gall, J., Fanelli, G., and Gool, L. V. 2012. Real-time facial feature detection using conditional regression forests. In Proc. CVPR, 2578--2585.
David, P., DeMenthon, D., Duraiswami, R., and Samet, H. 2004. SoftPOSIT: Simultaneous pose and correspondence determination. IJCV 59, 3, 259--284.
DeCarlo, D., and Metaxas, D. 1996. The integration of optical flow and deformable models with applications to human face shape and motion estimation. In Proc. CVPR, 231--238.
Essa, I., Basu, S., Darrell, T., and Pentland, A. 1996. Modeling, tracking and interactive animation of faces and heads using input from video. In Proc. CA, 68--79.
Furukawa, Y., and Ponce, J. 2009. Dense 3D motion capture for human faces. In Proc. CVPR, 1674--1681.
Guenter, B., Grimm, C., Wood, D., Malvar, H., and Pighin, F. 1998. Making faces. In Proc. SIGGRAPH, 55--66.
Huang, H., Chai, J., Tong, X., and Wu, H.-T. 2011. Leveraging motion capture and 3D scanning for high-fidelity facial performance acquisition. ACM TOG (Proc. SIGGRAPH) 30, 74:1--74:10.
Kemelmacher-Shlizerman, I., Sankar, A., Shechtman, E., and Seitz, S. M. 2010. Being John Malkovich. In Proc. ECCV, 341--353.
Li, H., Roivainen, P., and Forcheimer, R. 1993. 3-D motion estimation in model-based facial image coding. IEEE TPAMI 15, 6, 545--555.
Li, H., Weise, T., and Pauly, M. 2010. Example-based facial rigging. ACM TOG (Proc. SIGGRAPH) 29, 3, 32:1--32:6.
Li, K., Xu, F., Wang, J., Dai, Q., and Liu, Y. 2012. A data-driven approach for facial expression synthesis in video. In Proc. CVPR, 57--64.
Li, H., Yu, J., Ye, Y., and Bregler, C. 2013. Realtime facial animation with on-the-fly correctives. ACM TOG (Proc. SIGGRAPH) 32, 4, 42:1--42:10.
Nehab, D., Rusinkiewicz, S., Davis, J., and Ramamoorthi, R. 2005. Efficiently combining positions and normals for precise 3D geometry. ACM TOG 24, 3, 536--543.
Pighin, F., and Lewis, J. 2006. Performance-driven facial animation. In ACM SIGGRAPH Courses.
Pighin, F., Szeliski, R., and Salesin, D. 1999. Resynthesizing facial animation through 3D model-based tracking. In Proc. CVPR, 143--150.
Platt, J. C. 1998. Sequential minimal optimization: A fast algorithm for training support vector machines. Tech. Rep. MSRTR-98-14, Microsoft Research.
Saragih, J. M., Lucey, S., and Cohn, J. F. 2011. Deformable model fitting by regularized landmark mean-shift. IJCV 91, 2, 200--215.
Sorkine, O. 2005. Laplacian mesh processing. In EUROGRAPHICS STAR report, 53--70.
Valgaerts, L., Bruhn, A., Mainberger, M., and Weickert, J. 2011. Dense versus sparse approaches for estimating the fundamental matrix. IJCV 96, 2, 212--234.
Valgaerts, L., Wu, C., Bruhn, A., Seidel, H.-P., and Theobalt, C. 2012. Lightweight binocular facial performance capture under uncontrolled lighting. ACM TOG (Proc. SIGGRAPH Asia) 31, 6, 187:1--187:11.
Vlasic, D., Brand, M., Pfister, H., and Popovíc, J. 2005. Face transfer with multilinear models. ACM TOG (Proc. SIGGRAPH) 24, 3, 426--433.
Volz, S., Bruhn, A., Valgaerts, L., and Zimmer, H. 2011. Modeling temporal coherence for optical flow. In Proc. ICCV, 1116--1123.
Wang, Y., Huang, X., Su Lee, C., Zhang, S., Li, Z., Samaras, D., Metaxas, D., Elgammal, A., and Huang, P. 2004. High resolution acquisition, learning and transfer of dynamic 3-D facial expressions. CGF 23, 677--686.
Weise, T., Leibe, B., and Gool, L. J. V. 2007. Fast 3D scanning with automatic motion compensation. In Proc. CVPR.
Weise, T., Li, H., Gool, L. J. V., and Pauly, M. 2009. Face/Off: live facial puppetry. In Proc. SIGGRAPH/Eurographics Symposium on Computer Animation, 7--16.
Weise, T., Bouaziz, S., Li, H., and Pauly, M. 2011. Realtime performance-based facial animation. ACM TOG (Proc. SIGGRAPH) 30, 77:1--77:10.
Williams, L. 1990. Performance-driven facial animation. In Proc. SIGGRAPH, 235--242.
Wilson, C. A., Ghosh, A., Peers, P., Chiang, J.-Y., Busch, J., and Debevec, P. 2010. Temporal upsampling of performance geometry using photometric alignment. ACM TOG 29, 17:1--17:11.
Xiao, J., Baker, S., Matthews, I., and Kanade, T. 2004. Real-time combined 2D+3D active appearance models. In Proc. CVPR, 535--542.
Zhang, L., Noah, Curless, B., and Seitz, S. M. 2004. Spacetime faces: high resolution capture for modeling and animation. ACM TOG (Proc. SIGGRAPH) 23, 548--558.

Cited By

View all



Information & Contributors


Published In

cover image ACM Transactions on Graphics
ACM Transactions on Graphics  Volume 32, Issue 6
November 2013
671 pages
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]


Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 2013
Published in TOG Volume 32, Issue 6


Request permissions for this article.

Check for updates

Author Tags

  1. facial performance capture
  2. monocular tracking
  3. shading-based refinement
  4. temporally coherent optical flow


  • Research-article


Other Metrics

Bibliometrics & Citations


Article Metrics

  • Downloads (Last 12 months)23
  • Downloads (Last 6 weeks)5
Reflects downloads up to 24 Dec 2024

Other Metrics


Cited By

View all

View Options

Login options

Full Access

View options


View or Download as a PDF file.



View online with eReader.








Share this Publication link

Share on social media