research-article

Lightweight binocular facial performance capture under uncontrolled lighting

Authors:

Levi Valgaerts,

Hans-Peter Seidel,

Christian TheobaltAuthors Info & Claims

ACM Transactions on Graphics (TOG), Volume 31, Issue 6

Article No.: 187, Pages 1 - 11

https://doi.org/10.1145/2366145.2366206

Published: 01 November 2012 Publication History

Abstract

Recent progress in passive facial performance capture has shown impressively detailed results on highly articulated motion. However, most methods rely on complex multi-camera set-ups, controlled lighting or fiducial markers. This prevents them from being used in general environments, outdoor scenes, during live action on a film set, or by freelance animators and everyday users who want to capture their digital selves. In this paper, we therefore propose a lightweight passive facial performance capture approach that is able to reconstruct high-quality dynamic facial geometry from only a single pair of stereo cameras. Our method succeeds under uncontrolled and time-varying lighting, and also in outdoor scenes. Our approach builds upon and extends recent image-based scene flow computation, lighting estimation and shading-based refinement algorithms. It integrates them into a pipeline that is specifically tailored towards facial performance reconstruction from challenging binocular footage under uncontrolled lighting. In an experimental evaluation, the strong capabilities of our method become explicit: We achieve detailed and spatio-temporally coherent results for expressive facial motion in both indoor and outdoor scenes -- even from low quality input images recorded with a hand-held consumer stereo camera. We believe that our approach is the first to capture facial performances of such high quality from a single stereo rig and we demonstrate that it brings facial performance capture out of the studio, into the wild, and within the reach of everybody.

References

[1]

Alexander, O., Rogers, M., Lambeth, W., Chiang, M., and Debevec, P. 2009. The digital emily project: photo-real facial modeling and animation. In ACM SIGGRAPH 2009 Courses, ACM, 12:1--12:15.

Digital Library

[2]

Anuar, N., and Guskov, I. 2004. Extracting animated meshes with adaptive motion estimation. In Proc. VMV, 63--71.

[3]

Basha, T., Moses, Y., and Kiryati, N. 2010. Multi-view scene flow estimation: a view centered variational approach. In Proc. CVPR, 1506--1513.

[4]

Basri, R., Jacobs, D., and Kemelmacher, I. 2007. Photometric stereo with general, unknown lighting. IJCV 72, 3, 239--257.

Digital Library

[5]

Beeler, T., Bickel, B., Beardsley, P., Sumner, B., and Gross, M. 2010. High-quality single-shot capture of facial geometry. ACM TOG 29, 40:1--40:9.

Digital Library

[6]

Beeler, T., Hahn, F., Bradley, D., Bickel, B., Beardsley, P., Gotsman, C., Sumner, R. W., and Gross, M. 2011. High-quality passive facial performance capture using anchor frames. ACM TOG 30, 75:1--75:10.

Digital Library

[7]

Bickel, B., Botsch, M., Angst, R., Matusik, W., Otaduy, M., Pfister, H., and Gross, M. 2007. Multi-scale capture of facial geometry and motion. ACM TOG 26, 33:1--33:10.

Digital Library

[8]

Birkbeck, N., Cobzas, D., and Jägersand, M. 2011. Basis constrained 3D scene flow on a dynamic proxy. In Proc. ICCV.

Digital Library

[9]

Blanz, V., Basso, C., Vetter, T., and Poggio, T. 2003. Reanimating faces in images and video. CGF (Proc. EUROGRAPHICS) 22, 641--650.

[10]

Borshukov, G., Plponi, D., Larsen, O., Lewis, J. P., and Tempelaar-Lietz, C. 2003. Universal capture: image-based facial animation for "the matrix reloaded". In ACM SIGGRAPH 2003 Sketches, ACM, 16:1--16:1.

Digital Library

[11]

Bradley, D., Heidrich, W., Popa, T., and Sheffer, A. 2010. High resolution passive facial performance capture. ACM TOG (Proc. SIGGRAPH) 29, 41:1--41:10.

Digital Library

[12]

Carceroni, R. L., and Kutulakos, K. N. 2002. Multi-view scene capture by surfel sampling: from video streams to nonrigid 3D notion, shape and reflectance. IJCV 49, 2--3, 175--214.

Digital Library

[13]

DeCarlo, D., and Metaxas, D. 1996. The integration of optical flow and deformable models with applications to human face ahape and motion estimation. In Proc. CVPR, 231--238.

Digital Library

[14]

Felzenszwalb, P. F., and Huttenlocher, D. P. 2004. Efficient graph-based image segmentation. IJCV 59, 2, 167--181.

Digital Library

[15]

Furukawa, Y., and Ponce, J. 2009. Dense 3D motion capture for human faces. In Proc. CVPR, 1674--1681.

[16]

Georghiades, A. S. 2003. Recovering 3-D shape and reflectance from a small number of photographs. In Proc. EGSR, 230--240.

Digital Library

[17]

Guenter, B., Grimm, C., Wood, D., Malvar, H., and Pighin, F. 1998. Making faces. In Proc. SIGGRAPH, ACM, 55--66.

Digital Library

[18]

Harris, C. G., and Stephens, M. 1988. A combined corner and edge detector. In Proc. Alvey Vision Conf., 147--152.

[19]

Hartley, R., and Zisserman, A. 2000. Multiple View Geometry in Computer Vision. Cambridge University Press.

Digital Library

[20]

Huang, H., Chai, J., Tong, X., and Wu, H.-T. 2011. Leveraging motion capture and 3D scanning for high-fidelity facial performance acquisition. ACM TOG 30, 74:1--74:10.

Digital Library

[21]

Jin, H., Cremers, D., Wang, D., Prados, E., Yezzi, A., and Soatto, S. 2008. 3-D reconstruction of shaded objects from multiple images under unknown illumination. IJCV 76, 3, 245--256.

Digital Library

[22]

Kazhdan, M., Bolitho, M., and Hoppe, H. 2006. Poisson surface reconstruction. In Proc. SGP, 61--70.

Digital Library

[23]

Nehab, D., Rusinkiewicz, S., Davis, J., and Ramamoorthi, R. 2005. Efficiently combining positions and normals for precise 3D geometry. ACM TOG 24, 3, 536--543.

Digital Library

[24]

Pighin, F., Szeliski, R., and Salesin, D. 1999. Resynthesizing facial animation through 3d model-based tracking. In Proc. CVPR, vol. 1, 143--150.

[25]

Popa, T., South-Dickinson, I., Bradley, D., Sheffer, A., and Heidrich, W. 2010. Globally consistent space-time reconstruction. CGF (Proc. SGP) 29, 1633--1642.

[26]

Sorkine, O. 2005. Laplacian mesh processing. In STAR Proceedings of Eurographics 2005, Eurographics Association, 53--70.

[27]

Sun, D., Roth, S., Lewis, J. P., and Black, M. J. 2008. Learning optical flow. In Proc. ECCV, vol. 5304, 83--97.

Digital Library

[28]

Valgaerts, L., Bruhn, A., Zimmer, H., Weickert, J., Stoll, C., and Theobalt, C. 2010. Joint estimation of motion, structure and geometry from stereo sequences. In Proc. ECCV, Springer LNCS, vol. 6314, 568--581.

Digital Library

[29]

Valgaerts, L., Bruhn, A., Mainberger, M., and Weickert, J. 2011. Dense versus sparse approaches for estimating the fundamental matrix. IJCV. Springer Online First.

Digital Library

[30]

Vogiatzis, G., and Hernández, C. 2011. Self-calibrated, multi-spectral photometric stereo for 3D face capture. IJCV.

Digital Library

[31]

Wand, M., Adams, B., Ovsjanikov, M., Berner, A., Bokeloh, M., Jenke, P., Guibas, L., Seidel, H.-P., and Schilling, A. 2009. Efficient reconstruction of nonrigid shape and motion from real-time 3D scanner data. ACM TOG 28, 15:1--15:15.

Digital Library

[32]

Wang, Y., Huang, X., Su Lee, C., Zhang, S., Li, Z., Samaras, D., Metaxas, D., Elgammal, A., and Huang, P. 2004. High resolution acquisition, learning and transfer of dynamic 3-D facial expressions. CGF 23, 677--686.

[33]

Wedel, A., Rabe, C., Vaudrey, T., Brox, T., Franke, U., and Cremers, D. 2008. Efficient dense scene flow from sparse or dense stereo data. In Proc. ECCV, vol. 5302, 739--751.

Digital Library

[34]

Weise, T., Leibe, B., and Gool, L. J. V. 2007. Fast 3D scanning with automatic motion compensation. In Proc. CVPR.

[35]

Weise, T., Bouaziz, S., Li, H., and Pauly, M. 2011. Realtime performance-based facial animation. ACM TOG 30, 77:1--77:10.

Digital Library

[36]

Williams, L. 1990. Performance-driven facial animation. In Proc. SIGGRAPH, ACM, 235--242.

Digital Library

[37]

Wilson, C. A., Ghosh, A., Peers, P., Chiang, J.-Y., Busch, J., and Debevec, P. 2010. Temporal upsampling of performance geometry using photometric alignment. ACM TOG 29, 17:1--17:11.

Digital Library

[38]

Wu, C., Varanasi, K., Liu, Y., Seidel, H.-P., and Theobalt, C. 2011. Shading-based dynamic shape refinement from multi-view video under general illumination. In Proc. ICCV.

Digital Library

[39]

Wu, C., Wilburn, B., Matsushita, Y., and Theobalt, C. 2011. High-quality shape from multi-view stereo and shading under general illumination. In Proc. IEEE CVPR, 969--976.

Digital Library

[40]

Zhang, L., Noah, Curless, B., and Seitz, S. M. 2004. Spacetime faces: high resolution capture for modeling and animation. ACM TOG 23, 548--558.

Digital Library

[41]

Zimmer, H., Bruhn, A., and Weickert, J. 2011. Optic flow in harmony. IJCV 93, 3, 368--388.

Digital Library

Cited By

Ha HHwang IMonzon NCho JKim DBaek SMuñoz AGutierrez DKim M(2024)Polarimetric BSSRDF Acquisition of Dynamic FacesACM Transactions on Graphics10.1145/368776743:6(1-11)Online publication date: 19-Dec-2024
https://dl.acm.org/doi/10.1145/3687767
Kabadayi BZielonka WBhatnagar BPons-Moll GThies J(2024)GAN-Avatar: Controllable Personalized GAN-based Human Head Avatar2024 International Conference on 3D Vision (3DV)10.1109/3DV62453.2024.00058(882-892)Online publication date: 18-Mar-2024
https://doi.org/10.1109/3DV62453.2024.00058
Peng SZhu XYi DQian CLei Z(2024)Formulating facial mesh tracking as a differentiable optimization problem: a backpropagation-based solutionVisual Intelligence10.1007/s44267-024-00054-x2:1Online publication date: 19-Jul-2024
https://doi.org/10.1007/s44267-024-00054-x
Show More Cited By

Index Terms

Lightweight binocular facial performance capture under uncontrolled lighting
1. Applied computing
  1. Document management and text processing
    1. Document capture
      1. Document scanning
2. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Computer graphics
    1. Animation

Recommendations

Reconstructing detailed dynamic face geometry from monocular video

Detailed facial performance geometry can be reconstructed using dense camera and light setups in controlled studios. However, a wide range of important applications cannot employ these approaches, including all movie productions shot from a single ...
High-quality passive facial performance capture using anchor frames

We present a new technique for passive and markerless facial performance capture based on anchor frames. Our method starts with high resolution per-frame geometry acquisition using state-of-the-art stereo reconstruction, and proceeds to establish a ...
High-Detail 3D Capture and Non-sequential Alignment of Facial Performance
3DIMPVT '12: Proceedings of the 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission

This paper presents a novel system for the 3D capture of facial performance using standard video and lighting equipment. The mesh of an actor's face is tracked non-sequentially throughout a performance using multi-view image sequences. The minimum ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics

ACM Transactions on Graphics Volume 31, Issue 6

November 2012

794 pages

ISSN:0730-0301

EISSN:1557-7368

DOI:10.1145/2366145

Issue’s Table of Contents

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 2012

Published in TOG Volume 31, Issue 6

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

122
Total Citations
View Citations
635
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)4

Reflects downloads up to 24 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ha HHwang IMonzon NCho JKim DBaek SMuñoz AGutierrez DKim M(2024)Polarimetric BSSRDF Acquisition of Dynamic FacesACM Transactions on Graphics10.1145/368776743:6(1-11)Online publication date: 19-Dec-2024
https://dl.acm.org/doi/10.1145/3687767
Kabadayi BZielonka WBhatnagar BPons-Moll GThies J(2024)GAN-Avatar: Controllable Personalized GAN-based Human Head Avatar2024 International Conference on 3D Vision (3DV)10.1109/3DV62453.2024.00058(882-892)Online publication date: 18-Mar-2024
https://doi.org/10.1109/3DV62453.2024.00058
Peng SZhu XYi DQian CLei Z(2024)Formulating facial mesh tracking as a differentiable optimization problem: a backpropagation-based solutionVisual Intelligence10.1007/s44267-024-00054-x2:1Online publication date: 19-Jul-2024
https://doi.org/10.1007/s44267-024-00054-x
Xiang XCui YWang XZhai MEl Saddik A(2024)GloFP-MSF: monocular scene flow estimation with global feature perceptionMultimedia Systems10.1007/s00530-024-01418-530:4Online publication date: 30-Jul-2024
https://doi.org/10.1007/s00530-024-01418-5
Ming XLi JLing JZhang LXu F(2024)High-Quality Mesh Blendshape Generation from Face Videos via Neural Inverse RenderingComputer Vision – ECCV 202410.1007/978-3-031-72897-6_7(106-125)Online publication date: 2-Dec-2024
https://doi.org/10.1007/978-3-031-72897-6_7
Peng HShi NWang G(2023)Remote sensing traffic scene retrieval based on learning control algorithm for robot multimodal sensing information fusion and human-machine interaction and collaborationFrontiers in Neurorobotics10.3389/fnbot.2023.126723117Online publication date: 11-Oct-2023
https://doi.org/10.3389/fnbot.2023.1267231
Mehl LJahedi ASchmalfuss JBruhn A(2023)M-FUSE: Multi-frame Fusion for Scene Flow Estimation2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV56688.2023.00206(2019-2028)Online publication date: Jan-2023
https://doi.org/10.1109/WACV56688.2023.00206
Zhu HYang HGuo LZhang YWang YHuang MWu MShen QYang RCao X(2023)FaceScape: 3D Facial Dataset and Benchmark for Single-View 3D Face ReconstructionIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.330733845:12(14528-14545)Online publication date: Dec-2023
https://doi.org/10.1109/TPAMI.2023.3307338
Chen SLai YXia SRosin PGao L(2023)3D Face Reconstruction and Gaze Tracking in the HMD for Virtual InteractionIEEE Transactions on Multimedia10.1109/TMM.2022.315682025(3166-3179)Online publication date: 2023
https://doi.org/10.1109/TMM.2022.3156820
Kumar RLuo JPang ADavis J(2023)Disjoint Pose and Shape for 3D Face Reconstruction2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)10.1109/ICCVW60793.2023.00336(3107-3117)Online publication date: 2-Oct-2023
https://doi.org/10.1109/ICCVW60793.2023.00336
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents