skip to main content
research-article

Lightweight binocular facial performance capture under uncontrolled lighting

Published: 01 November 2012 Publication History

Abstract

Recent progress in passive facial performance capture has shown impressively detailed results on highly articulated motion. However, most methods rely on complex multi-camera set-ups, controlled lighting or fiducial markers. This prevents them from being used in general environments, outdoor scenes, during live action on a film set, or by freelance animators and everyday users who want to capture their digital selves. In this paper, we therefore propose a lightweight passive facial performance capture approach that is able to reconstruct high-quality dynamic facial geometry from only a single pair of stereo cameras. Our method succeeds under uncontrolled and time-varying lighting, and also in outdoor scenes. Our approach builds upon and extends recent image-based scene flow computation, lighting estimation and shading-based refinement algorithms. It integrates them into a pipeline that is specifically tailored towards facial performance reconstruction from challenging binocular footage under uncontrolled lighting. In an experimental evaluation, the strong capabilities of our method become explicit: We achieve detailed and spatio-temporally coherent results for expressive facial motion in both indoor and outdoor scenes -- even from low quality input images recorded with a hand-held consumer stereo camera. We believe that our approach is the first to capture facial performances of such high quality from a single stereo rig and we demonstrate that it brings facial performance capture out of the studio, into the wild, and within the reach of everybody.

References

[1]
Alexander, O., Rogers, M., Lambeth, W., Chiang, M., and Debevec, P. 2009. The digital emily project: photo-real facial modeling and animation. In ACM SIGGRAPH 2009 Courses, ACM, 12:1--12:15.
[2]
Anuar, N., and Guskov, I. 2004. Extracting animated meshes with adaptive motion estimation. In Proc. VMV, 63--71.
[3]
Basha, T., Moses, Y., and Kiryati, N. 2010. Multi-view scene flow estimation: a view centered variational approach. In Proc. CVPR, 1506--1513.
[4]
Basri, R., Jacobs, D., and Kemelmacher, I. 2007. Photometric stereo with general, unknown lighting. IJCV 72, 3, 239--257.
[5]
Beeler, T., Bickel, B., Beardsley, P., Sumner, B., and Gross, M. 2010. High-quality single-shot capture of facial geometry. ACM TOG 29, 40:1--40:9.
[6]
Beeler, T., Hahn, F., Bradley, D., Bickel, B., Beardsley, P., Gotsman, C., Sumner, R. W., and Gross, M. 2011. High-quality passive facial performance capture using anchor frames. ACM TOG 30, 75:1--75:10.
[7]
Bickel, B., Botsch, M., Angst, R., Matusik, W., Otaduy, M., Pfister, H., and Gross, M. 2007. Multi-scale capture of facial geometry and motion. ACM TOG 26, 33:1--33:10.
[8]
Birkbeck, N., Cobzas, D., and Jägersand, M. 2011. Basis constrained 3D scene flow on a dynamic proxy. In Proc. ICCV.
[9]
Blanz, V., Basso, C., Vetter, T., and Poggio, T. 2003. Reanimating faces in images and video. CGF (Proc. EUROGRAPHICS) 22, 641--650.
[10]
Borshukov, G., Plponi, D., Larsen, O., Lewis, J. P., and Tempelaar-Lietz, C. 2003. Universal capture: image-based facial animation for "the matrix reloaded". In ACM SIGGRAPH 2003 Sketches, ACM, 16:1--16:1.
[11]
Bradley, D., Heidrich, W., Popa, T., and Sheffer, A. 2010. High resolution passive facial performance capture. ACM TOG (Proc. SIGGRAPH) 29, 41:1--41:10.
[12]
Carceroni, R. L., and Kutulakos, K. N. 2002. Multi-view scene capture by surfel sampling: from video streams to nonrigid 3D notion, shape and reflectance. IJCV 49, 2--3, 175--214.
[13]
DeCarlo, D., and Metaxas, D. 1996. The integration of optical flow and deformable models with applications to human face ahape and motion estimation. In Proc. CVPR, 231--238.
[14]
Felzenszwalb, P. F., and Huttenlocher, D. P. 2004. Efficient graph-based image segmentation. IJCV 59, 2, 167--181.
[15]
Furukawa, Y., and Ponce, J. 2009. Dense 3D motion capture for human faces. In Proc. CVPR, 1674--1681.
[16]
Georghiades, A. S. 2003. Recovering 3-D shape and reflectance from a small number of photographs. In Proc. EGSR, 230--240.
[17]
Guenter, B., Grimm, C., Wood, D., Malvar, H., and Pighin, F. 1998. Making faces. In Proc. SIGGRAPH, ACM, 55--66.
[18]
Harris, C. G., and Stephens, M. 1988. A combined corner and edge detector. In Proc. Alvey Vision Conf., 147--152.
[19]
Hartley, R., and Zisserman, A. 2000. Multiple View Geometry in Computer Vision. Cambridge University Press.
[20]
Huang, H., Chai, J., Tong, X., and Wu, H.-T. 2011. Leveraging motion capture and 3D scanning for high-fidelity facial performance acquisition. ACM TOG 30, 74:1--74:10.
[21]
Jin, H., Cremers, D., Wang, D., Prados, E., Yezzi, A., and Soatto, S. 2008. 3-D reconstruction of shaded objects from multiple images under unknown illumination. IJCV 76, 3, 245--256.
[22]
Kazhdan, M., Bolitho, M., and Hoppe, H. 2006. Poisson surface reconstruction. In Proc. SGP, 61--70.
[23]
Nehab, D., Rusinkiewicz, S., Davis, J., and Ramamoorthi, R. 2005. Efficiently combining positions and normals for precise 3D geometry. ACM TOG 24, 3, 536--543.
[24]
Pighin, F., Szeliski, R., and Salesin, D. 1999. Resynthesizing facial animation through 3d model-based tracking. In Proc. CVPR, vol. 1, 143--150.
[25]
Popa, T., South-Dickinson, I., Bradley, D., Sheffer, A., and Heidrich, W. 2010. Globally consistent space-time reconstruction. CGF (Proc. SGP) 29, 1633--1642.
[26]
Sorkine, O. 2005. Laplacian mesh processing. In STAR Proceedings of Eurographics 2005, Eurographics Association, 53--70.
[27]
Sun, D., Roth, S., Lewis, J. P., and Black, M. J. 2008. Learning optical flow. In Proc. ECCV, vol. 5304, 83--97.
[28]
Valgaerts, L., Bruhn, A., Zimmer, H., Weickert, J., Stoll, C., and Theobalt, C. 2010. Joint estimation of motion, structure and geometry from stereo sequences. In Proc. ECCV, Springer LNCS, vol. 6314, 568--581.
[29]
Valgaerts, L., Bruhn, A., Mainberger, M., and Weickert, J. 2011. Dense versus sparse approaches for estimating the fundamental matrix. IJCV. Springer Online First.
[30]
Vogiatzis, G., and Hernández, C. 2011. Self-calibrated, multi-spectral photometric stereo for 3D face capture. IJCV.
[31]
Wand, M., Adams, B., Ovsjanikov, M., Berner, A., Bokeloh, M., Jenke, P., Guibas, L., Seidel, H.-P., and Schilling, A. 2009. Efficient reconstruction of nonrigid shape and motion from real-time 3D scanner data. ACM TOG 28, 15:1--15:15.
[32]
Wang, Y., Huang, X., Su Lee, C., Zhang, S., Li, Z., Samaras, D., Metaxas, D., Elgammal, A., and Huang, P. 2004. High resolution acquisition, learning and transfer of dynamic 3-D facial expressions. CGF 23, 677--686.
[33]
Wedel, A., Rabe, C., Vaudrey, T., Brox, T., Franke, U., and Cremers, D. 2008. Efficient dense scene flow from sparse or dense stereo data. In Proc. ECCV, vol. 5302, 739--751.
[34]
Weise, T., Leibe, B., and Gool, L. J. V. 2007. Fast 3D scanning with automatic motion compensation. In Proc. CVPR.
[35]
Weise, T., Bouaziz, S., Li, H., and Pauly, M. 2011. Realtime performance-based facial animation. ACM TOG 30, 77:1--77:10.
[36]
Williams, L. 1990. Performance-driven facial animation. In Proc. SIGGRAPH, ACM, 235--242.
[37]
Wilson, C. A., Ghosh, A., Peers, P., Chiang, J.-Y., Busch, J., and Debevec, P. 2010. Temporal upsampling of performance geometry using photometric alignment. ACM TOG 29, 17:1--17:11.
[38]
Wu, C., Varanasi, K., Liu, Y., Seidel, H.-P., and Theobalt, C. 2011. Shading-based dynamic shape refinement from multi-view video under general illumination. In Proc. ICCV.
[39]
Wu, C., Wilburn, B., Matsushita, Y., and Theobalt, C. 2011. High-quality shape from multi-view stereo and shading under general illumination. In Proc. IEEE CVPR, 969--976.
[40]
Zhang, L., Noah, Curless, B., and Seitz, S. M. 2004. Spacetime faces: high resolution capture for modeling and animation. ACM TOG 23, 548--558.
[41]
Zimmer, H., Bruhn, A., and Weickert, J. 2011. Optic flow in harmony. IJCV 93, 3, 368--388.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics
ACM Transactions on Graphics  Volume 31, Issue 6
November 2012
794 pages
ISSN:0730-0301
EISSN:1557-7368
DOI:10.1145/2366145
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 2012
Published in TOG Volume 31, Issue 6

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. facial performance capture
  2. scene flow
  3. shading-based refinement
  4. uncontrolled lighting

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)4
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media