skip to main content
10.1007/978-3-030-01234-2_2guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

BodyNet: Volumetric Inference of 3D Human Body Shapes

Published: 08 September 2018 Publication History

Abstract

Human shape estimation is an important task for video editing, animation and fashion industry. Predicting 3D human body shape from natural images, however, is highly challenging due to factors such as variation in human bodies, clothing and viewpoint. Prior methods addressing this problem typically attempt to fit parametric body models with certain priors on pose and shape. In this work we argue for an alternative representation and propose BodyNet, a neural network for direct inference of volumetric body shape from a single image. BodyNet is an end-to-end trainable network that benefits from (i) a volumetric 3D loss, (ii) a multi-view re-projection loss, and (iii) intermediate supervision of 2D pose, 2D body part segmentation, and 3D pose. Each of them results in performance improvement as demonstrated by our experiments. To evaluate the method, we fit the SMPL model to our network output and show state-of-the-art results on the SURREAL and Unite the People datasets, outperforming recent approaches. Besides achieving state-of-the-art performance, our method also enables volumetric body-part segmentation.

References

[1]
Newell A, Yang K, and Deng J Leibe B, Matas J, Sebe N, and Welling M Stacked hourglass networks for human pose estimation Computer Vision – ECCV 2016 2016 Cham Springer 483-499
[2]
Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: CVPR (2016)
[3]
Pishchulin, L., et al.: DeepCut: joint subset partition and labeling for multi person pose estimation. In: CVPR (2016)
[4]
Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: CVPR (2017)
[5]
Martinez, J., Hossain, R., Romero, J., Little, J.J.: A simple yet effective baseline for 3D human pose estimation. In: ICCV (2017)
[6]
Pavlakos, G., Zhou, X., Derpanis, K.G., Daniilidis, K.: Coarse-to-fine volumetric prediction for single-image 3D human pose. In: CVPR (2017)
[7]
Rogez, G., Weinzaepfel, P., Schmid, C.: LCR-Net: localization-classification-regression for human pose. In: CVPR (2017)
[8]
Zhou, X., Huang, Q., Sun, X., Xue, X., Wei, Y.: Towards 3D human pose estimation in the wild: a weakly-supervised approach. In: ICCV (2017)
[9]
Leroy, V., Franco, J.S., Boyer, E.: Multi-view dynamic shape refinement using local temporal integration. In: ICCV (2017)
[10]
Loper, M.M., Mahmood, N., Black, M.J.: MoSh: motion and shape capture from sparse markers. In: SIGGRAPH (2014)
[11]
von Marcard, T., Rosenhahn, B., Black, M., Pons-Moll, G.: Sparse inertial poser: automatic 3D human pose estimation from sparse IMUs. In: Eurographics (2017)
[12]
Yang J, Franco J-S, Hétroy-Wheeler F, and Wuhrer S Leibe B, Matas J, Sebe N, and Welling M Estimation of human body shape in motion with wide clothing Computer Vision – ECCV 2016 2016 Cham Springer 439-454
[13]
Bogo F, Kanazawa A, Lassner C, Gehler P, Romero J, and Black MJ Leibe B, Matas J, Sebe N, and Welling M Keep It SMPL: automatic estimation of 3d human pose and shape from a single image Computer Vision – ECCV 2016 2016 Cham Springer 561-578
[14]
Tan, V., Budvytis, I., Cipolla, R.: Indirect deep structured learning for 3D human body shape and pose prediction. In: BMVC (2017)
[15]
Tung, H., Tung, H., Yumer, E., Fragkiadaki, K.: Self-supervised learning of motion capture. In: NIPS (2017)
[16]
Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: CVPR (2018)
[17]
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.: SMPL: a skinned multi-person linear model. In: SIGGRAPH (2015)
[18]
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS (2012)
[19]
LeCun Y., Boser B., Denker J. S., Henderson D., Howard R. E., Hubbard W., and Jackel L. D. Backpropagation Applied to Handwritten Zip Code Recognition Neural Computation 1989 1 4 541-551
[20]
Maturana, D., Scherer, S.: VoxNet: a 3D convolutional neural network for real-time object recognition. In: IROS (2015)
[21]
Yan, X., Yang, J., Yumer, E., Guo, Y., Lee, H.: Perspective transformer nets: learning single-view 3D object reconstruction without 3D supervision. In: NIPS (2016)
[22]
Yumer ME and Mitra NJ Leibe B, Matas J, Sebe N, and Welling M Learning semantic deformation flows with 3D convolutional networks Computer Vision – ECCV 2016 2016 Cham Springer 294-311
[23]
Yumer ME and Mitra NJ Leibe B, Matas J, Sebe N, and Welling M Learning semantic deformation flows with 3D convolutional networks Computer Vision – ECCV 2016 2016 Cham Springer 294-311
[24]
Tatarchenko, M., Dosovitskiy, A., Brox, T.: Octree generating networks: Efficient convolutional architectures for high-resolution 3D outputs. In: ICCV (2017)
[25]
Riegler, G., Ulusoy, A.O., Geiger, A.: OctNet: learning deep 3D representations at high resolutions. In: CVPR (2017)
[26]
Wang, P.S., Liu, Y., Guo, Y.X., Sun, C.Y., Tong, X.: O-CNN: Octree-based convolutional neural networks for 3D shape analysis. In: SIGGRAPH (2017)
[27]
Riegler, G., Ulusoy, A.O., Bischof, H., Geiger, A.: OctNetFusion: learning depth fusion from data. In: 3DV (2017)
[28]
Su, H., Fan, H., Guibas, L.: A point set generation network for 3D object reconstruction from a single image. In: CVPR (2017)
[29]
Su, H., Qi, C., Mo, K., Guibas, L.: PointNet: deep learning on point sets for 3D classification and segmentation. In: CVPR (2017)
[30]
Deng, H., Birdal, T., Ilic, S.: PPFNet: global context aware local features for robust 3D point matching. In: CVPR (2018)
[31]
Groueix, T., Fisher, M., Kim, V.G., Russell, B., Aubry, M.: AtlasNet: a Papier-Mâché approach to learning 3D surface generation. In: CVPR (2018)
[32]
Toshev, A., Szegedy, C.: DeepPose: human pose estimation via deep neural networks. In: CVPR (2014)
[33]
Varol, G., et al.: Learning from synthetic humans. In: CVPR (2017)
[34]
Lassner, C., Romero, J., Kiefel, M., Bogo, F., Black, M.J., Gehler, P.V.: Unite the people: closing the loop between 3D and 2D human representations. In: CVPR (2017)
[35]
Ionescu Catalin, Papava Dragos, Olaru Vlad, and Sminchisescu Cristian Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments IEEE Transactions on Pattern Analysis and Machine Intelligence 2014 36 7 1325-1339
[36]
Kostrikov, I., Gall, J.: Depth sweep regression forests for estimating 3D human pose from images. In: BMVC (2014)
[37]
Yasin, H., Iqbal, U., Kruger, B., Weber, A., Gall, J.: A dual-source approach for 3D pose estimation from a single image. In: CVPR (2016)
[38]
Rogez, G., Schmid, C.: MoCap-guided data augmentation for 3D pose estimation in the wild. In: NIPS (2016)
[39]
Balan, A., Sigal, L., Black, M.J., Davis, J., Haussecker, H.: Detailed human shape and pose from images. In: CVPR (2007)
[40]
Guan, P., Weiss, A., O. Balan, A., Black, M.: Estimating human shape and pose from a single image. In: ICCV (2009)
[41]
Anguelov, D., Srinivasan, P., Koller, D., Thrun, S., Rodgers, J., Davis, J.: SCAPE: shape completion and animation of people. In: SIGGRAPH (2005)
[42]
Huang, Y., et al.: Towards accurate marker-less human shape and pose estimation over time. In: 3DV (2017)
[43]
Alldieck, T., Kassubeck, M., Wandt, B., Rosenhahn, B., Magnor, M.: Optical flow-based 3D human motion estimation from monocular video. In: GCPR (2017)
[44]
Rhodin H, Robertini N, Casas D, Richardt C, Seidel H-P, and Theobalt C Leibe B, Matas J, Sebe N, and Welling M General automatic human shape and motion capture using volumetric contour cues Computer Vision – ECCV 2016 2016 Cham Springer 509-526
[45]
Dibra, E., Jain, H., Öztireli, C., Ziegler, R., Gross, M.: HS-Nets: estimating human body shape from silhouettes with convolutional neural networks. In: 3DV (2016)
[46]
Jackson, A.S., Bulat, A., Argyriou, V., Tzimiropoulos, G.: Large pose 3D face reconstruction from a single image via direct volumetric CNN regression. In: ICCV (2017)
[47]
Güler, R.A., George, T., Antonakos, E., Snape, P., Zafeiriou, S., Kokkinos, I.: DenseReg: fully convolutional dense shape regression in-the-wild. In: CVPR (2017)
[48]
Güler, R.A., Neverova, N., Kokkinos, I.: DensePose: dense human pose estimation in the wild. In: CVPR (2018)
[49]
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: NIPS (2014)
[50]
Luvizon, D.C., Picard, D., Tabia, H.: 2D/3D pose estimation and action recognition using multitask deep learning. In: CVPR (2018)
[51]
Popa, A., Zanfir, M., Sminchisescu, C.: Deep multitask architecture for integrated 2D and 3D human sensing. In: CVPR (2017)
[52]
Nooruddin FS and Turk G Simplification and repair of polygonal models using volumetric techniques IEEE Trans. Vis. Comput. Graph. 2003 9 2 191-205
[54]
Zhu, R., Kiani, H., Wang, C., Lucey, S.: Rethinking reprojection: closing the loop for pose-aware shape reconstruction from a single image. In: ICCV (2017)
[55]
Tulsiani, S., Zhou, T., Efros, A.A., Malik, J.: Multi-view supervision for single-view reconstruction via differentiable ray consistency. In: CVPR (2017)
[56]
Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: CVPR (2014)
[57]
Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012)
[59]
Lewiner T, Lopes H, Vieira AW, and Tavares G Efficient implementation of marching cubes cases with topological guarantees J. Graph. Tools 2003 8 2 1-15
[60]
Nocedal J and Wright SJ Numerical Optimization 2006 New York Springer
[62]
Barbosa IB, Cristani M, Caputo B, Rognhaugen A, and Theoharis T Looking beyond appearances: synthetic training data for deep CNNs in re-identification CVIU 2018 167 50-62
[63]
Ghezelghieh, M.F., Kasturi, R., Sarkar, S.: Learning camera viewpoint using CNN to improve 3D body pose estimation. In: 3DV (2016)
[64]
Chen, W., et al.: Synthesizing training images for boosting human 3D pose estimation. In: 3DV (2016)
[65]
Butler DJ, Wulff J, Stanley GB, and Black MJ Fitzgibbon A, Lazebnik S, Perona P, Sato Y, and Schmid C A naturalistic open source movie for optical flow evaluation Computer Vision – ECCV 2012 2012 Heidelberg Springer 611-625
[66]
Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: BMVC (2010)

Cited By

View all

Index Terms

  1. BodyNet: Volumetric Inference of 3D Human Body Shapes
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image Guide Proceedings
          Computer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part VII
          Sep 2018
          849 pages
          ISBN:978-3-030-01233-5
          DOI:10.1007/978-3-030-01234-2

          Publisher

          Springer-Verlag

          Berlin, Heidelberg

          Publication History

          Published: 08 September 2018

          Author Tags

          1. Net Body
          2. Body Part Segmentation
          3. Intermediate Supervision
          4. Body Shape Estimation
          5. People Dataset

          Qualifiers

          • Article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 15 Sep 2024

          Other Metrics

          Citations

          Cited By

          View all

          View Options

          View options

          Get Access

          Login options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media