Article

BodyNet: Volumetric Inference of 3D Human Body Shapes

Authors:

Cordelia SchmidAuthors Info & Claims

Computer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part VII

Pages 20 - 38

https://doi.org/10.1007/978-3-030-01234-2_2

Published: 08 September 2018 Publication History

Abstract

Human shape estimation is an important task for video editing, animation and fashion industry. Predicting 3D human body shape from natural images, however, is highly challenging due to factors such as variation in human bodies, clothing and viewpoint. Prior methods addressing this problem typically attempt to fit parametric body models with certain priors on pose and shape. In this work we argue for an alternative representation and propose BodyNet, a neural network for direct inference of volumetric body shape from a single image. BodyNet is an end-to-end trainable network that benefits from (i) a volumetric 3D loss, (ii) a multi-view re-projection loss, and (iii) intermediate supervision of 2D pose, 2D body part segmentation, and 3D pose. Each of them results in performance improvement as demonstrated by our experiments. To evaluate the method, we fit the SMPL model to our network output and show state-of-the-art results on the SURREAL and Unite the People datasets, outperforming recent approaches. Besides achieving state-of-the-art performance, our method also enables volumetric body-part segmentation.

References

[1]

Newell A, Yang K, and Deng J Leibe B, Matas J, Sebe N, and Welling M Stacked hourglass networks for human pose estimation Computer Vision – ECCV 2016 2016 Cham Springer 483-499

[2]

Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: CVPR (2016)

[3]

Pishchulin, L., et al.: DeepCut: joint subset partition and labeling for multi person pose estimation. In: CVPR (2016)

[4]

Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: CVPR (2017)

[5]

Martinez, J., Hossain, R., Romero, J., Little, J.J.: A simple yet effective baseline for 3D human pose estimation. In: ICCV (2017)

[6]

Pavlakos, G., Zhou, X., Derpanis, K.G., Daniilidis, K.: Coarse-to-fine volumetric prediction for single-image 3D human pose. In: CVPR (2017)

[7]

Rogez, G., Weinzaepfel, P., Schmid, C.: LCR-Net: localization-classification-regression for human pose. In: CVPR (2017)

[8]

Zhou, X., Huang, Q., Sun, X., Xue, X., Wei, Y.: Towards 3D human pose estimation in the wild: a weakly-supervised approach. In: ICCV (2017)

[9]

Leroy, V., Franco, J.S., Boyer, E.: Multi-view dynamic shape refinement using local temporal integration. In: ICCV (2017)

[10]

Loper, M.M., Mahmood, N., Black, M.J.: MoSh: motion and shape capture from sparse markers. In: SIGGRAPH (2014)

[11]

von Marcard, T., Rosenhahn, B., Black, M., Pons-Moll, G.: Sparse inertial poser: automatic 3D human pose estimation from sparse IMUs. In: Eurographics (2017)

[12]

Yang J, Franco J-S, Hétroy-Wheeler F, and Wuhrer S Leibe B, Matas J, Sebe N, and Welling M Estimation of human body shape in motion with wide clothing Computer Vision – ECCV 2016 2016 Cham Springer 439-454

[13]

Bogo F, Kanazawa A, Lassner C, Gehler P, Romero J, and Black MJ Leibe B, Matas J, Sebe N, and Welling M Keep It SMPL: automatic estimation of 3d human pose and shape from a single image Computer Vision – ECCV 2016 2016 Cham Springer 561-578

[14]

Tan, V., Budvytis, I., Cipolla, R.: Indirect deep structured learning for 3D human body shape and pose prediction. In: BMVC (2017)

[15]

Tung, H., Tung, H., Yumer, E., Fragkiadaki, K.: Self-supervised learning of motion capture. In: NIPS (2017)

[16]

Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: CVPR (2018)

[17]

Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.: SMPL: a skinned multi-person linear model. In: SIGGRAPH (2015)

[18]

Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS (2012)

[19]

LeCun Y., Boser B., Denker J. S., Henderson D., Howard R. E., Hubbard W., and Jackel L. D. Backpropagation Applied to Handwritten Zip Code Recognition Neural Computation 1989 1 4 541-551

[20]

Maturana, D., Scherer, S.: VoxNet: a 3D convolutional neural network for real-time object recognition. In: IROS (2015)

[21]

Yan, X., Yang, J., Yumer, E., Guo, Y., Lee, H.: Perspective transformer nets: learning single-view 3D object reconstruction without 3D supervision. In: NIPS (2016)

[22]

Yumer ME and Mitra NJ Leibe B, Matas J, Sebe N, and Welling M Learning semantic deformation flows with 3D convolutional networks Computer Vision – ECCV 2016 2016 Cham Springer 294-311

[23]

Yumer ME and Mitra NJ Leibe B, Matas J, Sebe N, and Welling M Learning semantic deformation flows with 3D convolutional networks Computer Vision – ECCV 2016 2016 Cham Springer 294-311

[24]

Tatarchenko, M., Dosovitskiy, A., Brox, T.: Octree generating networks: Efficient convolutional architectures for high-resolution 3D outputs. In: ICCV (2017)

[25]

Riegler, G., Ulusoy, A.O., Geiger, A.: OctNet: learning deep 3D representations at high resolutions. In: CVPR (2017)

[26]

Wang, P.S., Liu, Y., Guo, Y.X., Sun, C.Y., Tong, X.: O-CNN: Octree-based convolutional neural networks for 3D shape analysis. In: SIGGRAPH (2017)

[27]

Riegler, G., Ulusoy, A.O., Bischof, H., Geiger, A.: OctNetFusion: learning depth fusion from data. In: 3DV (2017)

[28]

Su, H., Fan, H., Guibas, L.: A point set generation network for 3D object reconstruction from a single image. In: CVPR (2017)

[29]

Su, H., Qi, C., Mo, K., Guibas, L.: PointNet: deep learning on point sets for 3D classification and segmentation. In: CVPR (2017)

[30]

Deng, H., Birdal, T., Ilic, S.: PPFNet: global context aware local features for robust 3D point matching. In: CVPR (2018)

[31]

Groueix, T., Fisher, M., Kim, V.G., Russell, B., Aubry, M.: AtlasNet: a Papier-Mâché approach to learning 3D surface generation. In: CVPR (2018)

[32]

Toshev, A., Szegedy, C.: DeepPose: human pose estimation via deep neural networks. In: CVPR (2014)

[33]

Varol, G., et al.: Learning from synthetic humans. In: CVPR (2017)

[34]

Lassner, C., Romero, J., Kiefel, M., Bogo, F., Black, M.J., Gehler, P.V.: Unite the people: closing the loop between 3D and 2D human representations. In: CVPR (2017)

[35]

Ionescu Catalin, Papava Dragos, Olaru Vlad, and Sminchisescu Cristian Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments IEEE Transactions on Pattern Analysis and Machine Intelligence 2014 36 7 1325-1339

[36]

Kostrikov, I., Gall, J.: Depth sweep regression forests for estimating 3D human pose from images. In: BMVC (2014)

[37]

Yasin, H., Iqbal, U., Kruger, B., Weber, A., Gall, J.: A dual-source approach for 3D pose estimation from a single image. In: CVPR (2016)

[38]

Rogez, G., Schmid, C.: MoCap-guided data augmentation for 3D pose estimation in the wild. In: NIPS (2016)

[39]

Balan, A., Sigal, L., Black, M.J., Davis, J., Haussecker, H.: Detailed human shape and pose from images. In: CVPR (2007)

[40]

Guan, P., Weiss, A., O. Balan, A., Black, M.: Estimating human shape and pose from a single image. In: ICCV (2009)

[41]

Anguelov, D., Srinivasan, P., Koller, D., Thrun, S., Rodgers, J., Davis, J.: SCAPE: shape completion and animation of people. In: SIGGRAPH (2005)

[42]

Huang, Y., et al.: Towards accurate marker-less human shape and pose estimation over time. In: 3DV (2017)

[43]

Alldieck, T., Kassubeck, M., Wandt, B., Rosenhahn, B., Magnor, M.: Optical flow-based 3D human motion estimation from monocular video. In: GCPR (2017)

[44]

Rhodin H, Robertini N, Casas D, Richardt C, Seidel H-P, and Theobalt C Leibe B, Matas J, Sebe N, and Welling M General automatic human shape and motion capture using volumetric contour cues Computer Vision – ECCV 2016 2016 Cham Springer 509-526

[45]

Dibra, E., Jain, H., Öztireli, C., Ziegler, R., Gross, M.: HS-Nets: estimating human body shape from silhouettes with convolutional neural networks. In: 3DV (2016)

[46]

Jackson, A.S., Bulat, A., Argyriou, V., Tzimiropoulos, G.: Large pose 3D face reconstruction from a single image via direct volumetric CNN regression. In: ICCV (2017)

[47]

Güler, R.A., George, T., Antonakos, E., Snape, P., Zafeiriou, S., Kokkinos, I.: DenseReg: fully convolutional dense shape regression in-the-wild. In: CVPR (2017)

[48]

Güler, R.A., Neverova, N., Kokkinos, I.: DensePose: dense human pose estimation in the wild. In: CVPR (2018)

[49]

Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: NIPS (2014)

[50]

Luvizon, D.C., Picard, D., Tabia, H.: 2D/3D pose estimation and action recognition using multitask deep learning. In: CVPR (2018)

[51]

Popa, A., Zanfir, M., Sminchisescu, C.: Deep multitask architecture for integrated 2D and 3D human sensing. In: CVPR (2017)

[52]

Nooruddin FS and Turk G Simplification and repair of polygonal models using volumetric techniques IEEE Trans. Vis. Comput. Graph. 2003 9 2 191-205

[53]

Min, P.: binvox. http://www.patrickmin.com/binvox

[54]

Zhu, R., Kiani, H., Wang, C., Lucey, S.: Rethinking reprojection: closing the loop for pose-aware shape reconstruction from a single image. In: ICCV (2017)

[55]

Tulsiani, S., Zhou, T., Efros, A.A., Malik, J.: Multi-view supervision for single-view reconstruction via differentiable ray consistency. In: CVPR (2017)

[56]

Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: CVPR (2014)

[57]

Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012)

[58]

http://www.di.ens.fr/willow/research/bodynet/

[59]

Lewiner T, Lopes H, Vieira AW, and Tavares G Efficient implementation of marching cubes cases with topological guarantees J. Graph. Tools 2003 8 2 1-15

[60]

Nocedal J and Wright SJ Numerical Optimization 2006 New York Springer

[61]

http://chumpy.org

[62]

Barbosa IB, Cristani M, Caputo B, Rognhaugen A, and Theoharis T Looking beyond appearances: synthetic training data for deep CNNs in re-identification CVIU 2018 167 50-62

[63]

Ghezelghieh, M.F., Kasturi, R., Sarkar, S.: Learning camera viewpoint using CNN to improve 3D body pose estimation. In: 3DV (2016)

[64]

Chen, W., et al.: Synthesizing training images for boosting human 3D pose estimation. In: 3DV (2016)

[65]

Butler DJ, Wulff J, Stanley GB, and Black MJ Fitzgibbon A, Lazebnik S, Perona P, Sato Y, and Schmid C A naturalistic open source movie for optical flow evaluation Computer Vision – ECCV 2012 2012 Heidelberg Springer 611-625

[66]

Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: BMVC (2010)

Cited By

Xie ZHe HZou GWu JLiu GZhao JWang YLin HLin WGurrin CKongkachandra RSchoeffmann KDang-Nguyen DRossetto LSatoh SZhou L(2024)Visibility-guided Human Body Reconstruction from Uncalibrated Multi-view CamerasProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658110(589-598)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3652583.3658110
Wang SNing ZTruong ADontcheva MLi DChilton L(2024)PodReels: Human-AI Co-Creation of Video Podcast TeasersProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3661591(958-974)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1145/3643834.3661591
Kushwaha MChoudhary JSingh D(2024)3DPMeshComputers and Graphics10.1016/j.cag.2024.103894119:COnline publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1016/j.cag.2024.103894
Show More Cited By

Index Terms

BodyNet: Volumetric Inference of 3D Human Body Shapes
1. Computing methodologies

Index terms have been assigned to the content through auto-classification.

Recommendations

3D Body Shapes Estimation from Dressed-Human Silhouettes

Estimation of 3D body shapes from dressed-human photos is an important but challenging problem in virtual fitting. We propose a novel automatic framework to efficiently estimate 3D body shapes under clothes. We construct a database of 3D naked and ...
Example-based statistical framework for parametric modeling of human body shapes

A statistical framework was developed for parametric modeling of human body shapes.We developed a non-linear optimization-based optimal body shape modeling technique.New body shapes were generated by inputting linear anthropometric parameters.Resultant ...
3D Body Reconstruction for Immersive Interaction
AMDO '02: Proceedings of the Second International Workshop on Articulated Motion and Deformable Objects

In this paper we present an approach for capturing 3D body motion and inferring human body posture from detected silhouettes. We show that the integration of two or more silhouettes allows us to perform a 3D body reconstruction while each silhouette can ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

Computer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part VII

Sep 2018

849 pages

ISBN:978-3-030-01233-5

DOI:10.1007/978-3-030-01234-2

Editors:
Vittorio Ferrari
Google Research, Zurich, Switzerland
,
Martial Hebert
Carnegie Mellon University, Pittsburgh, PA, USA
,
Cristian Sminchisescu
Google Research, Zurich, Switzerland
,
Yair Weiss
Hebrew University of Jerusalem, Jerusalem, Israel

© Springer Nature Switzerland AG 2018.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 08 September 2018

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

35
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Xie ZHe HZou GWu JLiu GZhao JWang YLin HLin WGurrin CKongkachandra RSchoeffmann KDang-Nguyen DRossetto LSatoh SZhou L(2024)Visibility-guided Human Body Reconstruction from Uncalibrated Multi-view CamerasProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658110(589-598)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3652583.3658110
Wang SNing ZTruong ADontcheva MLi DChilton L(2024)PodReels: Human-AI Co-Creation of Video Podcast TeasersProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3661591(958-974)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1145/3643834.3661591
Kushwaha MChoudhary JSingh D(2024)3DPMeshComputers and Graphics10.1016/j.cag.2024.103894119:COnline publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1016/j.cag.2024.103894
Xiong ZDu DWu YDong JKang DBao LHan X(2024)PIFu for the Real World: A Self-supervised Framework to Reconstruct Dressed Human from Single-View ImagesComputational Visual Media10.1007/978-981-97-2095-8_1(3-23)Online publication date: 10-Apr-2024
https://dl.acm.org/doi/10.1007/978-981-97-2095-8_1
Zhang ZSun LYang ZChen LYang YOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Global-correlated 3D-decoupling transformer for clothed avatar reconstructionProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666465(7818-7830)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3666465
Huang ZShi MLiu CXian KCao ZEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)SimHMR: A Simple Query-based Framework for Parameterized Human Mesh ReconstructionProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611814(6918-6927)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3611814
Hirzle TMüller FDraxler FSchmitz MKnierim PHornbæk K(2023)When XR and AI Meet - A Scoping Review on Extended Reality and Artificial IntelligenceProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581072(1-45)Online publication date: 19-Apr-2023
https://dl.acm.org/doi/10.1145/3544548.3581072
Yu HCheang CFu YXue X(2023)Multi-view Shape Generation for a 3D Human-like BodyACM Transactions on Multimedia Computing, Communications, and Applications10.1145/351424819:1(1-22)Online publication date: 5-Jan-2023
https://dl.acm.org/doi/10.1145/3514248
Xie DWang ZCai GXia QChen YYang S(2022)Monocular Camera Video Based Reconstruction of 3D human modelProceedings of the 2022 6th International Conference on Electronic Information Technology and Computer Engineering10.1145/3573428.3573670(1365-1371)Online publication date: 21-Oct-2022
https://dl.acm.org/doi/10.1145/3573428.3573670
Liu YSra M(2022)Mix3D: Assembly and Animation of Seamlessly Stitched Meshes for Creating Hybrid Creatures and ObjectsProceedings of the 2022 ACM Symposium on Spatial User Interaction10.1145/3565970.3567686(1-12)Online publication date: 1-Dec-2022
https://dl.acm.org/doi/10.1145/3565970.3567686
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents