skip to main content
research-article
Open access

XNect: real-time multi-person 3D motion capture with a single RGB camera

Published: 12 August 2020 Publication History

Abstract

We present a real-time approach for multi-person 3D motion capture at over 30 fps using a single RGB camera. It operates successfully in generic scenes which may contain occlusions by objects and by other people. Our method operates in subsequent stages. The first stage is a convolutional neural network (CNN) that estimates 2D and 3D pose features along with identity assignments for all visible joints of all individuals. We contribute a new architecture for this CNN, called SelecSLS Net, that uses novel selective long and short range skip connections to improve the information flow allowing for a drastically faster network without compromising accuracy. In the second stage, a fullyconnected neural network turns the possibly partial (on account of occlusion) 2D pose and 3D pose features for each subject into a complete 3D pose estimate per individual. The third stage applies space-time skeletal model fitting to the predicted 2D and 3D pose per subject to further reconcile the 2D and 3D pose, and enforce temporal coherence. Our method returns the full skeletal pose in joint angles for each subject. This is a further key distinction from previous work that do not produce joint angle results of a coherent skeleton in real time for multi-person scenes. The proposed system runs on consumer hardware at a previously unseen speed of more than 30 fps given 512x320 images as input while achieving state-of-the-art accuracy, which we will demonstrate on a range of challenging real-world scenes.

Supplemental Material

MP4 File
Presentation video
Transcript for: Presentation video
MP4 File
ZIP File
Supplemental files.

References

[1]
1MILLION TV. 2016. GDFR - Flo Rida / 1MILLION Dance TUTORIAL (1/2). https://www.youtube.com/watch?v=9HkVnFpmXAw.
[2]
Adobe. 2020. Mixamo. https://www.mixamo.com/.
[3]
Thiemo Alldieck, Marcus Magnor, Bharat Lal Bhatnagar, Christian Theobalt, and Gerard Pons-Moll. 2019. Learning to Reconstruct People in Clothing from a Single RGB Camera. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4]
Thiemo Alldieck, Marcus Magnor, Weipeng Xu, Christian Theobalt, and Gerard Pons-Moll. 2018a. Detailed Human Avatars from Monocular Video. In 3DV.
[5]
Thiemo Alldieck, Marcus Magnor, Weipeng Xu, Christian Theobalt, and Gerard Pons-Moll. 2018b. Video Based Reconstruction of 3D People Models. In CVPR.
[6]
Mykhaylo Andriluka, Leonid Pishchulin, Peter Gehler, and Bernt Schiele. 2014. 2D Human Pose Estimation: New Benchmark and State of the Art Analysis. In CVPR.
[7]
Anurag Arnab, Carl Doersch, and Andrew Zisserman. 2019. Exploiting temporal context for 3D human pose estimation in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3395--3404.
[8]
Bharat Lal Bhatnagar, Garvita Tiwari, Christian Theobalt, and Gerard Pons-Moll. 2019. Multi-Garment Net: Learning to Dress 3D People from Images. In IEEE International Conference on Computer Vision (ICCV). IEEE.
[9]
Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter Gehler, Javier Romero, and Michael J. Black. 2016. Keep it SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image. In ECCV.
[10]
Boxing School Alexei Frolov. 2018. Boxing: bouncing - slip - overhand punch. https://www.youtube.com/watch?v=dbuz9Q05bsM.
[11]
Brave Entertainment. 2017. (Rollin') Dance Practice Video. https://www.youtube.com/watch?v=ZhuDSdmby8k.
[12]
Adrian Bulat and Georgios Tzimiropoulos. 2017. Binarized convolutional landmark localizers for human pose estimation and face alignment with limited resources. In International Conference on Computer Vision.
[13]
Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2017. Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. In CVPR.
[14]
Géry Casiez, Nicolas Roussel, and Daniel Vogel. 2012. 1 € Filter: A Simple Speed-based Low-pass Filter for Noisy Input in Interactive Systems (CHI '12). ACM, New York, NY, USA, 2527--2530.
[15]
Jinxiang Chai and Jessica K Hodgins. 2005. Performance animation from low-dimensional control signals. TOG 24, 3 (2005), 686--696.
[16]
Wenzheng Chen, Huan Wang, Yangyan Li, Hao Su, Zhenhua Wang, Changhe Tu, Dani Lischinski, Daniel Cohen-Or, and Baoquan Chen. 2016. Synthesizing Training Images for Boosting Human 3D Pose Estimation. In 3DV.
[17]
Rishabh Dabral, Nitesh B Gundavarapu, Rahul Mitra, Abhishek Sharma, Ganesh Ramakrishnan, and Arjun Jain. 2019. Multi-Person 3D Human Pose Estimation from Monocular Images. In 2019 International Conference on 3D Vision (3DV). IEEE, 405--414.
[18]
Rishabh Dabral, Anurag Mundhada, Uday Kusupati, Safeer Afaque, Abhishek Sharma, and Arjun Jain. 2018. Learning 3d human pose from structure and motion. In Proceedings of the European Conference on Computer Vision (ECCV). 668--683.
[19]
A. Elhayek, E. Aguiar, A. Jain, J. Tompson, L. Pishchulin, M. Andriluka, C. Bregler, B. Schiele, and C. Theobalt. 2016. MARCOnI - ConvNet-based MARker-less Motion Capture in Outdoor and Indoor Scenes. PAMI (2016).
[20]
Hao-Shu Fang, Yuanlu Xu, Wenguan Wang, Xiaobai Liu, and Song-Chun Zhu. 2018. Learning pose grammar to encode human body configuration for 3d pose estimation. In Thirty-Second AAAI Conference on Artificial Intelligence.
[21]
Jonathan Frankle and Michael Carbin. 2018. The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv preprint arXiv:1803.03635 (2018).
[22]
Georgia Gkioxari, Bharath Hariharan, Ross Girshick, and Jitendra Malik. 2014. Using k-poselets for detecting people and localizing their keypoints. In CVPR. 3582--3589.
[23]
Peng Guan, A. Weiss, A. O. BÃčlan, and M. J. Black. 2009. Estimating human shape and pose from a single image. In CVPR. 1381--1388.
[24]
Riza Alp Guler and Iasonas Kokkinos. 2019. Holopose: Holistic 3d human reconstruction in-the-wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 10884--10894.
[25]
Riza Alp Güler, Natalia Neverova, and Iasonas Kokkinos. 2018. Dense-Pose: Dense Human Pose Estimation in the Wild. In CVPR. 7297--7306.
[26]
Semih Gunel, Helge Rhodin, and Pascal Fua. 2019. What Face and Body Shapes Can Tell Us About Height. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 0--0.
[27]
Marc Habermann, Weipeng Xu, Michael Zollhoefer, Gerard Pons-Moll, and Christian Theobalt. 2019. LiveCap: Real-time Human Performance Capture from Monocular Video. ACM Transactions on Graphics, (Proc. SIGGRAPH) (jul 2019).
[28]
Lei Tan Lin Gui Bart Nabbe Iain Matthews Takeo Kanade Shohei Nobuhara Hanbyul Joo, Hao Liu and Yaser Sheikh. 2015. Panoptic Studio: A Massively Multiview System for Social Motion Capture. In ICCV.
[29]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR.
[30]
Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, et al. 2019. Searching for mobilenetv3. In Proceedings of the IEEE International Conference on Computer Vision. 1314--1324.
[31]
Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017b. Densely connected convolutional networks. In CVPR.
[32]
Yinghao Huang, Federica Bogo, Christoph Lassner, Angjoo Kanazawa, Peter V Gehler, Javier Romero, Ijaz Akhter, and Michael J Black. 2017a. Towards accurate marker-less human shape and pose estimation over time. In 2017 International Conference on 3D Vision (3DV). IEEE, 421--430.
[33]
Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size. arXiv:1602.07360 (2016).
[34]
Eldar Insafutdinov, Mykhaylo Andriluka, Leonid Pishchulin, Siyu Tang, Evgeny Levinkov, Bjoern Andres, Bernt Schiele, and Saarland Informatics Campus. 2017. ArtTrack: Articulated multi-person tracking in the wild. In CVPR.
[35]
Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu. 2014. Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. PAMI 36, 7 (2014), 1325--1339.
[36]
Umar Iqbal and Juergen Gall. 2016. Multi-person pose estimation with local joint-to-person associations. In ECCV Workshops. Springer, 627--642.
[37]
Arjun Jain, Thorsten Thormählen, Hans-Peter Seidel, and Christian Theobalt. 2010. MovieReshape: Tracking and Reshaping of Humans in Videos. TOG 29, 6, Article 148 (Dec. 2010), 10 pages.
[38]
Sam Johnson and Mark Everingham. 2010. Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation. In BMVC.
[39]
Sam Johnson and Mark Everingham. 2011. Learning Effective Human Pose Estimation from Inaccurate Annotation. In CVPR.
[40]
Hanbyul Joo, Hao Liu, Lei Tan, Lin Gui, Bart Nabbe, Iain Matthews, Takeo Kanade, Shohei Nobuhara, and Yaser Sheikh. 2015. Panoptic Studio: A Massively Multiview System for Social Motion Capture. In ICCV. 3334--3342.
[41]
Angjoo Kanazawa, Michael J. Black, David W. Jacobs, and Jitendra Malik. 2018. End-to-end Recovery of Human Shape and Pose. In CVPR.
[42]
Angjoo Kanazawa, Jason Y. Zhang, Panna Felsen, and Jitendra Malik. 2019. Learning 3D Human Dynamics from Video. In Computer Vision and Pattern Recognition (CVPR).
[43]
Isinsu Katircioglu, Bugra Tekin, Mathieu Salzmann, Vincent Lepetit, and Pascal Fua. 2018. Learning latent representations of 3d human pose with deep neural networks. International Journal of Computer Vision 126, 12 (2018), 1326--1341.
[44]
KNG Music. 2019. SEÃŚORITA - Shawn Mendes Camila Cabello | Violin, Cello, and Viola Cover [KNG Music]. https://www.youtube.com/watch?v=_xCKmEhKQl4.
[45]
Muhammed Kocabas, Nikos Athanasiou, and Michael J. Black. 2020. VIBE: Video Inference for Human Body Pose and Shape Estimation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[46]
Nikos Kolotouros, Georgios Pavlakos, Michael J Black, and Kostas Daniilidis. 2019b. Learning to Reconstruct 3D Human Pose and Shape via Model-fitting in the Loop. In Proceedings of the IEEE International Conference on Computer Vision. 2252--2261.
[47]
Nikos Kolotouros, Georgios Pavlakos, and Kostas Daniilidis. 2019a. Convolutional mesh regression for single-image human shape reconstruction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4501--4510.
[48]
Christoph Lassner, Javier Romero, Martin Kiefel, Federica Bogo, Michael J. Black, and Peter V. Gehler. 2017. Unite the People: Closing the Loop Between 3D and 2D Human Representations. In CVPR.
[49]
Sijin Li and Antoni B Chan. 2014. 3d human pose estimation from monocular images with deep convolutional neural network. In ACCV.
[50]
Sijin Li, Weichen Zhang, and Antoni B Chan. 2015. Maximum-margin structured learning with deep networks for 3d human pose estimation. In ICCV. 2848--2856.
[51]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In ECCV. Springer, 740--755.
[52]
Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J Black. 2015. SMPL: A Skinned Multi-Person Linear Model. TOG 34, 6 (2015), 248:1--248:16.
[53]
Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, and Jian Sun. 2018. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European Conference on Computer Vision (ECCV). 116--131.
[54]
Julieta Martinez, Rayat Hossain, Javier Romero, and James J. Little. 2017. A simple yet effective baseline for 3d human pose estimation. In ICCV.
[55]
Dushyant Mehta, Helge Rhodin, Dan Casas, Pascal Fua, Oleksandr Sotnychenko, Weipeng Xu, and Christian Theobalt. 2017a. Monocular 3D Human Pose Estimation In The Wild Using Improved CNN Supervision. In 3DV. IEEE.
[56]
Dushyant Mehta, Oleksandr Sotnychenko, Franziska Mueller, Weipeng Xu, Srinath Sridhar, Gerard Pons-Moll, and Christian Theobalt. 2018b. Single-Shot Multi-Person 3D Pose Estimation From Monocular RGB. In 3DV. IEEE. http://gvv.mpi-inf.mpg.de/projects/SingleShotMultiPerson
[57]
Dushyant Mehta, Srinath Sridhar, Oleksandr Sotnychenko, Helge Rhodin, Mohammad Shafiei, Hans-Peter Seidel, Weipeng Xu, Dan Casas, and Christian Theobalt. 2017b. VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera. In TOG, Vol. 36. 14.
[58]
Sachin Mehta, Mohammad Rastegari, Anat Caspi, Linda Shapiro, and Hannaneh Hajishirzi. 2018a. ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation. In ECCV.
[59]
Alberto Menache. 2010. Understanding Motion Capture for Computer Animation, Second Edition (2nd ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
[60]
Gyeongsik Moon, Juyong Chang, and Kyoung Mu Lee. 2019. Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image. In The IEEE Conference on International Conference on Computer Vision (ICCV).
[61]
Music Express Magazine. 2013a. Arirang. https://www.youtube.com/watch?v=kX6xMYlEwLA.
[62]
Music Express Magazine. 2013b. Down at the Twist Shout. https://www.youtube.com/watch?v=lv-h4WNnw0g.
[63]
Alejandro Newell and Jia Deng. 2017. Associative Embedding: End-to-End Learning for Joint Detection and Grouping. In NeurIPS.
[64]
Aiden Nibali, Zhen He, Stuart Morgan, and Luke Prendergast. 2019. 3D Human Pose Estimation with 2D Marginal Heatmaps. In WACV.
[65]
Mohamed Omran, Christop Lassner, Gerard Pons-Moll, Peter Gehler, and Bernt Schiele. 2018. Neural Body Fitting: Unifying Deep Learning and Model Based Human Pose and Shape Estimation. In 3DV.
[66]
George Papandreou, Tyler Zhu, Nori Kanazawa, Alexander Toshev, Jonathan Tompson, Chris Bregler, and Kevin Murphy. 2017. Towards Accurate Multi-person Pose Estimation in the Wild. In CVPR.
[67]
Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed A. A. Osman, Dimitrios Tzionas, and Michael J. Black. 2019. Expressive Body Capture: 3D Hands, Face, and Body from a Single Image. (2019).
[68]
Georgios Pavlakos, Xiaowei Zhou, and Kostas Daniilidis. 2018a. Ordinal Depth Supervision for 3D Human Pose Estimation. In CVPR.
[69]
Georgios Pavlakos, Xiaowei Zhou, Konstantinos G Derpanis, and Kostas Daniilidis. 2017. Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose. In CVPR.
[70]
Georgios Pavlakos, Luyang Zhu, Xiaowei Zhou, and Kostas Daniilidis. 2018b. Learning to Estimate 3D Human Pose and Shape from a Single Color Image. In CVPR.
[71]
Leonid Pishchulin, Eldar Insafutdinov, Siyu Tang, Bjoern Andres, Mykhaylo Andriluka, Peter Gehler, and Bernt Schiele. 2016. DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation. In CVPR.
[72]
Leonid Pishchulin, Arjun Jain, Mykhaylo Andriluka, Thorsten Thormählen, and Bernt Schiele. 2012. Articulated people detection and pose estimation: Reshaping the future. In CVPR. IEEE, 3178--3185.
[73]
Gerard Pons-Moll, David J Fleet, and Bodo Rosenhahn. 2014. Posebits for monocular human pose estimation. In CVPR. 2337--2344.
[74]
Alin-Ionut Popa, Mihai Zanfir, and Cristian Sminchisescu. 2017. Deep Multitask Architecture for Integrated 2D and 3D Human Sensing. In CVPR.
[75]
Varun Ramakrishna, Takeo Kanade, and Yaser Sheikh. 2012. Reconstructing 3d human pose from 2d image landmarks. In ECCV. Springer, 573--586.
[76]
Mir Rayat Imtiaz Hossain and James J Little. 2018. Exploiting temporal information for 3d human pose estimation. In Proceedings of the European Conference on Computer Vision (ECCV). 68--84.
[77]
Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V Le. 2019. Regularized evolution for image classifier architecture search. 33 (2019), 4780--4789.
[78]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In NeurIPS. 91--99.
[79]
Helge Rhodin, Christian Richardt, Dan Casas, Eldar Insafutdinov, Mohammad Shafiei, Hans-Peter Seidel, Bernt Schiele, and Christian Theobalt. 2016a. EgoCap: Egocentric Marker-less Motion Capture with Two Fisheye Cameras. TOG (Proc. SIGGRAPH Asia) (2016).
[80]
Helge Rhodin, Nadia Robertini, Dan Casas, Christian Richardt, Hans-Peter Seidel, and Christian Theobalt. 2016b. General Automatic Human Shape and Motion Capture Using Volumetric Contour Cues. In ECCV. 509--526.
[81]
Gregory Rogez, Philippe Weinzaepfel, and Cordelia Schmid. 2017. LCR-Net: Localization-Classification-Regression for Human Pose. In CVPR.
[82]
Grégory Rogez, Philippe Weinzaepfel, and Cordelia Schmid. 2019. LCR-Net++: Multi-person 2D and 3D Pose Detection in Natural Images. PAMI (2019).
[83]
Eduardo Romera, José M Alvarez, Luis M Bergasa, and Roberto Arroyo. 2018. Erfnet: Efficient residual factorized convnet for real-time semantic segmentation. IEEE Transactions on Intelligent Transportation Systems 19, 1 (2018), 263--272.
[84]
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In CVPR. IEEE, 4510--4520.
[85]
Nikolaos Sarafianos, Bogdan Boteanu, Bogdan Ionescu, and Ioannis A Kakadiaris. 2016. 3D Human pose estimation: A review of the literature and analysis of covariates. Computer Vision and Image Understanding 152 (2016), 1--20.
[86]
Leonid Sigal, Alexandru O Balan, and Michael J Black. 2010. Humaneva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. IJCV 87, 1-2 (2010), 4--27.
[87]
Jonathan Starck and Adrian Hilton. 2007. Surface capture for performance-based animation. IEEE computer graphics and applications 27, 3 (2007).
[88]
Carsten Stoll, Nils Hasler, Juergen Gall, Hans-Peter Seidel, and Christian Theobalt. 2011. Fast articulated motion tracking using a sums of Gaussians body model. In ICCV. 951--958.
[89]
Ke Sun, Bin Xiao, Dong Liu, and Jingdong Wang. 2019a. Deep High-Resolution Representation Learning for Human Pose Estimation. In CVPR.
[90]
Min Sun and Silvio Savarese. 2011. Articulated part-based model for joint object detection and pose estimation. In ICCV. IEEE, 723--730.
[91]
Xiao Sun, Jiaxiang Shang, Shuang Liang, and Yichen Wei. 2017. Compositional human pose regression. 2, 3 (2017), 7.
[92]
Xiao Sun, Bin Xiao, Fangyin Wei, Shuang Liang, and Yichen Wei. 2018. Integral human pose regression. In Proceedings of the European Conference on Computer Vision (ECCV). 529--545.
[93]
Yu Sun, Yun Ye, Wu Liu, Wenpeng Gao, Yili Fu, and Tao Mei. 2019b. Human mesh recovery from monocular images via a skeleton-disentangled representation. In Proceedings of the IEEE International Conference on Computer Vision. 5349--5358.
[94]
Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A Alemi. 2017. Inception-v4, inception-resnet and the impact of residual connections on learning. In AAAI, Vol. 4. 12.
[95]
Mingxing Tan and Quoc V Le. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.11946 (2019).
[96]
Bugra Tekin, Isinsu Katircioglu, Mathieu Salzmann, Vincent Lepetit, and Pascal Fua. 2016. Structured Prediction of 3D Human Pose with Deep Neural Networks. In BMVC.
[97]
Bugra Tekin, Pablo Márquez-Neila, Mathieu Salzmann, and Pascal Fua. 2017. Learning to Fuse 2D and 3D Image Cues for Monocular Body Pose Estimation. In ICCV.
[98]
Denis Tome, Patrick Peluse, Lourdes Agapito, and Hernan Badino. 2019. xR-EgoPose: Egocentric 3D Human Pose from an HMD Camera. In Proceedings of the IEEE International Conference on Computer Vision. 7728--7738.
[99]
TPLA:Terra Prime Light Armory. 2016. Commited Sparring. https://www.youtube.com/watch?v=xmFVfUKr1MQ.
[100]
Matthew Trumble, Andrew Gilbert, Charles Malleson, Adrian Hilton, and John Collomosse. 2017. Total Capture: 3D Human Pose Estimation Fusing Video and Inertial Sensors. In BMVC. 1--13.
[101]
Hsiao-Yu Tung, Hsiao-Wei Tung, Ersin Yumer, and Katerina Fragkiadaki. 2017. Self-supervised Learning of Motion Capture. In NeurIPS. 5242--5252.
[102]
Timo von Marcard, Roberto Henschel, Michael Black, Bodo Rosenhahn, and Gerard Pons-Moll. 2018. Recovering Accurate 3D Human Pose in The Wild Using IMUs and a Moving Camera. In ECCV.
[103]
Timo von Marcard, Gerard Pons-Moll, and Bodo Rosenhahn. 2016. Human Pose Estimation from Video and IMUs. PAMI 38, 8 (Jan. 2016), 1533--1547.
[104]
Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, Wenyu Liu, and Bin Xiao. 2019. Deep High-Resolution Representation Learning for Visual Recognition. TPAMI (2019).
[105]
Xiaolin Wei and Jinxiang Chai. 2010. VideoMocap: Modeling Physically Realistic Human Motion from Monocular Video Sequences. TOG 29, 4, Article 42 (July 2010), 10 pages.
[106]
Shihong Xia, Lin Gao, Yu-Kun Lai, Mingzhe Yuan, and Jinxiang Chai. 2017. A Survey on Human Performance Capture and Animation. J. Comput. Sci. Technol. 32, 3 (2017), 536--554.
[107]
Donglai Xiang, Hanbyul Joo, and Yaser Sheikh. 2019. Monocular total capture: Posing face, body, and hands in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 10965--10974.
[108]
Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. 2017. Aggregated residual transformations for deep neural networks. In CVPR. IEEE, 5987--5995.
[109]
Weipeng Xu, Avishek Chatterjee, Michael Zollhoefer, Helge Rhodin, Pascal Fua, Hans-Peter Seidel, and Christian Theobalt. 2019. Mo 2 Cap 2: Real-time Mobile 3D Motion Capture with a Cap-mounted Fisheye Camera. IEEE transactions on visualization and computer graphics 25, 5 (2019), 2093--2101.
[110]
Weipeng Xu, Avishek Chatterjee, Michael Zollhöfer, Helge Rhodin, Dushyant Mehta, Hans-Peter Seidel, and Christian Theobalt. 2018. Monoperfcap: Human performance capture from monocular video. TOG 37, 2 (2018), 27.
[111]
Wei Yang, Wanli Ouyang, Xiaolong Wang, Jimmy Ren, Hongsheng Li, and Xiaogang Wang. 2018. 3d human pose estimation in the wild by adversarial learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. 1.
[112]
Andrei Zanfir, Elisabeta Marinoiu, and Cristian Sminchisescu. 2018a. Monocular 3D Pose and Shape Estimation of Multiple People in Natural Scenes-The Importance of Multiple Scene Constraints. In CVPR. 2148--2157.
[113]
Andrei Zanfir, Elisabeta Marinoiu, Mihai Zanfir, Alin-Ionut Popa, and Cristian Sminchisescu. 2018b. Deep Network for the Integrated 3D Sensing of Multiple People in Natural Images. In NeurIPS.
[114]
Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. 2018. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6848--6856.
[115]
Xingyi Zhou, Qixing Huang, Xiao Sun, Xiangyang Xue, and Yichen Wei. 2017. Towards 3D Human Pose Estimation in the Wild: A Weakly-Supervised Approach. In CVPR. 398--407.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics
ACM Transactions on Graphics  Volume 39, Issue 4
August 2020
1732 pages
ISSN:0730-0301
EISSN:1557-7368
DOI:10.1145/3386569
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 August 2020
Published in TOG Volume 39, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. RGB
  2. human body pose
  3. monocular
  4. motion capture
  5. real-time

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)719
  • Downloads (Last 6 weeks)72
Reflects downloads up to 13 Sep 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media