Article

PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model

Authors:

George Papandreou,

Liang-Chieh Chen,

Spyros Gidaris,

Jonathan Tompson,

Kevin MurphyAuthors Info & Claims

Computer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part XIV

Pages 282 - 299

https://doi.org/10.1007/978-3-030-01264-9_17

Published: 08 September 2018 Publication History

Abstract

We present a box-free bottom-up approach for the tasks of pose estimation and instance segmentation of people in multi-person images using an efficient single-shot model. The proposed PersonLab model tackles both semantic-level reasoning and object-part associations using part-based modeling. Our model employs a convolutional network which learns to detect individual keypoints and predict their relative displacements, allowing us to group keypoints into person pose instances. Further, we propose a part-induced geometric embedding descriptor which allows us to associate semantic person pixels with their corresponding person instance, delivering instance-level person segmentations. Our system is based on a fully-convolutional architecture and allows for efficient inference, with runtime essentially independent of the number of people present in the scene. Trained on COCO data alone, our system achieves COCO test-dev keypoint average precision of 0.665 using single-scale inference and 0.687 using multi-scale inference, significantly outperforming all previous bottom-up pose estimation systems. We are also the first bottom-up method to report competitive results for the person class in the COCO instance segmentation task, achieving a person category average precision of 0.417.

References

[1]

Lin, T.Y., et al.: Coco 2016 keypoint challenge (2016)

[2]

Newell, A., Deng, J.: Associative embedding: end-to-end learning for joint detection and grouping. In: NIPS (2017)

[3]

Li, Y., Qi, H., Dai, J., Ji, X., Wei, Y.: Fully convolutional instance-aware semantic segmentation. In: CVPR (2017)

[4]

LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. In: Proceedings IEEE (1998)

[5]

Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)

[6]

Fischler, M.A., Elschlager, R.: The representation and matching of pictorial structures. In: IEEE TOC (1973)

[7]

Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: CVPR (2008)

[8]

Andriluka, M., Roth, S., Schiele, B.: Pictorial structures revisited: people detection and articulated pose estimation. In: CVPR (2009)

[9]

Eichner, M., Ferrari, V.: Better appearance models for pictorial structures. In: BMVC (2009)

[10]

Sapp, B., Jordan, C., Taskar, B.: Adaptive pose priors for pictorial structures. In: CVPR (2010)

[11]

Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures of parts. In: CVPR (2011)

[12]

Dantone, M., Gall, J., Leistner, C., Gool., L.V.: Human pose estimation using body parts dependent joint regressors. In: CVPR (2013)

[13]

Johnson, S., Everingham, M.: Learning effective human pose estimation from inaccurate annotation. In: CVPR (2011)

[14]

Pishchulin, L., Andriluka, M., Gehler, P., Schiele, B.: Poselet conditioned pictorial structures. In: CVPR (2013)

[15]

Sapp, B., Taskar, B.: Modec: Multimodal decomposable models for human pose estimation. In: CVPR (2013)

[16]

Gkioxari, G., Arbelaez, P., Bourdev, L., Malik, J.: Articulated pose estimation using discriminative armlet classifiers. In: CVPR (2013)

[17]

Toshev, A., Szegedy, C.: DeepPose: human pose estimation via deep neural networks. In: CVPR (2014)

[18]

Jain, A., Tompson, J., Andriluka, M., Taylor, G., Bregler, C.: Learning human pose estimation features with convolutional networks. In: ICLR (2014)

[19]

Tompson, J., Jain, A., LeCun, Y., Bregler, C.: Join training of a convolutional network and a graphical model for human pose estimation. In: NIPS (2014)

[20]

Chen, X., Yuille, A.: Articulated pose estimation by a graphical model with image dependent pairwise relations. In: NIPS (2014)

[21]

Tompson, J., Goroshin, R., Jain, A., LeCun, Y., Bregler, C.: Efficient object localization using convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 648–656 (2015)

[22]

Newell A, Yang K, and Deng J Leibe B, Matas J, Sebe N, and Welling M Stacked hourglass networks for human pose estimation Computer Vision – ECCV 2016 2016 Cham Springer 483-499

[23]

Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: CVPR (2014)

[24]

Bulat A and Tzimiropoulos G Leibe B, Matas J, Sebe N, and Welling M Human pose estimation via convolutional part heatmap regression Computer Vision – ECCV 2016 2016 Cham Springer 717-732

[25]

Belagiannis, V., Zisserman, A.: Recurrent human pose estimation. arxiv (2016)

[26]

Gkioxari G, Toshev A, and Jaitly N Leibe B, Matas J, Sebe N, and Welling M Chained predictions using convolutional neural networks Computer Vision – ECCV 2016 2016 Cham Springer 728-743

[27]

Pishchulin, L., et al.: DeepCut: joint subset partition and labeling for multi person pose estimation. In: CVPR (2016)

[28]

Insafutdinov E, Pishchulin L, Andres B, Andriluka M, and Schiele B Leibe B, Matas J, Sebe N, and Welling M DeeperCut: a deeper, stronger, and faster multi-person pose estimation model Computer Vision – ECCV 2016 2016 Cham Springer 34-50

[29]

Insafutdinov, E., Andriluka, M., Pishchulin, L., Tang, S., Andres, B., Schiele, B.: Articulated multi-person tracking in the wild. arXiv:1612.01465 (2016)

[30]

Iqbal U and Gall J Hua G and Jégou H Multi-person pose estimation with local joint-to-person associations Computer Vision – ECCV 2016 Workshops 2016 Cham Springer 627-642

[31]

Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. arXiv (2016)

[32]

Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: CVPR (2017)

[33]

Papandreou, G., et al.: Towards accurate multi-person pose estimation in the wild. In: CVPR (2017)

[34]

He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. arXiv:1703.06870v2 (2017)

[35]

Huang, S., Gong, M., Tao, D.: A coarse-fine network for keypoint localization. In: ICCV (2017)

[36]

Fang, H.S., Xie, S., Tai, Y.W., Lu, C.: RMPE: regional multi-person pose estimation. In: ICCV (2017)

[37]

Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded pyramid network for multi-person pose estimation. arXiv:1711.07319 (2017)

[38]

Girshick, R.: Fast R-CNN. In: ICCV, pp. 1440–1448 (2015)

[39]

Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)

[40]

Dai, J., Li, Y., He, K., Sun, J.: R-FCN: Object detection via region-based fully convolutional networks. In: NIPS (2016)

[41]

Carreira J and Sminchisescu C CPMC: automatic object segmentation using constrained parametric min-cuts PAMI 2012 34 7 1312-1328

[42]

Arbeláez, P., Pont-Tuset, J., Barron, J.T., Marques, F., Malik, J.: Multiscale combinatorial grouping. In: CVPR (2014)

[43]

Hariharan B, Arbeláez P, Girshick R, and Malik J Fleet D, Pajdla T, Schiele B, and Tuytelaars T Simultaneous detection and segmentation Computer Vision – ECCV 2014 2014 Cham Springer 297-312

[44]

Pinheiro, P.O., Collobert, R., Dollár, P.: Learning to segment object candidates. In: NIPS (2015)

[45]

Dai, J., He, K., Sun, J.: Convolutional feature masking for joint object and stuff segmentation. In: CVPR (2015)

[46]

Pinheiro PO, Lin T-Y, Collobert R, and Dollár P Leibe B, Matas J, Sebe N, and Welling M Learning to refine object segments Computer Vision – ECCV 2016 2016 Cham Springer 75-91

[47]

Dai J, He K, Li Y, Ren S, and Sun J Leibe B, Matas J, Sebe N, and Welling M Instance-sensitive fully convolutional networks Computer Vision – ECCV 2016 2016 Cham Springer 534-549

[48]

Dai, J., He, K., Sun, J.: Instance-aware semantic segmentation via multi-task network cascades. In: CVPR (2016)

[49]

Peng, C., et al.: MegDet: a large mini-batch object detector (2018)

[50]

Chen, L.C., Hermans, A., Papandreou, G., Schroff, F., Wang, P., Adam, H.: MaskLab: instance segmentation by refining object detection with semantic and direction features. In: CVPR (2018)

[51]

Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: CVPR (2018)

[52]

Liang, X., Wei, Y., Shen, X., Yang, J., Lin, L., Yan, S.: Proposal-free network for instance-level object segmentation. arXiv preprint arXiv:1509.02636 (2015)

[53]

Uhrig Jonas, Cordts Marius, Franke Uwe, and Brox Thomas Pixel-Level Encoding and Depth Layering for Instance-Level Semantic Labeling Lecture Notes in Computer Science 2016 Cham Springer International Publishing 14-25

[54]

Zhang, Z., Schwing, A.G., Fidler, S., Urtasun, R.: Monocular object instance segmentation and depth ordering with CNNs. In: ICCV (2015)

[55]

Zhang, Z., Fidler, S., Urtasun, R.: Instance-level segmentation for autonomous driving with deep densely connected MRFs. In: CVPR (2016)

[56]

Wu, Z., Shen, C., van den Hengel, A.: Bridging category-level and instance-level semantic image segmentation. arXiv:1605.06885 (2016)

[57]

Liu, S., Qi, X., Shi, J., Zhang, H., Jia, J.: Multi-scale patch aggregation (MPA) for simultaneous detection and segmentation. In: CVPR (2016)

[58]

Levinkov, E., et al.: Joint graph decomposition & node labeling: problem, algorithms, applications. In: CVPR (2017)

[59]

Kirillov, A., Levinkov, E., Andres, B., Savchynskyy, B., Rother, C.: InstanceCut: from edges to instances with multicut. In: CVPR (2017)

[60]

Jin, L., Chen, Z., Tu, Z.: Object detection free instance segmentation with labeling transformations. arXiv:1611.08991 (2016)

[61]

Fathi, A., et al.: Semantic instance segmentation via deep metric learning. arXiv:1703.10277 (2017)

[62]

De Brabandere, B., Neven, D., Van Gool, L.: Semantic instance segmentation with a discriminative loss function. arXiv:1708.02551 (2017)

[63]

Bai, M., Urtasun, R.: Deep watershed transform for instance segmentation. In: CVPR (2017)

[64]

Liu, S., Jia, J., Fidler, S., Urtasun, R.: SGN: sequential grouping networks for instance segmentation. In: ICCV (2017)

[65]

Bodla, N., Singh, B., Chellappa, R., Davis, L.S.: Soft-NMS: improving object detection with one line of code. In: ICCV (2017)

[66]

Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)

[67]

Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. TPAMI (2017)

[68]

Radosavovic, I., Dollár, P., Girshick, R., Gkioxari, G., He, K.: Data distillation: towards omni-supervised learning. arXiv:1712.04440 (2017)

[69]

Lin T-Y et al. Fleet D, Pajdla T, Schiele B, Tuytelaars T, et al. Microsoft COCO: common objects in context Computer Vision – ECCV 2014 2014 Cham Springer 740-755

[70]

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

[71]

Russakovsky O et al. ImageNet large scale visual recognition challenge IJCV 2015 115 3 211-252

[72]

Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167 (2015)

[73]

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). tensorflow.org

Cited By

Arenas RMéndez RPedraza LFlores J(2024)Benchmarking analysis of human pose estimation solutions for virtual television setsProceedings of the XXIV International Conference on Human Computer Interaction10.1145/3657242.3657244(1-10)Online publication date: 19-Jun-2024
https://dl.acm.org/doi/10.1145/3657242.3657244
Wang DZhang SWang YTian YHuang TGao WEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)HumVis: Human-Centric Visual Analysis SystemProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612663(9396-9398)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612663
Wang HLiu JTang JWu GEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Lightweight Super-Resolution Head for Human Pose EstimationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612236(2353-2361)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612236
Show More Cited By

Index Terms

PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model
1. Computing methodologies

Index terms have been assigned to the content through auto-classification.

Recommendations

PoseDet: Fast Multi-Person Pose Estimation Using Pose Embedding
2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021)
Current methods of multi-person pose estimation typically treat the localization and the association of body joints separately. It is convenient but inefficient, leading to additional computation and a waste of time. This paper, however, presents a novel ...
Globally-Robust Instance Identification and Locally-Accurate Keypoint Alignment for Multi-Person Pose Estimation
MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Scenes with a large number of human instances are characterized by significant overlap of the instances with similar appearance, occlusion, and scale variation. We propose GRAPE, a novel method that leverages both Globally Robust human instance ...
Explicit Occlusion Reasoning for Multi-person 3D Human Pose Estimation
Computer Vision – ECCV 2022
Abstract
Occlusion poses a great threat to monocular multi-person 3D human pose estimation due to large variability in terms of the shape, appearance, and position of occluders. While existing methods try to handle occlusion with pose priors/constraints, ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

Computer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part XIV

Sep 2018

844 pages

ISBN:978-3-030-01263-2

DOI:10.1007/978-3-030-01264-9

Editors:
Vittorio Ferrari
Google Research, Zurich, Switzerland
,
Martial Hebert
Carnegie Mellon University, Pittsburgh, PA, USA
,
Cristian Sminchisescu
Google Research, Zurich, Switzerland
,
Yair Weiss
Hebrew University of Jerusalem, Jerusalem, Israel

© Springer Nature Switzerland AG 2018.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 08 September 2018

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

42
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Arenas RMéndez RPedraza LFlores J(2024)Benchmarking analysis of human pose estimation solutions for virtual television setsProceedings of the XXIV International Conference on Human Computer Interaction10.1145/3657242.3657244(1-10)Online publication date: 19-Jun-2024
https://dl.acm.org/doi/10.1145/3657242.3657244
Wang DZhang SWang YTian YHuang TGao WEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)HumVis: Human-Centric Visual Analysis SystemProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612663(9396-9398)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612663
Wang HLiu JTang JWu GEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Lightweight Super-Resolution Head for Human Pose EstimationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612236(2353-2361)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612236
Mollyn VArakawa RGoel MHarrison CAhuja K(2023)IMUPoser: Full-Body Pose Estimation using IMUs in Phones, Watches, and EarbudsProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581392(1-12)Online publication date: 19-Apr-2023
https://dl.acm.org/doi/10.1145/3544548.3581392
He XWandt BRhodin HKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)AutoLinkProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602888(36123-36141)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3602888
Xiao YSu KWang XYu DJin LHe MYuan ZKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)QueryPoseProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601175(12464-12477)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3601175
Chen ZZhu YLi ZYang FLi WWang HZhao CWu LZhao RWang JTang MKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Obj2SeqProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3600451(2494-2506)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3600451
Li LZhao LXu LXu JJiang SAizawa KChen PYanai K(2022)Towards High Performance One-Stage Human Pose EstimationProceedings of the 4th ACM International Conference on Multimedia in Asia10.1145/3551626.3564968(1-5)Online publication date: 13-Dec-2022
https://dl.acm.org/doi/10.1145/3551626.3564968
Topham LKhan WAl-Jumeily DHussain A(2022)Human Body Pose Estimation for Gait Identification: A Comprehensive Survey of Datasets and ModelsACM Computing Surveys10.1145/353338455:6(1-42)Online publication date: 7-Dec-2022
https://dl.acm.org/doi/10.1145/3533384
Li SMan CShen AGuan ZMao WLuo SZhang RYu H(2022)A Fall Detection Network by 2D/3D Spatio-temporal Joint Models with Tensor Compression on EdgeACM Transactions on Embedded Computing Systems10.1145/353100421:6(1-19)Online publication date: 12-Dec-2022
https://dl.acm.org/doi/10.1145/3531004
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents