Article

Progressive Target Refinement by Self-distillation for Human Pose Estimation

Authors:

Shangfei WangAuthors Info & Claims

Computer Vision – ACCV 2024: 17th Asian Conference on Computer Vision, Hanoi, Vietnam, December 8–12, 2024, Proceedings, Part VIII

Pages 91 - 103

https://doi.org/10.1007/978-981-96-0966-6_6

Published: 08 December 2024 Publication History

Abstract

The handcrafted heatmap target can be improved and one way is knowledge distillation, which takes the predicted heatmaps from another model as auxiliary supervision. However, previous pose distillation methods are training inefficient, requiring either an extra training stage or complex network architecture modification. In this paper, we propose a novel Self-Distillation for Human Pose Estimation (SDP) method for better distillation efficiency. Specifically, a student pose estimator distills the soft targets from itself with the backup information of a previous batch, where the targets are progressively refined through model updating. The main advantage of our method is that we achieve efficient training and simple implementation simultaneously. Existing pose estimation networks can benefit from the proposed method effortlessly. A stepping strategy, that widens the distillation distance with the decaying of the learning rate, is further proposed. It ensures the difference between teacher and student in a low learning rate condition. Experimental results on two widely-used benchmark datasets, MPII and COCO, illustrate the effectiveness of the proposed approach.

References

[1]

Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2d human pose estimation: New benchmark and state of the art analysis. In: Proceedings of the IEEE Conference on computer Vision and Pattern Recognition. pp. 3686–3693 (2014)

[2]

Artacho, B., Savakis, A.: Unipose: Unified human pose estimation in single images and videos. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 7035–7044 (2020)

[3]

Buciluǎ, C., Caruana, R., Niculescu-Mizil, A.: Model compression. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 535–541 (2006)

[4]

Bulat, A., Tzimiropoulos, G.: Human pose estimation via convolutional part heatmap regression. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part VII 14. pp. 717–732. Springer (2016)

[5]

Chen, Y., Shen, C., Wei, X.S., Liu, L., Yang, J.: Adversarial posenet: A structure-aware convolutional network for human pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (Oct 2017)

[6]

Chu, X., Yang, W., Ouyang, W., Ma, C., Yuille, A.L., Wang, X.: Multi-context attention for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (July 2017)

[7]

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 770–778 (2016)

[8]

Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)

[9]

Ke, L., Chang, M.C., Qi, H., Lyu, S.: Multi-scale structure-aware network for human pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV) (September 2018)

[10]

Li, Z., Ye, J., Song, M., Huang, Y., Pan, Z.: Online knowledge distillation for efficient pose estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 11740–11750 (2021)

[11]

Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. pp. 740–755. Springer (2014)

[12]

Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VIII 14. pp. 483–499. Springer (2016)

[13]

Ning G, Zhang Z, and He Z Knowledge-guided deep fractal neural networks for human pose estimation IEEE Trans. Multimedia 2018 20 5 1246-1259

[14]

Ramakrishna, V., Munoz, D., Hebert, M., Andrew Bagnell, J., Sheikh, Y.: Pose machines: Articulated pose estimation via inference machines. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part II 13. pp. 33–47. Springer (2014)

[15]

Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 5693–5703 (2019)

[16]

Tang, W., Yu, P., Wu, Y.: Deeply learned compositional models for human pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV) (September 2018)

[17]

Tompson, J., Goroshin, R., Jain, A., LeCun, Y., Bregler, C.: Efficient object localization using convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 648–656 (2015)

[18]

Tompson, J., Jain, A., LeCun, Y., Bregler, C.: Joint training of a convolutional network and a graphical model for human pose estimation. In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 1. p. 1799-1807. NIPS’14, MIT Press, Cambridge, MA, USA (2014)

[19]

Xiao, B., Wu, H., Wei, Y.: Simple baselines for human pose estimation and tracking. In: European Conference on Computer Vision (ECCV) (2018)

[20]

Yang, W., Li, S., Ouyang, W., Li, H., Wang, X.: Learning feature pyramids for human pose estimation. 2017 IEEE International Conference on Computer Vision (ICCV) pp. 1290–1299 (2017)

[21]

Zhang, F., Zhu, X., Ye, M.: Fast human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 3517–3526 (2019)

Index Terms

Progressive Target Refinement by Self-distillation for Human Pose Estimation
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object detection
      2. Computer vision tasks
        Activity recognition and understanding
        Vision for robotics
  2. Machine learning
    1. Learning paradigms
      1. Multi-task learning
        Transfer learning

Index terms have been assigned to the content through auto-classification.

Recommendations

A deep structure for human pose estimation

Articulated human pose estimation in unconstrained conditions is a great challenge. We propose a deep structure that represents a human body in different granularity from coarse-to-fine for better detecting parts and describing spatial constrains ...
Single and two-person(s) pose estimation based on R-WAA
Abstract
Human pose estimation methods have difficulties predicting the correct pose for persons due to challenges in scale variation. Existing works in this domain mainly focus on single-person pose estimation. To counter this challenge we have developed ...
Pose-oriented transformer with uncertainty-guided refinement for 2D-to-3D human pose estimation
AAAI'23/IAAI'23/EAAI'23: Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence

There has been a recent surge of interest in introducing transformers to 3D human pose estimation (HPE) due to their powerful capabilities in modeling long-term dependencies. However, existing transformer-based methods treat body joints as equally ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

Computer Vision – ACCV 2024: 17th Asian Conference on Computer Vision, Hanoi, Vietnam, December 8–12, 2024, Proceedings, Part VIII

Dec 2024

506 pages

ISBN:978-981-96-0965-9

DOI:10.1007/978-981-96-0966-6

Editors:
Minsu Cho
Pohang University of Science and Technology (POSTECH), Pohang, Korea (Republic of)
,
Ivan Laptev
Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates
,
Du Tran
Google, Mountain View, CA, USA
,
Angela Yao
National University of Singapore, Singapore, Singapore
,
Hongbin Zha
https://ror.org/02v51f717Peking University, Beijing, China

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 08 December 2024

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Table of Contents