skip to main content
10.1007/978-981-96-0966-6_6guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Progressive Target Refinement by Self-distillation for Human Pose Estimation

Published: 08 December 2024 Publication History

Abstract

The handcrafted heatmap target can be improved and one way is knowledge distillation, which takes the predicted heatmaps from another model as auxiliary supervision. However, previous pose distillation methods are training inefficient, requiring either an extra training stage or complex network architecture modification. In this paper, we propose a novel Self-Distillation for Human Pose Estimation (SDP) method for better distillation efficiency. Specifically, a student pose estimator distills the soft targets from itself with the backup information of a previous batch, where the targets are progressively refined through model updating. The main advantage of our method is that we achieve efficient training and simple implementation simultaneously. Existing pose estimation networks can benefit from the proposed method effortlessly. A stepping strategy, that widens the distillation distance with the decaying of the learning rate, is further proposed. It ensures the difference between teacher and student in a low learning rate condition. Experimental results on two widely-used benchmark datasets, MPII and COCO, illustrate the effectiveness of the proposed approach.

References

[1]
Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2d human pose estimation: New benchmark and state of the art analysis. In: Proceedings of the IEEE Conference on computer Vision and Pattern Recognition. pp. 3686–3693 (2014)
[2]
Artacho, B., Savakis, A.: Unipose: Unified human pose estimation in single images and videos. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 7035–7044 (2020)
[3]
Buciluǎ, C., Caruana, R., Niculescu-Mizil, A.: Model compression. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 535–541 (2006)
[4]
Bulat, A., Tzimiropoulos, G.: Human pose estimation via convolutional part heatmap regression. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part VII 14. pp. 717–732. Springer (2016)
[5]
Chen, Y., Shen, C., Wei, X.S., Liu, L., Yang, J.: Adversarial posenet: A structure-aware convolutional network for human pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (Oct 2017)
[6]
Chu, X., Yang, W., Ouyang, W., Ma, C., Yuille, A.L., Wang, X.: Multi-context attention for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (July 2017)
[7]
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 770–778 (2016)
[8]
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
[9]
Ke, L., Chang, M.C., Qi, H., Lyu, S.: Multi-scale structure-aware network for human pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV) (September 2018)
[10]
Li, Z., Ye, J., Song, M., Huang, Y., Pan, Z.: Online knowledge distillation for efficient pose estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 11740–11750 (2021)
[11]
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. pp. 740–755. Springer (2014)
[12]
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VIII 14. pp. 483–499. Springer (2016)
[13]
Ning G, Zhang Z, and He Z Knowledge-guided deep fractal neural networks for human pose estimation IEEE Trans. Multimedia 2018 20 5 1246-1259
[14]
Ramakrishna, V., Munoz, D., Hebert, M., Andrew Bagnell, J., Sheikh, Y.: Pose machines: Articulated pose estimation via inference machines. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part II 13. pp. 33–47. Springer (2014)
[15]
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 5693–5703 (2019)
[16]
Tang, W., Yu, P., Wu, Y.: Deeply learned compositional models for human pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV) (September 2018)
[17]
Tompson, J., Goroshin, R., Jain, A., LeCun, Y., Bregler, C.: Efficient object localization using convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 648–656 (2015)
[18]
Tompson, J., Jain, A., LeCun, Y., Bregler, C.: Joint training of a convolutional network and a graphical model for human pose estimation. In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 1. p. 1799-1807. NIPS’14, MIT Press, Cambridge, MA, USA (2014)
[19]
Xiao, B., Wu, H., Wei, Y.: Simple baselines for human pose estimation and tracking. In: European Conference on Computer Vision (ECCV) (2018)
[20]
Yang, W., Li, S., Ouyang, W., Li, H., Wang, X.: Learning feature pyramids for human pose estimation. 2017 IEEE International Conference on Computer Vision (ICCV) pp. 1290–1299 (2017)
[21]
Zhang, F., Zhu, X., Ye, M.: Fast human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 3517–3526 (2019)

Index Terms

  1. Progressive Target Refinement by Self-distillation for Human Pose Estimation
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image Guide Proceedings
          Computer Vision – ACCV 2024: 17th Asian Conference on Computer Vision, Hanoi, Vietnam, December 8–12, 2024, Proceedings, Part VIII
          Dec 2024
          506 pages
          ISBN:978-981-96-0965-9
          DOI:10.1007/978-981-96-0966-6
          • Editors:
          • Minsu Cho,
          • Ivan Laptev,
          • Du Tran,
          • Angela Yao,
          • Hongbin Zha

          Publisher

          Springer-Verlag

          Berlin, Heidelberg

          Publication History

          Published: 08 December 2024

          Author Tags

          1. heatmap
          2. pose Estimation
          3. knowledge distillation

          Qualifiers

          • Article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 0
            Total Downloads
          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 13 Jan 2025

          Other Metrics

          Citations

          View Options

          View options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media