skip to main content
10.1145/3480651.3480668acmotherconferencesArticle/Chapter ViewAbstractPublication PagesprisConference Proceedingsconference-collections
research-article

Simultaneous temporal and spatial deep attention for imaged skeleton-based action recognition

Published: 12 October 2021 Publication History

Abstract

The use of skeletons as a modality to represent and recognize human actions has gained interest thanks to the compactness of the data, their reliable representativeness in addition to their strong robustness. The deep learning based recognition approaches which are based on it often propose to improve the recognition pipeline by integrating the concept of attention in their modeling. The idea is to allow the model to focus on the relevant information of the action instead of attempting some kind of blind modeling. In this article, we propose an action recognition approach integrating simultaneously both spatial and temporal attentions. We first perform a transformation of the input sequence data into a color matrix, called imaged skeleton, comprising Cartesian and rotational information. Then, this new representation is given as input to an architecture composed of a main trunk, that allows features extraction and classification, and several attention branches. Different experimental evaluations on two popular benchmark databases, namely UT-Kinect [1] and SBU Kinect Interaction [2], are conducted to verify the interest of our proposed approach, where better performances are reported.
Index: convolutional neural network, spatio-temporal, skeleton-based action recognition, deep attention.

References

[1]
[1] Wang, Y., Sun, Y., Liu, Z., Sarma, S. E., Bronstein, M. M., & Solomon, J. M. Dynamic graph cnn for learning on point clouds. Acm Transactions On Graphics (tog), 38(5), 1-12, 2019.
[2]
[2] Yun, K., Honorio, J., Berg, T. L. Two-person interaction detection using body-pose features and multiple instance learning. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (pp. 28-35), 2012.
[3]
[3] Pham, H. H., Khoudour, L., Crouzil, A., Zegers, P., & Velastin, S. A. Learning and recognizing human action from skeleton movement with deep residual neural networks, 2017.
[4]
[4] Tang, Y., Tian, Y., Lu, J., Li, P., & Zhou, J. Deep progressive reinforcement learning for skeleton-based action recognition. In Proceedings of Computer Vision and Pattern Recognition, 2018
[5]
[5] Boulahia, S. Y., Anquetil, E., Multon, F., & Kulpa, R. CuDi3D: Curvilinear displacement based approach for online 3D action detection. Computer vision and image understanding, 174, 57-69, 2018.
[6]
[6] Li, B., Dai, Y., Cheng, X., Chen, H., Lin, Y., & He, M. Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep CNN. In ICMEW International Conference, 2017.
[7]
[7] Simonyan, K., & Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, (2014).
[8]
[8] Kevin M. Attention on pretrained-vgg16 for bone age.https://www.kaggle.com/” kmader/attention-on-pretrained-vgg16-for-bone-age, 2018.
[9]
[9] Kishore, P. V. V., Kumar, D. A., Sastry, A. C. S., & Kumar, E. K. Motionlets matching with adaptive kernels for 3-d indian sign language recognition. IEEE Sensors Journal, 18(8), 3327-3337, 2018.
[10]
[10] Vijaya Prasad, K., Kishore, P. V. V., & Srinivasa Rao, O. Skeleton based view invariant human action recognition using convolutional neural networks. International Journal of Recent Technology and Engineering, 8(2), 4860-4867, 2019.
[11]
[11] Liu, M., Chen, C., & Liu, H. Learning informative pairwise joints with energy-based temporal pyramid for 3D action recognition. In IEEE International Conference on Multimedia and Expo, 2017.
[12]
[12] Liu, J., Shahroudy, A., Xu, D., Kot, A. C., & Wang, G. Skeleton-based action recognition using spatio-temporal LSTM network with trust gates. IEEE, 3007-3021, 2017.
[13]
[13] ZHU, Wentao, LAN, Cuiling, XING, Junliang, et al. Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. In : AAAI Conference on Artificial Intelligence, 2016.
[14]
[14] Song, S., Lan, C., Xing, J., Zeng, W., & Liu, J. Spatio-temporal attention-based LSTM networks for 3D action recognition and detection. IEEE Transactions on image processing, 27(7), 3459-3471, 2018.
[15]
[15] Wang, X., & Deng, H. A Multi-Feature Representation of Skeleton Sequences for Human Interaction Recognition. Electronics, 2020.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
PRIS '21: Proceedings of the 2021 International Conference on Pattern Recognition and Intelligent Systems
July 2021
91 pages
ISBN:9781450390392
DOI:10.1145/3480651
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2021

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

PRIS 2021

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media