research-article

Enhanced spatio-temporal 3D CNN for facial expression classification in videos

Authors:

Deepanshu Khanna,

Prashant Singh Rana,

Harpreet SinghAuthors Info & Claims

Multimedia Tools and Applications, Volume 83, Issue 4

Pages 9911 - 9928

https://doi.org/10.1007/s11042-023-16066-6

Published: 28 June 2023 Publication History

Abstract

This article proposes a hybrid network model for video-based human facial expression recognition (FER) system consisting of an end-to-end 3D deep convolutional neural networks. The proposed network combines two commonly used deep 3-dimensional Convolutional Neural Networks (3D CNN) models, ResNet-50 and DenseNet-121, in an end-to-end manner with slight modifications. Currently, various methodologies exist for FER, such as 2-dimensional Convolutional Neural Networks (2D CNN), 2D CNN-Recurrent Neural Networks, 3D CNN, and features extracting algorithms such as PCA and Histogram of oriented gradients (HOG) combined with machine learning classifiers. For the proposed model, we choose 3D CNN over other methods since they preserve temporal information of the videos, unlike 2D CNN. Moreover, these aren’t labor-intensive such as various handcrafted feature extracting methods. The proposed system relies on the temporal averaging of information from frame sequences of the video. The databases are pre-processed to remove unwanted backgrounds for training 3D deep CNN from scratch. Initially, feature vectors from video frame sequences are extracted using the 3D ResNet model. These feature vectors are fed to the 3D DenseNet model’s blocks, which are then used to classify the predicted emotion. The model is evaluated on three benchmarking databases: Ravdess, CK + , and BAUM1s, which achieved 91.69%, 98.61%, and 73.73% accuracy for the respective databases and outperformed various existing methods. We prove that the proposed architecture works well even for the classes with less amount of training data where many existing 3D CNN networks fail.

References

[1]

Akilan T, Wu QJ, Safaei A, Huo J, and Yang Y A 3D CNN-LSTM-Based Image-to-Image Foreground Segmentation IEEE Trans Intell Transp Syst 2020 21 3 959-971

[2]

Aly S, Abbott A L, Torki M (2016) A multimodal feature fusion framework for Kinect-based facial expression recognition using Dual Kernel Discriminant Analysis (DKDA). In: 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Placid, NY, USA, pp. 1–10.

[3]

Bartlett MS, Littlewood G, Fasel I, Movellan JR (2003) Real-Time Face Detection and Facial Expression Recognition: Development and Applications to Human-Computer Interaction. In: 2003 Conference on Computer Vision and Pattern Recognition Workshop, Madison, WI, USA, pp. 53–53.

[4]

Carreira J, Zisserman A (2017) Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, pp. 4724–4733.

[5]

Chang L, Chenglin W, and Yiting Q A Video Sequence Face Expression Recognition Method Based on Squeeze-and-Excitation and 3DPCA Network Sensors 2023 23 823

[6]

Deniz O, Bueno G, Salido J, et al. Face recognition using histograms of oriented gradients Pattern Recogn Lett 2011 32 12 1598-1603

[7]

Dhankhar P (2019) ResNet-50 and VGG-16 for recognizing Facial Emotions, 13(4):1-5.

[8]

Fan Y, Lu X, Li D, Liu Y (2016) Video-based emotion recognition using CNN-RNN and C3D hybrid networks. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction (ICMI’ 16). Association for Computing Machinery, New York, NY, USA, pp. 445–450.

[9]

Ghaleb E, Popa M, Asteriadis S (2019) Multimodal and Temporal Perception of Audio-visual Cues for Emotion Recognition. In: 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII), Cambridge, United Kingdom, pp. 552–558.

[10]

Haddad J, Lezoray O, Hamel P (2020) 3D-CNN for Facial Emotion Recognition in Videos. In: International Symposium on Visual Computing, pp. 298–309 Springer.

[11]

Hara K, Kataoka H, Satoh Y (2018) Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?

[12]

He Z, Jin T, Basu A, Soraghan J, Caterina G D, Petropoulakis L (2019) Human Emotion Recognition in Video Using Subtraction Pre-Processing. In: Proceedings of the 2019 11th International Conference on Machine Learning and Computing (ICMLC’ 19), Association for Computing Machinery, New York, NY, USA, pp. 374–379.

[13]

He K, Zhang X, Ren S, Sun J (2016) Deep Residual Learning for Image Recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, pp. 770–778.

[14]

Ho TT, Kim T, Kim WJ et al (2021) A 3D-CNN model with CT-based parametric response mapping for classifying COPD subjects.

[15]

Huang G, Liu Z, Maaten LVD, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, US, pp. 4700–4708.

[16]

Ji F, Zhang H, Zhu Z, and Dai W Blog text quality assessment using a 3D CNN-based statistical framework Futur Gener Comput Syst 2021 116 365-370

[17]

Kanade T, Cohn J F, Tian Y (2000) Comprehensive database for facial expression analysis. In: Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition (FG’00), Grenoble, France, pp. 46–53.

[18]

Khorrami P, Paine TL, Brady K, Dagli C, Huang TS (2016) How deep neural networks can improve emotion recognition on video data. In: 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, pp. 619–623.

[19]

Klaeser A, Marszalek M, Schmid C (2008) A Spatio-Temporal Descriptor Based on 3D-Gradients. In: Proceedings of the British Machine Vision Conference, pp. 99.1–99.10.

[20]

Li S, Deng W (2020) Deep Facial Expression Recognition: A Survey. In: IEEE Transactions on Affective Computing.

[21]

Li B, Lima D (2021) Facial expression recognition via ResNet-50. Int J Cogn Comput Eng. 57–64.

[22]

Liu M, Shan S, Wang R, Chen X (2014) Learning Expressionless on Spatio-temporal Manifold for Dynamic Facial Expression Recognition. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, pp. 1749–1756.

[23]

Livingstone SR, Russo FA (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English (2018).

[24]

Lopes AT, Aguiar E, Souza AFD, and Oliveira-Santos T Facial expression recognition with Convolutional Neural Networks: Coping with few data and the training sample order Pattern Recogn 2017 61 610-628

[25]

Lucey P, Cohn JF, Kanade T, Saragih J, Ambadar Z, Matthews I. The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops, San Francisco, CA, USA, pp. 94–101.

[26]

Miao Y, Dong H, Jaam J M A, Saddik A E (2019) A Deep Learning System for Recognizing Facial Expression in Real-Time. In: ACM Transactions on Multimedia Computing, Communications, and Applications.

[27]

Mohammadi MR, Fatemizadeh E, and Mahoor MH PCA-based dictionary building for accurate facial expression recognition via sparse representation J Vis Commun Image Represent 2014 25 5 1082-1092

[28]

Peña D, Tanaka F (2020) Human Perception of Social Robot’s Emotional States via Facial and Thermal Expressions. In: Association for Computing Machinery.

[29]

Rivera AR, Castillo JR, and Chae OO Local Directional Number Pattern for Face Analysis: Face and Expression Recognition IEEE Trans Image Process 2013 22 5 1740-1752

[30]

Scovanner P, Ali S, Shah M (2007) A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings of the 15th ACM international conference on Multimedia (MM’ 07). Association for Computing Machinery, New York, NY, USA, pp. 357–360.

[31]

Sharma G, Singh L, Gautam S (2019) Automatic Facial Expression Recognition Using Combined Geometric Features. In: 3D Research 10, Article 224.

[32]

Singh R, Saurav S, Kumar T et al (2023) Facial expression recognition in videos using hybrid CNN & ConvLSTM. Int J Inf Tecnol (2023).

[33]

Tariq U et al (2011) Emotion recognition from an ensemble of features. In: 2011 IEEE International Conference on Automatic Face & Gesture Recognition (FG), Santa Barbara, CA, USA, pp. 872–877.

[34]

Villanueva MG and Zavala SR Deep Neural Network Architecture: Application for Facial Expression Recognition IEEE Lat Am Trans 2020 18 07 1311-1319

[35]

Yang B, Cao J, Ni R, and Zhang Y Facial Expression Recognition Using Weighted Mixture Deep Neural Network Based on Double-Channel Facial Images IEEE Access 2018 6 4630-4640

[36]

Zhalehpour S, Onder O, Akhtar Z, and Erdem CE BAUM-1: A Spontaneous Audio-Visual Face Database of Affective and Mental States IEEE Trans Affect Comput 2017 8 3 300-313

[37]

Zhang S, Huang T, Gao W, and Tian Q Learning Affective Features with a Hybrid Deep Model for Audio-Visual Emotion Recognition IEEE Trans Circ Syst Video Technol 2018 28 10 3030-3043

[38]

Zhang S, Pan X, Cui Y, Zhao X, and Liu L Learning Affective Video Features for Facial Expression Recognition via Hybrid Deep Learning IEEE Access 2019 7 32297-32304

Index Terms

Enhanced spatio-temporal 3D CNN for facial expression classification in videos
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
      2. Computer vision tasks
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
    2. Machine learning approaches
      1. Neural networks

Index terms have been assigned to the content through auto-classification.

Recommendations

Fast and Light Manifold CNN based 3D Facial Expression Recognition across Pose Variations
MM '18: Proceedings of the 26th ACM international conference on Multimedia

This paper proposes a novel approach to 3D Facial Expression Recognition (FER), and it is based on a Fast and Light Manifold CNN model, namely FLM-CNN. Different from current manifold CNNs, FLM-CNN adopts a human vision inspired pooling structure and a ...
Facial Expression Recognition Using Neural Network Trained with Zernike Moments
ICAIET '14: Proceedings of the 2014 4th International Conference on Artificial Intelligence with Applications in Engineering and Technology

Neural network classifying method is used in this work to perform facial expression recognition. The processed expressions were the six most pertinent facial expressions and the neutral one. This operation was implemented in three steps. First, a neural ...
A spatio-temporal RBM-based model for facial expression recognition

The ability to recognize facial expressions will be an important characteristic of next generation human computer interfaces. Towards this goal, we propose a novel RBM-based model to learn effectively the relationships (or transformations) between image ...

Comments

Information & Contributors

Information

Published In

cover image Multimedia Tools and Applications

Multimedia Tools and Applications Volume 83, Issue 4

Jan 2024

2884 pages

Issue’s Table of Contents

© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 28 June 2023

Accepted: 18 June 2023

Revision received: 30 May 2023

Received: 13 September 2021

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 06 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents