skip to main content
research-article

Enhanced spatio-temporal 3D CNN for facial expression classification in videos

Published: 28 June 2023 Publication History

Abstract

This article proposes a hybrid network model for video-based human facial expression recognition (FER) system consisting of an end-to-end 3D deep convolutional neural networks. The proposed network combines two commonly used deep 3-dimensional Convolutional Neural Networks (3D CNN) models, ResNet-50 and DenseNet-121, in an end-to-end manner with slight modifications. Currently, various methodologies exist for FER, such as 2-dimensional Convolutional Neural Networks (2D CNN), 2D CNN-Recurrent Neural Networks, 3D CNN, and features extracting algorithms such as PCA and Histogram of oriented gradients (HOG) combined with machine learning classifiers. For the proposed model, we choose 3D CNN over other methods since they preserve temporal information of the videos, unlike 2D CNN. Moreover, these aren’t labor-intensive such as various handcrafted feature extracting methods. The proposed system relies on the temporal averaging of information from frame sequences of the video. The databases are pre-processed to remove unwanted backgrounds for training 3D deep CNN from scratch. Initially, feature vectors from video frame sequences are extracted using the 3D ResNet model. These feature vectors are fed to the 3D DenseNet model’s blocks, which are then used to classify the predicted emotion. The model is evaluated on three benchmarking databases: Ravdess, CK + , and BAUM1s, which achieved 91.69%, 98.61%, and 73.73% accuracy for the respective databases and outperformed various existing methods. We prove that the proposed architecture works well even for the classes with less amount of training data where many existing 3D CNN networks fail.

References

[1]
Akilan T, Wu QJ, Safaei A, Huo J, and Yang Y A 3D CNN-LSTM-Based Image-to-Image Foreground Segmentation IEEE Trans Intell Transp Syst 2020 21 3 959-971
[2]
Aly S, Abbott A L, Torki M (2016) A multimodal feature fusion framework for Kinect-based facial expression recognition using Dual Kernel Discriminant Analysis (DKDA). In: 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Placid, NY, USA, pp. 1–10.
[3]
Bartlett MS, Littlewood G, Fasel I, Movellan JR (2003) Real-Time Face Detection and Facial Expression Recognition: Development and Applications to Human-Computer Interaction. In: 2003 Conference on Computer Vision and Pattern Recognition Workshop, Madison, WI, USA, pp. 53–53.
[4]
Carreira J, Zisserman A (2017) Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, pp. 4724–4733.
[5]
Chang L, Chenglin W, and Yiting Q A Video Sequence Face Expression Recognition Method Based on Squeeze-and-Excitation and 3DPCA Network Sensors 2023 23 823
[6]
Deniz O, Bueno G, Salido J, et al. Face recognition using histograms of oriented gradients Pattern Recogn Lett 2011 32 12 1598-1603
[7]
Dhankhar P (2019) ResNet-50 and VGG-16 for recognizing Facial Emotions, 13(4):1-5. 
[8]
Fan Y, Lu X, Li D, Liu Y (2016) Video-based emotion recognition using CNN-RNN and C3D hybrid networks. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction (ICMI’ 16). Association for Computing Machinery, New York, NY, USA, pp. 445–450.
[9]
Ghaleb E, Popa M, Asteriadis S (2019) Multimodal and Temporal Perception of Audio-visual Cues for Emotion Recognition. In: 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII), Cambridge, United Kingdom, pp. 552–558.
[10]
Haddad J, Lezoray O, Hamel P (2020) 3D-CNN for Facial Emotion Recognition in Videos. In: International Symposium on Visual Computing, pp. 298–309 Springer.
[11]
Hara K, Kataoka H, Satoh Y (2018) Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?
[12]
He Z, Jin T, Basu A, Soraghan J, Caterina G D, Petropoulakis L (2019) Human Emotion Recognition in Video Using Subtraction Pre-Processing. In: Proceedings of the 2019 11th International Conference on Machine Learning and Computing (ICMLC’ 19), Association for Computing Machinery, New York, NY, USA, pp. 374–379.
[13]
He K, Zhang X, Ren S, Sun J (2016) Deep Residual Learning for Image Recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, pp. 770–778.
[14]
Ho TT, Kim T, Kim WJ et al (2021) A 3D-CNN model with CT-based parametric response mapping for classifying COPD subjects.
[15]
Huang G, Liu Z, Maaten LVD, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, US, pp. 4700–4708.
[16]
Ji F, Zhang H, Zhu Z, and Dai W Blog text quality assessment using a 3D CNN-based statistical framework Futur Gener Comput Syst 2021 116 365-370
[17]
Kanade T, Cohn J F, Tian Y (2000) Comprehensive database for facial expression analysis. In: Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition (FG’00), Grenoble, France, pp. 46–53.
[18]
Khorrami P, Paine TL, Brady K, Dagli C, Huang TS (2016) How deep neural networks can improve emotion recognition on video data. In: 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, pp. 619–623.
[19]
Klaeser A, Marszalek M, Schmid C (2008) A Spatio-Temporal Descriptor Based on 3D-Gradients. In: Proceedings of the British Machine Vision Conference, pp. 99.1–99.10.
[20]
Li S, Deng W (2020) Deep Facial Expression Recognition: A Survey. In: IEEE Transactions on Affective Computing.
[21]
Li B, Lima D (2021) Facial expression recognition via ResNet-50. Int J Cogn Comput Eng. 57–64.
[22]
Liu M, Shan S, Wang R, Chen X (2014) Learning Expressionless on Spatio-temporal Manifold for Dynamic Facial Expression Recognition. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, pp. 1749–1756.
[23]
Livingstone SR, Russo FA (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English (2018).
[24]
Lopes AT, Aguiar E, Souza AFD, and Oliveira-Santos T Facial expression recognition with Convolutional Neural Networks: Coping with few data and the training sample order Pattern Recogn 2017 61 610-628
[25]
Lucey P, Cohn JF, Kanade T, Saragih J, Ambadar Z, Matthews I. The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops, San Francisco, CA, USA, pp. 94–101.
[26]
Miao Y, Dong H, Jaam J M A, Saddik A E (2019) A Deep Learning System for Recognizing Facial Expression in Real-Time. In: ACM Transactions on Multimedia Computing, Communications, and Applications.
[27]
Mohammadi MR, Fatemizadeh E, and Mahoor MH PCA-based dictionary building for accurate facial expression recognition via sparse representation J Vis Commun Image Represent 2014 25 5 1082-1092
[28]
Peña D, Tanaka F (2020) Human Perception of Social Robot’s Emotional States via Facial and Thermal Expressions. In: Association for Computing Machinery.
[29]
Rivera AR, Castillo JR, and Chae OO Local Directional Number Pattern for Face Analysis: Face and Expression Recognition IEEE Trans Image Process 2013 22 5 1740-1752
[30]
Scovanner P, Ali S, Shah M (2007) A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings of the 15th ACM international conference on Multimedia (MM’ 07). Association for Computing Machinery, New York, NY, USA, pp. 357–360.
[31]
Sharma G, Singh L, Gautam S (2019) Automatic Facial Expression Recognition Using Combined Geometric Features. In: 3D Research 10, Article 224.
[32]
Singh R, Saurav S, Kumar T et al (2023) Facial expression recognition in videos using hybrid CNN & ConvLSTM. Int J Inf Tecnol (2023).
[33]
Tariq U et al (2011) Emotion recognition from an ensemble of features. In: 2011 IEEE International Conference on Automatic Face & Gesture Recognition (FG), Santa Barbara, CA, USA, pp. 872–877.
[34]
Villanueva MG and Zavala SR Deep Neural Network Architecture: Application for Facial Expression Recognition IEEE Lat Am Trans 2020 18 07 1311-1319
[35]
Yang B, Cao J, Ni R, and Zhang Y Facial Expression Recognition Using Weighted Mixture Deep Neural Network Based on Double-Channel Facial Images IEEE Access 2018 6 4630-4640
[36]
Zhalehpour S, Onder O, Akhtar Z, and Erdem CE BAUM-1: A Spontaneous Audio-Visual Face Database of Affective and Mental States IEEE Trans Affect Comput 2017 8 3 300-313
[37]
Zhang S, Huang T, Gao W, and Tian Q Learning Affective Features with a Hybrid Deep Model for Audio-Visual Emotion Recognition IEEE Trans Circ Syst Video Technol 2018 28 10 3030-3043
[38]
Zhang S, Pan X, Cui Y, Zhao X, and Liu L Learning Affective Video Features for Facial Expression Recognition via Hybrid Deep Learning IEEE Access 2019 7 32297-32304

Index Terms

  1. Enhanced spatio-temporal 3D CNN for facial expression classification in videos
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image Multimedia Tools and Applications
          Multimedia Tools and Applications  Volume 83, Issue 4
          Jan 2024
          2884 pages

          Publisher

          Kluwer Academic Publishers

          United States

          Publication History

          Published: 28 June 2023
          Accepted: 18 June 2023
          Revision received: 30 May 2023
          Received: 13 September 2021

          Author Tags

          1. Face expression recognition
          2. Video pre-processing
          3. Hybrid deep network

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 0
            Total Downloads
          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 06 Jan 2025

          Other Metrics

          Citations

          View Options

          View options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media