skip to main content
10.1145/2818346.2830582acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

Multiple Models Fusion for Emotion Recognition in the Wild

Published: 09 November 2015 Publication History

Abstract

Emotion recognition in the wild is a very challenging task. In this paper, we propose a multiple models fusion method to automatically recognize the expression in the video clip as part of the third Emotion Recognition in the Wild Challenge (EmotiW 2015). In our method, we first extract dense SIFT, LBP-TOP and audio features from each video clip. For dense SIFT features, we use the bag of features (BoF) model with two different encoding methods (locality-constrained linear coding and group saliency based coding) to further represent it. During the classification process, we use partial least square regression to calculate the regression value of each model. By learning the optimal weight of each model based on the regression value, we fuse these models together. We conduct experiments on the given validation and test datasets, and achieve superior performance. The best recognition accuracy of our fusion method is 52.50% on the test dataset, which is 13.17% higher than the challenge baseline accuracy of 39.33%.

References

[1]
C. Busso, Z. Deng, S. Yildirim, M. Bulut, C. M. Lee, A. Kazemzadeh, S. Lee, U. Neumann, and S. Narayanan. Analysis of emotion recognition using facial expressions, speech and multimodal information. In Proceedings of the ACM International Conference on Multimodal Interfaces, pages 205--211, 2004.
[2]
G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray. Visual categorization with bags of keypoints. In Proceedings of European Conference Computer Vision Workshop on Statistical Learning in Computer Vision, pages 1--2, 2004.
[3]
N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 886--893, 2005.
[4]
A. Dhall, R. Goecke, J. Joshi, K. Sikka, and T. Gedeon. Emotion recognition in the wild challenge 2014: Baseline, data and protocol. In Proceedings of the ACM International Conference on Multimodal Interaction (ICMI), pages 461--466, 2014.
[5]
A. Dhall, R. Goecke, S. Lucey, and T. Gedeon. Collecting large, richly annotated facial-expression databases from movies. IEEE Transactions on Multimedia, 19(3):34--41, 2012.
[6]
A. Dhall, O. R. Murthy, R. Goecke, J. Joshi, and T. Gedeon. Video and image based emotion recognition challenges in the wild: Emotiw 2015. In Proceedings of the ACM International Conference on Multimodal Interaction (ICMI), 2015.
[7]
F. Eyben, M. Wöllmer, and B. Schuller. Opensmile: The munich versatile and fast open-source audio feature extractor. In Proceedings of the ACM International Conference on Multimedia (MM), pages 1459--1462, 2010.
[8]
Y. Huang, K. Huang, Y. Yu, and T. Tan. Salient coding for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1753--1760, 2011.
[9]
Y. Huang, Z. Wu, L. Wang, and T. Tan. Feature coding in image classification: A comprehensive study. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(3):493--506, 2014.
[10]
T. Joachims. Text categorization with suport vector machines: Learning with many relevant features. In Proceedings of the European Conference on Machine Learning (ECML). Springer, 1998.
[11]
S. E. Kahou, C. Pal, X. Bouthillier, P. Froumenty, c. Gülçehre, R. Memisevic, P. Vincent, A. Courville, and Y. Bengio. Combining modality specific deep neural networks for emotion recognition in video. In Proceedings of the ACM International Conference on Multimodal Interaction (ICMI), pages 543--550, 2013.
[12]
A. V. Ken Chatfield, Victor Lempitsky and A. Zisserman. The devil is in the details: an evaluation of recent feature encoding methods. In Proceedings of the British Machine Vision Conference (BMVC), pages 76.1--76.12, 2011.
[13]
S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2169--2178, 2006.
[14]
M. Liu, R. Wang, S. Li, S. Shan, Z. Huang, and X. Chen. Combining multiple kernel methods on Riemannian manifold for emotion recognition in the wild. In Proceedings of the ACM International Conference on Multimodal Interaction (ICMI), pages 494--501, 2014.
[15]
S. Lloyd. Least squares quantization in PCM. IEEE Transactions on Information Theory, 28(2):129--137, 1982.
[16]
D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal on Computer Vision, 60(2):91--110, 2004.
[17]
B. Moore. Principal component analysis in linear systems: Controllability, observability, and model reduction. IEEE Transactions on Automatic Control, 26(1):17--32, 1981.
[18]
S. Moore and R. Bowden. Local binary patterns for multi-view facial expression recognition. Computer Vision and Image Understanding, 115(4):541--558, 2011.
[19]
T. Ojala, M. Pietikainen, and T. Maenpaa. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7):971--987, 2002.
[20]
R. Rosipal and N. Krämer. Overview and recent advances in partial least squares. In Proceedings of the International Conference on Subspace, Latent Structure and Feature Selection, pages 34--51. Springer, 2006.
[21]
O. Rudovic, M. Pantic, and I. Patras. Coupled Gaussian processes for pose-invariant facial expression recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(6):1357--1369, 2013.
[22]
K. Sikka, K. Dykstra, S. Sathyanarayana, G. Littlewort, and M. Bartlett. Multiple kernel learning for emotion recognition in the wild. In Proceedings of the ACM International Conference on Multimodal Interaction (ICMI), pages 517--524, 2013.
[23]
K. Sikka, T. Wu, J. Susskind, and M. Bartlett. Exploring bag of words architectures in the facial expression domain. In Proceedings of European Conference Computer Vision Workshop and Demonstrations, pages 250--259, 2012.
[24]
Y.-L. Tian, T. Kanade, and J. F. Cohn. Facial expression analysis. Handbook of face recognition, pages 247--275, 2005.
[25]
M. Valstar, B. Schuller, K. Smith, F. Eyben, B. Jiang, S. Bilakhia, S. Schnieder, R. Cowie, and M. Pantic. Avec 2013: The continuous audio/visual emotion and depression recognition challenge. In Proceedings of the ACM International Workshop on Audio/Visual Emotion Challenge (AVEC), pages 3--10, 2013.
[26]
A. Vedaldi and B. Fulkerson. Vlfeat: An open and portable library of computer vision algorithms. In Proceedings of the ACM International Conference on Multimedia (MM), pages 1469--1472, 2010.
[27]
J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong. Locality-constrained linear coding for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3360--3367, 2010.
[28]
H. Wold. Partial least squares. Encyclopedia of statistical sciences, pages 581--591, 1985.
[29]
Z. Wu, Y. Huang, L. Wang, and T. Tan. Group encoding of local features in image classification. In Proceedings of the IEEE International Conference on Pattern Recognition (ICPR), pages 1505--1508, 2012.
[30]
X. Xiong and F. de la Torre. Supervised descent method and its applications to face alignment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 532--539, 2013.
[31]
J. Yang, K. Yu, Y. Gong, and H. Thomas. Linear spatial pyramid matching using sparse coding for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1794--1801, 2009.
[32]
K. Yu, T. Zhang, and Y. Gong. Nonlinear learning using local coordinate coding. In Proceedings of the Advances in Neural Information Processing Systems (NIPS), pages 2223--2231, 2009.
[33]
Z. Zeng, M. Pantic, G. I. Roisman, and T. S. Huang. A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(1):39--58, 2009.
[34]
G. Zhao and M. Pietikainen. Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(6):915--928, 2007.
[35]
W. Zheng. Multi-view facial expression recognition based on group sparse reduced-rank regression. IEEE Transactions on Affective Computing, 5(1):71--85, 2014.
[36]
X. Zhu and D. Ramanan. Face detection, pose estimation, and landmark localization in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2879--2886, 2012.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction
November 2015
678 pages
ISBN:9781450339124
DOI:10.1145/2818346
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 November 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. bag of features
  2. emotion recognition
  3. emotiw 2015 challenge
  4. multiple models fusion

Qualifiers

  • Research-article

Funding Sources

  • National Basic Research Program of China (973 Program)
  • National Natural Science Foundation of China (NSFC)
  • Microsoft Research Asia Collaborative Research Program

Conference

ICMI '15
Sponsor:
ICMI '15: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION
November 9 - 13, 2015
Washington, Seattle, USA

Acceptance Rates

ICMI '15 Paper Acceptance Rate 52 of 127 submissions, 41%;
Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Multimodal Emotion Recognition in Noisy Environment Based on Progressive Label RevisionProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612867(9571-9575)Online publication date: 26-Oct-2023
  • (2023)Audio-Visual Group-based Emotion Recognition using Local and Global Feature Aggregation based Multi-Task LearningProceedings of the 25th International Conference on Multimodal Interaction10.1145/3577190.3616544(741-745)Online publication date: 9-Oct-2023
  • (2022)Spontaneous Speech Emotion Recognition Using Multiscale Deep Convolutional LSTMIEEE Transactions on Affective Computing10.1109/TAFFC.2019.294746413:2(680-688)Online publication date: 1-Apr-2022
  • (2022)Applied Affective ComputingundefinedOnline publication date: 25-Jan-2022
  • (2021)Towards a deep neural method based on freezing layers for in-the-wild facial emotion recognition2021 IEEE/ACS 18th International Conference on Computer Systems and Applications (AICCSA)10.1109/AICCSA53542.2021.9686927(1-8)Online publication date: Nov-2021
  • (2020)Multimodal(Audio, Facial and Gesture) based Emotion Recognition challenge2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020)10.1109/FG47880.2020.00142(908-911)Online publication date: Nov-2020
  • (2020)Learning Deep Binaural Representations With Deep Convolutional Neural Networks for Spontaneous Speech Emotion RecognitionIEEE Access10.1109/ACCESS.2020.29690328(23496-23505)Online publication date: 2020
  • (2019)Bi-modality Fusion for Emotion Recognition in the Wild2019 International Conference on Multimodal Interaction10.1145/3340555.3355719(589-594)Online publication date: 14-Oct-2019
  • (2018)Recognition of Infants' Gaze Behaviors and Emotions2018 24th International Conference on Pattern Recognition (ICPR)10.1109/ICPR.2018.8545766(3204-3209)Online publication date: Aug-2018
  • (2018)Video-based Emotion Recognition using Aggregated Features and Spatio-temporal Information2018 24th International Conference on Pattern Recognition (ICPR)10.1109/ICPR.2018.8545441(2833-2838)Online publication date: Aug-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media