research-article

Multiple Models Fusion for Emotion Recognition in the Wild

Authors:

Hongbin ZhaAuthors Info & Claims

ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pages 475 - 481

https://doi.org/10.1145/2818346.2830582

Published: 09 November 2015 Publication History

Abstract

Emotion recognition in the wild is a very challenging task. In this paper, we propose a multiple models fusion method to automatically recognize the expression in the video clip as part of the third Emotion Recognition in the Wild Challenge (EmotiW 2015). In our method, we first extract dense SIFT, LBP-TOP and audio features from each video clip. For dense SIFT features, we use the bag of features (BoF) model with two different encoding methods (locality-constrained linear coding and group saliency based coding) to further represent it. During the classification process, we use partial least square regression to calculate the regression value of each model. By learning the optimal weight of each model based on the regression value, we fuse these models together. We conduct experiments on the given validation and test datasets, and achieve superior performance. The best recognition accuracy of our fusion method is 52.50% on the test dataset, which is 13.17% higher than the challenge baseline accuracy of 39.33%.

References

[1]

C. Busso, Z. Deng, S. Yildirim, M. Bulut, C. M. Lee, A. Kazemzadeh, S. Lee, U. Neumann, and S. Narayanan. Analysis of emotion recognition using facial expressions, speech and multimodal information. In Proceedings of the ACM International Conference on Multimodal Interfaces, pages 205--211, 2004.

Digital Library

[2]

G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray. Visual categorization with bags of keypoints. In Proceedings of European Conference Computer Vision Workshop on Statistical Learning in Computer Vision, pages 1--2, 2004.

[3]

N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 886--893, 2005.

Digital Library

[4]

A. Dhall, R. Goecke, J. Joshi, K. Sikka, and T. Gedeon. Emotion recognition in the wild challenge 2014: Baseline, data and protocol. In Proceedings of the ACM International Conference on Multimodal Interaction (ICMI), pages 461--466, 2014.

Digital Library

[5]

A. Dhall, R. Goecke, S. Lucey, and T. Gedeon. Collecting large, richly annotated facial-expression databases from movies. IEEE Transactions on Multimedia, 19(3):34--41, 2012.

Digital Library

[6]

A. Dhall, O. R. Murthy, R. Goecke, J. Joshi, and T. Gedeon. Video and image based emotion recognition challenges in the wild: Emotiw 2015. In Proceedings of the ACM International Conference on Multimodal Interaction (ICMI), 2015.

Digital Library

[7]

F. Eyben, M. Wöllmer, and B. Schuller. Opensmile: The munich versatile and fast open-source audio feature extractor. In Proceedings of the ACM International Conference on Multimedia (MM), pages 1459--1462, 2010.

Digital Library

[8]

Y. Huang, K. Huang, Y. Yu, and T. Tan. Salient coding for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1753--1760, 2011.

Digital Library

[9]

Y. Huang, Z. Wu, L. Wang, and T. Tan. Feature coding in image classification: A comprehensive study. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(3):493--506, 2014.

Digital Library

[10]

T. Joachims. Text categorization with suport vector machines: Learning with many relevant features. In Proceedings of the European Conference on Machine Learning (ECML). Springer, 1998.

Digital Library

[11]

S. E. Kahou, C. Pal, X. Bouthillier, P. Froumenty, c. Gülçehre, R. Memisevic, P. Vincent, A. Courville, and Y. Bengio. Combining modality specific deep neural networks for emotion recognition in video. In Proceedings of the ACM International Conference on Multimodal Interaction (ICMI), pages 543--550, 2013.

Digital Library

[12]

A. V. Ken Chatfield, Victor Lempitsky and A. Zisserman. The devil is in the details: an evaluation of recent feature encoding methods. In Proceedings of the British Machine Vision Conference (BMVC), pages 76.1--76.12, 2011.

[13]

S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2169--2178, 2006.

Digital Library

[14]

M. Liu, R. Wang, S. Li, S. Shan, Z. Huang, and X. Chen. Combining multiple kernel methods on Riemannian manifold for emotion recognition in the wild. In Proceedings of the ACM International Conference on Multimodal Interaction (ICMI), pages 494--501, 2014.

Digital Library

[15]

S. Lloyd. Least squares quantization in PCM. IEEE Transactions on Information Theory, 28(2):129--137, 1982.

Digital Library

[16]

D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal on Computer Vision, 60(2):91--110, 2004.

Digital Library

[17]

B. Moore. Principal component analysis in linear systems: Controllability, observability, and model reduction. IEEE Transactions on Automatic Control, 26(1):17--32, 1981.

[18]

S. Moore and R. Bowden. Local binary patterns for multi-view facial expression recognition. Computer Vision and Image Understanding, 115(4):541--558, 2011.

Digital Library

[19]

T. Ojala, M. Pietikainen, and T. Maenpaa. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7):971--987, 2002.

Digital Library

[20]

R. Rosipal and N. Krämer. Overview and recent advances in partial least squares. In Proceedings of the International Conference on Subspace, Latent Structure and Feature Selection, pages 34--51. Springer, 2006.

Digital Library

[21]

O. Rudovic, M. Pantic, and I. Patras. Coupled Gaussian processes for pose-invariant facial expression recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(6):1357--1369, 2013.

Digital Library

[22]

K. Sikka, K. Dykstra, S. Sathyanarayana, G. Littlewort, and M. Bartlett. Multiple kernel learning for emotion recognition in the wild. In Proceedings of the ACM International Conference on Multimodal Interaction (ICMI), pages 517--524, 2013.

Digital Library

[23]

K. Sikka, T. Wu, J. Susskind, and M. Bartlett. Exploring bag of words architectures in the facial expression domain. In Proceedings of European Conference Computer Vision Workshop and Demonstrations, pages 250--259, 2012.

Digital Library

[24]

Y.-L. Tian, T. Kanade, and J. F. Cohn. Facial expression analysis. Handbook of face recognition, pages 247--275, 2005.

[25]

M. Valstar, B. Schuller, K. Smith, F. Eyben, B. Jiang, S. Bilakhia, S. Schnieder, R. Cowie, and M. Pantic. Avec 2013: The continuous audio/visual emotion and depression recognition challenge. In Proceedings of the ACM International Workshop on Audio/Visual Emotion Challenge (AVEC), pages 3--10, 2013.

Digital Library

[26]

A. Vedaldi and B. Fulkerson. Vlfeat: An open and portable library of computer vision algorithms. In Proceedings of the ACM International Conference on Multimedia (MM), pages 1469--1472, 2010.

Digital Library

[27]

J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong. Locality-constrained linear coding for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3360--3367, 2010.

[28]

H. Wold. Partial least squares. Encyclopedia of statistical sciences, pages 581--591, 1985.

[29]

Z. Wu, Y. Huang, L. Wang, and T. Tan. Group encoding of local features in image classification. In Proceedings of the IEEE International Conference on Pattern Recognition (ICPR), pages 1505--1508, 2012.

[30]

X. Xiong and F. de la Torre. Supervised descent method and its applications to face alignment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 532--539, 2013.

Digital Library

[31]

J. Yang, K. Yu, Y. Gong, and H. Thomas. Linear spatial pyramid matching using sparse coding for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1794--1801, 2009.

[32]

K. Yu, T. Zhang, and Y. Gong. Nonlinear learning using local coordinate coding. In Proceedings of the Advances in Neural Information Processing Systems (NIPS), pages 2223--2231, 2009.

[33]

Z. Zeng, M. Pantic, G. I. Roisman, and T. S. Huang. A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(1):39--58, 2009.

Digital Library

[34]

G. Zhao and M. Pietikainen. Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(6):915--928, 2007.

Digital Library

[35]

W. Zheng. Multi-view facial expression recognition based on group sparse reduced-rank regression. IEEE Transactions on Affective Computing, 5(1):71--85, 2014.

[36]

X. Zhu and D. Ramanan. Face detection, pose estimation, and landmark localization in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2879--2886, 2012.

Digital Library

Cited By

Li Slian HLu CZhao YTang CZong YZheng WEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Multimodal Emotion Recognition in Noisy Environment Based on Progressive Label RevisionProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612867(9571-9575)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612867
Li SLian HLu CZhao YTang CZong YZheng W(2023)Audio-Visual Group-based Emotion Recognition using Local and Global Feature Aggregation based Multi-Task LearningProceedings of the 25th International Conference on Multimodal Interaction10.1145/3577190.3616544(741-745)Online publication date: 9-Oct-2023
https://dl.acm.org/doi/10.1145/3577190.3616544
Zhang SZhao XTian Q(2022)Spontaneous Speech Emotion Recognition Using Multiscale Deep Convolutional LSTMIEEE Transactions on Affective Computing10.1109/TAFFC.2019.294746413:2(680-688)Online publication date: 1-Apr-2022
https://doi.org/10.1109/TAFFC.2019.2947464
Show More Cited By

Index Terms

Multiple Models Fusion for Emotion Recognition in the Wild
1. Computing methodologies

Recommendations

Capturing AU-Aware Facial Features and Their Latent Relations for Emotion Recognition in the Wild
ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

The Emotion Recognition in the Wild (EmotiW) Challenge has been held for three years. Previous winner teams primarily focus on designing specific deep neural networks or fusing diverse hand-crafted and deep convolutional features. They all neglect to ...
Bi-modality Fusion for Emotion Recognition in the Wild
ICMI '19: 2019 International Conference on Multimodal Interaction

The emotion recognition in the wild has been a hot research topic in the field of affective computing. Though some progresses have been achieved, the emotion recognition in the wild is still an unsolved problem due to the challenge of head movement, ...
Affect Recognition using Key Frame Selection based on Minimum Sparse Reconstruction
ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

In this paper, we present the methods used for Bahcesehir University team's submissions to the 2015 Emotion Recognition in the Wild Challenge. The challenge consists of categorical emotion recognition in short video clips extracted from movies based on ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

November 2015

678 pages

ISBN:9781450339124

DOI:10.1145/2818346

General Chairs:
Zhengyou Zhang
Microsoft Research, USA
,
Phil Cohen
VoiceBox Technologies, USA
,
Program Chairs:
Dan Bohus
Microsoft Research, USA
,
Radu Horaud
INRIA Grenoble Rhone-Alpes, France
,
Helen Meng
Chinese University of Hong Kong, China

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 November 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Basic Research Program of China (973 Program)
National Natural Science Foundation of China (NSFC)
Microsoft Research Asia Collaborative Research Program

Conference

ICMI '15

Sponsor:

SIGCHI

ICMI '15: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION

November 9 - 13, 2015

Washington, Seattle, USA

Acceptance Rates

ICMI '15 Paper Acceptance Rate 52 of 127 submissions, 41%;

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

17
Total Citations
View Citations
266
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Li Slian HLu CZhao YTang CZong YZheng WEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Multimodal Emotion Recognition in Noisy Environment Based on Progressive Label RevisionProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612867(9571-9575)Online publication date: 26-Oct-2023
Li SLian HLu CZhao YTang CZong YZheng W(2023)Audio-Visual Group-based Emotion Recognition using Local and Global Feature Aggregation based Multi-Task LearningProceedings of the 25th International Conference on Multimodal Interaction10.1145/3577190.3616544(741-745)Online publication date: 9-Oct-2023
Zhang SZhao XTian Q(2022)Spontaneous Speech Emotion Recognition Using Multiscale Deep Convolutional LSTMIEEE Transactions on Affective Computing10.1109/TAFFC.2019.294746413:2(680-688)Online publication date: 1-Apr-2022
Tian LOviatt SMuszynski MChamberlain BHealey JSano A(2022)Applied Affective ComputingundefinedOnline publication date: 25-Jan-2022
Boughanem HGhazouani HBarhoumi W(2021)Towards a deep neural method based on freezing layers for in-the-wild facial emotion recognition2021 IEEE/ACS 18th International Conference on Computer Systems and Applications (AICCSA)10.1109/AICCSA53542.2021.9686927(1-8)Online publication date: Nov-2021
Wei GJian LMo S(2020)Multimodal(Audio, Facial and Gesture) based Emotion Recognition challenge2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020)10.1109/FG47880.2020.00142(908-911)Online publication date: Nov-2020
Zhang SChen AGuo WCui YZhao XLiu L(2020)Learning Deep Binaural Representations With Deep Convolutional Neural Networks for Spontaneous Speech Emotion RecognitionIEEE Access10.1109/ACCESS.2020.29690328(23496-23505)Online publication date: 2020
Li SZheng WZong YLu CTang CJiang XLiu JXia W(2019)Bi-modality Fusion for Emotion Recognition in the Wild2019 International Conference on Multimodal Interaction10.1145/3340555.3355719(589-594)Online publication date: 14-Oct-2019
Yang BCui JTong YWang LZha H(2018)Recognition of Infants' Gaze Behaviors and Emotions2018 24th International Conference on Pattern Recognition (ICPR)10.1109/ICPR.2018.8545766(3204-3209)Online publication date: Aug-2018
Xu JDong YMa LBai H(2018)Video-based Emotion Recognition using Aggregated Features and Spatio-temporal Information2018 24th International Conference on Pattern Recognition (ICPR)10.1109/ICPR.2018.8545441(2833-2838)Online publication date: Aug-2018
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents