short-paper

Contextual modulation of affect: Comparing humans and deep neural networks

Authors:

Christian WallravenAuthors Info & Claims

ICMI '22 Companion: Companion Publication of the 2022 International Conference on Multimodal Interaction

Pages 127 - 133

https://doi.org/10.1145/3536220.3558036

Published: 07 November 2022 Publication History

Abstract

When inferring emotions, humans rely on a number of cues, including not only facial expressions, body posture, but also expressor-external, contextual information. The goal of the present study was to compare the impact of such contextual information on emotion processing in humans and two deep neural network (DNN) models. We used results from a human experiment in which two types of pictures were rated for valence and arousal: the first type depicted people expressing an emotion in a social context including other people; the second was a context-reduced version in which all information except for the target expressor was blurred out. The resulting human ratings of valence and arousal were systematically decreased in the context-reduced version, highlighting the importance of context. We then compared human ratings with those of two DNN models (one trained on face images only, and the other trained also on contextual information). Analyses of both categorical and the valence/arousal ratings showed that although there were some superficial similarities, both models failed to capture human rating patterns both in context-rich and context-reduced conditions. Our study emphasizes the importance of a more holistic, multi-modal training regime with richer human data to build better emotion-understanding systems in the area of affective computing.

References

[1]

Hillel Aviezer, Noga Ensenberg, and Ran R Hassin. 2017. The inherently contextualized nature of facial emotion perception. Current opinion in psychology 17 (2017), 47–54.

[2]

Hillel Aviezer, Yaacov Trope, and Alexander Todorov. 2012. Body cues, not facial expressions, discriminate between intense positive and negative emotions. Science 338, 6111 (2012), 1225–1229.

[3]

Tadas Baltrušaitis, Peter Robinson, and Louis-Philippe Morency. 2016. Openface: an open source facial behavior analysis toolkit. In 2016 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 1–10.

[4]

Lisa Feldman Barrett. 1998. Discrete emotions or dimensions? The role of valence focus and arousal focus. Cognition & Emotion 12, 4 (1998), 579–599.

[5]

Lisa Feldman Barrett. 2017. How emotions are made: The secret life of the brain. Pan Macmillan.

[6]

Lisa Feldman Barrett, Ralph Adolphs, Stacy Marsella, Aleix M Martinez, and Seth D Pollak. 2019. Emotional expressions reconsidered: Challenges to inferring emotion from human facial movements. Psychological science in the public interest 20, 1 (2019), 1–68.

[7]

Lisa Feldman Barrett and Elizabeth A Kensinger. 2010. Context is routinely encoded during emotion perception. Psychological science 21, 4 (2010), 595–599.

[8]

Margaret M Bradley and Peter J Lang. 1994. Measuring emotion: the self-assessment manikin and the semantic differential. Journal of behavior therapy and experimental psychiatry 25, 1(1994), 49–59.

[9]

Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2017. Realtime multi-person 2d pose estimation using part affinity fields. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7291–7299.

[10]

James M Carroll and James A Russell. 1996. Do facial expressions signal specific emotions? Judging emotion from the face in context.Journal of personality and social psychology 70, 2(1996), 205.

[11]

Rohan Chandra, Uttaran Bhattacharya, Christian Roncal, Aniket Bera, and Dinesh Manocha. 2019. Robusttp: End-to-end trajectory prediction for heterogeneous road-agents in dense traffic with noisy sensor inputs. In ACM Computer Science in Cars Symposium. 1–9.

Digital Library

[12]

Y Ivette Colón, Carlos D Castillo, and Alice J O’Toole. 2021. Facial expression is retained in deep networks trained for face identification. Journal of Vision 21, 4 (2021), 4–4.

[13]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248–255.

[14]

James J DiCarlo, Davide Zoccolan, and Nicole C Rust. 2012. How does the brain solve visual object recognition?Neuron 73, 3 (2012), 415–434.

[15]

ULF Dimberg and Maria Petterson. 2000. Facial reactions to happy and angry facial expressions: Evidence for right hemisphere dominance. Psychophysiology 37, 5 (2000), 693–696.

[16]

Sasa Drace, Emir Efendić, Mirna Kusturica, and Lamija Landžo. 2013. Cross-cultural validation of the “International Affective Picture System”(IAPS) on a sample from Bosnia and Herzegovina. Psihologija 46, 1 (2013).

[17]

Bernd Dudzik, Joost Broekens, Mark Neerincx, and Hayley Hung. 2020. Exploring Personal Memories and Video Content as Context for Facial Behavior in Predictions of Video-Induced Emotions. In Proceedings of the 2020 International Conference on Multimodal Interaction. 153–162.

Digital Library

[18]

Bernd Dudzik, Michel-Pierre Jansen, Franziska Burger, Frank Kaptein, Joost Broekens, Dirk KJ Heylen, Hayley Hung, Mark A Neerincx, and Khiet P Truong. 2019. Context in human emotion perception for automatic affect detection: A survey of audiovisual databases. In 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, 206–212.

[19]

Damien Dupré, Eva G Krumhuber, Dennis Küster, and Gary J McKeown. 2020. A performance comparison of eight commercially available automatic classifiers for facial affect recognition. PloS one 15, 4 (2020), e0231968.

[20]

C Fabian Benitez-Quiroz, Ramprakash Srinivasan, and Aleix M Martinez. 2016. Emotionet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5562–5570.

[21]

Panagiotis Paraskevas Filntisis, Niki Efthymiou, Gerasimos Potamianos, and Petros Maragos. 2020. Emotion understanding in videos through body, context, and visual-semantic embedding loss. In European Conference on Computer Vision. Springer, 747–755.

Digital Library

[22]

Yahui Fu, Shogo Okada, Longbiao Wang, Lili Guo, Jiaxing Liu, Yaodong Song, and Jianwu Dang. 2022. Context-and Knowledge-Aware Graph Convolutional Network for Multimodal Emotion Recognition. IEEE MultiMedia (2022).

Digital Library

[23]

Ian J Goodfellow, Dumitru Erhan, Pierre Luc Carrier, Aaron Courville, Mehdi Mirza, Ben Hamner, Will Cukierski, Yichuan Tang, David Thaler, Dong-Hyun Lee, 2013. Challenges in representation learning: A report on three machine learning contests. In International conference on neural information processing. Springer, 117–124.

[24]

Katharine H Greenaway, Elise K Kalokerinos, and Lisa A Williams. 2018. Context is everything (in emotion research). Social and Personality Psychology Compass 12, 6 (2018), e12393.

[25]

Sharath Chandra Guntuku, Weisi Lin, Michael James Scott, and Gheorghita Ghinea. 2015. Modelling the influence of personality and culture on affect and enjoyment in multimedia. In 2015 International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, 236–242.

Digital Library

[26]

Zakia Hammal and Merlin Teodosia Suarez. 2015. Towards context based affective computing introduction to the third international CBAR 2015 workshop. In 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), Vol. 3. IEEE Computer Society, 1–2.

Digital Library

[27]

Doo Yon Kim and Christian Wallraven. 2022. Label quality in AffectNet: results of crowd-based re-annotation. In Asian Conference on Pattern Recognition. Springer, Cham, 518–531.

Digital Library

[28]

Ronak Kosti, Jose M Alvarez, Adria Recasens, and Agata Lapedriza. 2019. Context based emotion recognition using emotic dataset. IEEE transactions on pattern analysis and machine intelligence 42, 11(2019), 2755–2766.

[29]

Shan Li and Weihong Deng. 2020. Deep facial expression recognition: A survey. IEEE transactions on affective computing(2020).

[30]

Zhengqi Li and Noah Snavely. 2018. Megadepth: Learning single-view depth prediction from internet photos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2041–2050.

[31]

Takahiko Masuda, Phoebe C Ellsworth, Batja Mesquita, Janxin Leu, Shigehito Tanida, and Ellen Van de Veerdonk. 2008. Placing the face in context: cultural differences in the perception of facial emotion.Journal of personality and social psychology 94, 3(2008), 365.

[32]

Trisha Mittal, Pooja Guhan, Uttaran Bhattacharya, Rohan Chandra, Aniket Bera, and Dinesh Manocha. 2020. Emoticon: Context-aware multimodal emotion recognition using frege’s principle. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14234–14243.

[33]

Ali Mollahosseini, Behzad Hasani, and Mohammad H Mahoor. 2017. Affectnet: A database for facial expression, valence, and arousal computing in the wild. IEEE Transactions on Affective Computing 10, 1 (2017), 18–31.

Digital Library

[34]

Ali Mollahosseini, Behzad Hasani, Michelle J Salvador, Hojjat Abdollahi, David Chan, and Mohammad H Mahoor. 2016. Facial expression recognition from world wild web. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 58–65.

[35]

Hyungjun Moon, Björn Browatzki, Caroline Blais, and Christian Wallraven. 2019. Deep neural networks process similar facial features compared to humans in facial expression recognition. IBRO Reports 6(2019), S193–S194.

[36]

Christian Mumenthaler and David Sander. 2015. Automatic integration of social information in emotion recognition.Journal of Experimental Psychology: General 144, 2 (2015), 392.

[37]

Maital Neta, F Caroline Davis, and Paul J Whalen. 2011. Valence resolution of ambiguous facial expressions using an emotional oddball task.Emotion 11, 6 (2011), 1425.

[38]

Ekman Paul and Cole James. 1972. Universals and cultural differences in facial expressions of emotions. In Nebraska symposium on motivation, Vol. 19. 207–283.

[39]

Ioannis Pikoulis, Panagiotis P Filntisis, and Petros Maragos. 2021. Leveraging semantic scene characteristics and multi-stream convolutional architectures in a contextual approach for video-based visual emotion recognition in the wild. In 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021). IEEE, 01–08.

Digital Library

[40]

Ruthger Righart and Beatrice De Gelder. 2006. Context influences early perceptual analysis of faces—an electrophysiological study. Cerebral Cortex 16, 9 (2006), 1249–1257.

[41]

Ruthger Righart and Beatrice De Gelder. 2008. Rapid influence of emotional scenes on encoding of facial expressions: an ERP study. Social cognitive and affective neuroscience 3, 3 (2008), 270–278.

[42]

Ruthger Righart and Beatrice de Gelder. 2008. Recognition of facial expressions is influenced by emotional scene gist. Cognitive, Affective, & Behavioral Neuroscience 8, 3(2008), 264–272.

[43]

James A Russell and Beverley Fehr. 1987. Relativity in the perception of emotion in facial expressions.Journal of Experimental Psychology: General 116, 3 (1987), 223.

[44]

Jan Van den Stock, Ruthger Righart, and Beatrice De Gelder. 2007. Body expressions influence recognition of emotions in the face and voice.Emotion 7, 3 (2007), 487.

[45]

Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool. 2016. Temporal segment networks: Towards good practices for deep action recognition. In European conference on computer vision. Springer, 20–36.

[46]

Daniel LK Yamins, Ha Hong, Charles F Cadieu, Ethan A Solomon, Darren Seibert, and James J DiCarlo. 2014. Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the national academy of sciences 111, 23(2014), 8619–8624.

[47]

Neta Yitzhak, Nir Giladi, Tanya Gurevich, Daniel S Messinger, Emily B Prince, Katherine Martin, and Hillel Aviezer. 2017. Gently does it: Humans outperform a software classifier in recognizing subtle, nonstereotypical facial expressions.Emotion 17, 8 (2017), 1187.

[48]

Stefanos Zafeiriou, Athanasios Papaioannou, Irene Kotsia, Mihalis Nicolaou, and Guoying Zhao. 2016. Facial Affect“In-The-Wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 36–47.

[49]

Bolei Zhou, Aditya Khosla, Agata Lapedriza, Antonio Torralba, and Aude Oliva. 2016. Places: An image database for deep scene understanding. arXiv preprint arXiv:1610.02055(2016).

[50]

Liqin Zhou, Anmin Yang, Ming Meng, and Ke Zhou. 2022. Emerged human-like facial expression representation in a deep convolutional neural network. Science Advances 8, 12 (2022), eabj4383. https://doi.org/10.1126/sciadv.abj4383 arXiv:https://www.science.org/doi/pdf/10.1126/sciadv.abj4383

Cited By

Yang VSrivastava AEtesam YZhang CLim A(2023)Contextual Emotion Estimation from Image Captions2023 11th International Conference on Affective Computing and Intelligent Interaction (ACII)10.1109/ACII59096.2023.10388198(1-8)Online publication date: 10-Sep-2023
https://doi.org/10.1109/ACII59096.2023.10388198

Index Terms

Contextual modulation of affect: Comparing humans and deep neural networks
1. Applied computing
  1. Law, social and behavioral sciences
    1. Psychology
2. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Deep Neural Network Augmentation: Generating Faces for Affect Analysis
Abstract
This paper presents a novel approach for synthesizing facial affect; either in terms of the six basic expressions (i.e., anger, disgust, fear, joy, sadness and surprise), or in terms of valence (i.e., how positive or negative is an emotion) and ...
Emotion Recognition Using Physiological Signals
MIDI '15: Proceedings of the Mulitimedia, Interaction, Design and Innnovation

In this paper the problem of emotion recognition using physiological signals is presented. Firstly the problems with acquisition of physiological signals related to specific human emotions are described. It is not a trivial problem to elicit real ...
AVEC 2014: 3D Dimensional Affect and Depression Recognition Challenge
AVEC '14: Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge

Mood disorders are inherently related to emotion. In particular, the behaviour of people suffering from mood disorders such as unipolar depression shows a strong temporal correlation with the affective dimensions valence, arousal and dominance. In ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMI '22 Companion: Companion Publication of the 2022 International Conference on Multimodal Interaction

November 2022

225 pages

ISBN:9781450393898

DOI:10.1145/3536220

Editors:
Raj Tumuluri
Openstream
,
Nicu Sebe
University of Trento
,
Gopal Pingali
Accenture
,
Dinesh Babu Jayagopi
IIIT Bangalore
,
Abhinav Dhall
IIT Ropar
,
Richa Singh
IIT Jodhpur
,
Lisa Anthony
University of Florida
,
Albert Ali Salah
Utrecht University and Boğaziçi University

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 November 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper
Research
Refereed limited

Funding Sources

Conference

ICMI '22

Sponsor:

SIGCHI

ICMI '22: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION

November 7 - 11, 2022

Bengaluru, India

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
106
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)1

Reflects downloads up to 07 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yang VSrivastava AEtesam YZhang CLim A(2023)Contextual Emotion Estimation from Image Captions2023 11th International Conference on Affective Computing and Intelligent Interaction (ACII)10.1109/ACII59096.2023.10388198(1-8)Online publication date: 10-Sep-2023
https://doi.org/10.1109/ACII59096.2023.10388198

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents