research-article

CaFGraph: Context-aware Facial Multi-graph Representation for Facial Action Unit Recognition

Authors:

Yun LiangAuthors Info & Claims

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Pages 1029 - 1037

https://doi.org/10.1145/3474085.3475295

Published: 17 October 2021 Publication History

Abstract

Facial action unit (AU) recognition has attracted increasing attention due to its indispensable role in affective computing, especially in the field of affective human-computer interaction. Due to the subtle and transient nature of AU, it is challenging to capture the delicate and ambiguous motions in local facial regions among consecutive frames. Considering that context is essential to resolve ambiguity in human visual system, modeling context within or among facial images emerges as a promising approach for AU recognition task. To this end, we propose CaFGraph, a novel context-aware facial multi-graph that can model both morphological & muscular-based region-level local context and region-level temporal context. CaFGraph is the first work to construct a universal facial multi-graph structure that is independent of both task settings and dataset statistics for almost all fine-grained facial behavior analysis tasks, including but not limited to AU recognition. To make full use of the context, we then present CaFNet that learns context-aware facial graph representations via CaFGraph from facial images for multi-label AU recognition. Experiments on two widely used benchmark datasets, BP4D and DISFA, demonstrate the superiority of our CaFNet over the state-of-the-art methods.

Supplementary Material

MP4 File (MM21-fp709.mp4)

Presentation video

Download
355.85 MB

References

[1]

Irving Biederman, Robert J Mezzanotte, and Jan C Rabinowitz. 1982. Scene perception: Detecting and judging objects undergoing relational violations. Cognitive psychology, Vol. 14, 2 (1982), 143--177.

[2]

Mina Bishay and Ioannis Patras. 2017. Fusing multilabel deep networks for facial action unit detection. In 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017). IEEE, 681--688.

[3]

Yuedong Chen, Guoxian Song, Zhiwen Shao, Jianfei Cai, Tat-Jen Cham, and Jianming Zheng. 2020. GeoConv: Geodesic guided convolution for facial action unit recognition. arXiv preprint arXiv:2003.03055 (2020).

[4]

Yingjie Chen, Tao Wang, Han Wu, and Yizhou Wang. 2018a. A fast and accurate multi-model facial expression recognition method for affective intelligent robots. In 2018 IEEE International Conference on Intelligence and Safety for Robotics (ISR). IEEE, 319--324.

[5]

Yingjie Chen, Han Wu, Tao Wang, and Yizhou Wang. 2018b. A Comparison of Methods of Facial Expression Recognition. In 2018 WRC Symposium on Advanced Robotics and Automation (WRC SARA). IEEE, 261--268.

[6]

Wen-Sheng Chu, Fernando De la Torre, and Jeffery F Cohn. 2013. Selective transfer machine for personalized facial action unit detection. 3515--3522.

Digital Library

[7]

Ciprian Corneanu, Meysam Madadi, and Sergio Escalera. 2018. Deep structure inference network for facial action unit recognition. 298--313.

[8]

P. Ekman and W. Friesen. 1978. Facial action coding system: A technique for the measurement of facial movement .

[9]

S. Eleftheriadis, O. Rudovic, and M. Pantic. 2015. Multi-conditional latent variable model for joint facial action unit detection.

Digital Library

[10]

Shizhong Han, Zibo Meng, Ahmed-Shehab Khan, and Yan Tong. 2016. Incremental boosting convolutional neural network for facial action unit recognition. 109--117.

Digital Library

[11]

Shizhong Han, Zibo Meng, Zhiyuan Li, James O'Reilly, Jie Cai, Xiaofeng Wang, and Yan Tong. 2018. Optimizing filter size in convolutional neural networks for facial action unit recognition.

[12]

Jun He, Dongliang Li, Bin Yang, Siming Cao, Bo Sun, and Lejun Yu. 2017. Multi view facial action unit detection based on CNN and BLSTM-RNN.

[13]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition.

[14]

Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning. PMLR, 448--456.

Digital Library

[15]

Bihan Jiang, Michel F Valstar, and Maja Pantic. 2011. Action unit detection using sparse appearance descriptors in space-time video volumes. IEEE, 314--321.

[16]

Davis E King. 2009. Dlib-ml: A machine learning toolkit. Journal of Machine Learning Research, Vol. 10, Jul (2009), 1755--1758.

Digital Library

[17]

Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).

[18]

Irene Kotsia, Stefanos Zafeiriou, and Ioannis Pitas. 2008. Texture and shape information fusion for facial expression and facial action unit recognition. Pattern Recognition, Vol. 41, 3 (2008), 833--851.

Digital Library

[19]

Guanbin Li, Xin Zhu, Yirui Zeng, Qing Wang, and Liang Lin. 2019. Semantic relationships guided representation learning for facial action unit recognition, Vol. 33. 8594--8601.

[20]

Wei Li, Farnaz Abtahi, and Zhigang Zhu. 2017a. Action unit detection with region adaptation, multi-labeling learning and optimal temporal fusing.

[21]

Wei Li, Farnaz Abtahi, Zhigang Zhu, and Lijun Yin. 2017b. EAC-Net: A region-based deep enhancing and cropping approach for facial action unit detection.

[22]

Zhilei Liu, Jiahui Dong, Cuicui Zhang, Longbiao Wang, and Jianwu Dang. 2020. Relation modeling with graph convolutional networks for facial action unit detection. In International Conference on Multimedia Modeling. Springer, 489--501.

Digital Library

[23]

S Mohammad Mavadati, Mohammad H Mahoor, Kevin Bartlett, Philip Trinh, and Jeffrey F Cohn. 2013. DISFA: A spontaneous facial action intensity database. IEEE Transactions on Affective Computing (2013).

Digital Library

[24]

Xuesong Niu, Hu Han, Songfan Yang, Yan Huang, and Shiguang Shan. 2019. Local relationship learning with person-specific shape regularization for facial action unit detection. 11917--11926.

[25]

Itir Onal Ertugrul, Le Yang, László A Jeni, and Jeffrey F Cohn. 2019. D-PAttNet: Dynamic patch-attentive deep network for action unit detection. Frontiers in computer science, Vol. 1 (2019), 11.

[26]

Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. In NIPS Workshop .

[27]

Yanting Pei, Yaping Huang, Qi Zou, Xingyuan Zhang, and Song Wang. 2019. Effects of image degradation and degradation removal to cnn-based image classification. IEEE transactions on pattern analysis and machine intelligence (2019).

[28]

Delian Ruan, Yan Yan, Si Chen, Jing-Hao Xue, and Hanzi Wang. 2020. Deep Disturbance-Disentangled Learning for Facial Expression Recognition. In Proceedings of the 28th ACM International Conference on Multimedia. 2833--2841.

Digital Library

[29]

Nishant Sankaran, Deen Dayal Mohan, Srirangaraj Setlur, Venugopal Govindaraju, and Dennis Fedorishin. 2019. Representation learning through cross-modality supervision. IEEE, 1--8.

[30]

Zhiwen Shao, Zhilei Liu, Jianfei Cai, and Lizhuang Ma. 2018. Deep adaptive attention for joint facial action unit detection and face alignment.

[31]

Zhiwen Shao, Lixin Zou, Jianfei Cai, Yunsheng Wu, and Lizhuang Ma. 2020. Spatio-Temporal Relation and Attention Learning for Facial Action Unit Detection. arXiv preprint arXiv:2001.01168 (2020).

[32]

Michel Valstar and Maja Pantic. 2006. Fully automatic facial action unit detection and temporal analysis. In Conference on Computer Vision and Pattern Recognition Workshop. IEEE, 149--149.

Digital Library

[33]

Can Wang and Shangfei Wang. 2018. Personalized multiple facial action unit recognition through generative adversarial recognition network. In Proceedings of the 26th ACM international conference on Multimedia. 302--310.

Digital Library

[34]

Kai Xu, Minghai Qin, Fei Sun, Yuhao Wang, Yen-Kuang Chen, and Fengbo Ren. 2020. Learning in the Frequency Domain. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1740--1749.

[35]

Jingwei Yan, Boyuan Jiang, Jingjing Wang, Qiang Li, Chunmao Wang, and Shiliang Pu. 2021. Multi-Level Adaptive Region of Interest and Graph Learning for Facial Action Unit Recognition. arXiv preprint arXiv:2102.12154 (2021).

[36]

Huiyuan Yang, Taoyue Wang, and Lijun Yin. 2020. Adaptive Multimodal Fusion for Facial Action Units Recognition. In Proceedings of the 28th ACM International Conference on Multimedia. 2982--2990.

Digital Library

[37]

Uldis Zarins. 2018. Anatomy of Facial Expressions .Exonicus, Incorporated.

[38]

X. Zhang and M. H. Mahoor. 2014. Simultaneous detection of multiple facial action units via hierarchical task structure learning. In Proceedings of International Conference on Pattern Recognition .

Digital Library

[39]

Xing Zhang, Lijun Yin, Jeffrey F Cohn, Shaun Canavan, Michael Reale, Andy Horowitz, Peng Liu, and Jeffrey M Girard. 2014. BP4D-spontaneous: A high-resolution spontaneous 3d dynamic facial expression database. Image and Vision Computing, Vol. 32, 10 (2014), 692--706.

[40]

Kaili Zhao, Wen-Sheng Chu, and Honggang Zhang. 2016. Deep region and multi-label learning for facial action unit detection.

[41]

Kaili Zhao, Wen-Sheng Chu, Fernando De la Torre, Jeffrey F Cohn, and Honggang Zhang. 2015. Joint patch and multi-label learning for facial action unit detection. 2207--2216.

[42]

Linyi Zhou, Xijian Fan, Yingjie Ma, Tardi Tjahjadi, and Qiaolin Ye. 2020. Uncertainty-aware Cross-dataset Facial Expression Recognition via Regularized Conditional Alignment. In Proceedings of the 28th ACM International Conference on Multimedia. 2964--2972.

Digital Library

[43]

Junjie Zhu, Bingjun Luo, Sicheng Zhao, Shihui Ying, Xibin Zhao, and Yue Gao. 2020. IExpressNet: Facial Expression Recognition with Incremental Classes. In Proceedings of the 28th ACM International Conference on Multimedia. 2899--2908.

Digital Library

Cited By

Ma BAn RZhang WDing YZhao ZZhang RLv TFan CHu Z(2024)Facial Action Unit Detection and Intensity Estimation From Self-Supervised RepresentationIEEE Transactions on Affective Computing10.1109/TAFFC.2024.336701515:3(1669-1683)Online publication date: Jul-2024
https://doi.org/10.1109/TAFFC.2024.3367015
Meng SShi W(2024)Fusing Structure and Appearance Features in Facial Expression Recognition TransformerICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10447031(3600-3604)Online publication date: 14-Apr-2024
https://doi.org/10.1109/ICASSP48485.2024.10447031
Guha SRodriguez-Acosta JDinov I(2024)A Bayesian Multiplex Graph Classifier of Functional Brain Connectivity Across Diverse Tasks of Cognitive ControlNeuroinformatics10.1007/s12021-024-09670-wOnline publication date: 11-Jun-2024
https://doi.org/10.1007/s12021-024-09670-w
Show More Cited By

Index Terms

CaFGraph: Context-aware Facial Multi-graph Representation for Facial Action Unit Recognition
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision

Recommendations

Recognizing action units for facial expression analysis
Multimodal interface for human-machine communication

Most automatic expression analysis systems attempt to recognize a small set of prototypic expressions, such as happiness, anger, surprise, and fear. Such prototypic expressions, however, occur rather infrequently. Human emotions and intentions are more ...
Evaluation of Gabor-Wavelet-Based Facial Action Unit Recognition in Image Sequences of Increasing Complexity
FGR '02: Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition

Previous work suggests that Gabor-wavelet-based methods can achieve high sensitivity and specificity for emotion-specified expressions (e.g., happy, sad) and single action units (AUs) of the Facial Action Coding System (FACS). This paper evaluates a ...
Recognizing Action Units for Facial Expression Analysis

Most automatic expression analysis systems attempt to recognize a small set of prototypic expressions, such as happiness, anger, surprise, and fear. Such prototypic expressions, however, occur rather infrequently. Human emotions and intentions are more ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

October 2021

5796 pages

ISBN:9781450386517

DOI:10.1145/3474085

General Chairs:
Heng Tao Shen
University of Electronic Science&Technology of China, China
,
Yueting Zhuang
Zhejiang University, China
,
John R. Smith
IBM, USA
,
Program Chairs:
Yang Yang
University of Electronic Science and Technology of China, China
,
Pablo Cesar
CWI&TU Delft, The Netherlands
,
Florian Metze
FACEBOOK, Inc., USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '21

Sponsor:

SIGMM

MM '21: ACM Multimedia Conference

October 20 - 24, 2021

Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
370
Total Downloads

Downloads (Last 12 months)48
Downloads (Last 6 weeks)3

Reflects downloads up to 29 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ma BAn RZhang WDing YZhao ZZhang RLv TFan CHu Z(2024)Facial Action Unit Detection and Intensity Estimation From Self-Supervised RepresentationIEEE Transactions on Affective Computing10.1109/TAFFC.2024.336701515:3(1669-1683)Online publication date: Jul-2024
https://doi.org/10.1109/TAFFC.2024.3367015
Meng SShi W(2024)Fusing Structure and Appearance Features in Facial Expression Recognition TransformerICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10447031(3600-3604)Online publication date: 14-Apr-2024
https://doi.org/10.1109/ICASSP48485.2024.10447031
Guha SRodriguez-Acosta JDinov I(2024)A Bayesian Multiplex Graph Classifier of Functional Brain Connectivity Across Diverse Tasks of Cognitive ControlNeuroinformatics10.1007/s12021-024-09670-wOnline publication date: 11-Jun-2024
https://doi.org/10.1007/s12021-024-09670-w
Jin QShi RDou YNi B(2024)AU-vMAE: Knowledge-Guide Action Units Detection via Video Masked AutoencoderPattern Recognition and Computer Vision10.1007/978-981-97-8499-8_33(479-493)Online publication date: 19-Oct-2024
https://doi.org/10.1007/978-981-97-8499-8_33
Liu HAn RZhang ZMa BZhang WSong YHu YChen WDing Y(2024)Norface: Improving Facial Expression Analysis by Identity NormalizationComputer Vision – ECCV 202410.1007/978-3-031-73001-6_17(293-314)Online publication date: 27-Nov-2024
https://doi.org/10.1007/978-3-031-73001-6_17
Li YZhang LLan XJiang DEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Towards Adaptable Graph Representation Learning: An Adaptive Multi-Graph Contrastive TransformerProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612358(6063-6071)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612358
Liu YZhang XLi YZhou JLi XZhao G(2022)Graph-Based Facial Affect Analysis: A ReviewIEEE Transactions on Affective Computing10.1109/TAFFC.2022.321591814:4(2657-2677)Online publication date: 19-Oct-2022
https://dl.acm.org/doi/10.1109/TAFFC.2022.3215918

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten