skip to main content
10.1145/3474085.3475295acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

CaFGraph: Context-aware Facial Multi-graph Representation for Facial Action Unit Recognition

Published: 17 October 2021 Publication History

Abstract

Facial action unit (AU) recognition has attracted increasing attention due to its indispensable role in affective computing, especially in the field of affective human-computer interaction. Due to the subtle and transient nature of AU, it is challenging to capture the delicate and ambiguous motions in local facial regions among consecutive frames. Considering that context is essential to resolve ambiguity in human visual system, modeling context within or among facial images emerges as a promising approach for AU recognition task. To this end, we propose CaFGraph, a novel context-aware facial multi-graph that can model both morphological & muscular-based region-level local context and region-level temporal context. CaFGraph is the first work to construct a universal facial multi-graph structure that is independent of both task settings and dataset statistics for almost all fine-grained facial behavior analysis tasks, including but not limited to AU recognition. To make full use of the context, we then present CaFNet that learns context-aware facial graph representations via CaFGraph from facial images for multi-label AU recognition. Experiments on two widely used benchmark datasets, BP4D and DISFA, demonstrate the superiority of our CaFNet over the state-of-the-art methods.

Supplementary Material

MP4 File (MM21-fp709.mp4)
Presentation video

References

[1]
Irving Biederman, Robert J Mezzanotte, and Jan C Rabinowitz. 1982. Scene perception: Detecting and judging objects undergoing relational violations. Cognitive psychology, Vol. 14, 2 (1982), 143--177.
[2]
Mina Bishay and Ioannis Patras. 2017. Fusing multilabel deep networks for facial action unit detection. In 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017). IEEE, 681--688.
[3]
Yuedong Chen, Guoxian Song, Zhiwen Shao, Jianfei Cai, Tat-Jen Cham, and Jianming Zheng. 2020. GeoConv: Geodesic guided convolution for facial action unit recognition. arXiv preprint arXiv:2003.03055 (2020).
[4]
Yingjie Chen, Tao Wang, Han Wu, and Yizhou Wang. 2018a. A fast and accurate multi-model facial expression recognition method for affective intelligent robots. In 2018 IEEE International Conference on Intelligence and Safety for Robotics (ISR). IEEE, 319--324.
[5]
Yingjie Chen, Han Wu, Tao Wang, and Yizhou Wang. 2018b. A Comparison of Methods of Facial Expression Recognition. In 2018 WRC Symposium on Advanced Robotics and Automation (WRC SARA). IEEE, 261--268.
[6]
Wen-Sheng Chu, Fernando De la Torre, and Jeffery F Cohn. 2013. Selective transfer machine for personalized facial action unit detection. 3515--3522.
[7]
Ciprian Corneanu, Meysam Madadi, and Sergio Escalera. 2018. Deep structure inference network for facial action unit recognition. 298--313.
[8]
P. Ekman and W. Friesen. 1978. Facial action coding system: A technique for the measurement of facial movement .
[9]
S. Eleftheriadis, O. Rudovic, and M. Pantic. 2015. Multi-conditional latent variable model for joint facial action unit detection.
[10]
Shizhong Han, Zibo Meng, Ahmed-Shehab Khan, and Yan Tong. 2016. Incremental boosting convolutional neural network for facial action unit recognition. 109--117.
[11]
Shizhong Han, Zibo Meng, Zhiyuan Li, James O'Reilly, Jie Cai, Xiaofeng Wang, and Yan Tong. 2018. Optimizing filter size in convolutional neural networks for facial action unit recognition.
[12]
Jun He, Dongliang Li, Bin Yang, Siming Cao, Bo Sun, and Lejun Yu. 2017. Multi view facial action unit detection based on CNN and BLSTM-RNN.
[13]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition.
[14]
Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning. PMLR, 448--456.
[15]
Bihan Jiang, Michel F Valstar, and Maja Pantic. 2011. Action unit detection using sparse appearance descriptors in space-time video volumes. IEEE, 314--321.
[16]
Davis E King. 2009. Dlib-ml: A machine learning toolkit. Journal of Machine Learning Research, Vol. 10, Jul (2009), 1755--1758.
[17]
Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).
[18]
Irene Kotsia, Stefanos Zafeiriou, and Ioannis Pitas. 2008. Texture and shape information fusion for facial expression and facial action unit recognition. Pattern Recognition, Vol. 41, 3 (2008), 833--851.
[19]
Guanbin Li, Xin Zhu, Yirui Zeng, Qing Wang, and Liang Lin. 2019. Semantic relationships guided representation learning for facial action unit recognition, Vol. 33. 8594--8601.
[20]
Wei Li, Farnaz Abtahi, and Zhigang Zhu. 2017a. Action unit detection with region adaptation, multi-labeling learning and optimal temporal fusing.
[21]
Wei Li, Farnaz Abtahi, Zhigang Zhu, and Lijun Yin. 2017b. EAC-Net: A region-based deep enhancing and cropping approach for facial action unit detection.
[22]
Zhilei Liu, Jiahui Dong, Cuicui Zhang, Longbiao Wang, and Jianwu Dang. 2020. Relation modeling with graph convolutional networks for facial action unit detection. In International Conference on Multimedia Modeling. Springer, 489--501.
[23]
S Mohammad Mavadati, Mohammad H Mahoor, Kevin Bartlett, Philip Trinh, and Jeffrey F Cohn. 2013. DISFA: A spontaneous facial action intensity database. IEEE Transactions on Affective Computing (2013).
[24]
Xuesong Niu, Hu Han, Songfan Yang, Yan Huang, and Shiguang Shan. 2019. Local relationship learning with person-specific shape regularization for facial action unit detection. 11917--11926.
[25]
Itir Onal Ertugrul, Le Yang, László A Jeni, and Jeffrey F Cohn. 2019. D-PAttNet: Dynamic patch-attentive deep network for action unit detection. Frontiers in computer science, Vol. 1 (2019), 11.
[26]
Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. In NIPS Workshop .
[27]
Yanting Pei, Yaping Huang, Qi Zou, Xingyuan Zhang, and Song Wang. 2019. Effects of image degradation and degradation removal to cnn-based image classification. IEEE transactions on pattern analysis and machine intelligence (2019).
[28]
Delian Ruan, Yan Yan, Si Chen, Jing-Hao Xue, and Hanzi Wang. 2020. Deep Disturbance-Disentangled Learning for Facial Expression Recognition. In Proceedings of the 28th ACM International Conference on Multimedia. 2833--2841.
[29]
Nishant Sankaran, Deen Dayal Mohan, Srirangaraj Setlur, Venugopal Govindaraju, and Dennis Fedorishin. 2019. Representation learning through cross-modality supervision. IEEE, 1--8.
[30]
Zhiwen Shao, Zhilei Liu, Jianfei Cai, and Lizhuang Ma. 2018. Deep adaptive attention for joint facial action unit detection and face alignment.
[31]
Zhiwen Shao, Lixin Zou, Jianfei Cai, Yunsheng Wu, and Lizhuang Ma. 2020. Spatio-Temporal Relation and Attention Learning for Facial Action Unit Detection. arXiv preprint arXiv:2001.01168 (2020).
[32]
Michel Valstar and Maja Pantic. 2006. Fully automatic facial action unit detection and temporal analysis. In Conference on Computer Vision and Pattern Recognition Workshop. IEEE, 149--149.
[33]
Can Wang and Shangfei Wang. 2018. Personalized multiple facial action unit recognition through generative adversarial recognition network. In Proceedings of the 26th ACM international conference on Multimedia. 302--310.
[34]
Kai Xu, Minghai Qin, Fei Sun, Yuhao Wang, Yen-Kuang Chen, and Fengbo Ren. 2020. Learning in the Frequency Domain. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1740--1749.
[35]
Jingwei Yan, Boyuan Jiang, Jingjing Wang, Qiang Li, Chunmao Wang, and Shiliang Pu. 2021. Multi-Level Adaptive Region of Interest and Graph Learning for Facial Action Unit Recognition. arXiv preprint arXiv:2102.12154 (2021).
[36]
Huiyuan Yang, Taoyue Wang, and Lijun Yin. 2020. Adaptive Multimodal Fusion for Facial Action Units Recognition. In Proceedings of the 28th ACM International Conference on Multimedia. 2982--2990.
[37]
Uldis Zarins. 2018. Anatomy of Facial Expressions .Exonicus, Incorporated.
[38]
X. Zhang and M. H. Mahoor. 2014. Simultaneous detection of multiple facial action units via hierarchical task structure learning. In Proceedings of International Conference on Pattern Recognition .
[39]
Xing Zhang, Lijun Yin, Jeffrey F Cohn, Shaun Canavan, Michael Reale, Andy Horowitz, Peng Liu, and Jeffrey M Girard. 2014. BP4D-spontaneous: A high-resolution spontaneous 3d dynamic facial expression database. Image and Vision Computing, Vol. 32, 10 (2014), 692--706.
[40]
Kaili Zhao, Wen-Sheng Chu, and Honggang Zhang. 2016. Deep region and multi-label learning for facial action unit detection.
[41]
Kaili Zhao, Wen-Sheng Chu, Fernando De la Torre, Jeffrey F Cohn, and Honggang Zhang. 2015. Joint patch and multi-label learning for facial action unit detection. 2207--2216.
[42]
Linyi Zhou, Xijian Fan, Yingjie Ma, Tardi Tjahjadi, and Qiaolin Ye. 2020. Uncertainty-aware Cross-dataset Facial Expression Recognition via Regularized Conditional Alignment. In Proceedings of the 28th ACM International Conference on Multimedia. 2964--2972.
[43]
Junjie Zhu, Bingjun Luo, Sicheng Zhao, Shihui Ying, Xibin Zhao, and Yue Gao. 2020. IExpressNet: Facial Expression Recognition with Incremental Classes. In Proceedings of the 28th ACM International Conference on Multimedia. 2899--2908.

Cited By

View all

Index Terms

  1. CaFGraph: Context-aware Facial Multi-graph Representation for Facial Action Unit Recognition

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '21: Proceedings of the 29th ACM International Conference on Multimedia
    October 2021
    5796 pages
    ISBN:9781450386517
    DOI:10.1145/3474085
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 October 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. context-aware
    2. facial action unit
    3. multi-graph representation

    Qualifiers

    • Research-article

    Conference

    MM '21
    Sponsor:
    MM '21: ACM Multimedia Conference
    October 20 - 24, 2021
    Virtual Event, China

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)48
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 29 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media