research-article

Pursuing Knowledge Consistency: Supervised Hierarchical Contrastive Learning for Facial Action Unit Recognition

Authors:

Jianqiang Huang,

Xian-Sheng Hua,

Yun LiangAuthors Info & Claims

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Pages 111 - 119

https://doi.org/10.1145/3503161.3548116

Published: 10 October 2022 Publication History

Abstract

With the increasing need for emotion analysis, facial action unit (AU) recognition has attracted much more attention as a fundamental task for affective computing. Although deep learning has boosted the performance of AU recognition to a new level in recent years, it remains challenging to extract subject-consistent representations since the appearance changes caused by AUs are subtle and ambiguous among subjects. We observe that there are three kinds of inherent relations among AUs, which can be treated as strong prior knowledge, and pursuing the consistency of such knowledge is the key to learning subject-consistent representations. To this end, we propose a supervised hierarchical contrastive learning method (SupHCL) for AU recognition to pursue knowledge consistency among different facial images and different AUs, which is orthogonal to methods focusing on network architecture design. Specifically, SupHCL contains three relation consistency modules, i.e., unary, binary, and multivariate relation consistency modules, which take the corresponding kind of inherent relations as extra supervision to encourage knowledge-consistent distributions of both AU-level and image-level representations. Experiments conducted on two commonly used AU benchmark datasets, BP4D and DISFA, demonstrate the effectiveness of each relation consistency module and the superiority of SupHCL.

Supplementary Material

MP4 File (MM22-fp1550.mp4)

Presentation video

Download
222.50 MB

References

[1]

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In International conference on machine learning. PMLR, 1597--1607.

[2]

Yingjie Chen, Diqi Chen, Tao Wang, Yizhou Wang, and Yun Liang. 2022. Causal Intervention for Subject-Deconfounded Facial Action Unit Recognition. arXiv preprint arXiv:2204.07935 (2022).

[3]

Yingjie Chen, Diqi Chen, Yizhou Wang, Tao Wang, and Yun Liang. 2021a. CaFGraph: Context-aware Facial Multi-graph Representation for Facial Action Unit Recognition. In Proceedings of the 29th ACM International Conference on Multimedia. 1029--1037.

Digital Library

[4]

Yuedong Chen, Guoxian Song, Zhiwen Shao, Jianfei Cai, Tat-Jen Cham, and Jianmin Zheng. 2021b. GeoConv: geodesic guided convolution for facial action unit recognition. Pattern Recognition (2021), 108355.

[5]

Yingjie Chen, Han Wu, Tao Wang, Yizhou Wang, and Yun Liang. 2021c. Cross-Modal Representation Learning for Lightweight and Accurate Facial Action Unit Detection. IEEE Robotics and Automation Letters, Vol. 6, 4 (2021), 7619--7626.

[6]

Yingjie Chen, Jiarui Zhang, Diqi Chen, Tao Wang, Yizhou Wang, and Yun Liang. 2021d. AUPro: Multi-label Facial Action Unit Proposal Generation for Sequence-Level Analysis. In International Conference on Neural Information Processing. Springer, 88--99.

[7]

Ciprian Corneanu, Meysam Madadi, and Sergio Escalera. 2018. Deep structure inference network for facial action unit recognition. In Proceedings of the European Conference on Computer Vision (ECCV). 298--313.

[8]

Zijun Cui, Tengfei Song, Yuru Wang, and Qiang Ji. 2020. Knowledge augmented deep neural networks for joint facial expression and action unit recognition. Advances in Neural Information Processing Systems, Vol. 33 (2020).

[9]

Paul Ekman. 1992. An argument for basic emotions. Cognition & emotion, Vol. 6, 3--4 (1992), 169--200.

[10]

E Friesen and Paul Ekman. 1978. Facial action coding system: a technique for the measurement of facial movement. Palo Alto, Vol. 3, 2 (1978), 5.

[11]

Florian Graf, Christoph Hofer, Marc Niethammer, and Roland Kwitt. 2021. Dissecting supervised constrastive learning. In International Conference on Machine Learning. PMLR, 3821--3830.

[12]

Shizhong Han, Zibo Meng, Zhiyuan Li, James O'Reilly, Jie Cai, Xiaofeng Wang, and Yan Tong. 2018. Optimizing filter size in convolutional neural networks for facial action unit recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5070--5078.

[13]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[14]

Geethu Miriam Jacob and Bjorn Stenger. 2021. Facial Action Unit Detection With Transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7680--7689.

[15]

Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. 2020. Supervised contrastive learning. arXiv preprint arXiv:2004.11362 (2020).

[16]

Davis E King. 2009. Dlib-ml: A machine learning toolkit. The Journal of Machine Learning Research, Vol. 10 (2009), 1755--1758.

Digital Library

[17]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, Vol. 25 (2012), 1097--1105.

Digital Library

[18]

Guanbin Li, Xin Zhu, Yirui Zeng, Qing Wang, and Liang Lin. 2019. Semantic relationships guided representation learning for facial action unit recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 8594--8601.

Digital Library

[19]

Wei Li, Farnaz Abtahi, and Zhigang Zhu. 2017a. Action unit detection with region adaptation, multi-labeling learning and optimal temporal fusing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1841--1850.

[20]

Wei Li, Farnaz Abtahi, Zhigang Zhu, and Lijun Yin. 2017b. Eac-net: A region-based deep enhancing and cropping approach for facial action unit detection. In 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017). IEEE, 103--110.

Digital Library

[21]

Zhihua Li, Xiang Deng, Xiaotian Li, and Lijun Yin. 2021. Integrating Semantic and Temporal Relationships in Facial Action Unit Detection. In Proceedings of the 29th ACM International Conference on Multimedia. 5519--5527.

Digital Library

[22]

Zhilei Liu, Jiahui Dong, Cuicui Zhang, Longbiao Wang, and Jianwu Dang. 2020. Relation modeling with graph convolutional networks for facial action unit detection. In International Conference on Multimedia Modeling. Springer, 489--501.

Digital Library

[23]

Zheda Mai, Ruiwen Li, Hyunwoo Kim, and Scott Sanner. 2021. Supervised contrastive replay: Revisiting the nearest class mean classifier in online class-incremental continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3589--3599.

[24]

S Mohammad Mavadati, Mohammad H Mahoor, Kevin Bartlett, Philip Trinh, and Jeffrey F Cohn. 2013. Disfa: A spontaneous facial action intensity database. IEEE Transactions on Affective Computing, Vol. 4, 2 (2013), 151--160.

Digital Library

[25]

Fausto Milletari, Nassir Navab, and Seyed-Ahmad Ahmadi. 2016. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In 2016 fourth international conference on 3D vision (3DV). IEEE, 565--571.

[26]

Xuesong Niu, Hu Han, Shiguang Shan, and Xilin Chen. 2019a. Multi-label co-regularization for semi-supervised facial action unit recognition. arXiv preprint arXiv:1910.11012 (2019).

[27]

Xuesong Niu, Hu Han, Songfan Yang, Yan Huang, and Shiguang Shan. 2019b. Local relationship learning with person-specific shape regularization for facial action unit detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11917--11926.

[28]

Itir Onal Ertugrul, Le Yang, László A Jeni, and Jeffrey F Cohn. 2019. D-PAttNet: Dynamic patch-attentive deep network for action unit detection. Frontiers in computer science, Vol. 1 (2019), 11.

[29]

Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. In NIPS Workshop.

[30]

Zhiwen Shao, Zhilei Liu, Jianfei Cai, and Lizhuang Ma. 2018. Deep adaptive attention for joint facial action unit detection and face alignment. In Proceedings of the European conference on computer vision (ECCV). 705--720.

[31]

Tengfei Song, Lisha Chen, Wenming Zheng, and Qiang Ji. 2021a. Uncertain graph neural networks for facial action unit detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 1.

[32]

Tengfei Song, Zijun Cui, Wenming Zheng, and Qiang Ji. 2021b. Hybrid Message Passing With Performance-Driven Structures for Facial Action Unit Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6267--6276.

[33]

Can Wang and Shangfei Wang. 2018. Personalized multiple facial action unit recognition through generative adversarial recognition network. In Proceedings of the 26th ACM international conference on Multimedia. 302--310.

Digital Library

[34]

Huiyuan Yang and Lijun Yin. 2020. RE-Net: A Relation Embedded Deep Model for AU Occurrence and Intensity Estimation. In Proceedings of the Asian Conference on Computer Vision.

[35]

Xing Zhang, Lijun Yin, Jeffrey F Cohn, Shaun Canavan, Michael Reale, Andy Horowitz, Peng Liu, and Jeffrey M Girard. 2014. Bp4d-spontaneous: a high-resolution spontaneous 3d dynamic facial expression database. Image and Vision Computing, Vol. 32, 10 (2014), 692--706.

[36]

Kaili Zhao, Wen-Sheng Chu, and Honggang Zhang. 2016. Deep region and multi-label learning for facial action unit detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3391--3399.

[37]

Ting Zhao and Xiangqian Wu. 2019. Pyramid feature attention network for saliency detection. In CVPR. 3085--3094.

Index Terms

Pursuing Knowledge Consistency: Supervised Hierarchical Contrastive Learning for Facial Action Unit Recognition
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision

Recommendations

Dual semi-supervised learning for facial action unit recognition
AAAI'19/IAAI'19/EAAI'19: Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence

Current works on facial action unit (AU) recognition typically require fully AU-labeled training samples. To reduce the reliance on time-consuming manual AU annotations, we propose a novel semi-supervised AU recognition method leveraging two kinds of ...
Facial action unit detection with emotion consistency: a cross-modal learning approach: Facial action unit detection with emotion consistency
Abstract
Facial Action Unit (AU) detection is essential for understanding emotional expressions. This study explores the intricate relationship among AUs, AU descriptions, and facial expressions, emphasizing emotional expression consistency. AUs represent ...
Evaluation of Gabor-Wavelet-Based Facial Action Unit Recognition in Image Sequences of Increasing Complexity
FGR '02: Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition

Previous work suggests that Gabor-wavelet-based methods can achieve high sensitivity and specificity for emotion-specified expressions (e.g., happy, sad) and single action units (AUs) of the Facial Action Coding System (FACS). This paper evaluates a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

October 2022

7537 pages

ISBN:9781450392037

DOI:10.1145/3503161

General Chairs:
João Magalhães
NOVA University of Lisbon, Portugal
,
Alberto del Bimbo
University of Florence, Italy
,
Shin'ichi Satoh
National Institute of Informatics, Japan
,
Nicu Sebe
University of Trento, Italy
,
Program Chairs:
Xavier Alameda-Pineda
Inria, Grenoble, France
,
Qin Jin
Renmin University of China, China
,
Vincent Oria
New Jersey Institute of Technology, USA
,
Laura Toni
University College London, UK

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

the PKU-NTU Joint Research Institute (JRI) sponsored by a donation from the Ng Teng Fong Charitable Foundation
the Fundamental Research Funds for the Central Universities
the National Key R&D Program of China

Conference

MM '22

Sponsor:

SIGMM

MM '22: The 30th ACM International Conference on Multimedia

October 10 - 14, 2022

Lisboa, Portugal

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
282
Total Downloads

Downloads (Last 12 months)41
Downloads (Last 6 weeks)4

Reflects downloads up to 29 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten