skip to main content
10.1145/3503161.3548116acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Pursuing Knowledge Consistency: Supervised Hierarchical Contrastive Learning for Facial Action Unit Recognition

Published: 10 October 2022 Publication History

Abstract

With the increasing need for emotion analysis, facial action unit (AU) recognition has attracted much more attention as a fundamental task for affective computing. Although deep learning has boosted the performance of AU recognition to a new level in recent years, it remains challenging to extract subject-consistent representations since the appearance changes caused by AUs are subtle and ambiguous among subjects. We observe that there are three kinds of inherent relations among AUs, which can be treated as strong prior knowledge, and pursuing the consistency of such knowledge is the key to learning subject-consistent representations. To this end, we propose a supervised hierarchical contrastive learning method (SupHCL) for AU recognition to pursue knowledge consistency among different facial images and different AUs, which is orthogonal to methods focusing on network architecture design. Specifically, SupHCL contains three relation consistency modules, i.e., unary, binary, and multivariate relation consistency modules, which take the corresponding kind of inherent relations as extra supervision to encourage knowledge-consistent distributions of both AU-level and image-level representations. Experiments conducted on two commonly used AU benchmark datasets, BP4D and DISFA, demonstrate the effectiveness of each relation consistency module and the superiority of SupHCL.

Supplementary Material

MP4 File (MM22-fp1550.mp4)
Presentation video

References

[1]
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In International conference on machine learning. PMLR, 1597--1607.
[2]
Yingjie Chen, Diqi Chen, Tao Wang, Yizhou Wang, and Yun Liang. 2022. Causal Intervention for Subject-Deconfounded Facial Action Unit Recognition. arXiv preprint arXiv:2204.07935 (2022).
[3]
Yingjie Chen, Diqi Chen, Yizhou Wang, Tao Wang, and Yun Liang. 2021a. CaFGraph: Context-aware Facial Multi-graph Representation for Facial Action Unit Recognition. In Proceedings of the 29th ACM International Conference on Multimedia. 1029--1037.
[4]
Yuedong Chen, Guoxian Song, Zhiwen Shao, Jianfei Cai, Tat-Jen Cham, and Jianmin Zheng. 2021b. GeoConv: geodesic guided convolution for facial action unit recognition. Pattern Recognition (2021), 108355.
[5]
Yingjie Chen, Han Wu, Tao Wang, Yizhou Wang, and Yun Liang. 2021c. Cross-Modal Representation Learning for Lightweight and Accurate Facial Action Unit Detection. IEEE Robotics and Automation Letters, Vol. 6, 4 (2021), 7619--7626.
[6]
Yingjie Chen, Jiarui Zhang, Diqi Chen, Tao Wang, Yizhou Wang, and Yun Liang. 2021d. AUPro: Multi-label Facial Action Unit Proposal Generation for Sequence-Level Analysis. In International Conference on Neural Information Processing. Springer, 88--99.
[7]
Ciprian Corneanu, Meysam Madadi, and Sergio Escalera. 2018. Deep structure inference network for facial action unit recognition. In Proceedings of the European Conference on Computer Vision (ECCV). 298--313.
[8]
Zijun Cui, Tengfei Song, Yuru Wang, and Qiang Ji. 2020. Knowledge augmented deep neural networks for joint facial expression and action unit recognition. Advances in Neural Information Processing Systems, Vol. 33 (2020).
[9]
Paul Ekman. 1992. An argument for basic emotions. Cognition & emotion, Vol. 6, 3--4 (1992), 169--200.
[10]
E Friesen and Paul Ekman. 1978. Facial action coding system: a technique for the measurement of facial movement. Palo Alto, Vol. 3, 2 (1978), 5.
[11]
Florian Graf, Christoph Hofer, Marc Niethammer, and Roland Kwitt. 2021. Dissecting supervised constrastive learning. In International Conference on Machine Learning. PMLR, 3821--3830.
[12]
Shizhong Han, Zibo Meng, Zhiyuan Li, James O'Reilly, Jie Cai, Xiaofeng Wang, and Yan Tong. 2018. Optimizing filter size in convolutional neural networks for facial action unit recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5070--5078.
[13]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[14]
Geethu Miriam Jacob and Bjorn Stenger. 2021. Facial Action Unit Detection With Transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7680--7689.
[15]
Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. 2020. Supervised contrastive learning. arXiv preprint arXiv:2004.11362 (2020).
[16]
Davis E King. 2009. Dlib-ml: A machine learning toolkit. The Journal of Machine Learning Research, Vol. 10 (2009), 1755--1758.
[17]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, Vol. 25 (2012), 1097--1105.
[18]
Guanbin Li, Xin Zhu, Yirui Zeng, Qing Wang, and Liang Lin. 2019. Semantic relationships guided representation learning for facial action unit recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 8594--8601.
[19]
Wei Li, Farnaz Abtahi, and Zhigang Zhu. 2017a. Action unit detection with region adaptation, multi-labeling learning and optimal temporal fusing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1841--1850.
[20]
Wei Li, Farnaz Abtahi, Zhigang Zhu, and Lijun Yin. 2017b. Eac-net: A region-based deep enhancing and cropping approach for facial action unit detection. In 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017). IEEE, 103--110.
[21]
Zhihua Li, Xiang Deng, Xiaotian Li, and Lijun Yin. 2021. Integrating Semantic and Temporal Relationships in Facial Action Unit Detection. In Proceedings of the 29th ACM International Conference on Multimedia. 5519--5527.
[22]
Zhilei Liu, Jiahui Dong, Cuicui Zhang, Longbiao Wang, and Jianwu Dang. 2020. Relation modeling with graph convolutional networks for facial action unit detection. In International Conference on Multimedia Modeling. Springer, 489--501.
[23]
Zheda Mai, Ruiwen Li, Hyunwoo Kim, and Scott Sanner. 2021. Supervised contrastive replay: Revisiting the nearest class mean classifier in online class-incremental continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3589--3599.
[24]
S Mohammad Mavadati, Mohammad H Mahoor, Kevin Bartlett, Philip Trinh, and Jeffrey F Cohn. 2013. Disfa: A spontaneous facial action intensity database. IEEE Transactions on Affective Computing, Vol. 4, 2 (2013), 151--160.
[25]
Fausto Milletari, Nassir Navab, and Seyed-Ahmad Ahmadi. 2016. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In 2016 fourth international conference on 3D vision (3DV). IEEE, 565--571.
[26]
Xuesong Niu, Hu Han, Shiguang Shan, and Xilin Chen. 2019a. Multi-label co-regularization for semi-supervised facial action unit recognition. arXiv preprint arXiv:1910.11012 (2019).
[27]
Xuesong Niu, Hu Han, Songfan Yang, Yan Huang, and Shiguang Shan. 2019b. Local relationship learning with person-specific shape regularization for facial action unit detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11917--11926.
[28]
Itir Onal Ertugrul, Le Yang, László A Jeni, and Jeffrey F Cohn. 2019. D-PAttNet: Dynamic patch-attentive deep network for action unit detection. Frontiers in computer science, Vol. 1 (2019), 11.
[29]
Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. In NIPS Workshop.
[30]
Zhiwen Shao, Zhilei Liu, Jianfei Cai, and Lizhuang Ma. 2018. Deep adaptive attention for joint facial action unit detection and face alignment. In Proceedings of the European conference on computer vision (ECCV). 705--720.
[31]
Tengfei Song, Lisha Chen, Wenming Zheng, and Qiang Ji. 2021a. Uncertain graph neural networks for facial action unit detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 1.
[32]
Tengfei Song, Zijun Cui, Wenming Zheng, and Qiang Ji. 2021b. Hybrid Message Passing With Performance-Driven Structures for Facial Action Unit Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6267--6276.
[33]
Can Wang and Shangfei Wang. 2018. Personalized multiple facial action unit recognition through generative adversarial recognition network. In Proceedings of the 26th ACM international conference on Multimedia. 302--310.
[34]
Huiyuan Yang and Lijun Yin. 2020. RE-Net: A Relation Embedded Deep Model for AU Occurrence and Intensity Estimation. In Proceedings of the Asian Conference on Computer Vision.
[35]
Xing Zhang, Lijun Yin, Jeffrey F Cohn, Shaun Canavan, Michael Reale, Andy Horowitz, Peng Liu, and Jeffrey M Girard. 2014. Bp4d-spontaneous: a high-resolution spontaneous 3d dynamic facial expression database. Image and Vision Computing, Vol. 32, 10 (2014), 692--706.
[36]
Kaili Zhao, Wen-Sheng Chu, and Honggang Zhang. 2016. Deep region and multi-label learning for facial action unit detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3391--3399.
[37]
Ting Zhao and Xiangqian Wu. 2019. Pyramid feature attention network for saliency detection. In CVPR. 3085--3094.

Index Terms

  1. Pursuing Knowledge Consistency: Supervised Hierarchical Contrastive Learning for Facial Action Unit Recognition

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '22: Proceedings of the 30th ACM International Conference on Multimedia
    October 2022
    7537 pages
    ISBN:9781450392037
    DOI:10.1145/3503161
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 October 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. facial action unit
    2. supervised contrastive learning

    Qualifiers

    • Research-article

    Funding Sources

    • the PKU-NTU Joint Research Institute (JRI) sponsored by a donation from the Ng Teng Fong Charitable Foundation
    • the Fundamental Research Funds for the Central Universities
    • the National Key R&D Program of China

    Conference

    MM '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 282
      Total Downloads
    • Downloads (Last 12 months)41
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 29 Jan 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media