skip to main content
10.1109/FG57933.2023.10042513guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

Multi-Zone Transformer Based on Self-Distillation for Facial Attribute Recognition

Published: 05 January 2023 Publication History

Abstract

Recently, transformers have shown great promising performance in various computer vision tasks. However, the current transformer based methods ignore the information exchanges between transformer blocks, and they have not been applied in the facial attribute recognition task. In this paper, we propose a multi-zone transformer based on self-distillation for FAR, termed MZTS, to predict the facial attributes. A multi-zone transformer encoder is firstly presented to achieve the interactions of the different transformer encoder blocks, thus avoiding forgetting the effective information between the transformer encoder block groups during the iteration process. Furthermore, we introduce a new self-distillation mechanism based on class tokens, which distills the class tokens obtained from the last transformer encoder block group to the other shallow groups by interacting with the significant information between the different transformer blocks through attention. Extensive experiments on the challenging CelebA and LFWA datasets have demonstrated the excellent performance of the proposed method for FAR.

References

[1]
B.-C. Chen, Y.-Y. Chen, Y.-H. Kuo, and W. H. Hsu. Scalable face image retrieval using attribute-enhanced sparse codewords. IEEE Transactions on Multimedia. 15(5):1163–1173. 2013.
[2]
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby. An image is worth 16×16 words: Trans-formers for image recognition at scale. In Proceedings of International Conference on Learning Representations, pages 1,2,3,4,6, 2021.
[3]
P. Goyal, P. Dollár, R. Girshick, P. Noordhuis, L. Wesolowski, A. Ky-rola, A. Tulloch, Y. Jia, and K. He. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:, 2017.
[4]
E. M. Hand and R. Chellappa. Attributes for improved attributes: A multi-task network utilizing implicit and explicit relationships for facial attribute classification. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 4068–4074, 2017.
[5]
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
[6]
R. He, T. Tan, L. Davis, and Z. Sun. Learning structured ordinal measures for video based face recognition. Pattern Recognition, 75:4–14. 2018.
[7]
Z. Huang, Y. Zou, B. V. K.V. Kumar, and D. Huang. Comprehensive attention self-distillation for weakly-supervised object detection. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, editors, Proceedings of Advances in Neural Information Processing Systems, volume 33, pages 16797–16807. Curran Associates. Inc., 2020.
[8]
N. Kumar, A. Berg, P. N. Belhumeur, and S. Nayar. Describable visual attributes for face verification and image search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(10):1962–1977. 2011.
[9]
J. Li, F. Zhao, J. Feng, S. Roy, S. Yan, and T. Sim. Landmark free face attribute prediction. IEEE Transactions on Image Processing, 27(9):4651–4662, 2018.
[10]
Q. Li, Q. Hu, S. Qi, Y. Qi, D. Wu, Y. Lin, and J. S. Dong. Stochastic ghost batch for self-distillation with dynamic soft label. Knowledge-Based Systems, 241: 107936. 2022.
[11]
Y. Li, L. Song, X. Wu, R. He, and T. Tan. Learning a bi-level adversarial network with global and local perception for makeup-invariant face verification. Pattern Recognition, 90:99–108, 2019.
[12]
Z. Liu, P. Luo, X. Wang, and X. Tang. Deep learning face attributes in the wild. In Proceedings of the IEEE International Conference on Computer Vision, pages 3730–3738, 2015.
[13]
I. Loshchilov and F. Hutter. SGDR: stochastic gradient descent with warm restarts. In Proceedings of the International Conference on Learning Representations, 2017.
[14]
T. Ma, W. Tian, and Y. Xie. Multi-level knowledge distillation for low-resolution object detection and facial expression recognition. Knowledge-Based Systems, 240:108136, 2022.
[15]
U. Mahbub, S. Sarkar, and R. Chellappa. Segment-based methods for facial attribute detection from partial faces. IEEE Transactions on Affective Computing, 11(4):601–613, 2020.
[16]
L. Mao, Y. Yan, J.-H. Xue, and H. Wang. Deep multi-task multi-label cnn for effective facial attribute classification. IEEE Transactions on Affective Computing, 13(2):818–828, 2022.
[17]
A. K. Sharma and H. Foroosh. Slim-cnn: A light-weight cnn for face attribute prediction. In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition, pages 329–335, 2020.
[18]
C. Shi, L. Fang, Z. Lv, and M. Zhao. Explainable scale distillation for hyperspectral image classification. Pattern Recognition, 122:108316, 2022.
[19]
H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jégou. Training data-efficient image transformers & distillation through attention. In Proceedings of International Conference on Machine Learning, pages 10347–10357. PMLR, 2021.
[20]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017.
[21]
Z. Wu, Q. Ke, J. Sun, and H.-Y. Shum. Scalable face image retrieval with identity-based quantization and multireference reranking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(10): 1991–2001. 2011.
[22]
S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1492–1500, 2017.
[23]
T.-B. Xu and C.-L. Liu. Data-distortion guided self-distillation for deep neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33. pages 5565–5572. 2019.
[24]
L. Yuan, Y. Chen, T. Wang, W. Yu, Y. Shi, Z. Jiang, F. E. H. Tay, J. Feng, and S. Yan. Tokens-to-token vit: Training vision transformers from scratch on imagenet. In Proceedings of the IEEE International Conference on Computer Vision, pages 538–547, 2021.
[25]
S. Zagoruyko and N. Komodakis. Wide residual networks. In British Machine Vision Conference 2016, 2016.
[26]
L. Zhang, J. Song, A. Gao, J. Chen, C. Bao, and K. Ma. Be your own teacher: Improve the performance of convolutional neural networks via self distillation. In Proceedings of the IEEE International Conference on Computer Vision, pages 3713–3722, 2019.
[27]
N. Zhang, M. Paluri, M. Ranzato, T. Darrell, and L. Bourdev. Panda: Pose aligned networks for deep attribute modeling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1637–1644, 2014.
[28]
J. Zhu, J. Liu, W. Li, J. Lai, X. He, L. Chen, and Z. Zheng. Ensembled ctr prediction via knowledge distillation. In Proceedings of the ACM International Conference on Information & Knowledge Management, pages 2941–2958, 2020.
[29]
N. Zhuang, Y. Yan, S. Chen, and H. Wang. Multi-task learning of cascaded cnn for facial attribute classification. In Proceedings of the IEEE International Conference on Pattern Recognition, pages 2069–2074, 2018.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)
Jan 2023
540 pages

Publisher

IEEE Press

Publication History

Published: 05 January 2023

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media