skip to main content
research-article

Blind face restoration: : Benchmark datasets and a baseline model

Published: 17 April 2024 Publication History

Abstract

Blind Face Restoration (BFR) aims to generate high-quality face images from low-quality inputs. However, existing BFR methods often use private datasets for training and evaluation, making it challenging for future approaches to compare fairly. To address this issue, we introduce two benchmark datasets, BFRBD128 and BFRBD512, for evaluating state-of-the-art methods in five scenarios: blur, noise, low resolution, JPEG compression artifacts, and full degradation. We use seven standard quantitative metrics and two task-specific metrics, AFLD and AFICS. Additionally, we propose an efficient baseline model called Swin Transformer U-Net (STUNet), which outperforms state-of-the-art methods in various BFR tasks. The codes, datasets, and trained models are publicly available at: https://github.com/bitzpy/Blind-Face-Restoration-Benchmark-Datasets-and-a-Baseline-Model.

References

[1]
Truong T.-D., Duong C.N., Quach K.G., Le N., Bui T.D., Luu K., LIAAD: Lightweight attentive angular distillation for large-scale age-invariant face recognition, Neurocomputing 543 (2023).
[2]
Cheema U., Moon S., Disguised heterogeneous face recognition using deep neighborhood difference relational network, Neurocomputing 519 (2023) 44–56.
[3]
Bai Y., Liu M., Yao C., Lin C., Zhao Y., MSPNet: Multi-stage progressive network for image denoising, Neurocomputing 517 (2023) 71–80.
[4]
Fu Z., Zheng Y., Ma T., Ye H., Yang J., He L., Edge-aware deep image deblurring, Neurocomputing 502 (2022) 37–47.
[5]
Wu Y., Cao H., Yang G., Lu T., Wan S., Digital twin of intelligent small surface defect detection with cyber-manufacturing systems, ACM Trans. Internet Technol. (2022).
[6]
Wu Y., Zhang L., Gu Z., Lu H., Wan S., Edge-AI-driven framework with efficient mobile network design for facial expression recognition, ACM Trans. Embed. Comput. Syst. 22 (3) (2023) 1–17.
[7]
Wu Y., Kong Q., Zhang L., Castiglione A., Nappi M., Wan S., Cdt-cad: Context-aware deformable transformers for end-to-end chest abnormality detection on x-ray images, IEEE/ACM Trans. Comput. Biol. Bioinform. (2023).
[8]
Z. Shen, W.-S. Lai, T. Xu, J. Kautz, M.-H. Yang, Deep semantic face deblurring, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8260–8269.
[9]
Yue Z., Yong H., Zhao Q., Meng D., Zhang L., Variational denoising network: Toward blind noise modeling and removal, Adv. Neural Inf. Process. Syst. 32 (2019).
[10]
S. Anwar, N. Barnes, Real image denoising with feature attention, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 3155–3164.
[11]
Dong C., Loy C.C., He K., Tang X., Image super-resolution using deep convolutional networks, IEEE Trans. Pattern Anal. Mach. Intell. 38 (2) (2015) 295–307.
[12]
Y. Chen, Y. Tai, X. Liu, C. Shen, J. Yang, Fsrnet: End-to-end learning face super-resolution with facial priors, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 2492–2501.
[13]
B. Lim, S. Son, H. Kim, S. Nah, K. Mu Lee, Enhanced deep residual networks for single image super-resolution, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017, pp. 136–144.
[14]
X. Wang, K. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, Y. Qiao, C. Change Loy, Esrgan: Enhanced super-resolution generative adversarial networks, in: Proceedings of the European Conference on Computer Vision (ECCV) Workshops, 2018.
[15]
Yang W., Zhang X., Tian Y., Wang W., Xue J.-H., Liao Q., Deep learning for single image super-resolution: A brief review, IEEE Trans. Multimed. 21 (12) (2019) 3106–3121.
[16]
C. Dong, Y. Deng, C.C. Loy, X. Tang, Compression artifacts reduction by a deep convolutional network, in: Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 576–584.
[17]
Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo, Swin transformer: Hierarchical vision transformer using shifted windows, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10012–10022.
[18]
Jesorsky O., Kirchberg K.J., Frischholz R.W., Robust face detection using the hausdorff distance, in: International Conference on Audio-and Video-Based Biometric Person Authentication, Springer, 2001, pp. 90–95.
[19]
G.B. Huang, M. Mattar, T. Berg, E. Learned-Miller, Labeled faces in the wild: A database forstudying face recognition in unconstrained environments, in: Workshop on Faces in’Real-Life’Images: Detection, Alignment, and Recognition, 2008.
[20]
Koestinger M., Wohlhart P., Roth P.M., Bischof H., Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization, in: 2011 IEEE International Conference on Computer Vision Workshops, ICCV Workshops, IEEE, 2011, pp. 2144–2151.
[21]
Le V., Brandt J., Lin Z., Bourdev L., Huang T.S., Interactive facial feature localization, in: European Conference on Computer Vision, Springer, 2012, pp. 679–692.
[22]
Yi D., Lei Z., Liao S., Li S.Z., Learning face representation from scratch, 2014, arXiv preprint arXiv:1411.7923.
[23]
Z. Liu, P. Luo, X. Wang, X. Tang, Deep learning face attributes in the wild, in: Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 3730–3738.
[24]
R. Rothe, R. Timofte, L.V. Gool, DEX: Deep EXpectation of apparent age from a single image, in: IEEE International Conference on Computer Vision Workshops, ICCVW, 2015.
[25]
Cao Q., Shen L., Xie W., Parkhi O.M., Zisserman A., Vggface2: A dataset for recognising faces across pose and age, in: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition, FG 2018, IEEE, 2018, pp. 67–74.
[26]
T. Karras, S. Laine, T. Aila, A style-based generator architecture for generative adversarial networks, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 4401–4410.
[27]
Sun Y., Chen Y., Wang X., Tang X., Deep learning face representation by joint identification-verification, Adv. Neural Inf. Process. Syst. 27 (2014).
[28]
Zhang K., Luo W., Zhong Y., Ma L., Liu W., Li H., Adversarial spatio-temporal learning for video deblurring, IEEE Trans. Image Process. 28 (1) (2018) 291–301.
[29]
K. Zhang, W. Luo, Y. Zhong, L. Ma, B. Stenger, W. Liu, H. Li, Deblurring by realistic blurring, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 2737–2746.
[30]
O. Kupyn, V. Budzan, M. Mykhailych, D. Mishkin, J. Matas, Deblurgan: Blind motion deblurring using conditional adversarial networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8183–8192.
[31]
O. Kupyn, T. Martyniuk, J. Wu, Z. Wang, Deblurgan-v2: Deblurring (orders-of-magnitude) faster and better, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 8878–8887.
[32]
Jiang X., Yao H., Zhao S., Text image deblurring via two-tone prior, Neurocomputing 242 (2017) 1–14.
[33]
Zhang K., Zuo W., Chen Y., Meng D., Zhang L., Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising, IEEE Trans. Image Process. 26 (7) (2017) 3142–3155.
[34]
Zhang K., Zuo W., Zhang L., FFDNet: Toward a fast and flexible solution for CNN-based image denoising, IEEE Trans. Image Process. 27 (9) (2018) 4608–4622.
[35]
Liu P., Zhang H., Wang J., Wang Y., Ren D., Zuo W., Robust deep ensemble method for real-world image denoising, Neurocomputing 512 (2022) 1–14.
[36]
Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, Y. Fu, Image super-resolution using very deep residual channel attention networks, in: Proceedings of the European Conference on Computer Vision, ECCV, 2018, pp. 286–301.
[37]
C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, et al., Photo-realistic single image super-resolution using a generative adversarial network, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 4681–4690.
[38]
M.S. Sajjadi, B. Scholkopf, M. Hirsch, Enhancenet: Single image super-resolution through automated texture synthesis, in: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 4491–4500.
[39]
K. Zhang, D. Li, W. Luo, W. Ren, B. Stenger, W. Liu, H. Li, M.-H. Yang, Benchmarking Ultra-High-Definition Image Super-resolution, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 14769–14778.
[40]
Wu Y., Cao R., Hu Y., Wang J., Li K., Combining global receptive field and spatial spectral information for single-image hyperspectral super-resolution, Neurocomputing 542 (2023).
[41]
Lee B., Ko K., Hong J., Ko H., Domain-agnostic single-image super-resolution via a meta-transfer neural architecture search, Neurocomputing 524 (2023) 59–68.
[42]
X. Fu, Z.-J. Zha, F. Wu, X. Ding, J. Paisley, Jpeg artifacts reduction via deep convolutional sparse coding, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 2501–2510.
[43]
Q. Cao, L. Lin, Y. Shi, X. Liang, G. Li, Attention-aware face hallucination via deep reinforcement learning, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 690–698.
[44]
Kim D., Kim M., Kwon G., Kim D.-S., Progressive face super-resolution via attention to facial landmark, 2019, arXiv preprint arXiv:1908.08239.
[45]
S. Menon, A. Damian, S. Hu, N. Ravi, C. Rudin, Pulse: Self-supervised photo upsampling via latent space exploration of generative models, in: Proceedings of the Ieee/Cvf Conference on Computer Vision and Pattern Recognition, 2020, pp. 2437–2445.
[46]
H. Huang, R. He, Z. Sun, T. Tan, Wavelet-srnet: A wavelet-based cnn for multi-scale face super resolution, in: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 1689–1697.
[47]
Zhang K., Li D., Luo W., Liu J., Deng J., Liu W., Zafeiriou S., EDFace-Celeb-1M: Benchmarking face hallucination with a million-scale dataset, 2021, arXiv preprint arXiv:2110.05031.
[48]
X. Yu, F. Porikli, Hallucinating very low-resolution unaligned and noisy face images by transformative discriminative autoencoders, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 3760–3768.
[49]
X. Yu, B. Fernando, R. Hartley, F. Porikli, Super-resolving very low-resolution face images with supplementary attributes, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 908–917.
[50]
X. Li, M. Liu, Y. Ye, W. Zuo, L. Lin, R. Yang, Learning warped guidance for blind face restoration, in: Proceedings of the European Conference on Computer Vision, ECCV, 2018, pp. 272–289.
[51]
Li X., Chen C., Zhou S., Lin X., Zuo W., Zhang L., Blind face restoration via deep multi-scale component dictionaries, in: European Conference on Computer Vision, Springer, 2020, pp. 399–415.
[52]
L. Yang, S. Wang, S. Ma, W. Gao, C. Liu, P. Wang, P. Ren, Hifacegan: Face renovation via collaborative suppression and replenishment, in: Proceedings of the 28th ACM International Conference on Multimedia, 2020, pp. 1551–1560.
[53]
T. Yang, P. Ren, X. Xie, L. Zhang, GAN prior embedded network for blind face restoration in the wild, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 672–681.
[54]
C. Chen, X. Li, L. Yang, X. Lin, L. Zhang, K.-Y.K. Wong, Progressive semantic-aware style transformation for blind face restoration, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 11896–11905.
[55]
X. Wang, Y. Li, H. Zhang, Y. Shan, Towards real-world blind face restoration with generative facial prior, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 9168–9178.
[56]
Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., Kaiser L.u., Polosukhin I., Attention is all you need, in: Guyon I., Luxburg U.V., Bengio S., Wallach H., Fergus R., Vishwanathan S., Garnett R. (Eds.), Advances in Neural Information Processing Systems, Vol. 30, Curran Associates, Inc., 2017, [Online]. Available: https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.
[57]
Parmar N., Vaswani A., Uszkoreit J., Kaiser L., Shazeer N., Ku A., Tran D., Image transformer, in: International Conference on Machine Learning, PMLR, 2018, pp. 4055–4064.
[58]
H. Hu, Z. Zhang, Z. Xie, S. Lin, Local relation networks for image recognition, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 3464–3473.
[59]
Ramachandran P., Parmar N., Vaswani A., Bello I., Levskaya A., Shlens J., Stand-alone self-attention in vision models, Adv. Neural Inf. Process. Syst. 32 (2019).
[60]
H. Zhao, J. Jia, V. Koltun, Exploring self-attention for image recognition, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 10076–10085.
[61]
Child R., Gray S., Radford A., Sutskever I., Generating long sequences with sparse transformers, 2019, arXiv preprint arXiv:1904.10509.
[62]
Weissenborn D., Täckström O., Uszkoreit J., Scaling autoregressive video models, 2019, arXiv preprint arXiv:1906.02634.
[63]
Ho J., Kalchbrenner N., Weissenborn D., Salimans T., Axial attention in multidimensional transformers, 2019, arXiv preprint arXiv:1912.12180.
[64]
Wang H., Zhu Y., Green B., Adam H., Yuille A., Chen L.-C., Axial-deeplab: Stand-alone axial-attention for panoptic segmentation, in: European Conference on Computer Vision, Springer, 2020, pp. 108–126.
[65]
Cordonnier J.-B., Loukas A., Jaggi M., On the relationship between self-attention and convolutional layers, 2019, arXiv preprint arXiv:1911.03584.
[66]
Dosovitskiy A., Beyer L., Kolesnikov A., Weissenborn D., Zhai X., Unterthiner T., Dehghani M., Minderer M., Heigold G., Gelly S., et al., An image is worth 16x16 words: Transformers for image recognition at scale, 2020, arXiv preprint arXiv:2010.11929.
[67]
Touvron H., Cord M., Douze M., Massa F., Sablayrolles A., Jégou H., Training data-efficient image transformers & distillation through attention, in: International Conference on Machine Learning, PMLR, 2021, pp. 10347–10357.
[68]
Cao J., Li Y., Zhang K., Van Gool L., Video super-resolution transformer, 2021, arXiv preprint arXiv:2106.06847.
[69]
H. Chen, Y. Wang, T. Guo, C. Xu, Y. Deng, Z. Liu, S. Ma, C. Xu, C. Xu, W. Gao, Pre-trained image processing transformer, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 12299–12310.
[70]
Wang Z., Cun X., Bao J., Liu J., Uformer: A general u-shaped transformer for image restoration, 2021, arXiv preprint arXiv:2106.03106.
[71]
J. Liang, J. Cao, G. Sun, K. Zhang, L. Van Gool, R. Timofte, Swinir: Image restoration using swin transformer, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 1833–1844.
[72]
Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., Kaiser Ł., Polosukhin I., Attention is all you need, Adv. Neural Inf. Process. Syst. 30 (2017).
[73]
Zhou S., Chan K., Li C., Loy C.C., Towards robust blind face restoration with codebook lookup transformer, Adv. Neural Inf. Process. Syst. 35 (2022) 30599–30611.
[74]
Yue Z., Loy C.C., DifFace: Blind face restoration with diffused error contraction, 2022, arXiv preprint arXiv:2212.06512.
[75]
Wang Z., Bovik A.C., Sheikh H.R., Simoncelli E.P., Image quality assessment: from error visibility to structural similarity, IEEE Trans. Image Process. 13 (4) (2004) 600–612.
[76]
Wang Z., Simoncelli E.P., Bovik A.C., Multiscale structural similarity for image quality assessment, in: The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, Vol. 2, Ieee, 2003, pp. 1398–1402.
[77]
R. Zhang, P. Isola, A.A. Efros, E. Shechtman, O. Wang, The unreasonable effectiveness of deep features as a perceptual metric, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 586–595.
[78]
Mittal A., Soundararajan R., Bovik A.C., Making a “completely blind” image quality analyzer, IEEE Signal Process. Lett. 20 (3) (2012) 209–212.
[79]
J. Ke, Q. Wang, Y. Wang, P. Milanfar, F. Yang, Musiq: Multi-scale image quality transformer, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 5148–5157.
[80]
J. Wang, K.C. Chan, C.C. Loy, Exploring clip for assessing the look and feel of images, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37, No. 2, 2023, pp. 2555–2563.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Neurocomputing
Neurocomputing  Volume 574, Issue C
Mar 2024
367 pages

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 17 April 2024

Author Tags

  1. Blind face restoration
  2. Benchmark datasets
  3. Comprehensive evaluation
  4. Transformer network

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media