skip to main content
10.1145/3581783.3611731acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article
Open access

DiffBFR: Bootstrapping Diffusion Model for Blind Face Restoration

Published: 27 October 2023 Publication History

Abstract

Blind face restoration (BFR) is important while challenging. Prior works prefer to exploit GAN-based frameworks to tackle this task due to the balance of quality and efficiency. However, these methods suffer from poor stability and adaptability to long-tail distribution, failing to simultaneously retain source identity and restore detail. In this paper, we propose to introduce Diffusion Probabilistic Model (DPM) for BFR to tackle the above problem, given its superiority over GAN in aspects of avoiding training collapse and generating long-tail distribution. We name the proposed framework as DiffBFR. In particular, DiffBFR utilizes a two-step design, that first restores identity information from low-quality images and then enhances texture details according to the distribution of real faces. This design is implemented with two key components: 1) Identity Restoration Module (IRM) for preserving the face details in results. Instead of denoising from pure Gaussian random distribution with LQ images as the condition during the reverse process, we propose a novel truncated sampling method which starts from LQ images with part noise added. We theoretically prove that this change shrinks the evidence lower bound of DPM and then restores more original details. With theoretical proof, two cascade conditional DPMs with different input sizes are introduced to strengthen this sampling effect and reduce training difficulty in the high-resolution image generated directly. 2) Texture Enhancement Module (TEM) for polishing the texture of the image. Here an unconditional DPM, a LQ-free model, is introduced to further force the restorations to appear realistic. We theoretically proved that this unconditional DPM trained on pure HQ images contributes to justifying the correct distribution of inference images output from IRM in pixel-level space. Concretely, truncated sampling with fractional time step is utilized to polish pixel-level textures while preserving identity information. Our experiments demonstrated that the proposed DiffBFR achieves significantly superior results to state-of-the-art methods both quantitatively and qualitatively.

Supplemental Material

MP4 File
Presentation video about DiffBFR. This is a model used to solve the problem of blind face restoration. DiffBFR utilizes a two-step design, that first restores identity information from low-quality images and then enhances texture details according to the distribution of real faces.

References

[1]
Tomer Amit, Eliya Nachmani, Tal Shaharbany, and Lior Wolf. 2021. Segdiff: Image segmentation with diffusion probabilistic models. arXiv preprint arXiv:2112.00390 (2021).
[2]
Chaofeng Chen, Xiaoming Li, Lingbo Yang, Xianhui Lin, Lei Zhang, and Kwan-Yee K Wong. 2021. Progressive semantic-aware style transformation for blind face restoration. In CVPR. 11896--11905.
[3]
Li Deng. 2012. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine (2012).
[4]
Chao Dong, Yubin Deng, Chen Change Loy, and Xiaoou Tang. 2015. Compression artifacts reduction by a deep convolutional network. In ICCV. 576--584.
[5]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In NeurIPS.
[6]
Yuchao Gu, Xintao Wang, Liangbin Xie, Chao Dong, Gen Li, Ying Shan, and Ming-Ming Cheng. 2022. VQFR: Blind face restoration with vector-quantized dictionary and parallel decoder. In ECCV. Springer, 126--143.
[7]
Xizewen Han, Huangjie Zheng, and Mingyuan Zhou. 2022. Card: Classification and regression diffusion models. In NeurIPS. 18100--18115.
[8]
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In NeurIPS.
[9]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. In NeurIPS. 6840--6851.
[10]
Jonathan Ho, Chitwan Saharia, William Chan, David J Fleet, Mohammad Norouzi, and Tim Salimans. 2022. Cascaded diffusion models for high fidelity image generation. The Journal of Machine Learning Research, Vol. 23, 1 (2022), 2249--2281.
[11]
Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2018. Progressive growing of gans for improved quality, stability, and variation. In ICLR.
[12]
Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In CVPR. 4401--4410.
[13]
Orest Kupyn, Volodymyr Budzan, Mykola Mykhailych, Dmytro Mishkin, and Jivrí Matas. 2018. Deblurgan: Blind motion deblurring using conditional adversarial networks. In CVPR. 8183--8192.
[14]
Bonan Li, Zicheng Zhang, Xuecheng Nie, Congying Han, Yinhan Hu, and Tiande Guo. 2023. StyO: Stylize Your Face in Only One-Shot. arXiv preprint arXiv:2303.03231 (2023).
[15]
Xiaoming Li, Chaofeng Chen, Shangchen Zhou, Xianhui Lin, Wangmeng Zuo, and Lei Zhang. 2020. Blind Face Restoration via Deep Multi-scale Component Dictionaries. (2020), 399--415.
[16]
Xiaoming Li, Ming Liu, Yuting Ye, Wangmeng Zuo, Liang Lin, and Ruigang Yang. 2018. Learning Warped Guidance for Blind Face Restoration. (2018), 272--289.
[17]
Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon. 2021. SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations. In ICLR.
[18]
Sachit Menon, Alexandru Damian, Shijia Hu, Nikhil Ravi, and Cynthia Rudin. 2020. Pulse: Self-supervised photo upsampling via latent space exploration of generative models. In CVPR. 2437--2445.
[19]
Anish Mittal, Rajiv Soundararajan, and Alan C Bovik. 2012. Making a ?completely blind" image quality analyzer. IEEE Signal processing letters, Vol. 20, 3 (2012), 209--212.
[20]
Alexander Quinn Nichol and Prafulla Dhariwal. 2021. Improved denoising diffusion probabilistic models. In ICML. 8162--8171.
[21]
Alexander Quinn Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob Mcgrew, Ilya Sutskever, and Mark Chen. 2022. GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models. In ICML. 16784--16804.
[22]
Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Mildenhall. 2022. DreamFusion: Text-to-3D using 2D Diffusion. In ICLR.
[23]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In ICML. 8748--8763.
[24]
Chitwan Saharia, Jonathan Ho, William Chan, Tim Salimans, David J Fleet, and Mohammad Norouzi. 2022. Image super-resolution via iterative refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 45, 4 (2022), 4713--4726.
[25]
Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. 2015. Deep unsupervised learning using nonequilibrium thermodynamics. In ICML. 2256--2265.
[26]
Jiaming Song, Chenlin Meng, and Stefano Ermon. 2020. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502 (2020).
[27]
Yichun Tai, Hailin Shi, Dan Zeng, Hang Du, Yibo Hu, Zicheng Zhang, Zhijiang Zhang, and Tao Mei. 2023. Multi-Agent Semi-Siamese Training for Long-Tail and Shallow Face Learning. ACM Trans. Multimedia Comput. Commun. Appl., Vol. 19, 6, Article 196 (jul 2023), 20 pages. https://doi.org/10.1145/3594669
[28]
Arash Vahdat, Karsten Kreis, and Jan Kautz. 2021. Score-based generative modeling in latent space. In NeurIPS. 11287--11302.
[29]
Xintao Wang, Yu Li, Honglun Zhang, and Ying Shan. 2021. Towards Real-World Blind Face Restoration with Generative Facial Prior. (2021), 9168--9178.
[30]
Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, Vol. 13, 4 (2004), 600--612.
[31]
Tao Yang, Peiran Ren, Xuansong Xie, and Lei Zhang. 2021. GAN Prior Embedded Network for Blind Face Restoration in the Wild. In CVPR. 672--681.
[32]
Xin Yu, Basura Fernando, Bernard Ghanem, Fatih Porikli, and Richard Hartley. 2018. Face super-resolution guided by facial component heatmaps. In ECCV. 217--233.
[33]
Zongsheng Yue and Chen Change Loy. 2022. DifFace: Blind Face Restoration with Diffused Error Contraction. arXiv preprint arXiv:2212.06512.
[34]
Kai Zhang, Wangmeng Zuo, Yunjin Chen, Deyu Meng, and Lei Zhang. 2017. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE transactions on image processing, Vol. 26, 7 (2017), 3142--3155.
[35]
Kai Zhang, Wangmeng Zuo, and Lei Zhang. 2018c. Learning a single convolutional super-resolution network for multiple degradations. In CVPR. 3262--3271.
[36]
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018a. The unreasonable effectiveness of deep features as a perceptual metric. In CVPR. 586--595.
[37]
Yifan Zhang, Bingyi Kang, Bryan Hooi, Shuicheng Yan, and Jiashi Feng. 2023 a. Deep long-tailed learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023).
[38]
Yulun Zhang, Kunpeng Li, Kai Li, Lichen Wang, Bineng Zhong, and Yun Fu. 2018b. Image super-resolution using very deep residual channel attention networks. In ECCV. 286--301.
[39]
Zicheng Zhang, Bonan Li, Xuecheng Nie, Congying Han, Tiande Guo, and Luoqi Liu. 2023 b. Towards Consistent Video Editing with Text-to-Image Diffusion Models. arXiv preprint arXiv:2305.17431 (2023).

Cited By

View all

Index Terms

  1. DiffBFR: Bootstrapping Diffusion Model for Blind Face Restoration

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MM '23: Proceedings of the 31st ACM International Conference on Multimedia
      October 2023
      9913 pages
      ISBN:9798400701085
      DOI:10.1145/3581783
      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 27 October 2023

      Check for updates

      Author Tags

      1. blind face restoration
      2. diffusion probabilistic models

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      MM '23
      Sponsor:
      MM '23: The 31st ACM International Conference on Multimedia
      October 29 - November 3, 2023
      Ottawa ON, Canada

      Acceptance Rates

      Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)427
      • Downloads (Last 6 weeks)43
      Reflects downloads up to 04 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media