skip to main content
research-article

FISTNet: : FusIon of STyle-path generative Networks for facial style transfer

Published: 01 December 2024 Publication History

Abstract

With the surge in emerging technologies such as Metaverse, spatial computing, and generative AI, the application of facial style transfer has gained much interest from researchers and startups enthusiasts alike. StyleGAN methods have paved the way for transfer-learning strategies that could reduce the dependency on the vast data available for the training process. However, StyleGAN methods tend to need to be more balanced, resulting in the introduction of artifacts in the facial images. Studies such as DualStyleGAN proposed multipath networks but required the networks to be trained for a specific style rather than simultaneously generating a fusion of facial styles. In this paper, we propose a Fusion of STyles (FIST) network for facial images that leverages pretrained multipath style transfer networks to eliminate the problem associated with the lack of enormous data volume in the training phase and the fusion of multiple styles at the output. We leverage pretrained styleGAN networks with an external style pass that uses a residual modulation block instead of a transform coding block. The method also preserves facial structure, identity, and details via the gated mapping unit introduced in this study. The aforementioned components enable us to train the network with minimal data while generating high-quality stylized images, opening up new possibilities for facial style transfer in emerging technologies. Our training process adapts curriculum learning strategy to perform efficient, flexible style, and model fusion in the generative space. We perform extensive experiments to show the superiority of the proposed FISTNet compared to existing state-of-the-art methods.

Highlights

A novel facial style transfer method for preserving facial structure is proposed.
The extrinsic and intrinsic style transformers are trained in hierarchical fashion.
Fusion of pre-trained style transfer networks to generate diversified facial styles
We report state-of-the-art results for facial style transfer on public datasets.

References

[1]
Zawish M., Dharejo F.A., Khowaja S.A., Raza S., Davy S., Dev K., Bellavista P., AI and 6G into the metaverse: Fundamentals, challenges and future research trends, IEEE Open J. Commun. Soc. 5 (2024) 730–778,.
[2]
Chen J., Liu G., Chen X., AnimeGAN: A novel lightweight GAN for photo animation, in: Artificial Intelligence Algorithms and Applications, Springer, Singapore, 2020, pp. 242–256.
[3]
Shi Y., Deb D., Jain A.K., Warpgan: Automatic caricature generation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2019, pp. 10762–10771.
[4]
Su H., Niu J., Liu X., Li Q., Cui J., Wan J., Mangagan: Unpaired photo-to-manga translation based on the methodology of manga drawing, vol. 35 (3) (2021) 2611–2619,.
[5]
Yi R., Liu Y.J., Lai Y.K., Rosin P.L., Unpaired portrait drawing generation via asymmetric cycle mapping, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2020, pp. 8217–8225.
[6]
Chen Y., Lai Y.K., Liu Y.J., Cartoongan: Generative adversarial networks for photo cartoonization, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2018, pp. 9465–9474.
[7]
Wang X., Yu J., Learning to cartoonize using white-box cartoon representations, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2020, pp. 8090–8099.
[8]
Jang W., Ju G., Jung Y., Yang J., Tong X., Lee S., StyleCariGAN: caricature generation via stylegan feature map modulation, ACM Trans. Graph. 40 (4) (2021) 1–16,.
[9]
Karras T., Laine S., Aila T., A style-based generator architecture for generative adversarial networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2019, pp. 4401–4410.
[10]
Khowaja S.A., Almakdi S., Memon M.A., Khuwaja P., Sulaiman A., Alqahtani A., Shaikh A., Alghamdi A., Extending user control for image stylization using hierarchical style transfer networks, Heliyon 10 (5) (2024).
[11]
J. Kim, M. Kim, H. Kang, K.H. Lee, U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation, in: International Conference on Learning Representations, 2020, pp. 1–19.
[12]
Shu Y., Yi R., Xia M., Ye Z., Zhao W., Chen Y., Lai Y.K., Liu Y.J., GAN-based multi-style photo cartoonization, IEEE Trans. Vis. Comput. Graphics 28 (10) (2022),.
[13]
Ruiz N., Li Y., Jampani V., Pritch Y., Rubinstein M., Aberman K., DreamBooth: Fine tuning text-to-image diffusion models for subject-driven generation, in: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2023, pp. 22500–22510,.
[14]
R. Gal, Y. Alaluf, Y. Atzmon, O. Patashnik, A.H. Bermani, G. Chechik, D. Cohen-or, An Image is Worth One Word: Personalizing Text-to-Image Generation Using Textual Inversion, in: International Conference on Learning Representations, ICLR, 2023, pp. 1–14.
[15]
Yang S., Jiang L., Liu Z., Loy C.C., VToonify: Controllable high-resolution portrait video style transfer, ACM Trans. Graph. 41 (6) (2022) 1–15,.
[16]
S. Yang, L. Jiang, Z. Liu, C.C. Loy, Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 7693–7702.
[17]
D. Liu, M. Fisher, A. Hertzmann, E. Kalogerakis, Neural Strokes: Stylized Line Drawing of 3D Shapes, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 14204–14213.
[18]
Wang Z., Qiu S., Feng N., Rushmeier H., McMillan L., Dorsey J., Tracing versus freehand for evaluating computer-generated drawings, ACM Trans. Graph. 40 (4) (2021) 1–12.
[19]
K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, in: International Conference on Learning Representations, 2015, pp. 1–14.
[20]
Gatys L.A., Ecker A.S., Bethge M., Image style transfer using convolutional neural networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2016, pp. 2414–2423.
[21]
Li C., Wand M., Combining Markov random fields and convolutional neural networks for image synthesis, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2016, pp. 2479–2486.
[22]
Liao J., Yao Y., Yuan L., Hua G., Kang S.B., Visual attribute transfer through deep image analogy, ACM Trans. Graph. 36 (4) (2017) 1–15.
[23]
Chen M., Dai H., Wei S., Hu Z., Linear-ResNet GAN-based anime style transfer of face images, Signal, Image Video Process. 17 (2023) 3237–3245,.
[24]
Goodfellow I., Pouget-Abadie J., Mirza M., Xu B., Warde-Farley D., Ozair S., Courville A., Bengio Y., Generative adversarial networks, Commun. ACM 63 (11) (2020) 139–144.
[25]
Isola P., Zhu J.Y., Zhou T., Efros A.A., Image-to-image translation with conditional adversarial networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2017, pp. 1125–1134.
[26]
Zhu J.Y., Park T., Isola P., Efros A.A., Unpaired image-to-image translation using cycle-consistent adversarial networks, in: Proceedings of the IEEE Conference on Computer Vision, IEEE, 2017, pp. 2223–2232.
[27]
W. Cho, S. Choi, D.K. Park, I. Shin, J. Choo, Image-to-Image Translation via group-wise deep whitening-and-coloring transformations, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 10639–10647.
[28]
Liu M.Y., Breuel T., Kautz J., Unsupervised image-to-image translation networks, in: Advances in Neural Information Processing Systems, 2017, pp. 1–9.
[29]
Shao X., Zhang W., SPatchGAN: A statistical feature based discriminator for unsupervised image-to-image translation, in: Proceedings of the IEEE Conference on Computer Vision, IEEE, 2021, pp. 6546–6555.
[30]
Zhao Y., Wu R., Dong H., Unpaired imageto-image translation using adversarial consistency loss, in: Proceedings of the European Conference on Computer Vision, Springer, 2020, pp. 800–815.
[31]
Li B., Zhu Y., Wang Y., Lin C.W., Ghanem B., Shen L., AniGAN: Style-guided generative adversarial networks for unsupervised anime face generation, IEEE Trans. Multimed. 24 (2021) 4077–4091.
[32]
Chong M.J., Forsyth D., GANs N’Roses: Stable, controllable, diverse image to image translation, 2021, pp. 1–9. arXiv preprint arXiv:2106.06561.
[33]
Olivier N., Baert K., Danieau F., Multon F., Avril Q., FaceTuneGAN: Face autoencoder for convolutional expression transfer using generative adversarial networks, Comput. Graph. 110 (2023) 69–85,.
[34]
Liu Y., Li Q., Deng Q., Sun Z., Yang M.H., GAN-based facial attribute manipulation, IEEE Trans. Pattern Anal. Mach. Intell. 45 (12) (2023) 14590–14610,.
[35]
Melnik A., Miasayedzenkau M., Makaravets D., Pirshtuk D., Akbulut E., Holzmann D., Renusch T., Reichert G., Ritter H., Face generation and editing with stylegan: A survey, IEEE Trans. Pattern Anal. Mach. Intell. (2024) 1–21,. Early Access.
[36]
Pinkey J.M., Adler D., Resolution dependent GAN interpolation for controllable image synthesis between domains, 2020, pp. 1–7. arXiv preprint arXiv:2010.05334.
[37]
Ojha U., Li Y., Lu J., Efros A.A., Fewshot image generation via cross-domain correspondence, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2021, pp. 10743–10752.
[38]
Song G., Luo L., Liu J., Ma W.C., Lai C., Zheng C., Cham T.J., Agilegan: stylizing portraits by inversion-consistent transfer learning, ACM Trans. Graph. 40 (4) (2021) 1–13.
[39]
Richardson E., Alaluf Y., Patashnik O., Nitzam Y., Azar Y., Shapiro S., Cohen-Or D., Encoding in style: a stylegan encoder for image-to-image translation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2021, pp. 2287–2296.
[40]
Kwong S., Huang J., Liao J., Unsupervised image-to-image translation via pre-trained stylegan2 network, IEEE Trans. Multimed. 24 (2021),.
[41]
T. Karras, M. Aittala, S. Laine, E. Härkönen, J. Hellsten, J. Lehtinen, T. Aila, Alias-Free Generative Adversarial Networks, in: Proceedings of the Advances in Neural Information Processing Systems, 2021, pp. 852–863.
[42]
Z. Liu, M. Li, Y. Zhang, C. Wang, Q. Zhang, J. Wang, Y. Nie, Fine-Grained Face Swaping via Regional GAN Inversion, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2023, pp. 8578–8587.
[43]
Y. Lan, X. Meng, S. Yang, C.C. Loy, B. Dai, Self-Supervised Geometry-Aware Encoder for Style-Based 3D GAN Inversion, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2023, pp. 20940–20949.
[44]
Zheng X., Yang X., Zhao Q., Zhang H., He X., Zhang J., Zhang X., CFA-gan: Cross fusion attention and frequency loss for image style transfer, Displays 81 (2024),.
[45]
Ren Z., Li J., Wu L., Xue X., Li X., Yang F., Jiao Z., Gao X., Brain-driven facial image reconstruction via stylegan inversion with improved identity consistency, Pattern Recognit. 150 (2024),.
[46]
Peng T., Li M., Chen F., Xu Y., Xie Y., Sun Y., Zhang D., ISFB-GAN: Interpretable semantic face beautification with generative adversarial networks, Expert Syst. Appl. 236 (2024),.
[47]
Patashnik O., Wu Z., Shechtman E., Cohen-Or D., Lischinski D., StyleCLIP: Text-driven manipulation of StyleGAN imagery, in: 2021 IEEE/CVF International Conference on Computer Vision, ICCV, 2021, pp. 2065–2074,.
[48]
Hang T., Yang H., Liu B., Fu J., Geng X., Guo B., Language-guided face animation by recurrent StyleGAN-based generator, IEEE Trans. Multimed. 25 (2023) 9216–9227,.
[49]
Xu C., Xu Y., Zhang H., Xu X., He S., DreamAnime: Learning style-identity textual disentanglement for anime and beyond, IEEE Trans. Vis. Comput. Graphics (2024) 1–12,.
[50]
Y. Shen, B. Zhou, Closed-form factorization of latent semantics in gans, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 1532–1540.
[51]
Y. Shi, D. Agarwal, A.K. Jain, Lifting 2D StyleGAN for 3D Aware Face Generation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 6258–6266.
[52]
C. Yu, J. Wang, C. Peng, C. Gao, G. Yu, N. Sang, BiseNet: Bilateral Segmentation Network for Real-time Semantic Segmentation, in: Proceedings of the European Conference on Computer Vision, 2018, pp. 325–341.
[53]
Y. Men, Y. Yao, M. Cui, Z. Lian, X. Xie, X.S. Hua, Unpaired Cartoon Image Synthesis via Gated Cycle Mapping, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 3501–3510.
[54]
K. He, X. Zhang, S. Ren, J. Sun, Deep Residual Learning for Image Recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
[55]
X. Huang, S. Belongie, Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization, in: Proceedings of the IEEE Conference on Computer Vision, 2017, pp. 1501–1510.
[56]
Y. Bengio, J. Louradour, R. Collobert, J. Weston, Curriuclum Learning, in: International Conference on Machine Learning, 2009, pp. 41–48.
[57]
R. Mechrez, I. Talmi, L. Zelnik-Manor, The Contextual Loss for Image Transformation with Non-Aligned Data, in: Proceedings of the European Conference on Computer Vision, 2018, pp. 768–783.
[58]
Z. Liu, P. Luo, X. Wang, X. Tang, Deep Learning Face Attributes in the Wild, in: Proceedings of the International Conference on Computer Vision, 2015, pp. 3730–3738.
[59]
M.J. Chong, D. Forsyth, JojoGAN: One Shot Face Stylization, in: Proceedings of the European Conference on Computer Vision, 2022, pp. 128–152.
[60]
Y. Choi, Y. Uh, J. Yoo, J.W. Ha, Stargan v2: Diverse image synthesis for multiple domains, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 8188–8197.
[61]
Heusel M., Ramsauer H., Unterthiner T., Nessler B., Hochreiter S., GANs trained by a two time-scale update rule converge to a local nash equilibrium, Advances in Neural Information Processing Systems, 2017, pp. 1–12.
[62]
U. Ojha, Y. Li, J. Lu, A. A.Efros, Y.J. Lee, E. Shechtman, R. Zhang, Few-shot Image Generation via Cross Domain Correspondence, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 10743–10752.
[63]
Liu M., Li Q., Qin Z., Zhang G., Wan P., Zheng W., BlendGAN: Implicitly GAN blending for arbitrary stylized face generation, Advances in Neural Information Processing Systems, 2021, pp. 29710–29722.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Information Fusion
Information Fusion  Volume 112, Issue C
Dec 2024
818 pages

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 December 2024

Author Tags

  1. StyleGAN
  2. Style of fusions
  3. GANs
  4. Face style transfer
  5. Style transfer networks

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media