research-article

Attention-driven Text-guided Image Manipulation

Authors:

Zhiqiang Zhang,

Wenxin YuAuthors Info & Claims

IAIT '23: Proceedings of the 13th International Conference on Advances in Information Technology

Article No.: 33, Pages 1 - 5

https://doi.org/10.1145/3628454.3631665

Published: 06 December 2023 Publication History

Abstract

The main content of Text-guided Image Manipulation (TGIM) research is the use of textual information to modify the corresponding content in the input image. Based on generative adversarial networks (GAN), this research has achieved impressive manipulation performance. Nevertheless, the quality of image manipulation still needs to be further improved. In this paper, an attention-driven TGIM method is proposed to further improve the quality of image manipulation. Specifically, the proposed method uses an attention mechanism to fine-tune the whole process of image manipulation at the word level. Through attentional fine-tuning, the quality of image manipulation can be continuously improved to realize high-quality image manipulation effects. The proposed method is experimentally validated on a public Caltech-UCSD birds-200-2011 (CUB) dataset, and the qualitative and quantitative comparison results demonstrate the superior performance of the proposed method on TGIM. Compared to the existing TGIM methods, the proposed method improves the Inception Score (IS) by 22.6% and reduces Fréchet Inception Distance (FID) by 13.4%.

References

[1]

Wah C, Branson S, Welinder P, Perona P, and Belongie S. 2011. The Caltech-UCSD Birds-200-2011 Dataset. Technical Report.

[2]

Hao Dong, Simiao Yu, Chao Wu, and Yike Guo. 2017. Semantic Image Synthesis via Adversarial Learning. In ICCV. 5707–5715.

[3]

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In NIPS. 6626–6637.

[4]

Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR.

[5]

Bowen Li, Xiaojuan Qi, Thomas Lukasiewicz, and Philip H. S. Torr. 2019. Controllable Text-to-Image Generation. In NeurIPS. 2063–2073.

[6]

Bowen Li, Xiaojuan Qi, Thomas Lukasiewicz, and Philip H. S. Torr. 2020. ManiGAN: Text-Guided Image Manipulation. In CVPR. 7877–7886.

[7]

Bowen Li, Xiaojuan Qi, Philip H. S. Torr, and Thomas Lukasiewicz. 2020. Lightweight Generative Adversarial Networks for Text-Guided Image Manipulation. In NeurIPS.

[8]

Seonghyeon Nam, Yunji Kim, and Seon Joo Kim. 2018. Text-Adaptive Generative Adversarial Networks: Manipulating Images with Natural Language. In NeurIPS. 42–51.

[9]

Tingting Qiao, Jing Zhang, Duanqing Xu, and Dacheng Tao. 2019. MirrorGAN: Learning Text-To-Image Generation by Redescription. In CVPR. 1505–1514.

[10]

[10] Scott E. Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, and Honglak Lee. [n. d.].

[11]

Tim Salimans, Ian J. Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. 2016. Improved Techniques for Training GANs. In NIPS. 2226–2234.

[12]

Mike Schuster and Kuldip K. Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45, 11 (1997), 2673–2681.

Digital Library

[13]

Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. 2016. Rethinking the Inception Architecture for Computer Vision. In CVPR. 2818–2826.

[14]

Tao Xu, Pengchuan Zhang, Qiuyuan Huang, Han Zhang, Zhe Gan, Xiaolei Huang, and Xiaodong He. 2018. AttnGAN: Fine-Grained Text to Image Generation With Attentional Generative Adversarial Networks. In CVPR. 1316–1324.

[15]

Han Zhang, Tao Xu, and Hongsheng Li. 2017. StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks. In ICCV. 5908–5916.

[16]

Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, and Dimitris N. Metaxas. 2019. StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks. IEEE Trans. Pattern Anal. Mach. Intell. 41, 8 (2019), 1947–1962.

[17]

Zhiqiang Zhang, Jinjia Zhou, Wenxin Yu, and Ning Jiang. 2021. Drawgan: Text to Image Synthesis with Drawing Generative Adversarial Networks. In ICASSP. 4195–4199.

[18]

Minfeng Zhu, Pingbo Pan, Wei Chen, and Yi Yang. 2019. DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-To-Image Synthesis. In CVPR. 5802–5810.

Index Terms

Attention-driven Text-guided Image Manipulation
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Computer graphics

Recommendations

StyleGAN-based CLIP-guided Image Shape Manipulation
CBMI '22: Proceedings of the 19th International Conference on Content-based Multimedia Indexing

In this paper, we propose a text-guided image manipulation method which focuses on editing shape attribute using text description. We combine an image generation model, StyleGAN2, and image-text matching model, CLIP, and we have achieved the goal of ...
Dilated Residual Aggregation Network for Text-Guided Image Manipulation
Artificial Neural Networks and Machine Learning – ICANN 2021
Abstract
Text-guided image manipulation aims to modify the visual attributes of images according to textual descriptions. Existing works either mismatch between generated images and textual descriptions or may pollute the text-irrelevant image regions. In ...
A Novel Attention-DeblurGAN-Based Defogging Algorithm
Image and Graphics
Abstract
With the rapid development of machine learning, deep learning-based image defogging algorithms are receiving more and more attention from scholars compared to traditional image defogging method. A novel method for synthesizing high-quality haze ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

IAIT '23: Proceedings of the 13th International Conference on Advances in Information Technology

December 2023

303 pages

ISBN:9798400708497

DOI:10.1145/3628454

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 December 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

IAIT 2023

IAIT 2023: 13th International Conference on Advances in Information Technology

December 6 - 9, 2023

Bangkok, Thailand

Acceptance Rates

Overall Acceptance Rate 20 of 47 submissions, 43%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
25
Total Downloads

Downloads (Last 12 months)20
Downloads (Last 6 weeks)2

Reflects downloads up to 14 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents