Beyond Color and Lines: Zero-Shot Style-Specific Image Variations with Coordinated Semantics

Hu, Jinghao; Zhang, Yuhe; Geng, GuoHua; Yang, Liuyuxin; Yan, JiaRui; Cheng, Jingtao; Zhang, YaDong; Li, Kang

Computer Science > Computer Vision and Pattern Recognition

arXiv:2410.18537 (cs)

[Submitted on 24 Oct 2024]

Title:Beyond Color and Lines: Zero-Shot Style-Specific Image Variations with Coordinated Semantics

Authors:Jinghao Hu, Yuhe Zhang, GuoHua Geng, Liuyuxin Yang, JiaRui Yan, Jingtao Cheng, YaDong Zhang, Kang Li

View PDF HTML (experimental)

Abstract:Traditionally, style has been primarily considered in terms of artistic elements such as colors, brushstrokes, and lighting. However, identical semantic subjects, like people, boats, and houses, can vary significantly across different artistic traditions, indicating that style also encompasses the underlying semantics. Therefore, in this study, we propose a zero-shot scheme for image variation with coordinated semantics. Specifically, our scheme transforms the image-to-image problem into an image-to-text-to-image problem. The image-to-text operation employs vision-language models e.g., BLIP) to generate text describing the content of the input image, including the objects and their positions. Subsequently, the input style keyword is elaborated into a detailed description of this style and then merged with the content text using the reasoning capabilities of ChatGPT. Finally, the text-to-image operation utilizes a Diffusion model to generate images based on the text prompt. To enable the Diffusion model to accommodate more styles, we propose a fine-tuning strategy that injects text and style constraints into cross-attention. This ensures that the output image exhibits similar semantics in the desired style. To validate the performance of the proposed scheme, we constructed a benchmark comprising images of various styles and scenes and introduced two novel metrics. Despite its simplicity, our scheme yields highly plausible results in a zero-shot manner, particularly for generating stylized images with high-fidelity semantics.

Comments:	13 pages,6 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
MSC classes:	68T07
Cite as:	arXiv:2410.18537 [cs.CV]
	(or arXiv:2410.18537v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2410.18537

Submission history

From: Jinghao Hu [view email]
[v1] Thu, 24 Oct 2024 08:34:57 UTC (19,963 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Beyond Color and Lines: Zero-Shot Style-Specific Image Variations with Coordinated Semantics

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Beyond Color and Lines: Zero-Shot Style-Specific Image Variations with Coordinated Semantics

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators