CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion

Zheng, Wendi; Teng, Jiayan; Yang, Zhuoyi; Wang, Weihan; Chen, Jidong; Gu, Xiaotao; Dong, Yuxiao; Ding, Ming; Tang, Jie

Computer Science > Computer Vision and Pattern Recognition

arXiv:2403.05121 (cs)

[Submitted on 8 Mar 2024]

Title:CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion

Authors:Wendi Zheng, Jiayan Teng, Zhuoyi Yang, Weihan Wang, Jidong Chen, Xiaotao Gu, Yuxiao Dong, Ming Ding, Jie Tang

View PDF HTML (experimental)

Abstract:Recent advancements in text-to-image generative systems have been largely driven by diffusion models. However, single-stage text-to-image diffusion models still face challenges, in terms of computational efficiency and the refinement of image details. To tackle the issue, we propose CogView3, an innovative cascaded framework that enhances the performance of text-to-image diffusion. CogView3 is the first model implementing relay diffusion in the realm of text-to-image generation, executing the task by first creating low-resolution images and subsequently applying relay-based super-resolution. This methodology not only results in competitive text-to-image outputs but also greatly reduces both training and inference costs. Our experimental results demonstrate that CogView3 outperforms SDXL, the current state-of-the-art open-source text-to-image diffusion model, by 77.0\% in human evaluations, all while requiring only about 1/2 of the inference time. The distilled variant of CogView3 achieves comparable performance while only utilizing 1/10 of the inference time by SDXL.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2403.05121 [cs.CV]
	(or arXiv:2403.05121v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2403.05121

Submission history

From: Wendi Zheng [view email]
[v1] Fri, 8 Mar 2024 07:32:50 UTC (20,617 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators