Curriculum Direct Preference Optimization for Diffusion and Consistency Models

Croitoru, Florinel-Alin; Hondru, Vlad; Ionescu, Radu Tudor; Sebe, Nicu; Shah, Mubarak

Computer Science > Computer Vision and Pattern Recognition

arXiv:2405.13637 (cs)

[Submitted on 22 May 2024 (v1), last revised 24 May 2024 (this version, v2)]

Title:Curriculum Direct Preference Optimization for Diffusion and Consistency Models

Authors:Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, Nicu Sebe, Mubarak Shah

View PDF HTML (experimental)

Abstract:Direct Preference Optimization (DPO) has been proposed as an effective and efficient alternative to reinforcement learning from human feedback (RLHF). In this paper, we propose a novel and enhanced version of DPO based on curriculum learning for text-to-image generation. Our method is divided into two training stages. First, a ranking of the examples generated for each prompt is obtained by employing a reward model. Then, increasingly difficult pairs of examples are sampled and provided to a text-to-image generative (diffusion or consistency) model. Generated samples that are far apart in the ranking are considered to form easy pairs, while those that are close in the ranking form hard pairs. In other words, we use the rank difference between samples as a measure of difficulty. The sampled pairs are split into batches according to their difficulty levels, which are gradually used to train the generative model. Our approach, Curriculum DPO, is compared against state-of-the-art fine-tuning approaches on three benchmarks, outperforming the competing methods in terms of text alignment, aesthetics and human preference. Our code is available at this https URL.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2405.13637 [cs.CV]
	(or arXiv:2405.13637v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2405.13637

Submission history

From: Radu Tudor Ionescu [view email]
[v1] Wed, 22 May 2024 13:36:48 UTC (26,011 KB)
[v2] Fri, 24 May 2024 13:14:40 UTC (26,011 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Curriculum Direct Preference Optimization for Diffusion and Consistency Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Curriculum Direct Preference Optimization for Diffusion and Consistency Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators