VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation

He, Xuan; Jiang, Dongfu; Zhang, Ge; Ku, Max; Soni, Achint; Siu, Sherman; Chen, Haonan; Chandra, Abhranil; Jiang, Ziyan; Arulraj, Aaran; Wang, Kai; Do, Quy Duc; Ni, Yuansheng; Lyu, Bohan; Narsupalli, Yaswanth; Fan, Rongqi; Lyu, Zhiheng; Lin, Yuchen; Chen, Wenhu

Computer Science > Computer Vision and Pattern Recognition

arXiv:2406.15252 (cs)

[Submitted on 21 Jun 2024 (v1), last revised 14 Oct 2024 (this version, v3)]

Title:VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation

Authors:Xuan He, Dongfu Jiang, Ge Zhang, Max Ku, Achint Soni, Sherman Siu, Haonan Chen, Abhranil Chandra, Ziyan Jiang, Aaran Arulraj, Kai Wang, Quy Duc Do, Yuansheng Ni, Bohan Lyu, Yaswanth Narsupalli, Rongqi Fan, Zhiheng Lyu, Yuchen Lin, Wenhu Chen

View PDF HTML (experimental)

Abstract:The recent years have witnessed great advances in video generation. However, the development of automatic video metrics is lagging significantly behind. None of the existing metric is able to provide reliable scores over generated videos. The main barrier is the lack of large-scale human-annotated dataset. In this paper, we release VideoFeedback, the first large-scale dataset containing human-provided multi-aspect score over 37.6K synthesized videos from 11 existing video generative models. We train VideoScore (initialized from Mantis) based on VideoFeedback to enable automatic video quality assessment. Experiments show that the Spearman correlation between VideoScore and humans can reach 77.1 on VideoFeedback-test, beating the prior best metrics by about 50 points. Further result on other held-out EvalCrafter, GenAI-Bench, and VBench show that VideoScore has consistently much higher correlation with human judges than other metrics. Due to these results, we believe VideoScore can serve as a great proxy for human raters to (1) rate different video models to track progress (2) simulate fine-grained human feedback in Reinforcement Learning with Human Feedback (RLHF) to improve current video generation models.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2406.15252 [cs.CV]
	(or arXiv:2406.15252v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2406.15252

Submission history

From: Xuan He [view email]
[v1] Fri, 21 Jun 2024 15:43:46 UTC (6,179 KB)
[v2] Mon, 24 Jun 2024 16:22:55 UTC (6,179 KB)
[v3] Mon, 14 Oct 2024 04:08:53 UTC (6,183 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators