HunyuanVideo: A Systematic Framework For Large Video Generative Models

Kong, Weijie; Tian, Qi; Zhang, Zijian; Min, Rox; Dai, Zuozhuo; Zhou, Jin; Xiong, Jiangfeng; Li, Xin; Wu, Bo; Zhang, Jianwei; Wu, Kathrina; Lin, Qin; Yuan, Junkun; Long, Yanxin; Wang, Aladdin; Wang, Andong; Li, Changlin; Huang, Duojun; Yang, Fang; Tan, Hao; Wang, Hongmei; Song, Jacob; Bai, Jiawang; Wu, Jianbing; Xue, Jinbao; Wang, Joey; Wang, Kai; Liu, Mengyang; Li, Pengyu; Li, Shuai; Wang, Weiyan; Yu, Wenqing; Deng, Xinchi; Li, Yang; Chen, Yi; Cui, Yutao; Peng, Yuanbo; Yu, Zhentao; He, Zhiyu; Xu, Zhiyong; Zhou, Zixiang; Xu, Zunnan; Tao, Yangyu; Lu, Qinglin; Liu, Songtao; Zhou, Dax; Wang, Hongfa; Yang, Yong; Wang, Di; Liu, Yuhong; Jiang, Jie; Zhong, Caesar

Computer Science > Computer Vision and Pattern Recognition

arXiv:2412.03603 (cs)

[Submitted on 3 Dec 2024 (v1), last revised 17 Jan 2025 (this version, v4)]

Title:HunyuanVideo: A Systematic Framework For Large Video Generative Models

Authors:Weijie Kong, Qi Tian, Zijian Zhang, Rox Min, Zuozhuo Dai, Jin Zhou, Jiangfeng Xiong, Xin Li, Bo Wu, Jianwei Zhang, Kathrina Wu, Qin Lin, Junkun Yuan, Yanxin Long, Aladdin Wang, Andong Wang, Changlin Li, Duojun Huang, Fang Yang, Hao Tan, Hongmei Wang, Jacob Song, Jiawang Bai, Jianbing Wu, Jinbao Xue, Joey Wang, Kai Wang, Mengyang Liu, Pengyu Li, Shuai Li, Weiyan Wang, Wenqing Yu, Xinchi Deng, Yang Li, Yi Chen, Yutao Cui, Yuanbo Peng, Zhentao Yu, Zhiyu He, Zhiyong Xu, Zixiang Zhou, Zunnan Xu, Yangyu Tao, Qinglin Lu, Songtao Liu, Dax Zhou, Hongfa Wang, Yong Yang, Di Wang, Yuhong Liu, Jie Jiang, Caesar Zhong (refer to the report for detailed contributions)

View PDF HTML (experimental)

Abstract:Recent advancements in video generation have significantly impacted daily life for both individuals and industries. However, the leading video generation models remain closed-source, resulting in a notable performance gap between industry capabilities and those available to the public. In this report, we introduce HunyuanVideo, an innovative open-source video foundation model that demonstrates performance in video generation comparable to, or even surpassing, that of leading closed-source models. HunyuanVideo encompasses a comprehensive framework that integrates several key elements, including data curation, advanced architectural design, progressive model scaling and training, and an efficient infrastructure tailored for large-scale model training and inference. As a result, we successfully trained a video generative model with over 13 billion parameters, making it the largest among all open-source models. We conducted extensive experiments and implemented a series of targeted designs to ensure high visual quality, motion dynamics, text-video alignment, and advanced filming techniques. According to evaluations by professionals, HunyuanVideo outperforms previous state-of-the-art models, including Runway Gen-3, Luma 1.6, and three top-performing Chinese video generative models. By releasing the code for the foundation model and its applications, we aim to bridge the gap between closed-source and open-source communities. This initiative will empower individuals within the community to experiment with their ideas, fostering a more dynamic and vibrant video generation ecosystem. The code is publicly available at this https URL.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2412.03603 [cs.CV]
	(or arXiv:2412.03603v4 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2412.03603

Submission history

From: Zijian Zhang [view email]
[v1] Tue, 3 Dec 2024 23:52:37 UTC (44,386 KB)
[v2] Fri, 6 Dec 2024 17:02:10 UTC (44,386 KB)
[v3] Thu, 2 Jan 2025 09:13:42 UTC (48,072 KB)
[v4] Fri, 17 Jan 2025 10:16:18 UTC (48,072 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:HunyuanVideo: A Systematic Framework For Large Video Generative Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:HunyuanVideo: A Systematic Framework For Large Video Generative Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators