V2X-VLM: End-to-End V2X Cooperative Autonomous Driving Through Large Vision-Language Models

You, Junwei; Shi, Haotian; Jiang, Zhuoyu; Huang, Zilin; Gan, Rui; Wu, Keshu; Cheng, Xi; Li, Xiaopeng; Ran, Bin

Computer Science > Robotics

arXiv:2408.09251 (cs)

[Submitted on 17 Aug 2024 (v1), last revised 16 Sep 2024 (this version, v2)]

Title:V2X-VLM: End-to-End V2X Cooperative Autonomous Driving Through Large Vision-Language Models

Authors:Junwei You, Haotian Shi, Zhuoyu Jiang, Zilin Huang, Rui Gan, Keshu Wu, Xi Cheng, Xiaopeng Li, Bin Ran

View PDF HTML (experimental)

Abstract:Advancements in autonomous driving have increasingly focused on end-to-end (E2E) systems that manage the full spectrum of driving tasks, from environmental perception to vehicle navigation and control. This paper introduces V2X-VLM, an innovative E2E vehicle-infrastructure cooperative autonomous driving (VICAD) framework with Vehicle-to-Everything (V2X) systems and large vision-language models (VLMs). V2X-VLM is designed to enhance situational awareness, decision-making, and ultimate trajectory planning by integrating multimodel data from vehicle-mounted cameras, infrastructure sensors, and textual information. The contrastive learning method is further employed to complement VLM by refining feature discrimination, assisting the model to learn robust representations of the driving environment. Evaluations on the DAIR-V2X dataset show that V2X-VLM outperforms state-of-the-art cooperative autonomous driving methods, while additional tests on corner cases validate its robustness in real-world driving conditions.

Subjects:	Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2408.09251 [cs.RO]
	(or arXiv:2408.09251v2 [cs.RO] for this version)
	https://doi.org/10.48550/arXiv.2408.09251

Submission history

From: Junwei You [view email]
[v1] Sat, 17 Aug 2024 16:42:13 UTC (7,584 KB)
[v2] Mon, 16 Sep 2024 05:23:07 UTC (8,981 KB)

Computer Science > Robotics

Title:V2X-VLM: End-to-End V2X Cooperative Autonomous Driving Through Large Vision-Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Robotics

Title:V2X-VLM: End-to-End V2X Cooperative Autonomous Driving Through Large Vision-Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators