Precise Drive with VLM: First Prize Solution for PRCV 2024 Drive LM challenge

Huang, Bin; Wang, Siyu; Chen, Yuanpeng; Wu, Yidan; Song, Hui; Ding, Zifan; Leng, Jing; Liang, Chengpeng; Xue, Peng; Zhang, Junliang; Zhao, Tiankun

Computer Science > Computer Vision and Pattern Recognition

arXiv:2411.02999 (cs)

[Submitted on 5 Nov 2024]

Title:Precise Drive with VLM: First Prize Solution for PRCV 2024 Drive LM challenge

Authors:Bin Huang, Siyu Wang, Yuanpeng Chen, Yidan Wu, Hui Song, Zifan Ding, Jing Leng, Chengpeng Liang, Peng Xue, Junliang Zhang, Tiankun Zhao

View PDF HTML (experimental)

Abstract:This technical report outlines the methodologies we applied for the PRCV Challenge, focusing on cognition and decision-making in driving scenarios. We employed InternVL-2.0, a pioneering open-source multi-modal model, and enhanced it by refining both the model input and training methodologies. For the input data, we strategically concatenated and formatted the multi-view images. It is worth mentioning that we utilized the coordinates of the original images without transformation. In terms of model training, we initially pre-trained the model on publicly available autonomous driving scenario datasets to bolster its alignment capabilities of the challenge tasks, followed by fine-tuning on the DriveLM-nuscenes Dataset. During the fine-tuning phase, we innovatively modified the loss function to enhance the model's precision in predicting coordinate values. These approaches ensure that our model possesses advanced cognitive and decision-making capabilities in driving scenarios. Consequently, our model achieved a score of 0.6064, securing the first prize on the competition's final results.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2411.02999 [cs.CV]
	(or arXiv:2411.02999v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2411.02999

Submission history

From: Bin Huang [view email]
[v1] Tue, 5 Nov 2024 11:00:55 UTC (1,705 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Precise Drive with VLM: First Prize Solution for PRCV 2024 Drive LM challenge

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Precise Drive with VLM: First Prize Solution for PRCV 2024 Drive LM challenge

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators