Learning Content-Aware Multi-Modal Joint Input Pruning via Bird's-Eye-View Representation

Li, Yuxin; Li, Yiheng; Yang, Xulei; Yu, Mengying; Huang, Zihang; Wu, Xiaojun; Yeo, Chai Kiat

Computer Science > Computer Vision and Pattern Recognition

arXiv:2410.07268 (cs)

[Submitted on 9 Oct 2024]

Title:Learning Content-Aware Multi-Modal Joint Input Pruning via Bird's-Eye-View Representation

Authors:Yuxin Li, Yiheng Li, Xulei Yang, Mengying Yu, Zihang Huang, Xiaojun Wu, Chai Kiat Yeo

View PDF HTML (experimental)

Abstract:In the landscape of autonomous driving, Bird's-Eye-View (BEV) representation has recently garnered substantial academic attention, serving as a transformative framework for the fusion of multi-modal sensor inputs. This BEV paradigm effectively shifts the sensor fusion challenge from a rule-based methodology to a data-centric approach, thereby facilitating more nuanced feature extraction from an array of heterogeneous sensors. Notwithstanding its evident merits, the computational overhead associated with BEV-based techniques often mandates high-capacity hardware infrastructures, thus posing challenges for practical, real-world implementations. To mitigate this limitation, we introduce a novel content-aware multi-modal joint input pruning technique. Our method leverages BEV as a shared anchor to algorithmically identify and eliminate non-essential sensor regions prior to their introduction into the perception model's backbone. We validatethe efficacy of our approach through extensive experiments on the NuScenes dataset, demonstrating substantial computational efficiency without sacrificing perception accuracy. To the best of our knowledge, this work represents the first attempt to alleviate the computational burden from the input pruning point.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2410.07268 [cs.CV]
	(or arXiv:2410.07268v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2410.07268

Submission history

From: Yuxin Li [view email]
[v1] Wed, 9 Oct 2024 03:30:00 UTC (8,042 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Learning Content-Aware Multi-Modal Joint Input Pruning via Bird's-Eye-View Representation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Learning Content-Aware Multi-Modal Joint Input Pruning via Bird's-Eye-View Representation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators