skip to main content
10.1145/3664647.3681326acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article
Open access

When, Where, and What? A Benchmark for Accident Anticipation and Localization with Large Language Models

Published: 28 October 2024 Publication History

Abstract

As autonomous driving systems increasingly become part of daily transportation, the ability to accurately anticipate and mitigate potential traffic accidents is paramount. Traditional accident anticipation models primarily utilizing dashcam videos are adept at predicting when an accident may occur but fall short in localizing the incident and identifying involved entities. Addressing this gap, this study introduces a novel framework that integrates Large Language Models (LLMs) to enhance predictive capabilities across multiple dimensions-what, when, and where accidents might occur. We develop an innovative chain-based attention mechanism that dynamically adjusts to prioritize high-risk elements within complex driving scenes. This mechanism is complemented by a three-stage model that processes outputs from smaller models into detailed multimodal inputs for LLMs, thus enabling a more nuanced understanding of traffic dynamics. Empirical validation on the DAD, CCD, and A3D datasets demonstrates superior performance in Average Precision (AP) and Mean Time-To-Accident (mTTA), establishing new benchmarks for accident prediction technology. Our approach not only advances the technological framework for autonomous driving safety but also enhances human-AI interaction, making predictive insights generated by autonomous systems more intuitive and actionable.

Supplemental Material

MP4 File - Presentation Video of "When, Where, and What? A Benchmark for Accident Anticipation and Localization with Large Language Models"
Video presentation about the brief introduction of the paper "When, Where, and What? A Benchmark for Accident Anticipation and Localization with Large Language Models".

References

[1]
Wentao Bao, Qi Yu, and Yu Kong. 2020. Uncertainty-based Traffic Accident Anticipation with Spatio-Temporal Relational Learning. In Proceedings of the 28th ACM International Conference on Multimedia (MM '20).
[2]
Wentao Bao, Qi Yu, and Yu Kong. 2021. Deep Reinforced Accident Anticipation with Visual Explanation. In International Conference on Computer Vision (ICCV).
[3]
Franco Basso, Raúl Pezoa, Mauricio Varas, and Matías Villalobos. 2021. A deep learning approach for real-time crash prediction using vehicle-by-vehicle data. Accident Analysis & Prevention 162 (2021), 106409.
[4]
Zhaowei Cai and Nuno Vasconcelos. 2018. Cascade R-CNN: Delving Into High Quality Object Detection. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6154--6162. https://doi.org/10.1109/CVPR.2018.00644
[5]
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-end object detection with transformers. In European conference on computer vision. Springer, 213--229.
[6]
Fu-Hsiang Chan, Yu-Ting Chen, Yu Xiang, and Min Sun. 2016. Anticipating accidents in dashcam videos. In Asian Conference on Computer Vision. Springer, 136--153.
[7]
Fu-Hsiang Chan, Yu-Ting Chen, Yu Xiang, and Min Sun. 2017. Anticipating Accidents in Dashcam Videos. In Computer Vision -- ACCV 2016. Springer International Publishing, Cham, 136--153.
[8]
Bowen Cheng, Alex Schwing, and Alexander Kirillov. 2021. Per-pixel classification is not all you need for semantic segmentation. Advances in Neural Information Processing Systems 34 (2021), 17864--17875.
[9]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[10]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
[11]
Haoqi Fan, Bo Xiong, Karttikeya Mangalam, Yanghao Li, Zhicheng Yan, Jitendra Malik, and Christoph Feichtenhofer. 2021. Multiscale vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision. 6824-- 6835.
[12]
Jun Fu, Jing Liu, Jie Jiang, Yong Li, Yongjun Bao, and Hanqing Lu. 2020. Scene Segmentation With Dual Relation-Aware Attention Network. IEEE Transactions on Neural Networks and Learning Systems (2020).
[13]
Jun Fu, Jing Liu, Haijie Tian, Yong Li, Yongjun Bao, Zhiwei Fang, and Hanqing Lu. 2019. Dual attention network for scene segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3146--3154.
[14]
Maximilian Geisslinger, Franziska Poszler, and Markus Lienkamp. 2023. An ethical trajectory planning algorithm for autonomous vehicles. Nature Machine Intelligence 5, 2 (2023), 137--144.
[15]
Yanchen Guan, Haicheng Liao, Zhenning Li, Jia Hu, Runze Yuan, Yunjian Li, Guohui Zhang, and Chengzhong Xu. 2024. World Models for Autonomous Driving: An Initial Survey. IEEE Transactions on Intelligent Vehicles (2024), 1--17. https://doi.org/10.1109/TIV.2024.3398357
[16]
Xingshuo Han, Guowen Xu, Yuan Zhou, Xuehuan Yang, Jiwei Li, and Tianwei Zhang. 2022. Physical backdoor attacks to lane detection systems in autonomous driving. In Proceedings of the 30th ACM International Conference on Multimedia. 2957--2968.
[17]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising Diffusion Probabilistic Models. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 6840--6851. https://proceedings.neurips.cc/paper_files/paper/2020/ file/4c5bcfec8584af0d967f1ab10179ca4b-Paper.pdf
[18]
Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, Xizhou Zhu, Siqi Chai, Senyao Du, Tianwei Lin, Wenhai Wang, Lewei Lu, Xiaosong Jia, Qiang Liu, Jifeng Dai, Yu Qiao, and Hongyang Li. 2023. Planning-oriented Autonomous Driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[19]
Tingting Huang, Shuo Wang, and Anuj Sharma. 2020. Highway crash detection and risk estimation using deep learning. Accident Analysis & Prevention 135 (2020), 105392.
[20]
Fizza Hussain, Yuefeng Li, Ashutosh Arun, and Md Mazharul Haque. 2022. A hybrid modelling framework of machine learning and extreme value theory for crash risk estimation using traffic conflicts. Analytic methods in accident research 36 (2022), 100248.
[21]
Albert Q Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, et al. 2023. Mistral 7B. arXiv preprint arXiv:2310.06825 (2023).
[22]
Aishwarya Kamath, Mannat Singh, Yann LeCun, Gabriel Synnaeve, Ishan Misra, and Nicolas Carion. 2021. Mdetr-modulated detection for end-to-end multimodal understanding. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1780--1790.
[23]
Muhammad Monjurul Karim, Yu Li, Ruwen Qin, and Zhaozheng Yin. 2022. A dynamic spatial-temporal attention network for early anticipation of traffic accidents. IEEE Transactions on Intelligent Transportation Systems 23, 7 (2022), 9590--9600.
[24]
Muhammad Monjurul Karim, Yu Li, Ruwen Qin, and Zhaozheng Yin. 2022. A Dynamic Spatial-Temporal Attention Network for Early Anticipation of Traffic Accidents. IEEE Transactions on Intelligent Transportation Systems 23, 7 (2022), 9590--9600. https://doi.org/10.1109/TITS.2022.3155613
[25]
Muhammad Monjurul Karim, Zhaozheng Yin, and Ruwen Qin. 2023. An Attention-guided Multistream Feature Fusion Network for Early Localization of Risky Traffic Agents in Driving Videos. IEEE Transactions on Intelligent Vehicles (2023).
[26]
Kunchang Li, Yali Wang, Yinan He, Yizhuo Li, Yi Wang, Limin Wang, and Yu Qiao. 2022. Uniformerv2: Spatiotemporal learning by arming image vits with video uniformer. arXiv preprint arXiv:2211.09552 (2022).
[27]
Zhenning Li, Zhiyong Cui, Haicheng Liao, John Ash, Guohui Zhang, Chengzhong Xu, and Yinhai Wang. 2024. Steering the Future: Redefining Intelligent Transportation Systems with Foundation Models. CHAIN 1, 1 (2024), 46--53.
[28]
Zhenning Li, Haicheng Liao, Ruru Tang, Guofa Li, Yunjian Li, and Chengzhong Xu. 2023. Mitigating the impact of outliers in traffic crash analysis: A robust Bayesian regression approach with application to tunnel crash data. Accident Analysis & Prevention 185 (2023), 107019.
[29]
Haicheng Liao, Yongkang Li, Zhenning Li, Zilin Bian, Jaeyoung Lee, Zhiyong Cui, Guohui Zhang, and Chengzhong Xu. 2024. Real-time accident anticipation for autonomous driving through monocular depth-enhanced 3D modeling. Accident Analysis & Prevention 207 (2024), 107760.
[30]
Haicheng Liao, Yongkang Li, Zhenning Li, Chengyue Wang, Zhiyong Cui, Shengbo Eben Li, and Chengzhong Xu. 2024. A Cognitive-Based Trajectory Prediction Approach for Autonomous Driving. IEEE Transactions on Intelligent Vehicles 9, 4 (2024), 4632--4643. https://doi.org/10.1109/TIV.2024.3376074
[31]
Haicheng Liao, Zhenning Li, Huanming Shen, Wenxuan Zeng, Dongping Liao, Guofa Li, and Chengzhong Xu. 2024. BAT: Behavior-Aware Human-Like Trajectory Prediction for Autonomous Driving. Proceedings of the AAAI Conference on Artificial Intelligence 38, 9 (Mar. 2024), 10332--10340. https://doi.org/10.1609/aaai. v38i9.28900
[32]
Haicheng Liao, Shangqian Liu, Yongkang Li, Zhenning Li, Chengyue Wang, Yunjian Li, Shengbo Eben Li, and Chengzhong Xu. 2024. Human observationinspired trajectory prediction for autonomous driving in mixed-autonomy traffic environments. In 2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 14212--14219.
[33]
Haicheng Liao, Huanming Shen, Zhenning Li, ChengyueWang, Guofa Li, Yiming Bie, and Chengzhong Xu. 2024. Gpt-4 enhanced multimodal grounding for autonomous driving: Leveraging cross-modal attention with large language models. Communications in Transportation Research 4 (2024), 100116.
[34]
Haicheng Liao, Haoyu Sun, Huanming Shen, ChengyueWang, Kahou Tam, Chunlin Tian, Li Li, Chengzhong Xu, and Zhenning Li. 2024. CRASH: Crash Recognition and Anticipation System Harnessing with Context-Aware and Temporal Focus Attentions. arXiv:2407.17757 [cs.CV] https://arxiv.org/abs/2407.17757
[35]
Haotian Liu, Chunyuan Li, Yuheng Li, Bo Li, Yuanhan Zhang, Sheng Shen, and Yong Jae Lee. 2024. LLaVA-NeXT: Improved reasoning, OCR, and world knowledge. https://llava-vl.github.io/blog/2024-01--30-llava-next/
[36]
Haotian Liu, Chunyuan Li, QingyangWu, and Yong Jae Lee. 2023. Visual Instruction Tuning.
[37]
Kun Liu, Minzhi Zhu, Huiyuan Fu, Huadong Ma, and Tat-Seng Chua. 2020. Enhancing anomaly detection in surveillance videos with transfer learning from action recognition. In Proceedings of the 28th ACM International Conference on Multimedia. 4664--4668.
[38]
Wei Liu, Tao Zhang, Yisheng Lu, Jun Chen, and LongshengWei. 2023. THAT-Net: Two-layer hidden state aggregation based two-stream network for traffic accident prediction. Information Sciences 634 (2023), 744--760.
[39]
Ze Liu, Jia Ning, Yue Cao, Yixuan Wei, Zheng Zhang, Stephen Lin, and Han Hu. 2022. Video swin transformer. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 3202--3211.
[40]
Zeyu Ma, Yang Yang, Guoqing Wang, Xing Xu, Heng Tao Shen, and Mingxing Zhang. 2022. Rethinking open-world object detection in autonomous driving scenarios. In Proceedings of the 30th ACM International Conference on Multimedia. 1279--1288.
[41]
Jiageng Mao, Yuxi Qian, Junjie Ye, Hang Zhao, and Yue Wang. 2023. GPT-Driver: Learning to Drive with GPT. arXiv:2310.01415 [cs.CV]
[42]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748--8763.
[43]
Md Adilur Rahim and Hany M Hassan. 2021. A deep learning based traffic crash severity prediction framework. Accident Analysis & Prevention 154 (2021), 106090.
[44]
Sara Sabour, Nicholas Frosst, and Geoffrey E Hinton. 2017. Dynamic Routing Between Capsules. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H.Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper_ files/paper/2017/file/2cad8fa47bbef282badbb8de5374b894-Paper.pdf
[45]
Mark Sandler, AndrewG. Howard, Menglong Zhu, Andrey Zhmoginov, and Liang- Chieh Chen. 2018. MobileNetV2: Inverted Residuals and Linear Bottlenecks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018), 4510-- 4520. https://api.semanticscholar.org/CorpusID:4555207
[46]
Hao Shao, Yuxuan Hu, LetianWang, Steven L.Waslander, Yu Liu, and Hongsheng Li. 2023. LMDrive: Closed-Loop End-to-End Driving with Large Language Models. arXiv:2312.07488 [cs.CV]
[47]
T. Suzuki, H. Kataoka, Y. Aoki, and Y. Satoh. 2018. Anticipating Traffic Accidents with Adaptive Loss and Large-Scale Incident DB. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 3521--3529. https://doi.org/10. 1109/CVPR.2018.00371
[48]
Tomoyuki Suzuki, Hirokatsu Kataoka, Yoshimitsu Aoki, and Yutaka Satoh. 2018. Anticipating Traffic Accidents with Adaptive Loss and Large-Scale Incident DB. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018), 3521--3529. https://api.semanticscholar.org/CorpusID:4713643
[49]
Kamalakar Vijay Thakare, Debi Prosad Dogra, Heeseung Choi, Haksub Kim, and Ig-Jae Kim. 2023. Rareanom: a benchmark video dataset for rare type anomalies. Pattern Recognition 140 (2023), 109567.
[50]
Nupur Thakur, PrasanthSai Gouripeddi, and Baoxin Li. 2024. Graph(Graph): A Nested Graph-Based Framework for Early Accident Anticipation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). 7533--7541.
[51]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).
[52]
Shiyi Wang, Yuxuan Zhu, Zhiheng Li, Yutong Wang, Li Li, and Zhengbing He. 2023. Chatgpt as your vehicle co-pilot: An initial attempt. IEEE Transactions on Intelligent Vehicles (2023).
[53]
Tianhang Wang, Kai Chen, Guang Chen, Bin Li, Zhijun Li, Zhengfa Liu, and Changjun Jiang. 2023. GSC: A Graph and Spatio-temporal Continuity Based Framework for Accident Anticipation. IEEE Transactions on Intelligent Vehicles (2023).
[54]
WenhaiWang, Jiangwei Xie, ChuanYang Hu, Haoming Zou, Jianan Fan,Wenwen Tong, Yang Wen, Silei Wu, Hanming Deng, Zhiqi Li, et al. 2023. DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving. arXiv preprint arXiv:2312.09245 (2023).
[55]
Zhuo Wei, Swee-Won Lo, Yu Liang, Tieyan Li, Jialie Shen, and Robert H Deng. 2015. Automatic accident detection and alarm system. In Proceedings of the 23rd ACM international conference on Multimedia. 781--784.
[56]
Yu Yao, Mingze Xu, Yuchen Wang, David J Crandall, and Ella M Atkins. 2019. Unsupervised traffic accident detection in first-person videos. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 273--280.
[57]
Muchao Ye, Xiaojiang Peng, Weihao Gan, Wei Wu, and Yu Qiao. 2019. Anopcn: Video anomaly detection via deep predictive coding network. In Proceedings of the 27th ACM international conference on multimedia. 1805--1813.
[58]
Kuo-Hao Zeng, Shih-Han Chou, Fu-Hsiang Chan, Juan Carlos Niebles, and Min Sun. 2017. Agent-Centric Risk Assessment: Accident Anticipation and Risky Region Localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[59]
Shile Zhang and Mohamed Abdel-Aty. 2022. Real-time crash potential prediction on freeways using connected vehicle data. Analytic methods in accident research 36 (2022), 100239.
[60]
Siyao Zhang, Daocheng Fu,Wenzhe Liang, Zhao Zhang, Bin Yu, Pinlong Cai, and Baozhen Yao. 2024. Trafficgpt: Viewing, processing and interacting with traffic foundation models. Transport Policy (2024).
[61]
Ling Zhao, Yujiao Song, Chao Zhang, Yu Liu, Pu Wang, Tao Lin, Min Deng, and Haifeng Li. 2019. T-gcn: A temporal graph convolutional network for traffic prediction. IEEE transactions on intelligent transportation systems 21, 9 (2019), 3848--3858.
[62]
Yiru Zhao, Bing Deng, Chen Shen, Yao Liu, Hongtao Lu, and Xian-Sheng Hua. 2017. Spatio-Temporal AutoEncoder for Video Anomaly Detection. In Proceedings of the 25th ACM International Conference on Multimedia (Mountain View, California, USA) (MM '17). Association for Computing Machinery, New York, NY, USA, 1933--1941. https://doi.org/10.1145/3123266.3123451

Index Terms

  1. When, Where, and What? A Benchmark for Accident Anticipation and Localization with Large Language Models

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '24: Proceedings of the 32nd ACM International Conference on Multimedia
    October 2024
    11719 pages
    ISBN:9798400706868
    DOI:10.1145/3664647
    This work is licensed under a Creative Commons Attribution-NonCommercial International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 October 2024

    Check for updates

    Author Tags

    1. autonomous driving
    2. dynamic object attention
    3. human-ai interaction
    4. large language models
    5. traffic accident anticipation

    Qualifiers

    • Research-article

    Funding Sources

    • University of Macau
    • Science and Technology Development Fund of Macau SAR
    • Shenzhen-Hong Kong-Macau Science and Technology Program Category C

    Conference

    MM '24
    Sponsor:
    MM '24: The 32nd ACM International Conference on Multimedia
    October 28 - November 1, 2024
    Melbourne VIC, Australia

    Acceptance Rates

    MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;
    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 163
      Total Downloads
    • Downloads (Last 12 months)163
    • Downloads (Last 6 weeks)103
    Reflects downloads up to 03 Jan 2025

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media