skip to main content
10.1145/3664647.3680578acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

MambaMOS: LiDAR-based 3D Moving Object Segmentation with Motion-aware State Space Model

Published: 28 October 2024 Publication History

Abstract

LiDAR-based Moving Object Segmentation (MOS) aims to locate and segment moving objects in point clouds of the current scan using motion information from previous scans. Despite the promising results achieved by previous MOS methods, several key issues, such as the weak coupling of temporal and spatial information, still need further study. In this paper, we propose a novel LiDAR-based 3D Moving Object Segmentation with Motion-aware State Space Model, termed MambaMOS. Firstly, we develop a novel embedding module, the Time Clue Bootstrapping Embedding (TCBE), to enhance the coupling of temporal and spatial information in point clouds and alleviate the issue of overlooked temporal clues. Secondly, we introduce the Motion-aware State Space Model (MSSM) to endow the model with the capacity to understand the temporal correlations of the same object across different time steps. Specifically, MSSM emphasizes the motion states of the same object at different time steps through two distinct temporal modeling and correlation steps. We utilize an improved state space model to represent these motion differences, significantly modeling the motion states. Finally, extensive experiments on the SemanticKITTI-MOS and KITTI-Road benchmarks demonstrate that the proposed MambaMOS achieves state-of-the-art performance. The source code is publicly available at https://github.com/Terminal-K/MambaMOS

References

[1]
Martin Arjovsky, Amar Shah, and Yoshua Bengio. 2016. Unitary evolution recurrent neural networks. In International Conference on Machine Learning (ICML).
[2]
Mehul Arora, Louis Wiesmann, Xieyuanli Chen, and Cyrill Stachniss. 2021. Mapping the static parts of dynamic scenes from 3D LiDAR point clouds exploiting ground segmentation. In 2021 European Conference on Mobile Robots (ECMR).
[3]
Jens Behley, Martin Garbade, Andres Milioto, Jan Quenzel, Sven Behnke, Cyrill Stachniss, and Jürgen Gall. 2020. SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).
[4]
Maxim Berman, Amal Rannen Triki, and Matthew B Blaschko. 2018. The lovász-softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[5]
Xieyuanli Chen, Shijie Li, Benedikt Mersch, Louis Wiesmann, Jürgen Gall, Jens Behley, and Cyrill Stachniss. 2021. Moving object segmentation in 3D LiDAR data: A learning-based approach exploiting sequential data. IEEE Robotics and Automation Letters (RA-L), Vol. 6, 4 (2021), 6529--6536.
[6]
Xieyuanli Chen, Benedikt Mersch, Lucas Nunes, Rodrigo Marcuzzi, Ignacio Vizzo, Jens Behley, and Cyrill Stachniss. 2022. Automatic labeling to generate training data for online LiDAR-based moving object segmentation. IEEE Robotics and Automation Letters (RA-L), Vol. 7, 3 (2022), 6107--6114.
[7]
Xieyuanli Chen, Andres Milioto, Emanuele Palazzolo, Philippe Giguere, Jens Behley, and Cyrill Stachniss. 2019. SuMa: Efficient LiDAR-based Semantic SLAM. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[8]
Jintao Cheng, Kang Zeng, Zhuoxu Huang, Xiaoyu Tang, Jin Wu, Chengxi Zhang, Xieyuanli Chen, and Rui Fan. 2024. MF-MOS: A Motion-Focused Model for Moving Object Segmentation. In 2024 IEEE International Conference on Robotics and Automation (ICRA).
[9]
Tiago Cortinhal, George Tzelepis, and Eren Erdal Aksoy. 2020. SalsaNext: Fast, Uncertainty-Aware Semantic Segmentation of LiDAR Point Clouds. In International Symposium on Visual Computing (ISVC).
[10]
Tri Dao, Dan Fu, Stefano Ermon, Atri Rudra, and Christopher Ré. 2022. Flashattention: Fast and memory-efficient exact attention with io-awareness. Advances in Neural Information Processing Systems (NeurIPS) (2022).
[11]
Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. 2010. The pascal visual object classes (voc) challenge. International Journal of Computer Vision (IJCV), Vol. 88 (2010), 303--338.
[12]
Lue Fan, Xuan Xiong, Feng Wang, Naiyan Wang, and Zhaoxiang Zhang. 2021. RangeDet: In Defense of Range View for LiDAR-based 3D Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).
[13]
Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. 2013. Vision meets Robotics: The KITTI Dataset. International Journal of Robotics Research (IJRR), Vol. 32, 11 (2013), 1231--1237.
[14]
Albert Gu. 2023. Modeling Sequences with Structured State Spaces. Stanford University.
[15]
Albert Gu and Tri Dao. 2023. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752 (2023).
[16]
David Hilbert. 1935. Dritter Band: Analysis· Grundlagen der Mathematik· Physik Verschiedenes: Nebst Einer Lebensgeschichte. (1935).
[17]
Weizhe Hua, Zihang Dai, Hanxiao Liu, and Quoc Le. 2022. Transformer quality in linear time. In International Conference on Machine Learning (ICML).
[18]
Giseop Kim and Ayoung Kim. 2020. Remove, then revert: Static point cloud map construction using multiresolution range images. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[19]
Jaeyeul Kim, Jungwan Woo, and Sunghoon Im. 2022. RVMOS: Range-View Moving Object Segmentation Leveraged by Semantic and Motion Features. IEEE Robotics and Automation Letters (RA-L), Vol. 7, 3 (2022), 8044--8051.
[20]
Lingdong Kong, Youquan Liu, Runnan Chen, Yuexin Ma, Xinge Zhu, Yikang Li, Yuenan Hou, Yu Qiao, and Ziwei Liu. 2023. Rethinking range view representation for lidar segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).
[21]
Xin Lai, Jianhui Liu, Li Jiang, Liwei Wang, Hengshuang Zhao, Shu Liu, Xiaojuan Qi, and Jiaya Jia. 2022. Stratified Transformer for 3D Point Cloud Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[22]
Qipeng Li and Yuan Zhuang. 2023. An efficient image-guided-based 3D point cloud moving object segmentation with transformer-attention in autonomous driving. International Journal of Applied Earth Observation and Geoinformation, Vol. 123 (2023), 103488.
[23]
Qipeng Li, Yuan Zhuang, and Jianzhu Huai. 2023. Multi-sensor fusion for robust localization with moving object segmentation in complex dynamic 3D scenes. International Journal of Applied Earth Observation and Geoinformation (2023).
[24]
Hyungtae Lim, Sungwon Hwang, and Hyun Myung. 2021. ERASOR: Egocentric ratio of pseudo occupancy-based dynamic object removal for static 3D point cloud map building. IEEE Robotics and Automation Letters (RA-L), Vol. 6, 2 (2021), 2272--2279.
[25]
HyungTae Lim, Lucas Nunes, Benedikt Mersch, Xieyunali Chen, Jens Behley, Hyun Myung, and Cyrill Stachniss. 2023. ERASOR2: Instance-aware robust 3D mapping of the static world in dynamic scenes. In Robotics: Science and Systems (RSS).
[26]
Ilya Loshchilov and Frank Hutter. 2018. Decoupled Weight Decay Regularization. In International Conference on Learning Representations (ICLR).
[27]
Benedikt Mersch, Xieyuanli Chen, Ignacio Vizzo, Lucas Nunes, Jens Behley, and Cyrill Stachniss. 2022. Receding Moving Object Segmentation in 3D LiDAR Data Using Sparse 4D Convolutions. IEEE Robotics and Automation Letters (RA-L), Vol. 7, 3 (2022), 7503--7510.
[28]
Benedikt Mersch, Tiziano Guadagnino, Xieyuanli Chen, Ignacio Vizzo, Jens Behley, and Cyrill Stachniss. 2023. Building volumetric beliefs for dynamic environments exploiting map-based moving object segmentation. IEEE Robotics and Automation Letters (RA-L), Vol. 8, 8 (2023), 5180--5187.
[29]
Andres Milioto, Ignacio Vizzo, Jens Behley, and Cyrill Stachniss. 2019. RangeNet : Fast and Accurate LiDAR Semantic Segmentation. In 2019 IEEE/RSJ international conference on intelligent robots and systems (IROS).
[30]
Sambit Mohapatra, Mona Hodaei, Senthil Yogamani, Stefan Milz, Heinrich Gotzig, Martin Simon, Hazem Rashed, and Patrick Maeder. 2022. LiMoSeg: Real-time Bird's Eye View based LiDAR Motion Segmentation. In International Conference on Computer Vision Theory and Applications (VISAPP).
[31]
Guy M Morton. 1966. A computer oriented geodetic data base and a new technique in file sequencing. (1966).
[32]
Patrick Pfreundschuh, Hubertus FC Hendrikx, Victor Reijgwart, Renaud Dubé, Roland Siegwart, and Andrei Cramariuc. 2021. Dynamic Object Aware LiDAR SLAM based on Automatic Generation of Training Data. In 2021 IEEE International Conference on Robotics and Automation (ICRA).
[33]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI).
[34]
Johannes Schauer and Andreas Nüchter. 2018. The Peopleremover - Removing Dynamic Objects From 3-D Point Cloud Data by Traversing a Voxel Occupancy Grid. IEEE Robotics and Automation Letters (RA-L), Vol. 3, 3 (2018), 1679--1686.
[35]
Lukas Schmid, Olov Andersson, Aurelio Sulser, Patrick Pfreundschuh, and Roland Siegwart. 2023. Dynablox: Real-Time Detection of Diverse Dynamic Objects in Complex Environments. IEEE Robotics and Automation Letters (RA-L), Vol. 8, 10 (2023), 6259--6266.
[36]
Tao Song, Yunhao Liu, Ziying Yao, and Xinkai Wu. 2024. SSF-MOS: Semantic Scene Flow Assisted Moving Object Segmentation for Autonomous Vehicles. IEEE Transactions on Instrumentation and Measurement (TIM), Vol. 73 (2024), 1--12.
[37]
Jiadai Sun, Yuchao Dai, Xianjing Zhang, Jintao Xu, Rui Ai, Weihao Gu, and Xieyuanli Chen. 2022. Efficient Spatial-Temporal Information Fusion for LiDAR-Based 3D Moving Object Segmentation. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[38]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems (NeurIPS).
[39]
Neng Wang, Chenghao Shi, Ruibin Guo, Huimin Lu, Zhiqiang Zheng, and Xieyuanli Chen. 2023. InsMOS: Instance-aware moving object segmentation in LiDAR data. In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[40]
Peng-Shuai Wang. 2023. OctFormer: Octree-based Transformers for 3D Point Clouds. ACM Transactions on Graphics (TOG), Vol. 42, 4 (2023), 1--11.
[41]
Xiaoyang Wu, Li Jiang, Peng-Shuai Wang, Zhijian Liu, Xihui Liu, Yu Qiao, Wanli Ouyang, Tong He, and Hengshuang Zhao. 2024. Point Transformer V3: Simpler, Faster, Stronger. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[42]
Yu-Qi Yang, Yu-Xiao Guo, Jian-Yu Xiong, Yang Liu, Hao Pan, Peng-Shuai Wang, Xin Tong, and Baining Guo. 2023. Swin3D: A Pretrained Transformer Backbone for 3D Indoor Scene Understanding. arXiv preprint arXiv:2304.06906 (2023).
[43]
Yang Zhang, Zixiang Zhou, Philip David, Xiangyu Yue, Zerong Xi, Boqing Gong, and Hassan Foroosh. 2020. PolarNet: An Improved Grid Representation for Online LiDAR Point Clouds Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[44]
Bo Zhou, Jiapeng Xie, Yan Pan, Jiajie Wu, and Chuanzhao Lu. 2023. MotionBEV: Attention-Aware Online LiDAR Moving Object Segmentation With Bird's Eye View Based Appearance and Motion Features. IEEE Robotics and Automation Letters (RA-L), Vol. 8, 12 (2023), 8074--8081.

Index Terms

  1. MambaMOS: LiDAR-based 3D Moving Object Segmentation with Motion-aware State Space Model

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '24: Proceedings of the 32nd ACM International Conference on Multimedia
    October 2024
    11719 pages
    ISBN:9798400706868
    DOI:10.1145/3664647
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 October 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. moving object segmentation
    2. spatio-temporal fusion
    3. state space model

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    MM '24
    Sponsor:
    MM '24: The 32nd ACM International Conference on Multimedia
    October 28 - November 1, 2024
    Melbourne VIC, Australia

    Acceptance Rates

    MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;
    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 59
      Total Downloads
    • Downloads (Last 12 months)59
    • Downloads (Last 6 weeks)21
    Reflects downloads up to 31 Jan 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media