research-article

MambaMOS: LiDAR-based 3D Moving Object Segmentation with Motion-aware State Space Model

Authors:

Kailun YangAuthors Info & Claims

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

Pages 1505 - 1513

https://doi.org/10.1145/3664647.3680578

Published: 28 October 2024 Publication History

Abstract

LiDAR-based Moving Object Segmentation (MOS) aims to locate and segment moving objects in point clouds of the current scan using motion information from previous scans. Despite the promising results achieved by previous MOS methods, several key issues, such as the weak coupling of temporal and spatial information, still need further study. In this paper, we propose a novel LiDAR-based 3D Moving Object Segmentation with Motion-aware State Space Model, termed MambaMOS. Firstly, we develop a novel embedding module, the Time Clue Bootstrapping Embedding (TCBE), to enhance the coupling of temporal and spatial information in point clouds and alleviate the issue of overlooked temporal clues. Secondly, we introduce the Motion-aware State Space Model (MSSM) to endow the model with the capacity to understand the temporal correlations of the same object across different time steps. Specifically, MSSM emphasizes the motion states of the same object at different time steps through two distinct temporal modeling and correlation steps. We utilize an improved state space model to represent these motion differences, significantly modeling the motion states. Finally, extensive experiments on the SemanticKITTI-MOS and KITTI-Road benchmarks demonstrate that the proposed MambaMOS achieves state-of-the-art performance. The source code is publicly available at https://github.com/Terminal-K/MambaMOS

References

[1]

Martin Arjovsky, Amar Shah, and Yoshua Bengio. 2016. Unitary evolution recurrent neural networks. In International Conference on Machine Learning (ICML).

[2]

Mehul Arora, Louis Wiesmann, Xieyuanli Chen, and Cyrill Stachniss. 2021. Mapping the static parts of dynamic scenes from 3D LiDAR point clouds exploiting ground segmentation. In 2021 European Conference on Mobile Robots (ECMR).

[3]

Jens Behley, Martin Garbade, Andres Milioto, Jan Quenzel, Sven Behnke, Cyrill Stachniss, and Jürgen Gall. 2020. SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).

[4]

Maxim Berman, Amal Rannen Triki, and Matthew B Blaschko. 2018. The lovász-softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]

Xieyuanli Chen, Shijie Li, Benedikt Mersch, Louis Wiesmann, Jürgen Gall, Jens Behley, and Cyrill Stachniss. 2021. Moving object segmentation in 3D LiDAR data: A learning-based approach exploiting sequential data. IEEE Robotics and Automation Letters (RA-L), Vol. 6, 4 (2021), 6529--6536.

[6]

Xieyuanli Chen, Benedikt Mersch, Lucas Nunes, Rodrigo Marcuzzi, Ignacio Vizzo, Jens Behley, and Cyrill Stachniss. 2022. Automatic labeling to generate training data for online LiDAR-based moving object segmentation. IEEE Robotics and Automation Letters (RA-L), Vol. 7, 3 (2022), 6107--6114.

[7]

Xieyuanli Chen, Andres Milioto, Emanuele Palazzolo, Philippe Giguere, Jens Behley, and Cyrill Stachniss. 2019. SuMa: Efficient LiDAR-based Semantic SLAM. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[8]

Jintao Cheng, Kang Zeng, Zhuoxu Huang, Xiaoyu Tang, Jin Wu, Chengxi Zhang, Xieyuanli Chen, and Rui Fan. 2024. MF-MOS: A Motion-Focused Model for Moving Object Segmentation. In 2024 IEEE International Conference on Robotics and Automation (ICRA).

[9]

Tiago Cortinhal, George Tzelepis, and Eren Erdal Aksoy. 2020. SalsaNext: Fast, Uncertainty-Aware Semantic Segmentation of LiDAR Point Clouds. In International Symposium on Visual Computing (ISVC).

[10]

Tri Dao, Dan Fu, Stefano Ermon, Atri Rudra, and Christopher Ré. 2022. Flashattention: Fast and memory-efficient exact attention with io-awareness. Advances in Neural Information Processing Systems (NeurIPS) (2022).

[11]

Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. 2010. The pascal visual object classes (voc) challenge. International Journal of Computer Vision (IJCV), Vol. 88 (2010), 303--338.

Digital Library

[12]

Lue Fan, Xuan Xiong, Feng Wang, Naiyan Wang, and Zhaoxiang Zhang. 2021. RangeDet: In Defense of Range View for LiDAR-based 3D Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).

[13]

Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. 2013. Vision meets Robotics: The KITTI Dataset. International Journal of Robotics Research (IJRR), Vol. 32, 11 (2013), 1231--1237.

Digital Library

[14]

Albert Gu. 2023. Modeling Sequences with Structured State Spaces. Stanford University.

[15]

Albert Gu and Tri Dao. 2023. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752 (2023).

[16]

David Hilbert. 1935. Dritter Band: Analysis· Grundlagen der Mathematik· Physik Verschiedenes: Nebst Einer Lebensgeschichte. (1935).

[17]

Weizhe Hua, Zihang Dai, Hanxiao Liu, and Quoc Le. 2022. Transformer quality in linear time. In International Conference on Machine Learning (ICML).

[18]

Giseop Kim and Ayoung Kim. 2020. Remove, then revert: Static point cloud map construction using multiresolution range images. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

Digital Library

[19]

Jaeyeul Kim, Jungwan Woo, and Sunghoon Im. 2022. RVMOS: Range-View Moving Object Segmentation Leveraged by Semantic and Motion Features. IEEE Robotics and Automation Letters (RA-L), Vol. 7, 3 (2022), 8044--8051.

[20]

Lingdong Kong, Youquan Liu, Runnan Chen, Yuexin Ma, Xinge Zhu, Yikang Li, Yuenan Hou, Yu Qiao, and Ziwei Liu. 2023. Rethinking range view representation for lidar segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).

[21]

Xin Lai, Jianhui Liu, Li Jiang, Liwei Wang, Hengshuang Zhao, Shu Liu, Xiaojuan Qi, and Jiaya Jia. 2022. Stratified Transformer for 3D Point Cloud Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]

Qipeng Li and Yuan Zhuang. 2023. An efficient image-guided-based 3D point cloud moving object segmentation with transformer-attention in autonomous driving. International Journal of Applied Earth Observation and Geoinformation, Vol. 123 (2023), 103488.

[23]

Qipeng Li, Yuan Zhuang, and Jianzhu Huai. 2023. Multi-sensor fusion for robust localization with moving object segmentation in complex dynamic 3D scenes. International Journal of Applied Earth Observation and Geoinformation (2023).

[24]

Hyungtae Lim, Sungwon Hwang, and Hyun Myung. 2021. ERASOR: Egocentric ratio of pseudo occupancy-based dynamic object removal for static 3D point cloud map building. IEEE Robotics and Automation Letters (RA-L), Vol. 6, 2 (2021), 2272--2279.

[25]

HyungTae Lim, Lucas Nunes, Benedikt Mersch, Xieyunali Chen, Jens Behley, Hyun Myung, and Cyrill Stachniss. 2023. ERASOR2: Instance-aware robust 3D mapping of the static world in dynamic scenes. In Robotics: Science and Systems (RSS).

[26]

Ilya Loshchilov and Frank Hutter. 2018. Decoupled Weight Decay Regularization. In International Conference on Learning Representations (ICLR).

[27]

Benedikt Mersch, Xieyuanli Chen, Ignacio Vizzo, Lucas Nunes, Jens Behley, and Cyrill Stachniss. 2022. Receding Moving Object Segmentation in 3D LiDAR Data Using Sparse 4D Convolutions. IEEE Robotics and Automation Letters (RA-L), Vol. 7, 3 (2022), 7503--7510.

[28]

Benedikt Mersch, Tiziano Guadagnino, Xieyuanli Chen, Ignacio Vizzo, Jens Behley, and Cyrill Stachniss. 2023. Building volumetric beliefs for dynamic environments exploiting map-based moving object segmentation. IEEE Robotics and Automation Letters (RA-L), Vol. 8, 8 (2023), 5180--5187.

[29]

Andres Milioto, Ignacio Vizzo, Jens Behley, and Cyrill Stachniss. 2019. RangeNet : Fast and Accurate LiDAR Semantic Segmentation. In 2019 IEEE/RSJ international conference on intelligent robots and systems (IROS).

Digital Library

[30]

Sambit Mohapatra, Mona Hodaei, Senthil Yogamani, Stefan Milz, Heinrich Gotzig, Martin Simon, Hazem Rashed, and Patrick Maeder. 2022. LiMoSeg: Real-time Bird's Eye View based LiDAR Motion Segmentation. In International Conference on Computer Vision Theory and Applications (VISAPP).

[31]

Guy M Morton. 1966. A computer oriented geodetic data base and a new technique in file sequencing. (1966).

[32]

Patrick Pfreundschuh, Hubertus FC Hendrikx, Victor Reijgwart, Renaud Dubé, Roland Siegwart, and Andrei Cramariuc. 2021. Dynamic Object Aware LiDAR SLAM based on Automatic Generation of Training Data. In 2021 IEEE International Conference on Robotics and Automation (ICRA).

Digital Library

[33]

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI).

[34]

Johannes Schauer and Andreas Nüchter. 2018. The Peopleremover - Removing Dynamic Objects From 3-D Point Cloud Data by Traversing a Voxel Occupancy Grid. IEEE Robotics and Automation Letters (RA-L), Vol. 3, 3 (2018), 1679--1686.

[35]

Lukas Schmid, Olov Andersson, Aurelio Sulser, Patrick Pfreundschuh, and Roland Siegwart. 2023. Dynablox: Real-Time Detection of Diverse Dynamic Objects in Complex Environments. IEEE Robotics and Automation Letters (RA-L), Vol. 8, 10 (2023), 6259--6266.

[36]

Tao Song, Yunhao Liu, Ziying Yao, and Xinkai Wu. 2024. SSF-MOS: Semantic Scene Flow Assisted Moving Object Segmentation for Autonomous Vehicles. IEEE Transactions on Instrumentation and Measurement (TIM), Vol. 73 (2024), 1--12.

[37]

Jiadai Sun, Yuchao Dai, Xianjing Zhang, Jintao Xu, Rui Ai, Weihao Gu, and Xieyuanli Chen. 2022. Efficient Spatial-Temporal Information Fusion for LiDAR-Based 3D Moving Object Segmentation. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[38]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems (NeurIPS).

[39]

Neng Wang, Chenghao Shi, Ruibin Guo, Huimin Lu, Zhiqiang Zheng, and Xieyuanli Chen. 2023. InsMOS: Instance-aware moving object segmentation in LiDAR data. In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[40]

Peng-Shuai Wang. 2023. OctFormer: Octree-based Transformers for 3D Point Clouds. ACM Transactions on Graphics (TOG), Vol. 42, 4 (2023), 1--11.

Digital Library

[41]

Xiaoyang Wu, Li Jiang, Peng-Shuai Wang, Zhijian Liu, Xihui Liu, Yu Qiao, Wanli Ouyang, Tong He, and Hengshuang Zhao. 2024. Point Transformer V3: Simpler, Faster, Stronger. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]

Yu-Qi Yang, Yu-Xiao Guo, Jian-Yu Xiong, Yang Liu, Hao Pan, Peng-Shuai Wang, Xin Tong, and Baining Guo. 2023. Swin3D: A Pretrained Transformer Backbone for 3D Indoor Scene Understanding. arXiv preprint arXiv:2304.06906 (2023).

[43]

Yang Zhang, Zixiang Zhou, Philip David, Xiangyu Yue, Zerong Xi, Boqing Gong, and Hassan Foroosh. 2020. PolarNet: An Improved Grid Representation for Online LiDAR Point Clouds Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]

Bo Zhou, Jiapeng Xie, Yan Pan, Jiajie Wu, and Chuanzhao Lu. 2023. MotionBEV: Attention-Aware Online LiDAR Moving Object Segmentation With Bird's Eye View Based Appearance and Motion Features. IEEE Robotics and Automation Letters (RA-L), Vol. 8, 12 (2023), 8074--8081.

Index Terms

MambaMOS: LiDAR-based 3D Moving Object Segmentation with Motion-aware State Space Model
1. Computing methodologies
  1. Artificial intelligence

Recommendations

MambaTrack: A Simple Baseline for Multiple Object Tracking with State Space Model
MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

Tracking by detection has been the prevailing paradigm in the field of Multi-object Tracking (MOT). These methods typically rely on the Kalman Filter to estimate the future locations of objects, assuming linear object motion. However, they fall short ...
A novel moving object segmentation framework utilizing camera motion recognition for H.264 compressed videos

Segmentation framework catering static, moving, combination of static/moving camera.Motion vector magnitude feature utilized for recognizing still or moving camera.Coarse segmentation by decomposing block motion vectors into wavelet sub-...
Unsupervised mesh based segmentation of moving objects
AREA '08: Proceedings of the 1st ACM workshop on Analysis and retrieval of events/actions and workflows in video streams

Multimedia analysis usually deals with a large amount of video data with a significant number of moving objects. Often it is necessary to reduce the amount of data and to represent the video in terms of moving objects and events. Event analysis can be ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

October 2024

11719 pages

ISBN:9798400706868

DOI:10.1145/3664647

General Chairs:
Jianfei Cai
Monash University, Australia
,
Mohan Kankanhalli
NUS, Singapore
,
Balakrishnan Prabhakaran
UT Dallas, USA
,
Susanne Boll
University of Oldenburg, Germany
,
Program Chairs:
Ramanathan Subramanian
University of Canberra & IIT Ropar, Australia
,
Liang Zheng
Australian National University, Australia
,
Vivek K. Singh
Rutgers University, USA
,
Pablo Cesar
Centrum Wiskunde & Informatica, Netherlands
,
Lexing Xie
Australian National University, Australia
,
Dong Xu
University of Hong Kong, Hong Kong

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China

Conference

MM '24

Sponsor:

SIGMM

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne VIC, Australia

Acceptance Rates

MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
59
Total Downloads

Downloads (Last 12 months)59
Downloads (Last 6 weeks)21

Reflects downloads up to 31 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten