skip to main content
10.1145/3489849.3489882acmconferencesArticle/Chapter ViewAbstractPublication PagesvrstConference Proceedingsconference-collections
research-article

Fusing Semantic Segmentation and Object Detection for Visual SLAM in Dynamic Scenes

Published: 08 December 2021 Publication History

Abstract

The assumption of static scenes limits the performance of traditional visual SLAM. Many existing solutions adopt deep learning methods or geometric constraints to solve the problem of dynamic scenes, but these schemes are either low efficiency or lack of robustness to a certain extent. In this paper, we propose a solution combining object detection and semantic segmentation to obtain the prior contours of potential dynamic objects. With this prior information, geometric constraints techniques are utilized to assist with removing dynamic feature points. Finally, the evaluation with the public datasets demonstrates that our proposed method can improve the accuracy of pose estimation and robustness of visual SLAM with no efficiency loss in high dynamic scenarios.

References

[1]
Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. 2017. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 12 (Dec. 2017), 2481–2495. https://doi.org/10.1109/TPAMI.2016.2644615
[2]
Berta Bescos, José M. Fácil, Javier Civera, and José Neira. 2018. DynaSLAM: Tracking, Mapping and Inpainting in Dynamic Scenes. IEEE Robotics and Automation Letters 3, 4 (Oct. 2018), 4076–4083. https://doi.org/10.1109/LRA.2018.2860039 arxiv:1806.05620
[3]
Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. 2020. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv:2004.10934 [cs, eess] (April 2020). arxiv:2004.10934 [cs, eess]
[4]
Nikolas Brasch, Aljaz Bozic, Joe Lallemand, and Federico Tombari. 2018. Semantic Monocular SLAM for Highly Dynamic Environments. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, Madrid, 393–400. https://doi.org/10.1109/IROS.2018.8593828
[5]
Jakob Engel, Thomas Schöps, and Daniel Cremers. 2014. LSD-SLAM: Large-Scale Direct Monocular SLAM. In Computer Vision – ECCV 2014, David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars (Eds.). Vol. 8690. Springer International Publishing, Cham, 834–849. https://doi.org/10.1007/978-3-319-10605-2_54
[6]
Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. 2010. The Pascal Visual Object Classes (VOC) Challenge. International Journal of Computer Vision 88, 2 (June 2010), 303–338. https://doi.org/10.1007/s11263-009-0275-4
[7]
Gumin Jin, Xingjun Zhong, Shaoqing Fang, Xiangyu Deng, and Jianxun Li. 2019. Keyframe-Based Dynamic Elimination SLAM System Using YOLO Detection. In Intelligent Robotics and Applications, Haibin Yu, Jinguo Liu, Lianqing Liu, Zhaojie Ju, Yuwang Liu, and Dalin Zhou (Eds.). Vol. 11743. Springer International Publishing, Cham, 697–705. https://doi.org/10.1007/978-3-030-27538-9_60
[8]
Masaya Kaneko, Kazuya Iwami, Torn Ogawa, Toshihiko Yamasaki, and Kiyoharu Aizawa. 2018. Mask-SLAM: Robust Feature-Based Monocular SLAM by Masking Using Semantic Segmentation. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, Salt Lake City, UT, USA, 371–3718. https://doi.org/10.1109/CVPRW.2018.00063
[9]
Georg Klein and David Murray. 2007. Parallel Tracking and Mapping for Small AR Workspaces. In 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality. IEEE, Nara, Japan, 1–10. https://doi.org/10.1109/ISMAR.2007.4538852
[10]
Abhijit Kundu, K Madhava Krishna, and Jayanthi Sivaswamy. 2009. Moving Object Detection by Multi-View Geometric Techniques from a Single Camera Mounted Robot. In 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, St. Louis, MO, USA, 4306–4312. https://doi.org/10.1109/IROS.2009.5354227
[11]
Davide Migliore, Roberto Rigamonti, Daniele Marzorati, Matteo Matteucci, and Domenico G Sorrenti. 2009. Use a Single Camera for Simultaneous Localization And Mapping with Mobile Object Tracking in Dynamic Environments. In Proceedings of the ICRA Workshop on Safe Navigation in Open and Dynamic Environments: Application to Autonomous Vehicles. 12–17.
[12]
Raul Mur-Artal, J. M. M. Montiel, and Juan D. Tardos. 2015. ORB-SLAM: A Versatile and Accurate Monocular SLAM System. IEEE Transactions on Robotics 31, 5 (Oct. 2015), 1147–1163. https://doi.org/10.1109/TRO.2015.2463671
[13]
Raul Mur-Artal and Juan D. Tardos. 2017. ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras. IEEE Transactions on Robotics 33, 5 (Oct. 2017), 1255–1262. https://doi.org/10.1109/TRO.2017.2705103 arxiv:1610.06475
[14]
Richard A. Newcombe, Steven J. Lovegrove, and Andrew J. Davison. 2011. DTAM: Dense Tracking and Mapping in Real-Time. In 2011 International Conference on Computer Vision. IEEE, Barcelona, Spain, 2320–2327. https://doi.org/10.1109/ICCV.2011.6126513
[15]
Joseph Redmon and Ali Farhadi. 2018. YOLOv3: An Incremental Improvement. arXiv:1804.02767 [cs] (April 2018). arxiv:1804.02767 [cs]
[16]
Muhamad Risqi U. Saputra, Andrew Markham, and Niki Trigoni. 2018. Visual SLAM and Structure from Motion in Dynamic Environments: A Survey. Comput. Surveys 51, 2 (June 2018), 1–36. https://doi.org/10.1145/3177853
[17]
Jan Stühmer, Stefan Gumhold, and Daniel Cremers. 2010. Real-Time Dense Geometry from a Handheld Camera. In Pattern Recognition. Vol. 6376. Springer Berlin Heidelberg, Berlin, Heidelberg, 11–20. https://doi.org/10.1007/978-3-642-15986-2_2
[18]
Jrgen Sturm, Nikolas Engelhard, Felix Endres, Wolfram Burgard, and Daniel Cremers. 2012. A Benchmark for the Evaluation of RGB-D SLAM Systems. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, Vilamoura-Algarve, Portugal, 573–580. https://doi.org/10.1109/IROS.2012.6385773
[19]
Zemin Wang, Qian Zhang, Jiansheng Li, Shuming Zhang, and Jingbin Liu. 2019. A Computationally Efficient Semantic SLAM Solution for Dynamic Scenes. Remote Sensing 11, 11 (June 2019), 1363. https://doi.org/10.3390/rs11111363
[20]
Wei Tan, Haomin Liu, Zilong Dong, Guofeng Zhang, and Hujun Bao. 2013. Robust Monocular SLAM in Dynamic Environments. In 2013 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). IEEE, Adelaide, Australia, 209–218. https://doi.org/10.1109/ISMAR.2013.6671781
[21]
Linhui Xiao, Jinge Wang, Xiaosong Qiu, Zheng Rong, and Xudong Zou. 2019. Dynamic-SLAM: Semantic Monocular Visual Localization and Mapping Based on Deep Learning in Dynamic Environment. Robotics and Autonomous Systems 117 (July 2019), 1–16. https://doi.org/10.1016/j.robot.2019.03.012
[22]
Shiqiang Yang, Guohao Fan, Lele Bai, Rui Li, and Dexin Li. 2020. MGC-VSLAM: A Meshing-Based and Geometric Constraint VSLAM for Dynamic Indoor Environments. IEEE Access 8(2020), 81007–81021. https://doi.org/10.1109/ACCESS.2020.2990890
[23]
Chao Yu, Zuxin Liu, Xin-Jun Liu, Fugui Xie, Yi Yang, Qi Wei, and Qiao Fei. 2018. DS-SLAM: A Semantic Visual SLAM towards Dynamic Environments. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, Madrid, 1168–1174. https://doi.org/10.1109/IROS.2018.8593691

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
VRST '21: Proceedings of the 27th ACM Symposium on Virtual Reality Software and Technology
December 2021
563 pages
ISBN:9781450390927
DOI:10.1145/3489849
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 December 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. objection detection
  2. pose estimation
  3. semantic segmentation
  4. visual simultaneous and mapping

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

VRST '21

Acceptance Rates

Overall Acceptance Rate 66 of 254 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)34
  • Downloads (Last 6 weeks)3
Reflects downloads up to 07 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media