VINS-MKF: A Tightly-Coupled Multi-Keyframe Visual-Inertial Odometry for Accurate and Robust State Estimation
Abstract
:1. Introduction.
- For higher accurate and robust VO state estimation, a hyper graph structure based nonlinear optimization was formulated, which characterized by multi-keyframe, tightly-coupled and visual-inertial combination.
- To estimate the state of mobile robot efficiently, a novel VO state estimation framework was proposed, in which a GPU based feature extraction thread was parallelized with tracking and mapping thread.
- To further ensure the precision of state estimation, a novel MultiCol-IMU camera model and an accurate initialization method with a hardware synchronization mechanism and self-calibration method were presented.
- The performance of the VINS-MKF was validated by tremendous experiments on home-made datasets, and the improved accuracy and robustness was demonstrated by comparing against the state-of-the-art VINS-Mono algorithm.
- To the best of our knowledge, the proposed VINS-MKF is the first tightly-coupled multi-keyframe visual-inertial odometry based on monocular ORBSLAM, modified with multiple fisheye cameras alongside an inertial measurement unit (IMU).
2. Related Work
3. Multi-Keyframe Tightly-Coupled Visual-Inertial Nonlinear Optimization
3.1. MultiCol-IMU Camera Model and Structure
3.1.1. Camera Model for Single Camera
3.1.2. MultiCol-IMU Camera Model
3.2. IMU Pre-Integration
3.3. Derivation of the Proposed Nonlinear Optimization
3.3.1. The States
3.3.2. The Measurements
3.3.3. Derivation of the Nonlinear Optimization
3.4. The Solution to the Nonlinear Optimization
3.4.1. Multi-Keyframe Reprojection Error Term
3.4.2. IMU Error Term
3.4.3. The Solution to the Nonlinear Optimization
4. Multi-Keyframe Tightly-Coupled Visual-Inertial Odometry
4.1. Visual Inertial Initialization
4.1.1. Initialization Pre-Requirements
Algorithm 1. Adaptive selection of the main camera. |
th10 th20 for c < numcams do if inliers[c] > th1 && translational[c] > th2 then mainCamc th1inliers[c] th2translational[c] end if end for |
Algorithm 2. The external parameters calibration terminate criterion. |
Input: the local period t, time increment ∆t, convergence time T, the standard deviations , , , , of the six axes (yaw, pitch, roll, x, y, z), threshold value ∆. Output: the external parameters R, P Process: T 0; while T < 60 do update (R, P) with the most recent convergence value if t then if < ∆, < ∆, < ∆ < ∆, < ∆, < ∆ then break; end if end if T ← T + ∆t; end |
4.1.2. Multi-Keyframe SFM
4.1.3. IMU Inertial Initialization
4.2. GPU Based Feature Extraction
4.3. Tracking
4.3.1. The Introduction of the Hyper Graph
4.3.2. Different Initial MF Pose Prediction Method
4.3.3. The Different Co-Visibility Graph and the Motion-Only BA Optimization Method
- Assuming no map update, we performed the nonlinear optimization as Equation (11).
- Assuming that the map updated, the nonlinear optimization in Equation (11) changed to: F
4.3.4. Additional Criterion for Spawning a New MKF
4.4. Local Mapping
4.4.1. Improved Double Window Structure
4.4.2. Different MKF Deletion Criteria
- The two consecutive MKFs in the local temporal window differ by more than 0.5 s, the reason is that the longer the temporal difference between consecutive MKFs, the weaker the information IMU provides.
- Any two consecutive MKFs differing less than 3 s, the reason is that we needed to perform full BA and to refine a map at any time. If we switched off full BA with IMU constraints, we would only need to restrict the temporal offset between keyframes in the local spatial window.
5. Experiments
5.1. Experiment 1: Impact of Multiple Cameras
5.1.1. Experiment 1 on Home-Made Datasets
5.1.2. Experiment 1 on Corridor Environment
5.2. Experiment 2: Impact of Tightly-Coupled IMU
5.3. Efficiency Evaluation
5.4. Comparison between the Proposed Vins-MKF and Vins-Mono
6. Conclusions and Future Work
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
Appendix A
References
- Scaramuzza, D.; Fraundorfer, F. Visual Odometry Part I: The First 30 Years and Fundamentals. IEEE Robot. Autom Mag. 2011, 18, 80–92. [Google Scholar] [CrossRef]
- Nistér, D.; Naroditsky, O.; Bergen, J. Visual odometry. In Proceedings of the 2004 IEEE International Conference on Computer Vision and Pattern Recognition(CVPR), Washington, DC, USA, 27 June–2 July 2004; pp. 652–659. [Google Scholar]
- Forster, C.; Pizzoli, M.; Scaramuzza, D. SVO: Fast Semi-Direct Monocular Visual Odometry. In Proceedings of the 2014 IEEE International Conference on Robotics and Automation(ICRA), Hong Kong, China, 31 May–7 June 2014; pp. 15–22. [Google Scholar]
- Singh, A. Monocular Visual Odometry. Undergraduate Project 2 IITKanpur. 24 April 2015. Available online: http://avisingh599.github.io/assets/ugp2-report.pdf (accessed on 30 March 2017).
- Song, S.; Chandraker, M.; Guest, C. Parallel, real-time monocular visual odometry. In Proceedings of the 2013 IEEE International Conference on Robotics and Automation (ICRA), Karlsruhe, Germany, 6–10 May 2013; pp. 4698–4705. [Google Scholar]
- Olson, C.F.; Matthies, L.H.; Schoppers, M.; Maimone, M.W. Robust stereo ego-motion for long distance navigation. In Proceedings of the 2000 IEEE International Conference on Computer Vision and Pattern Recognition(CVPR), Hilton Head Island, SC, USA, 15 June 2000; pp. 453–458. [Google Scholar]
- Witt, J.; Weltin, U. Robust stereo visual odometry using iterative closest multiple lines. In Proceedings of the 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Tokyo, Japan, 3–8 November 2013; pp. 4164–4171. [Google Scholar]
- Wang, R.; Schwörer, M.; Cremers, D. Stereo dso: Large-scale direct sparse visual odometry with stereo cameras. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 3923–3931. [Google Scholar]
- Valiente, D.; Gil, A.; Reinoso, O.; Julia, M.; Holloway, M. Improved Omnidirectional Odometry for a View-Based Mapping Approach. Sensors 2017, 17, 325. [Google Scholar] [CrossRef] [PubMed]
- Ouerghi, S.; Boutteau, R.; Savatier, X.; Thai, F. Visual Odometry and Place Recognition Fusion for Vehicle Position Tracking in Urban Environments. Sensors 2018, 18, 939. [Google Scholar] [CrossRef] [PubMed]
- Leutenegger, S.; Lynen, S.; Bosse, M.; Siegwart, R.; Furgale, P. Keyframe-based visual-inertial odometry using nonlinear optimization. Int. J. Robot. Res. 2015, 34, 314–334. [Google Scholar] [CrossRef]
- Frahm, J.M.; Koser, K.; Koch, R. Pose estimation for multi-camera systems. In Proceedings of the 2004 Annual Pattern Recognition of the German-Association-for-Pattern-Recognition, Orlando, FL, USA, 15–19 May 2006; pp. 286–293. [Google Scholar]
- Zhao, C.; Fan, B.; Hu, J.; Tian, L.; Zhang, Z.; Li, S.; Pan, Q. Pose estimation for multi-camera systems. In Proceedings of the 2017 IEEE International Conference on Unmanned Systems (ICUS), Beijing, China, 27–29 October 2017; pp. 533–538. [Google Scholar]
- Heng, L.; Lee, G.H.; Pollefeys, M. Self-calibration and visual SLAM with a multi-camera system on a micro aerial vehicle. Auton. Robots 2015, 39, 259–277. [Google Scholar] [CrossRef] [Green Version]
- Pless, R. Using many cameras as one. In Proceedings of the 2003 IEEE International Conference on Computer Vision and Pattern Recognition(CVPR), Madison, WI, USA, 18–20 June 2003; pp. 587–593. [Google Scholar]
- Harmat, A.; Trentini, M.; Sharf, I. Multi-Camera Tracking and Mapping for Unmanned Aerial Vehicles in Unstructured Environments. J. Intell. Robot. Syst. 2015, 78, 291–317. [Google Scholar] [CrossRef]
- Yang, S.; Scherer, S.A.; Yi, X.; Zell, A. Multi-camera visual SLAM for autonomous navigation of micro aerial vehicles. Robot. Autom. Syst. 2017, 93, 116–134. [Google Scholar] [CrossRef]
- Urban, S.; Hinz, S. MultiCol-SLAM—A Modular Real-Time Multi-Camera SLAM System. arXiv, 2016; arXiv:1610.07336. [Google Scholar]
- Corke, P.; Lobo, J.; Dias, J. An introduction to inertial and visual sensing. Int. J. Robot. Res. 2007, 26, 519–535. [Google Scholar] [CrossRef]
- Li, M.; Mourikis, A.I. High-precision, consistent EKF-based visual-inertial odometry. Int. J. Robot. Res. 2013, 32, 690–711. [Google Scholar] [CrossRef]
- Weiss, S.; Siegwart, R.Y. Real-Time Metric State Estimation for Modular Vision-Inertial Systems. In Proceedings of the 2011 IEEE International Conference on Robotics and Automation(ICRA), Shanghai, China, 9–13 May 2011; pp. 4531–4537. [Google Scholar]
- Shen, S.; Michael, N.; Kumar, V. Tightly-Coupled Monocular Visual-Inertial Fusion for Autonomous Flight of Rotorcraft MAVs. In Proceedings of the 2015 IEEE International Conference on Robotics and Automation(ICRA), Seattle, WA, USA, 26–30 May 2015; pp. 5303–5310. [Google Scholar]
- Forster, C.; Carlone, L.; Dellaert, F.; Scaramuzza, D. On-Manifold Preintegration for Real-Time Visual--Inertial Odometry. IEEE Trans. Robot. 2017, 33, 1–21. [Google Scholar] [CrossRef]
- Mur-Artal, R.; Tardos, J.D. Visual-Inertial Monocular SLAM With Map Reuse. IEEE Robot. Autom. Lett. 2017, 2, 796–803. [Google Scholar] [CrossRef] [Green Version]
- Qin, T.; Li, P.; Shen, S. VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State Estimator. IEEE Trans. Robot. 2018, 34, 1004–1020. [Google Scholar] [CrossRef]
- Mur-Artal, R.; Montiel, J.M.M.; Tardos, J.D. ORB-SLAM: A Versatile and Accurate Monocular SLAM System. IEEE Trans. Robot. 2015, 31, 1147–1163. [Google Scholar] [CrossRef] [Green Version]
- Taketomi, T.; Uchiyama, H.; Ikeda, S. Visual SLAM algorithms: A survey from 2010 to 2016. IPSJ Trans. Comput. Vis. Appl. 2017, 9, 16. [Google Scholar] [CrossRef]
- Davison, A.J.; Reid, I.D.; Molton, N.D.; Stasse, O. MonoSLAM: Real-time single camera SLAM. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 1052–1067. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Schmidt, A. Incorporating Static Environment Elements into the EKF-Based Visual SLAM. In Man–Machine Interactions 4; Springer: Cham, Switzerland, 2016; Volume 391, pp. 161–168. [Google Scholar]
- Civera, J.; Grasa, O.G.; Davison, A.J.; Montiel, J.M.M. 1-Point RANSAC for Extended Kalman Filtering: Application to Real-Time Structure from Motion and Visual Odometry. J. Field Robot. 2010, 27, 609–631. [Google Scholar] [CrossRef]
- Davison, A.J. Real-time simultaneous localisation and mapping with a single camera. In Proceedings of the 2003 IEEE International Conference on Computer Vision, Nice, France, 13–16 October 2003; pp. 1403–1410. [Google Scholar]
- Klein, G.; Murray, D. Parallel tracking and mapping for small AR workspaces. In Proceedings of the 2007 IEEE and ACM International Symposium on Mixed and Augmented Reality(ISMAR), Nara, Japan, 13–16 November 2007; pp. 250–259. [Google Scholar]
- Engel, J.; Schoeps, T.; Cremers, D. LSD-SLAM: Large-Scale Direct Monocular SLAM. In Proceedings of the 2014 European Conference on Computer Vision (ECCV), Zurich, Switzerland, 6–12 September 2014; pp. 834–849. [Google Scholar]
- Engel, J.; Koltun, V.; Cremers, D. Direct Sparse Odometry. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 611–625. [Google Scholar] [CrossRef] [PubMed]
- Strasdat, H.; Davison, A.J.; Montiel, J.M.M.; Konolige, K. Double Window Optimisation for Constant Time Visual SLAM. In Proceedings of the 2011 IEEE International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2352–2359. [Google Scholar]
- Valiente, D.; Paya, L.; Jimenez, L.M.; Sebastian, J.M.; Reinoso, O. Visual Information Fusion through Bayesian Inference for Adaptive Probability-Oriented Feature Matching. Sensors 2018, 18, 2041. [Google Scholar] [CrossRef] [PubMed]
- Strasdat, H.; Montiel, J.M.M.; Davison, A.J. Real-time Monocular SLAM: Why Filter? In Proceedings of the 2010 IEEE International Conference on Robotics and Automation, Anchorage, AK, USA, 3–8 May 2010; pp. 2657–2664. [Google Scholar]
- Lee, G.H.; Li, B.; Pollefeys, M.; Fraundorfer, F. Minimal solutions for the multi-camera pose estimation problem. Int. J. Robot. Res. 2015, 34, 837–848. [Google Scholar] [CrossRef] [Green Version]
- Harmat, A.; Sharf, I.; Trentini, M. Parallel tracking and mapping with multiple cameras on an unmanned aerial vehicle. In Proceedings of the 2012 IEEE International Conference on Intelligent Robotics and Applications, Montreal, QC, Canada, 3–5 October 2012; pp. 421–432. [Google Scholar]
- Zou, D.; Tan, P. CoSLAM: Collaborative Visual SLAM in Dynamic Environments. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 354–366. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Brueckner, M.; Bajramovic, F.; Denzler, J. Intrinsic and extrinsic active self-calibration of multi-camera systems. Mach. Vision Appl. 2014, 25, 389–403. [Google Scholar] [CrossRef]
- Lynen, S.; Achtelik, M.W.; Weiss, S.; Chli, M.; Siegwart, R. A Robust and Modular Multi-Sensor Fusion Approach Applied to MAV Navigation. In Proceedings of the 2013 IEEE International Conference on Intelligent Robots and Systems, Tokyo, Japan, 3–8 November 2013; pp. 3923–3929. [Google Scholar]
- Qin, T.; Shen, S. Robust Initialization of Monocular Visual-Inertial Estimation on Aerial Robots. In Proceedings of the 2017 IEEE International Conference on Intelligent Robots and Systems, Vancouver, BC, Canada, 24–28 September 2017; pp. 4225–4232. [Google Scholar]
- Houben, S.; Quenzel, J.; Krombach, N.; Behnke, S. Efficient multi-camera visual-inertial SLAM for micro aerial vehicles. In Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Korea, 9–14 October 2016; pp. 1616–1622. [Google Scholar]
- Thrun, S.; Burgard, W.; Fox, D. Probabilistic Robotics; MIT press: Cambridge, UK, 2005. [Google Scholar]
- Lupton, T.; Sukkarieh, S. Visual-Inertial-Aided Navigation for High-Dynamic Motion in Built Environments Without Initial Conditions. IEEE Trans. Robot. 2012, 28, 61–76. [Google Scholar] [CrossRef]
- Forster, C.; Carlone, L.; Dellaert, F.; Scaramuzza, D. IMU preintegration on manifold for efficient visual-inertial maximum-a-posteriori estimation. In Proceedings of the 2015 Robotics: Science and Systems, Rome, Italy, 13–17 July 2015. [Google Scholar]
- Moré, J.J. The Levenberg-Marquardt algorithm: Implementation and theory. In Numerical Analysis; Springer: Heidelberg, Germany, 1978; Volume 630, pp. 105–116. [Google Scholar]
- Nikolic, J.; Rehder, J.; Burri, M.; Gohl, P.; Leutenegger, S.; Furgale, P.T.; Siegwart, R. A synchronized visual-inertial sensor system with FPGA pre-processing for accurate real-time SLAM. In Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China, 31 May–7 June 2014; pp. 431–437. [Google Scholar]
- Furgale, P.; Rehder, J.; Siegwart, R. Unified Temporal and Spatial Calibration for Multi-Sensor Systems. In Proceedings of the 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, Tokyo, Japan, 3–8 November 2013; pp. 1280–1286. [Google Scholar]
- Yang, Z.; Shen, S. Monocular Visual-Inertial State Estimation With Online Initialization and Camera-IMU Extrinsic Calibration. IEEE Trans. Autom. Sci. Eng. 2017, 14, 39–51. [Google Scholar] [CrossRef]
- Kneip, L.; Furgale, P.; Siegwart, R. Using Multi-Camera Systems in Robotics: Efficient Solutions to the NPnP Problem. In Proceedings of the 2013 IEEE International Conference on Robotics and Automation, Karlsruhe, Germany, 6–10 May 2013; pp. 3770–3776. [Google Scholar]
- Sturm, J.; Engelhard, N.; Endres, F.; Burgard, W.; Cremers, D. Benchmark for the Evaluation of RGB-D SLAM Systems. In Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Algarve, Portugal, 7–12 October2012; pp. 573–580. [Google Scholar]
Datasets | Shape | Texture-Less | Over Exposure | Pedestrians | Aggressive Motion |
---|---|---|---|---|---|
static | squ 1 | √ | √ | ||
arb 1 | √ | √ | |||
dyn 1_1 | squ | √ | √ | √ | |
arb | √ | √ | √ | ||
dyn_2 | squ | √ | √ | √ | √ |
arb | √ | √ | √ | √ |
Datasets | Algorithms | RMSE/GT Scale | ATE | ||
---|---|---|---|---|---|
Mean | Median | Std | |||
static_squ | VO-Mono | 0.084 | 0.068 | 0.055 | 0.050 |
VO-Stereo | 0.082 | 0.0736 | 0.072 | 0.038 | |
VO-MKF | 0.072 | 0.066 | 0.065 | 0.029 | |
static_arb | VO-Mono | 0.079 | 0.067 | 0.065 | 0.043 |
VO-Stereo | 0.085 | 0.081 | 0.077 | 0.026 | |
VO-MKF | 0.043 | 0.040 | 0.038 | 0.014 | |
dyn_1_squ | VO-Mono | × | × | × | × |
VO-Stereo | × | × | × | × | |
VO-MKF | 0.088 | 0.081 | 0.072 | 0.035 | |
dyn_1_arb | VO-Mono | × | × | × | × |
VO-Stereo | × | × | × | × | |
VO-MKF | 0.068 | 0.061 | 0.054 | 0.029 |
Datasets | Algorithms | RMSE/GT Scale | ATE | ||
---|---|---|---|---|---|
Mean | Median | Std | |||
dyn_1_squ | VO-MKF | 0.088 | 0.081 | 0.072 | 0.035 |
VINS-MKF | 0.048 | 0.041 | 0.035 | 0.026 | |
dyn_1_arb | VO-MKF | 0.068 | 0.062 | 0.057 | 0.029 |
VINS-MKF | 0.041 | 0.034 | 0.032 | 0.022 | |
dyn_2_squ | VO-MKF VINS-MKF | 0.138 0.0395 | 0.107 0.035 | 0.072 0.032 | 0.087 0.018 |
dyn_2_arb | VO-MKF | 0.490 | 0.433 | 0.460 | 0.228 |
VINS-MKF | 0.057 | 0.052 | 0.049 | 0.024 |
Conditions | Feature Extraction | Tracking | Mapping | Total Time | |
---|---|---|---|---|---|
Origin 2_CPU 2 | 18.2 | 17.3 | 31.3 | 37.1 | |
Parallel 2_CPU | 18.6 | 19.7 | 31 | 26.3 | |
Origin_GPU 2 | 11 | 17.7 | 30.3 | 30.4 | |
Parallel_GPU | 11.1 | 17.6 | 30.7 | 23.7 |
Conditions | Feature Extraction | Tracking | Mapping | Total Time |
---|---|---|---|---|
Origin_CPU | 18.6 | 15.4 | 39.4 | 35.7 |
Parallel_CPU | 18.9 | 20 | 38.2 | 26.8 |
Origin_GPU | 10.9 | 15.9 | 38.9 | 28.4 |
Parallel_GPU | 11.1 | 18.2 | 37.6 | 24.8 |
Datasets | Algorithms | RMSE/GT Scale | ATE | ||
---|---|---|---|---|---|
Mean | Median | Std | |||
dyn_1_squ | VINS-Mono | 0.122 | 0.100 | 0.077 | 0.071 |
VINS-MKF | 0.048 | 0.041 | 0.035 | 0.026 | |
dyn_1_arb | VINS-Mono | 0.116 | 0.094 | 0.065 | 0.070 |
VINS-MKF | 0.041 | 0.034 | 0.032 | 0.022 | |
dyn_2_squ | VINS-Mono | 0.149 | 0.115 | 0.106 | 0.080 |
VINS-MKF | 0.039 | 0.035 | 0.032 | 0.018 | |
dyn_2_arb | VINS-Mono | 0.232 | 0.210 | 0.222 | 0.106 |
VINS-MKF | 0.057 | 0.052 | 0.049 | 0.024 |
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, C.; Liu, Y.; Wang, F.; Xia, Y.; Zhang, W. VINS-MKF: A Tightly-Coupled Multi-Keyframe Visual-Inertial Odometry for Accurate and Robust State Estimation. Sensors 2018, 18, 4036. https://doi.org/10.3390/s18114036
Zhang C, Liu Y, Wang F, Xia Y, Zhang W. VINS-MKF: A Tightly-Coupled Multi-Keyframe Visual-Inertial Odometry for Accurate and Robust State Estimation. Sensors. 2018; 18(11):4036. https://doi.org/10.3390/s18114036
Chicago/Turabian StyleZhang, Chaofan, Yong Liu, Fan Wang, Yingwei Xia, and Wen Zhang. 2018. "VINS-MKF: A Tightly-Coupled Multi-Keyframe Visual-Inertial Odometry for Accurate and Robust State Estimation" Sensors 18, no. 11: 4036. https://doi.org/10.3390/s18114036
APA StyleZhang, C., Liu, Y., Wang, F., Xia, Y., & Zhang, W. (2018). VINS-MKF: A Tightly-Coupled Multi-Keyframe Visual-Inertial Odometry for Accurate and Robust State Estimation. Sensors, 18(11), 4036. https://doi.org/10.3390/s18114036