Driven by Vision: Learning Navigation by Visual Localization and Trajectory Prediction
Abstract
:1. Introduction
1.1. Differences between Visual-Based and GPS-Based Localization and Navigation
- Visual-based navigation systems, using deep convolutional nets, could respond faster than GPS-based navigation systems since deep nets offer a local, immediate and direct mapping between visual input and the desired localization and navigation output, without needing external signal from satellites and various, often expensive, search and other algorithmic procedures. It is well known that current GPS-based commercial navigation applications can have a significant delay (often up to a second), which could make driving very difficult and even unsafe. It is not unusual for a regular driver to miss, for example, an intersection due to such a delay.
- Visual-based systems could also be more accurate than GPS-based systems, since vision, which is based on local sensing information, can offer a vehicle pose (position and orientation) that is better aligned with the real scene. At rest or very low speeds it is well known that GPS cannot offer such accurate orientation.
- GPS signal is not always available or could be erroneous, especially in cluttered urban or natural areas, with many tall buildings, trees and other large structures, or under bridges, tunnels and structures that could occlude the sky. GPS could in fact fail for many other reasons, in which case a visual based navigation system could take its place.
- In general, we could think of GPS-based and vision-based navigation as having complementary properties such that in practice they could work together, benefiting from each other’s advantages. While GPS is robust to weather conditions and traffic and does not need prior learning, the visual system could be faster, sometimes even more accurate (especially in areas cluttered by tall structures) and it could also estimate orientation correctly even in cases of zero or very low speed. Together, GPS and visual-based localization and navigation, could form a more robust and more accurate system. This hybrid combination between GPS and vision is definitely worth becoming the subject of future research. In this paper we focus on vision only, in order to better understand its capabilities and limitations.
1.2. Visual-Based Localization
1.3. Visual Navigation with Location Information and Trajectory Prediction
1.4. Robust Visual Solutions in Changing Conditions
2. Motivation and System Overview
Main Contributions
- We introduce, to our best knowledge, the first deep-learning-based system that simultaneously learns to self-locate and to navigate towards a planned destination from visual information only. This is an important ability and complementary to the case when GPS information is available. Due to the loss or inaccuracies in GPS signal, often met in practice, the capacity to accurately locate using visual information can significantly improve the performance and robustness of current GPS-based methods.
- The system is highly scalable at minimal costs and can be easily deployed to learn over an entire city or other geographical region by having it used by many drivers simultaneously. We also introduce the Urban European Driving Dataset (UED), which we make available along with our code and user-friendly application, which we developed for both data collection and real-time usage.
- We present competitive numerical results, improving over strong baselines and state-of-the-art methods.
- Other contributions: (1) we extend and improve a previous visual localization by image segmentation model and adapt it to learn to localize accurately in challenging traffic conditions; (2) we output trajectories, functions of space vs. time, which comprise steering and speed information for up to seven seconds in the future; (3) the map is created analytically and automatically from the collected GPS data, with no human intervention; (4) we make the localization and navigation components robust to problem-specific noise.
3. Creating the Urban European Driving (UED) Dataset and the Map
3.1. Exploring the Urban European Dataset (UED)
3.2. Polynomial-Based Analytical Techniques for Modeling Paths, Trajectories and Creating the Map
3.3. Analytical Map Representation Using Polynomial Path Fitting
- Place each geo-coordinate sample into its segment bucket together with its distance to the start of the segment. The distance to the segment start is 0 when entering the segment. Then for each following sample point, its associated distance is the previous point distance summed with the euclidean distance between the two samples.
- Fit polynomial functions of distance for each segment given the points and the corresponding distances in its bucket, using the method presented in Section 3.2, but using and instead of and , where d is the known distance to the start of the segment. The degree of polynomials is directly proportional to the length of the modeled segments but significantly smaller than the number of points.
- Using the analytical model, we sample points at 1 m distance interval along each segment from to the segment’s end. After this step, some pairs of segments (consisting of the sampled points) will not connect smoothly, having small gaps in between. To tackle this, for each segment we refit its polynomial function, by considering the ending points of its neighboring map segments, and then sample from to ( is a small distance buffer) to obtain a final smooth map representation.
4. LOVis: Learning to Localize from Vision
4.1. Localization by Regression
4.2. Localization by an Image Segmentation Approach
4.3. A Deeper Look into the Localization by Segmentation Model
5. NAVis: Learning to Navigate from Vision
6. Experimental Analysis
6.1. Methodology of Experimental Analysis
6.2. Experimental Results on Localization
6.3. Experimental Results on Navigation
6.4. Experimental Results on Changing Environmental Conditions
7. Open Research Questions and Future Work
- Probably the first research question that should be addressed next is how to best combine the visual localization and navigation approach with the more traditional GPS-based one? While the GPS-based navigation is more robust and works in almost any driving condition, it also has limitations and can often fail or have delays, especially in scenes cluttered with tall buildings, trees and other large structures. On the other hand, the visual system can fail when the scenes are not distinctive enough, very similar to other scenes from other places, when the driver view is obstructed by traffic, or when a completely new scene is visited (by the system) for the first time. In such cases, the GPS could still be working fine. At the same time, the visual-based navigation system, when it works well, it is expected to be real-time and faster than the GPS-based one. The two approaches, visual and GPS-based, are definitely complementary and can help each other for a more robust and accurate prediction. Thus, they offer an excellent open research question and next topic of work.
- The second open research question is related to the scalability of the learning-based, visual navigation system. What is the best way to collect training data from many users and integrate it into a single coherent database, in order to learn efficiently a single unified system. The scalability problem becomes interesting from many reasons, including aspects of speed, computation and memory requirements, but also the aspect of reusing resources and data. We expect that many users will often pass through the same common streets and scenes, while other more secluded scenes will be less covered by drivers. In such cases, an intelligent and efficient balancing system is needed in order to give different priorities to data covering new territory vs data covering well-known, already mapped areas. Such a novel intelligent system, for handling scalability of learning, is needed in order to integrate all these cases in the most optimal way for robust and well-balanced training.
- The third important research question we propose, is how to best provide visual navigation information to drivers, since, in the visual approach to navigation, vision and navigation are tightly connected. What is the best way to offer visual assistance to a driver for optimal results and how can this aspect be best evaluated and improved? While the topic falls in the realm of human-computer interaction (HCI), it is definitely an important one for future research, in order to get the most out of visual navigation.
- Another important research question comes from the ability of the system to adapt to structural and weather changes. We have proved in our experiments that this adaptation can work well. However, we have tested only over a period of 14 months, not longer. Over much longer periods, the structural and environmental changes at a given location are expected to be much more significant. It would be interesting to see what is the best way for the system to start forgetting the old road structures and scenes and adapt to the new ones as they change over time. One particularly interesting aspect here is the fact that some streets and scenes change more than others, over time.
- Other interesting open research questions, relevant for visual-based navigation, include the following: what are the regions, categories of objects or features in the scene that are best for localization. How about the features, objects and classes in the scene that are best for navigation? Is there a real advantage in using some other high-level auxiliary tasks (e.g., semantic segmentation of the scene, prediction of depth and 3D structure, detection of moving objects and other object categories in the scene) in order to improve localization and navigation? Also, is there a better way of using spatio-temporal processing? So far we have used only a relatively small temporal window of frames around the current time. It is not yet clear how much temporal processing is required, what is the optimal period of past time to be considered and what deep network models (e.g., recurrent in space and time) are most appropriate for visual navigation. In our experiments, the localization task seems to work fine without having to consider significant time processing. At the same time, we do expect the higher-level problem of navigation to be better suited for more sophisticated temporal processing and combination with higher-level auxiliary tasks.
8. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Teichmann, M.; Michael, W.; Zoellner, M.; Cipolla, R.; Urtrasun, R. MultiNet: Real-time joint semantic reasoning for autonomous driving. In Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), Chagshu, China, 26–30 June 2018; pp. 1013–1020. [Google Scholar]
- Ryohei, K.; Kanezaki, A.; Samejima, M.; Sugano, Y.; Matsushita, Y. Multi-task Learning using Multi-modal Encoder-Decoder Networks with Shared Skip Connections. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 403–411. [Google Scholar]
- Muller, U.; Ben, J.; Cosatto, E.; Flepp, B.; Cun, Y.L. Off-road obstacle avoidance through end-to-end learning. In Advances in Neural Information Processing Systems; Morgan Kaufmann Publishers: San Francisco, CA, USA, 2006; pp. 739–746. [Google Scholar]
- Bojarski, M.; Del Testa, D.; Dworakowski, D.; Firner, B.; Flepp, B.; Goyal, P.; Jackel, L.; Monfort, M.; Muller, U.; Zhang, J.; et al. End to end learning for self-driving cars. arXiv 2016, arXiv:1604.07316. [Google Scholar]
- Xu, H.; Gao, Y.; Yu, F.; Darrell, T. End-to-end learning of driving models from large-scale video datasets. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2174–2182. [Google Scholar]
- Pomerleau, D.A. Alvinn: An autonomous land vehicle in a neural network. In Advances in Neural Information Processing Systems; Morgan Kaufmann Publishers: San Francisco, CA, USA, 1989; pp. 305–313. [Google Scholar]
- Hecker, S.; Dai, D.; Van Gool, L. End-to-end learning of driving models with surround-view cameras and route planners. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 435–453. [Google Scholar]
- Nedevschi, S.; Popescu, V.; Danescu, R.; Marita, T.; Oniga, F. Accurate ego-vehicle global localization at intersections through alignment of visual data with digital map. IEEE Trans. Intell. Transp. Syst. 2012, 14, 673–687. [Google Scholar] [CrossRef]
- Mattern, N.; Wanielik, G. Camera-based vehicle localization at intersections using detailed digital maps. In Proceedings of the IEEE/ION Position, Location and Navigation Symposium, Indian Wells, CA, USA, 4–6 May 2010; pp. 1100–1107. [Google Scholar]
- Sünderhauf, N.; Shirazi, S.; Dayoub, F.; Upcroft, B.; Milford, M. On the performance of convnet features for place recognition. In Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany, 28 September–2 October 2015; pp. 4297–4304. [Google Scholar]
- Piasco, N.; Sidibé, D.; Demonceaux, C.; Gouet-Brunet, V. A survey on visual-based localization: On the benefit of heterogeneous data. Pattern Recognit. 2018, 74, 90–109. [Google Scholar] [CrossRef] [Green Version]
- Marcu, A.; Costea, D.; Slusanschi, E.; Leordeanu, M. A multi-stage multi-task neural network for aerial scene interpretation and geolocalization. arXiv 2018, arXiv:1804.01322. [Google Scholar]
- Sattler, T.; Leibe, B.; Kobbelt, L. Efficient & effective prioritized matching for large-scale image-based localization. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1744–1756. [Google Scholar] [PubMed]
- Kendall, A.; Grimes, M.; Cipolla, R. Posenet: A convolutional network for real-time 6-dof camera relocalization. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 2938–2946. [Google Scholar]
- Kendall, A.; Cipolla, R. Geometric loss functions for camera pose regression with deep learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5974–5983. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer: Berlin, Germany, 2015; pp. 234–241. [Google Scholar]
- Chen, B.; Neubert, B.; Ofek, E.; Deussen, O.; Cohen, M.F. Integrated videos and maps for driving directions. In Proceedings of the 22nd Annual ACM Symposium on User Interface Software and Technology, Victoria, BC, Canada, 4–7 October 2009; pp. 223–232. [Google Scholar]
- Peng, C.; Chen, B.Y.; Tsai, C.H. Integrated google maps and smooth street view videos for route planning. In Proceedings of the 2010 International Computer Symposium (ICS2010), Tainan, Taiwan, 16–18 December 2010; pp. 319–324. [Google Scholar]
- Amini, A.; Rosman, G.; Karaman, S.; Rus, D. Variational end-to-end navigation and localization. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 8958–8964. [Google Scholar]
- Glassner, Y.; Gispan, L.; Ayash, A.; Shohet, T.F. Closing the gap towards end-to-end autonomous vehicle system. arXiv 2019, arXiv:1901.00114. [Google Scholar]
- Hecker, S.; Dai, D.; Van Gool, L. Learning accurate, comfortable and human-like driving. arXiv 2019, arXiv:1903.10995. [Google Scholar]
- Badue, C.; Guidolini, R.; Carneiro, R.V.; Azevedo, P.; Cardoso, V.B.; Forechi, A.; Jesus, L.; Berriel, R.; Paixão, T.M.; Mutz, F.; et al. Self-driving cars: A survey. Expert Syst. Appl. 2020, 165, 113816. [Google Scholar] [CrossRef]
- Grigorescu, S.; Trasnea, B.; Cocias, T.; Macesanu, G. A survey of deep learning techniques for autonomous driving. J. Field Robot. 2020, 37, 362–386. [Google Scholar] [CrossRef]
- Hassan, I.; Elharti, A.I. A Literature Review of Steering Angle Prediction Algorithms for Self-driving Cars. In Advanced Intelligent Systems for Sustainable Development (AI2SD2019): Volume 4—Advanced Intelligent Systems for Applied Computing Sciences; Springer: Berlin, Germany, 2020; Volume 4, p. 30. [Google Scholar]
- Janai, J.; Güney, F.; Behl, A.; Geiger, A. Computer vision for autonomous vehicles: Problems, datasets and state of the art. Found. Trends Comput. Graph. Vis. 2020, 12, 1–308. [Google Scholar] [CrossRef]
- Mozaffari, S.; Al-Jarrah, O.Y.; Dianati, M.; Jennings, P.; Mouzakitis, A. Deep learning-based vehicle behavior prediction for autonomous driving applications: A review. IEEE Trans. Intell. Transp. Syst. 2020. [Google Scholar] [CrossRef]
- Shreyas, V.; Bharadwaj, S.N.; Srinidhi, S.; Ankith, K.; Rajendra, A. Self-driving Cars: An Overview of Various Autonomous Driving Systems. In Advances in Data and Information Sciences; Springer: Berlin, Germany, 2020; pp. 361–371. [Google Scholar]
- Mirowski, P.; Grimes, M.; Malinowski, M.; Hermann, K.M.; Anderson, K.; Teplyashin, D.; Simonyan, K.; Zisserman, A.; Hadsell, R. Learning to navigate in cities without a map. Adv. Neural Inf. Process. Syst. 2018, 31, 2419–2430. [Google Scholar]
- Sattler, T.; Maddern, W.; Toft, C.; Torii, A.; Hammarstrand, L.; Stenborg, E.; Kahl, F. Benchmarking 6dof outdoor visual localization in changing conditions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8601–8610. [Google Scholar]
- Toft, C.; Stenborg, E.; Hammarstrand, L.; Brynte, L.; Pollefeys, M.; Sattler, T.; Kahl, F. Semantic match consistency for long-term visual localization. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 383–399. [Google Scholar]
- Volk, G.; Müller, S.; von Bernuth, A.; Hospach, D.; Bringmann, O. Towards Robust CNN-based Object Detection through Augmentation with Synthetic Rain Variations. In Proceedings of the 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zealand, 27–30 October 2019; pp. 285–292. [Google Scholar]
- von Bernuth, A.; Volk, G.; Bringmann, O. Simulating Photo-realistic Snow and Fog on Existing Images for Enhanced CNN Training and Evaluation. In Proceedings of the 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zealand, 27–30 October 2019; pp. 41–46. [Google Scholar]
- Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision meets robotics: The kitti dataset. Int. J. Robot. Res. 2013, 32, 1231–1237. [Google Scholar] [CrossRef] [Green Version]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 3213–3223. [Google Scholar]
- Santana, E.; Hotz, G. Learning a driving simulator. arXiv 2016, arXiv:1608.01230. [Google Scholar]
- Maddern, W.; Pascoe, G.; Linegar, C.; Newman, P. 1 year, 1000 km: The Oxford RobotCar dataset. Int. J. Robot. Res. 2017, 36, 3–15. [Google Scholar] [CrossRef]
- Udacity. 2016. Available online: https://github.com/udacity/self-driving-car (accessed on 27 January 2021).
- Corke, P. Robotics, Vision and Control: Fundamental Algorithms in MATLAB® Second, Completely Revised; Springer: Berlin, Germany, 2017; Volume 118. [Google Scholar]
- Kendall, A.; Gal, Y.; Cipolla, R. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7482–7491. [Google Scholar]
- Springenberg, J.T.; Dosovitskiy, A.; Brox, T.; Riedmiller, M. Striving for simplicity: The all convolutional net. arXiv 2014, arXiv:1412.6806. [Google Scholar]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
- Bishop, C.M. Training with noise is equivalent to Tikhonov regularization. Neural Comput. 1995, 7, 108–116. [Google Scholar] [CrossRef]
Dataset | Driving Time (h) | GPS/IMU | Route Planner | Autonomous Route Planner | CAN Bus Reader | Lidar | Accessible Setup |
---|---|---|---|---|---|---|---|
BDDV [5] | 10k | Yes | No | No | No | No | Yes |
Cityscapes [34] | <100 | Yes | No | No | No | No | No |
Comma.ai [35] | 7.3 | Yes | No | No | Yes | No | No |
Drive360 [7] | 60 | Yes | Yes | No | Yes | No | No |
KITTI [33] | <1 | Yes | No | No | No | Yes | No |
Oxford [36] | 214 | Yes | No | No | No | Yes | No |
Udacity [37] | 10 | Yes | No | No | Yes | No | No |
UED (ours) | 21 | Yes | Yes | Yes | No | No | Yes |
Position | Orientation | |||||||
---|---|---|---|---|---|---|---|---|
Method | Type | Response | Mean | Median | Type | Response | Mean | Median |
[14] | Reg | 100% | 58.09 m | 17.75 m | Reg | 100% | 7.81° | 2.41° |
[15] | Reg | 100% | 50.84 m | 15.39 m | Reg | 100% | 7.33° | 1.88° |
[12] | Seg | 91.00% | 27.36 m | 11.44 m | - | - | - | - |
LOVis-2DOF | Seg | 94.72% | 17.31 m | 11.55 m | - | - | - | - |
LOVis-reg | Seg | 92.62% | 26.95 m | 13.95 m | Reg | 100% | 9.92° | 4.41° |
LOVis | Seg | 96.35% | 16.89 m | 11.18 m | Seg | 96.08% | 3.65° | 1.43° |
LOVis-F | Seg | 100% | 16.05 m | 10.90 m | Seg | 100% | 3.73° | 0.67° |
MAE Speed (m/s) | MAE Steering Angle (°) | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Method | 1 s | 2 s | 3 s | 4 s | 5 s | 6 s | 7 s | 1 s | 2 s | 3 s | 4 s | 5 s | 6 s | 7 s |
[4] | 1.9 | 1.91 | 1.99 | 1.94 | 1.96 | 1.95 | 2.33 | 1.01 | 1.61 | 2.09 | 2.65 | 3.14 | 3.9 | 5.48 |
[7] | 1.76 | 1.7 | 1.68 | 1.68 | 1.69 | 1.72 | 1.76 | 0.91 | 1.32 | 1.74 | 2.05 | 2.39 | 2.95 | 4.3 |
NAVis | 0.94 | 0.92 | 0.91 | 0.92 | 0.94 | 0.98 | 1.03 | 0.84 | 1.26 | 1.68 | 2 | 2.36 | 2.91 | 4.24 |
Position: response (%), mean (meters), median (meters) | |||||||||||||||
None | Luminosity | Sun Flare | Fog | Rain | |||||||||||
LOVis | 97.9 | 14.7 | 11.2 | 87.8 | 26.9 | 12.3 | 86.5 | 27.1 | 12.6 | 78.5 | 31 | 12.3 | 74.2 | 22.7 | 14 |
LOVis-W | 98.3 | 13.6 | 11.2 | 97.7 | 14.3 | 11.6 | 97.5 | 14.9 | 11.4 | 97.6 | 14.6 | 11.8 | 97.8 | 14.2 | 11.5 |
LOVis-WF | 100 | 13 | 11.2 | 100 | 13.2 | 11.5 | 100 | 13.2 | 11.3 | 100 | 13.4 | 11.7 | 100 | 13.3 | 11.3 |
Orientation: response (%), mean (degrees), median (degrees) | |||||||||||||||
None | Luminosity | Sun Flare | Fog | Rain | |||||||||||
LOVis | 97.3 | 2.59 | 1.15 | 87.1 | 3.81 | 1.52 | 85.5 | 4.48 | 1.47 | 77.5 | 5.08 | 1.57 | 73.3 | 3.86 | 1.66 |
LOVis-W | 98.2 | 2.23 | 1.06 | 97.5 | 2.44 | 1.18 | 97.2 | 2.58 | 1.26 | 97.5 | 2.52 | 1.21 | 97.6 | 2.41 | 1.24 |
LOVis-WF | 100 | 2.28 | 1.07 | 100 | 2.51 | 1.19 | 100 | 2.68 | 1.27 | 100 | 2.56 | 1.22 | 100 | 2.46 | 1.25 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Leordeanu, M.; Paraicu, I. Driven by Vision: Learning Navigation by Visual Localization and Trajectory Prediction. Sensors 2021, 21, 852. https://doi.org/10.3390/s21030852
Leordeanu M, Paraicu I. Driven by Vision: Learning Navigation by Visual Localization and Trajectory Prediction. Sensors. 2021; 21(3):852. https://doi.org/10.3390/s21030852
Chicago/Turabian StyleLeordeanu, Marius, and Iulia Paraicu. 2021. "Driven by Vision: Learning Navigation by Visual Localization and Trajectory Prediction" Sensors 21, no. 3: 852. https://doi.org/10.3390/s21030852
APA StyleLeordeanu, M., & Paraicu, I. (2021). Driven by Vision: Learning Navigation by Visual Localization and Trajectory Prediction. Sensors, 21(3), 852. https://doi.org/10.3390/s21030852