Model-Based Predictive Control and Reinforcement Learning for Planning Vehicle-Parking Trajectories for Vertical Parking Spaces
Abstract
:1. Introduction
2. Vehicle Model and Environment-Aware Design
2.1. Vehicle Kinematic Model
2.2. Environment Perception Design
2.2.1. Self-Position and Velocity Information Perception
- Part I: ;
- Part II: ;
- Part III: No change;
- Part IV: .
2.2.2. Obstacle Information Perception
3. Parking Method Framework Design
3.1. Overall Framework Design
3.2. MPC Controller Design
3.3. PPO-Based Parking Trajectory Planning Process
3.3.1. Reinforcing the Learning Process
3.3.2. PPO Algorithm
Algorithm 1: PPO algorithm to update neural network process |
1. Initialize the weight parameters of the policy network (actor) and value function network (critic). Given the discount factor and greedy factor , the initial state of the vehicle is . |
2. Do the following for each time step: |
|
|
|
|
. |
3. Repeat Step 2 until convergence or the maximum number of iterations is reached. |
4. Trajectory Planning Model Design
4.1. Action Strategy Actor–Critic Function Design
4.2. Algorithmic Network Framework Design
4.3. Reward Function Design
5. Discussion
5.1. Simulation Design and Validation
Evaluation Index and Scenario Design for Algorithm Training
5.2. Analysis of Algorithm Training Results
- The number of training rounds was set to 200, and the model parameters were updated every 40 steps; if the round ended, the model was updated and the environment was reinitialized to start the next round of training.
- If the moving platform collided, went beyond the driving range, or reached the destination, the reward was returned, the model parameters were updated, and the initial environment was reinitialized to begin the next round of training.
6. Conclusions
- The model prediction method was combined with the PPO algorithm to make it more adaptable to parking environments. To solve the problems of traditional trajectory planning algorithms with poor-quality generated paths and sharper points at node connections, this study split the entire parking process into two scenarios: finding a parking space and parking planning, and merged the endpoint of trajectory tracking and the starting point of parking, which effectively improved the smoothness of the paths.
- A reward function evaluation method based on four-dimensional indicators was designed and a smoothing bias strategy was added such that the intelligent body could learn to approach the target location yet avoid choosing a long detour to reach the reward of the target. This method can substantially accelerate training. The results confirmed that the PPO algorithm with the introduction of four-dimensional evaluation metrics converged in 2500 training cycles, which is 75% and 68% less than the training times of the DDPG and TD3 algorithms, respectively. And the PPO-based reinforcement learning method achieved shorter learning times, totaling only 30% and 37.5% of DDPG and TD3, respectively.
- To verify the path planning and motion control, a vehicle kinematic model was established based on the Ackermann steering principle and tested in a simulation environment. The test results demonstrated that the model could effectively avoid obstacles and reach the destination under different target positions, thus verifying its effectiveness and acceptable adaptability to the environment. The parking path was smooth without breakpoints, ensuring the comfort of the automatic parking process.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Yue, Z.; Cassandras, C.G. Decentralized optimal control of connected automated vehicles at signal-free intersections including comfort-constrained turns and safety guarantees. Automatica 2019, 109, 108563. [Google Scholar]
- Yurtsever, E.; Lambert, J.; Carballo, A.; Takeda, K. A survey of autonomous driving: Common practices and emerging technologies. IEEE Access 2020, 8, 58443–58469. [Google Scholar] [CrossRef]
- Pendleton, S.D.; Andersen, H.; Dux, X.; Shen, X.; Meghjani, M.; Eng, Y.H.; Rus, D.; Ang, M.H., Jr. Perception, planning, control, and coordination for autonomous vehicles. Machines 2017, 5, 6. [Google Scholar] [CrossRef] [Green Version]
- Claussmann, L.; Revilloud, M.; Gruyer, D.; Glaser, S. A review of motion planning for highway autonomous driving. IEEE Trans. Intell. Transp. Syst. 2019, 21, 1826–1848. [Google Scholar] [CrossRef] [Green Version]
- Schwarting, W.; Alonso-Mora, J.; Rus, D. Planning and decision-making for autonomous vehicles. Annu. Rev. Control Robot. Auton. Syst. 2018, 1, 187–210. [Google Scholar] [CrossRef]
- Sung, I.; Choi, B.; Nielsen, P. On the training of a neural network for online path planning with offline path planning algorithms. Int. J. Inf. Manag. 2021, 57, 102142. [Google Scholar] [CrossRef]
- Chakraborty, N.; Mondal, A.; Mondal, S. Intelligent charge scheduling and eco-routing mechanism for electric vehicles: A multi-objective heuristic approach. Sustain. Cities Soc. 2021, 69, 102820. [Google Scholar] [CrossRef]
- Ngo, T.G.; Dao, T.K.; Thandapani, J.; Nguyen, T.T.; Pham, D.T.; Vu, V.D. Analysis Urban Traffic Vehicle Routing Based on Dijkstra Algorithm Optimization. In Communication and Intelligent Systems; Springer: Singapore, 2021; pp. 69–79. [Google Scholar]
- Arulkumaran, K.; Deisenroth, M.P.; Brundage, M.; Bharath, A.A. A brief survey of deep reinforcement learning. arXiv 2017, arXiv:1708.05866. [Google Scholar] [CrossRef] [Green Version]
- Li, Y. Deep reinforcement learning: An overview. arXiv 2017, arXiv:1701.07274. [Google Scholar]
- Zhang, P.; Xiong, L.; Yu, Z.; Fang, P.; Yan, S.; Yao, J.; Zhou, Y. Reinforcement learning-based end-to-end parking for automatic parking system. Sensors 2019, 19, 3996. [Google Scholar] [CrossRef] [Green Version]
- Thunyapoo, B.; Ratchadakorntham, C.; Siricharoen, P.; Susutti, W. Self-Parking car simulation using reinforcement learning approach for moderate complexity parking scenario. In Proceedings of the 2020 17th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), Phuket, Thailand, 24–27 June 2020; pp. 576–579. [Google Scholar]
- Bejar, E.; Morn, A. Reverse parking a car-like mobile robot with deep reinforcement learning and preview control. In Proceedings of the 2019 IEEE 9th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA, 7–9 January 2019; pp. 0377–0383. [Google Scholar]
- Du, Z.; Miao, Q.; Zong, C. Trajectory planning for automated parking systems using deep reinforcement learning. Int. J. Automot. Technol. 2020, 21, 881–887. [Google Scholar] [CrossRef]
- Mnih, V.; Badia, A.P.; Mirza, M.; Graves, A.; Lillicrap, T.; Harley, T.; Silver, D.; Kavukcuoglu, K. Asynchronous Methods for Deep Reinforcement Learning. In Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016. [Google Scholar]
- Silver, D.; Schrittwieser, J.; Simonyan, K.; Antonoglou, I.; Huang, A.; Guez, A.; Hubert, T.; Baker, L.; Lai, M.; Bolton, A.; et al. Mastering the game of go without human knowledge. Nature 2017, 550, 354–359. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Miculescu, D.; Karaman, S. Polling-systems-based Autonomous Vehicle Coordination in Traffic Intersections with No Traffic Signals. IEEE Trans. Autom. Control 2016, 65, 680–694. [Google Scholar] [CrossRef]
- Chen, X.; Ma, H.; Wan, J.; Li, B.; Xia, T. Multi-view 3D object detection network for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1907–1915. [Google Scholar]
- Gupta, A.; Anpalagan, A.; Guan, L.; Khwaja, A.S. Deep Learning for Object Detection and Scene Perception in Self-Driving Cars: Survey, Challenges, and Open Issues. Array 2021, 10, 100057. [Google Scholar] [CrossRef]
- Bengio, Y.; Louradour, J.; Collobert, R.; Weston, J. Curriculum learning. In Proceedings of the 26th Annual International Conference on Machine Learning, Montreal, QC, Canada, 14–18 June 2009; Association for Computing Machinery: New York, NY, USA, 2009; pp. 41–48. [Google Scholar]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2016, arXiv:1509.02971. [Google Scholar]
- Silver, D.; Lever, G.; Heess, N.; Degris, T.; Wierstra, D.; Riedmiller, M. Deterministic Policy Gradient Algorithms. In Proceedings of the 31st International Conference on Machine Learning, Beijing, China, 21–26 June 2014; pp. 387–395. [Google Scholar]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 18, 529–533. [Google Scholar] [CrossRef] [PubMed]
- Zhang, J.; Wu, J.; Shen, X.; Li, Y. Autonomous land vehicle path planning algorithm based on improved heuristic function of A-Star. Int. J. Adv. Robot. Syst. 2021, 18, 17298814211042730. [Google Scholar] [CrossRef]
- Boroujeni, Z.; Goehring, D.; Ulbrich, F.; Neumann, D.; Rojas, R. Flexible unit A-star trajectory planning for autonomous vehicles on structured road maps. In Proceedings of the 2017 IEEE International Conference on Vehicular Electronics and Safety (ICVES), Vienna, Austria, 27–28 June 2017; pp. 7–12. [Google Scholar]
- Gurenko, B.V.; Vasileva, M.A. Intelligent system of mooring planning, based on deep q-learning. In International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems; Springer: Cham, Switzerland, 2021; pp. 369–378. [Google Scholar]
- Wu, Z.; Sun, L.; Zhan, W.; Yang, C.; Tomizuka, M. Efficient sampling-based maximum entropy inverse reinforcement learning with application to autonomous driving. IEEE Robot. Autom. Lett. 2020, 5, 5355–5362. [Google Scholar] [CrossRef]
- Jin, X.; Yan, Z.; Yin, G.; Li, S.; Wei, C. An adaptive motion planning technique for on-road autonomous driving. IEEE Access 2020, 9, 2655–2664. [Google Scholar] [CrossRef]
- Shi, Y.; Li, Q.; Bu, S.; Yang, J.; Zhu, L. Research on intelligent vehicle path planning based on rapidly-exploring random tree. Math. Probl. Eng. 2020, 2020, 5910503. [Google Scholar] [CrossRef]
- Jiang, C.; Hu, Z.; Mourelatos, Z.P.; Gorsich, D.; Jayakumar, P.; Fu, Y.; Majcher, M. R2-RRT*: Reliability-based robust mission planning of offroad autonomous ground vehicle under uncertain terrain environment. IEEE Trans. Autom. Sci. Eng. 2021, 19, 1030–1046. [Google Scholar] [CrossRef]
- Ayawli, B.B.K.; Chellali, R.; Appiah, A.Y.; Kyeremeh, F. An overview of nature-inspired, conventional, and hybrid methods of autonomous vehicle path planning. J. Adv. Transp. 2018, 2018, 8269698. [Google Scholar] [CrossRef]
- Sharma, O.; Sahoo, N.C.; Puhan, N.B. Recent advances in motion and behavior planning techniques for software architecture of autonomous vehicles: A state-of-the-art survey. Eng. Appl. Artif. Intell. 2021, 101, 104211. [Google Scholar] [CrossRef]
- Hao, Y.; Almutairi, F.; Rakha, H. Eco-driving at signalized intersections: A multiple signal optimization approach. IEEE Trans. Intell. Transp. Syst. 2020, 22, 2943–2955. [Google Scholar]
- Qiangqiang, G.; Li, L.; Xuegang, B. Urban traffic signal control with connected and automated vehicles: A survey. Transp. Res. Part C Emerg. Technol. 2019, 101, 313–334. [Google Scholar]
- Xiao, L.; Wang, M.; Schakel, W.; van Arem, B. Unravelling effects of cooperative adaptive cruise control deactivation on traffic flow characteristics at merging bottlenecks. Transp. Res. Part C Emerg. Technol. 2018, 96, 380–397. [Google Scholar] [CrossRef]
- Liao, X.; Wang, Z.; Zhao, X.; Han, K.; Tiwari, P.; Barth, M.J.; Wu, G. Cooperative ramp merging design and field implementation: A digital twin approach based on vehicle-to-cloud communication. IEEE Trans. Intell. Transp. Syst. 2021, 23, 4490–4500. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Shi, J.; Li, K.; Piao, C.; Gao, J.; Chen, L. Model-Based Predictive Control and Reinforcement Learning for Planning Vehicle-Parking Trajectories for Vertical Parking Spaces. Sensors 2023, 23, 7124. https://doi.org/10.3390/s23167124
Shi J, Li K, Piao C, Gao J, Chen L. Model-Based Predictive Control and Reinforcement Learning for Planning Vehicle-Parking Trajectories for Vertical Parking Spaces. Sensors. 2023; 23(16):7124. https://doi.org/10.3390/s23167124
Chicago/Turabian StyleShi, Junren, Kexin Li, Changhao Piao, Jun Gao, and Lizhi Chen. 2023. "Model-Based Predictive Control and Reinforcement Learning for Planning Vehicle-Parking Trajectories for Vertical Parking Spaces" Sensors 23, no. 16: 7124. https://doi.org/10.3390/s23167124
APA StyleShi, J., Li, K., Piao, C., Gao, J., & Chen, L. (2023). Model-Based Predictive Control and Reinforcement Learning for Planning Vehicle-Parking Trajectories for Vertical Parking Spaces. Sensors, 23(16), 7124. https://doi.org/10.3390/s23167124