Stochastic Potential Game-Based Target Tracking and Encirclement Approach for Multiple Unmanned Aerial Vehicles System
Abstract
:1. Introduction
- The SPG-based TTE scenario is constructed within a finite continuous action-state domain that involves adjusting the composite tasks of an MUAVS in various environments to achieve the TTE objective. All trackers are dealing with the uncertainty of the target’s motion pattern, as the target possesses equivalent intelligent decision-making capabilities within the scenario.
- To search for local NE strategies, a Time-Series Multi-Agent Soft Actor–Critic (TMSAC) approach is proposed. It leverages sequential observations from agents and is particularly effective in determining the optimal strategy for MUAVS, especially when involving multiple agents. Furthermore, novel reward functions that account for the SPG condition are designed. These functions are integrated with the TMSAC framework to enhance performance for both the tracker and the target. The convergence of the algorithm is also discussed to justify its search capability for the local NE.
- Considering the dynamical characteristics of agents, a guidance loop based on the actor training from TMSAC is combined with velocity and attitude controllers, deployed in the visualized physical simulation environment to show the effectiveness and the success rate of the proposed method.
2. System Modeling
2.1. Quadrotor Dynamics
2.2. Multi-Agent Kinematics
3. Stochastic Potential Game
- indicates the set of agents state constructed by relative kinematic information and obstacle observation.
- is the joint Markov strategy set where is the mix strategy gathered from continuous action space .
- is the state transfer probability function expressing the system response characteristic.
- is the rewards function for each agent in .
- is called self-influence rewards, affected by the agent i obstacle avoidance and the smoothness of the action.
- represents the inter-influence rewards that include the tracking and encirclement state.
4. Time-Series Multi-Agent Soft Actor–Critic Approach
4.1. State and Action Embedding
4.1.1. State Vector
4.1.2. Action Vector
4.2. Composite Rewards Function
4.2.1. Self-Influence Rewards
4.2.2. Inter-Influence Rewards
4.3. Algorithm
4.3.1. Learning Methods
4.3.2. Actor and Critic Structures
4.3.3. Details
Algorithm 1 Time-Series Multi-Agent Soft Actor–Critic. |
Input: Randomly initial , , and ; for to M do: Initialize the environment and agents state ; Initialize the experience buffer ; Select agent for training; for to T do: Sample actions from each ; Execute and update state set from environment; Obtain rewards ; Pull into the experience buffer ; if : Gather a batch of set from the experience buffer ; Softly update of agent i; end if end for end for Output: ; |
4.3.4. Convergence Discussion
5. Simulation and Discussion
5.1. Algorithm Performance
5.2. Comparison with Other Methods
- Higher rewards value. In all cases of comparison, TMSAC obtains 5–20% more rewards than other methods with the same environment and hyper-parameters settings, both for trackers and target, simultaneously. That means strategies approximated by the TMSAC algorithm are closer to MPE as mentioned in Section 3.
- Adaptivity for more agents scenarios. For fully distributed algorithms, increasing the number of agents makes it more challenging to search for NE strategies. From Figure 7a–e, the advantage of TMSAC is progressively obvious on the trackers side. The recurrent network structure gives a measure of prediction ability for agent to avoid potential collision, and select effective actions for the predictive tracking. The mounting agents group brings more kinematic information into state vectors. The results demonstrate that as the number of trackers in the environment increases, TMSAC exhibits enhanced robustness in identifying near-Nash Equilibrium policies.
- Full Success Rate (FSR): This metric measures the percentage of scenarios in which all trackers in the MUAVS successfully capture their targets within a limited number of time steps.
- Half Success Rate (HSR): This metric quantifies the percentage of scenarios, in which at least half of the trackers achieve their goals.
5.3. Visual Simulation
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
TTE | Target Tracking and Encirclement |
MUAVS | Multiple Unmanned Aerial Vehicles System |
AC | Actor–Critic |
MARL | Multi-Agent Reinforcement Learning |
PG | Potential Game |
NE | Nash Equilibrium |
SPG | Stochastic Potential Game |
SG | Stochastic Game |
MDP | Markov Decision Process |
MPE | Markov Perfect Equilibrium |
TMSAC | Time-Series Multi-Agent Soft Actor–Critic |
MASAC | Multi-Agent Soft Actor–Critic |
MADDPG | Multi-Agent Deep Deterministic Policy Gradient |
GRU | Gated Recurrent Unit |
MLPs | Multilayer Perceptrons |
References
- Erdelj, M.; Król, M.; Natalizio, E. Wireless sensor networks and multi-UAV systems for natural disaster management. Comput. Netw. 2017, 124, 72–86. [Google Scholar] [CrossRef]
- Outay, F.; Mengash, H.A.; Adnan, M. Applications of unmanned aerial vehicle (UAV) in road safety, traffic and highway infrastructure management: Recent advances and challenges. Transp. Res. Part A Policy Pract. 2020, 141, 116–129. [Google Scholar] [CrossRef] [PubMed]
- Wang, C.; Wang, J.; Shen, Y.; Zhang, X. Autonomous navigation of UAVs in large-scale complex environments: A deep reinforcement learning approach. IEEE Trans. Veh. Technol. 2019, 68, 2124–2136. [Google Scholar] [CrossRef]
- Chong, C.Y.; Garren, D.; Grayson, T.P. Ground target tracking-a historical perspective. In Proceedings of the 2000 IEEE Aerospace Conference. Proceedings (Cat. No. 00TH8484), Big Sky, MT, USA, 25 March 2000; Volume 3, pp. 433–448. [Google Scholar]
- Luo, C.; McClean, S.I.; Parr, G.; Teacy, L.; De Nardi, R. UAV position estimation and collision avoidance using the extended Kalman filter. IEEE Trans. Veh. Technol. 2013, 62, 2749–2762. [Google Scholar] [CrossRef]
- Mao, G.; Drake, S.; Anderson, B.D. Design of an extended kalman filter for uav localization. In Proceedings of the 2007 Information, Decision and Control, Adelaide, SA, Australia, 12–14 February 2007; pp. 224–229. [Google Scholar]
- Rullán-Lara, J.L.; Salazar, S.; Lozano, R. Real-time localization of an UAV using Kalman filter and a Wireless Sensor Network. J. Intell. Robot. Syst. 2012, 65, 283–293. [Google Scholar] [CrossRef]
- Xiong, J.J.; Zheng, E.H. Optimal kalman filter for state estimation of a quadrotor UAV. Optik 2015, 126, 2862–2868. [Google Scholar] [CrossRef]
- Leven, W.F.; Lanterman, A.D. Unscented Kalman filters for multiple target tracking with symmetric measurement equations. IEEE Trans. Autom. Control 2009, 54, 370–375. [Google Scholar] [CrossRef]
- Kamen, E. Multiple target tracking based on symmetric measurement equations. In Proceedings of the 1989 American Control Conference, Pittsburgh, PA, USA, 21–23 June 1989; pp. 2690–2695. [Google Scholar]
- Gulati, D.; Zhang, F.; Clarke, D.; Knoll, A. Graph-based cooperative localization using symmetric measurement equations. Sensors 2017, 17, 1422. [Google Scholar] [CrossRef]
- Quintero, S.A.; Copp, D.A.; Hespanha, J.P. Robust UAV coordination for target tracking using output-feedback model predictive control with moving horizon estimation. In Proceedings of the 2015 American Control Conference (ACC), Chicago, IL, USA, 1–3 July 2015; pp. 3758–3764. [Google Scholar]
- Quintero, S.A.; Copp, D.A.; Hespanha, J.P. Robust Coordination of Small UAVs for Vision-Based Target Tracking Using Output-Feedback MPC with MHE. In Cooperative Control of Multi-Agent Systems: Theory and Applications; Wiley: Hoboken, NJ, USA, 2017; pp. 51–83. [Google Scholar]
- Shen, C.; Shi, Y.; Buckham, B. Path-following control of an AUV: A multiobjective model predictive control approach. IEEE Trans. Control Syst. Technol. 2018, 27, 1334–1342. [Google Scholar] [CrossRef]
- Gao, Y.; Bai, C.; Zhang, L.; Quan, Q. Multi-UAV cooperative target encirclement within an annular virtual tube. Aerosp. Sci. Technol. 2022, 128, 107800. [Google Scholar] [CrossRef]
- Li, K.; Han, Y.; Yan, X. Distributed multi-UAV cooperation for dynamic target tracking optimized by an SAQPSO algorithm. ISA Trans. 2022, 129, 230–242. [Google Scholar]
- Xie, R.; Dempster, A.G. An on-line deep learning framework for low-thrust trajectory optimisation. Aerosp. Sci. Technol. 2021, 118, 107002. [Google Scholar] [CrossRef]
- Xia, Z.; Du, J.; Wang, J.; Jiang, C.; Ren, Y.; Li, G.; Han, Z. Multi-agent reinforcement learning aided intelligent UAV swarm for target tracking. IEEE Trans. Veh. Technol. 2021, 71, 931–945. [Google Scholar] [CrossRef]
- Wenhong, Z.; Jie, L.; Zhihong, L.; Lincheng, S. Improving multi-target cooperative tracking guidance for UAV swarms using multi-agent reinforcement learning. Chin. J. Aeronaut. 2022, 35, 100–112. [Google Scholar]
- Ao, T.; Zhang, K.; Shi, H.; Jin, Z.; Zhou, Y.; Liu, F. Energy-Efficient Multi-UAVs Cooperative Trajectory Optimization for Communication Coverage: An MADRL Approach. Remote. Sens. 2023, 15, 429. [Google Scholar] [CrossRef]
- Zheng, Z.; Cai, S. A collaborative target tracking algorithm for multiple UAVs with inferior tracking capabilities. Front. Inf. Technol. Electron. Eng. 2021, 22, 1334–1350. [Google Scholar] [CrossRef]
- Yang, Y.; Wang, J. An overview of multi-agent reinforcement learning from game theoretical perspective. arXiv 2020, arXiv:2011.00583. [Google Scholar]
- Lanctot, M.; Zambaldi, V.; Gruslys, A.; Lazaridou, A.; Tuyls, K.; Pérolat, J.; Silver, D.; Graepel, T. A unified game-theoretic approach to multiagent reinforcement learning. Adv. Neural Inf. Process. Syst. 2017, 30, 4193–4206. [Google Scholar]
- Lowe, R.; Wu, Y.I.; Tamar, A.; Harb, J.; Pieter Abbeel, O.; Mordatch, I. Multi-agent actor-critic for mixed cooperative-competitive environments. Adv. Neural Inf. Process. Syst. 2017, 30, 6382–6393. [Google Scholar]
- Vamvoudakis, K.G.; Modares, H.; Kiumarsi, B.; Lewis, F.L. Game theory-based control system algorithms with real-time reinforcement learning: How to solve multiplayer games online. IEEE Control Syst. Mag. 2017, 37, 33–52. [Google Scholar]
- Jiang, J.; Dun, C.; Huang, T.; Lu, Z. Graph convolutional reinforcement learning. arXiv 2018, arXiv:1810.09202. [Google Scholar]
- Wang, J.; Ye, D.; Lu, Z. More centralized training, still decentralized execution: Multi-agent conditional policy factorization. arXiv 2022, arXiv:2209.12681. [Google Scholar]
- Meng, X.; Tan, Y. PMAC: Personalized Multi-Agent Communication. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, Canada, 20–27 February 2024; Volume 38, pp. 17505–17513. [Google Scholar]
- Hu, J.; Wellman, M.P. Nash Q-learning for general-sum stochastic games. J. Mach. Learn. Res. 2003, 4, 1039–1069. [Google Scholar]
- Vamvoudakis, K.G. Non-zero sum Nash Q-learning for unknown deterministic continuous-time linear systems. Automatica 2015, 61, 274–281. [Google Scholar] [CrossRef]
- Holt, C.A.; Roth, A.E. The Nash equilibrium: A perspective. Proc. Natl. Acad. Sci. USA 2004, 101, 3999–4002. [Google Scholar] [CrossRef]
- Perkins, S.; Mertikopoulos, P.; Leslie, D.S. Mixed-strategy learning with continuous action sets. IEEE Trans. Autom. Control 2015, 62, 379–384. [Google Scholar] [CrossRef]
- Margellos, K.; Lygeros, J. Hamilton–Jacobi formulation for reach–avoid differential games. IEEE Trans. Autom. Control 2011, 56, 1849–1861. [Google Scholar] [CrossRef]
- Kokolakis, N.M.T.; Kanellopoulos, A.; Vamvoudakis, K.G. Bounded rational unmanned aerial vehicle coordination for adversarial target tracking. In Proceedings of the 2020 American Control Conference (ACC), Denver, CO, USA, 1–3 July 2020; pp. 2508–2513. [Google Scholar]
- Monderer, D.; Shapley, L.S. Potential games. Games Econ. Behav. 1996, 14, 124–143. [Google Scholar] [CrossRef]
- Deng, X.; Li, N.; Mguni, D.; Wang, J.; Yang, Y. On the complexity of computing markov perfect equilibrium in general-sum stochastic games. Natl. Sci. Rev. 2023, 10, nwac256. [Google Scholar] [CrossRef]
- Babichenko, Y.; Rubinstein, A. Communication complexity of Nash equilibrium in potential games. In Proceedings of the 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS), Durham, NC, USA, 16–19 November 2020; pp. 1439–1445. [Google Scholar]
- Mguni, D.H.; Wu, Y.; Du, Y.; Yang, Y.; Wang, Z.; Li, M.; Wen, Y.; Jennings, J.; Wang, J. Learning in nonzero-sum stochastic games with potentials. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 18–24 July 2021; pp. 7688–7699. [Google Scholar]
- Yu, Y.; Wang, H.; Liu, S.; Guo, L.; Yeoh, P.L.; Vucetic, B.; Li, Y. Distributed multi-agent target tracking: A Nash-combined adaptive differential evolution method for UAV systems. IEEE Trans. Veh. Technol. 2021, 70, 8122–8133. [Google Scholar] [CrossRef]
- Zhang, K.; Yang, Z.; Başar, T. Multi-agent reinforcement learning: A selective overview of theories and algorithms. In Handbook of Reinforcement Learning and Control; Springer: Berlin/Heidelberg, Germany, 2021; pp. 321–384. [Google Scholar]
- Beard, R.W. Quadrotor Dynamics and Control; Brigham Young University: Provo, UT, USA, 2008; Volume 19, pp. 46–56. [Google Scholar]
- Luukkonen, T. Modelling and Control of Quadcopter; Independent Research Project in Applied Mathematics, Aalto University: Espoo, Finland, 2011; Volume 22. [Google Scholar]
- Chen, X.; Deng, X.; Teng, S.H. Settling the complexity of computing two-player Nash equilibria. J. ACM 2009, 56, 1–57. [Google Scholar] [CrossRef]
- Foerster, J.; Farquhar, G.; Afouras, T.; Nardelli, N.; Whiteson, S. Counterfactual multi-agent policy gradients. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
- Peng, B.; Rashid, T.; Schroeder de Witt, C.; Kamienny, P.A.; Torr, P.; Böhmer, W.; Whiteson, S. Facmac: Factored multi-agent centralised policy gradients. Adv. Neural Inf. Process. Syst. 2021, 34, 12208–12221. [Google Scholar]
- Haarnoja, T.; Zhou, A.; Hartikainen, K.; Tucker, G.; Ha, S.; Tan, J.; Kumar, V.; Zhu, H.; Gupta, A.; Abbeel, P.; et al. Soft actor-critic algorithms and applications. arXiv 2018, arXiv:1812.05905. [Google Scholar]
- Panerati, J.; Zheng, H.; Zhou, S.; Xu, J.; Prorok, A.; Schoellig, A.P. Learning to fly—A gym environment with pybullet physics for reinforcement learning of multi-agent quadcopter control. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; pp. 7512–7519. [Google Scholar]
Names | Values |
---|---|
Plane size | 1000 m × 1000 m |
Tracker max speed | 10 m/s |
Tracker max angular speed | 15 deg/s |
Target max speed | 10 m/s |
Target max angular speed | 10 deg/s |
Trackers number | 4 or 5 |
Target number | 1 |
Agent safe interval | 20 m |
Obstacle detection distance | 8 m |
Obstacle detection points | 12 |
Obstacles shape and size | random |
Names | Values |
---|---|
Total episodes M | |
Total steps in each episode T | 150 |
Switching period | 20 × N |
Learning interval | 5 |
Batch size | 1024 |
Target entropy | −2 |
Discount factor | 0.988 |
Buffer size | |
Sequential length | 10 |
Optimizer | Adam |
Names | Values |
---|---|
Gravity acceleration g | 9.81 |
UAV mass m | 0.25 kg |
UAV arm length l and b | 0.2 m |
X-axis inertias | |
Y-axis inertias | |
Z-axis inertias | |
Max propeller force | 7.23 N |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yang, K.; Zhu, M.; Guo, X.; Zhang, Y.; Zhou, Y. Stochastic Potential Game-Based Target Tracking and Encirclement Approach for Multiple Unmanned Aerial Vehicles System. Drones 2025, 9, 103. https://doi.org/10.3390/drones9020103
Yang K, Zhu M, Guo X, Zhang Y, Zhou Y. Stochastic Potential Game-Based Target Tracking and Encirclement Approach for Multiple Unmanned Aerial Vehicles System. Drones. 2025; 9(2):103. https://doi.org/10.3390/drones9020103
Chicago/Turabian StyleYang, Kejie, Ming Zhu, Xiao Guo, Yifei Zhang, and Yuting Zhou. 2025. "Stochastic Potential Game-Based Target Tracking and Encirclement Approach for Multiple Unmanned Aerial Vehicles System" Drones 9, no. 2: 103. https://doi.org/10.3390/drones9020103
APA StyleYang, K., Zhu, M., Guo, X., Zhang, Y., & Zhou, Y. (2025). Stochastic Potential Game-Based Target Tracking and Encirclement Approach for Multiple Unmanned Aerial Vehicles System. Drones, 9(2), 103. https://doi.org/10.3390/drones9020103