Network Architecture for Optimizing Deep Deterministic Policy Gradient Algorithms

Haifei Zhang; Jian Xu; Jian Zhang; Quan Liu

doi:10.1155/2022/1117781

Network Architecture for Optimizing Deep Deterministic Policy Gradient Algorithms

Comput Intell Neurosci. 2022 Nov 18:2022:1117781. doi: 10.1155/2022/1117781. eCollection 2022.

Authors

Haifei Zhang¹, Jian Xu², Jian Zhang³, Quan Liu³

Affiliations

¹ School of Computer and Information Engineering, Nantong Institute of Technology, Yongxing Road 211, Nantong 226002, China.
² School of Information Science and Technology, Nantong University, Seyuan Road 9, Nantong 226019, China.
³ School of Computer Science and Technology, Soochow University, Shizi Street 1, Suzhou 215006, China.

Abstract

The traditional Deep Deterministic Policy Gradient (DDPG) algorithm has been widely used in continuous action spaces, but it still suffers from the problems of easily falling into local optima and large error fluctuations. Aiming at these deficiencies, this paper proposes a dual-actor-dual-critic DDPG algorithm (DN-DDPG). First, on the basis of the original actor-critic network architecture of the algorithm, a critic network is added to assist the training, and the smallest Q value of the two critic networks is taken as the estimated value of the action in each update. Reduce the probability of local optimal phenomenon; then, introduce the idea of dual-actor network to alleviate the underestimation of value generated by dual-evaluator network, and select the action with the greatest value in the two-actor networks to update to stabilize the training of the algorithm process. Finally, the improved method is validated on four continuous action tasks provided by MuJoCo, and the results show that the improved method can reduce the fluctuation range of error and improve the cumulative return compared with the classical algorithm.

MeSH terms

Algorithms*
Policy*