Policy-Gradient Algorithms Have No Guarantees of Convergence in Linear Quadratic Games

Mazumdar, Eric; Ratliff, Lillian J.; Jordan, Michael I.; Sastry, S. Shankar

Computer Science > Machine Learning

arXiv:1907.03712 (cs)

[Submitted on 8 Jul 2019 (v1), last revised 16 Dec 2019 (this version, v2)]

Title:Policy-Gradient Algorithms Have No Guarantees of Convergence in Linear Quadratic Games

Authors:Eric Mazumdar, Lillian J. Ratliff, Michael I. Jordan, S. Shankar Sastry

View PDF

Abstract:We show by counterexample that policy-gradient algorithms have no guarantees of even local convergence to Nash equilibria in continuous action and state space multi-agent settings. To do so, we analyze gradient-play in N-player general-sum linear quadratic games, a classic game setting which is recently emerging as a benchmark in the field of multi-agent learning. In such games the state and action spaces are continuous and global Nash equilibria can be found be solving coupled Ricatti equations. Further, gradient-play in LQ games is equivalent to multi agent policy-gradient. We first show that these games are surprisingly not convex games. Despite this, we are still able to show that the only critical points of the gradient dynamics are global Nash equilibria. We then give sufficient conditions under which policy-gradient will avoid the Nash equilibria, and generate a large number of general-sum linear quadratic games that satisfy these conditions. In such games we empirically observe the players converging to limit cycles for which the time average does not coincide with a Nash equilibrium. The existence of such games indicates that one of the most popular approaches to solving reinforcement learning problems in the classic reinforcement learning setting has no local guarantee of convergence in multi-agent settings. Further, the ease with which we can generate these counterexamples suggests that such situations are not mere edge cases and are in fact quite common.

Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1907.03712 [cs.LG]
	(or arXiv:1907.03712v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1907.03712

Submission history

From: Eric Mazumdar [view email]
[v1] Mon, 8 Jul 2019 16:35:03 UTC (3,011 KB)
[v2] Mon, 16 Dec 2019 20:32:31 UTC (3,772 KB)

Computer Science > Machine Learning

Title:Policy-Gradient Algorithms Have No Guarantees of Convergence in Linear Quadratic Games

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Policy-Gradient Algorithms Have No Guarantees of Convergence in Linear Quadratic Games

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators