HSVI for zs-POSGs using Concavity, Convexity and Lipschitz Properties

Delage, Aurélien; Buffet, Olivier; Dibangoye, Jilles

Computer Science > Computer Science and Game Theory

arXiv:2110.14529 (cs)

[Submitted on 25 Oct 2021 (v1), last revised 15 Nov 2022 (this version, v2)]

Title:HSVI for zs-POSGs using Concavity, Convexity and Lipschitz Properties

Authors:Aurélien Delage, Olivier Buffet, Jilles Dibangoye

View PDF

Abstract:Dynamic programming and heuristic search are at the core of state-of-the-art solvers for sequential decision-making problems. In partially observable or collaborative settings (\eg, POMDPs and Dec-POMDPs), this requires introducing an appropriate statistic that induces a fully observable problem as well as bounding (convex) approximators of the optimal value function. This approach has succeeded in some subclasses of 2-player zero-sum partially observable stochastic games (zs-POSGs) as well, but failed in the general case despite known concavity and convexity properties, which only led to heuristic algorithms with poor convergence guarantees. We overcome this issue, leveraging on these properties to derive bounding approximators and efficient update and selection operators, before deriving a prototypical solver inspired by HSVI that provably converges to an $\epsilon$-optimal solution in finite time, and which we empirically evaluate. This opens the door to a novel family of promising approaches complementing those relying on linear programming or iterative methods.

Comments:	37 pages, 4 figures, 4 tables, 3 algorithms
Subjects:	Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2110.14529 [cs.GT]
	(or arXiv:2110.14529v2 [cs.GT] for this version)
	https://doi.org/10.48550/arXiv.2110.14529

Submission history

From: Olivier Buffet [view email]
[v1] Mon, 25 Oct 2021 13:38:21 UTC (516 KB)
[v2] Tue, 15 Nov 2022 14:23:47 UTC (517 KB)

Computer Science > Computer Science and Game Theory

Title:HSVI for zs-POSGs using Concavity, Convexity and Lipschitz Properties

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Science and Game Theory

Title:HSVI for zs-POSGs using Concavity, Convexity and Lipschitz Properties

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators