Muesli: Combining Improvements in Policy Optimization

Hessel, Matteo; Danihelka, Ivo; Viola, Fabio; Guez, Arthur; Schmitt, Simon; Sifre, Laurent; Weber, Theophane; Silver, David; van Hasselt, Hado

Computer Science > Machine Learning

arXiv:2104.06159 (cs)

[Submitted on 13 Apr 2021 (v1), last revised 31 Mar 2022 (this version, v2)]

Title:Muesli: Combining Improvements in Policy Optimization

Authors:Matteo Hessel, Ivo Danihelka, Fabio Viola, Arthur Guez, Simon Schmitt, Laurent Sifre, Theophane Weber, David Silver, Hado van Hasselt

View PDF

Abstract:We propose a novel policy update that combines regularized policy optimization with model learning as an auxiliary loss. The update (henceforth Muesli) matches MuZero's state-of-the-art performance on Atari. Notably, Muesli does so without using deep search: it acts directly with a policy network and has computation speed comparable to model-free baselines. The Atari results are complemented by extensive ablations, and by additional results on continuous control and 9x9 Go.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2104.06159 [cs.LG]
	(or arXiv:2104.06159v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2104.06159

Submission history

From: Ivo Danihelka [view email]
[v1] Tue, 13 Apr 2021 13:04:29 UTC (812 KB)
[v2] Thu, 31 Mar 2022 09:35:40 UTC (804 KB)

Computer Science > Machine Learning

Title:Muesli: Combining Improvements in Policy Optimization

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Muesli: Combining Improvements in Policy Optimization

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators