Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning

Rodrigo Toro Icarte, Toryn Klassen, Richard Valenzano, Sheila McIlraith
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:2107-2116, 2018.

Abstract

In this paper we propose Reward Machines {—} a type of finite state machine that supports the specification of reward functions while exposing reward function structure to the learner and supporting decomposition. We then present Q-Learning for Reward Machines (QRM), an algorithm which appropriately decomposes the reward machine and uses off-policy q-learning to simultaneously learn subpolicies for the different components. QRM is guaranteed to converge to an optimal policy in the tabular case, in contrast to Hierarchical Reinforcement Learning methods which might converge to suboptimal policies. We demonstrate this behavior experimentally in two discrete domains. We also show how function approximation methods like neural networks can be incorporated into QRM, and that doing so can find better policies more quickly than hierarchical methods in a domain with a continuous state space.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-icarte18a, title = {Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning}, author = {Icarte, Rodrigo Toro and Klassen, Toryn and Valenzano, Richard and McIlraith, Sheila}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {2107--2116}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/icarte18a/icarte18a.pdf}, url = {https://proceedings.mlr.press/v80/icarte18a.html}, abstract = {In this paper we propose Reward Machines {—} a type of finite state machine that supports the specification of reward functions while exposing reward function structure to the learner and supporting decomposition. We then present Q-Learning for Reward Machines (QRM), an algorithm which appropriately decomposes the reward machine and uses off-policy q-learning to simultaneously learn subpolicies for the different components. QRM is guaranteed to converge to an optimal policy in the tabular case, in contrast to Hierarchical Reinforcement Learning methods which might converge to suboptimal policies. We demonstrate this behavior experimentally in two discrete domains. We also show how function approximation methods like neural networks can be incorporated into QRM, and that doing so can find better policies more quickly than hierarchical methods in a domain with a continuous state space.} }
Endnote
%0 Conference Paper %T Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning %A Rodrigo Toro Icarte %A Toryn Klassen %A Richard Valenzano %A Sheila McIlraith %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-icarte18a %I PMLR %P 2107--2116 %U https://proceedings.mlr.press/v80/icarte18a.html %V 80 %X In this paper we propose Reward Machines {—} a type of finite state machine that supports the specification of reward functions while exposing reward function structure to the learner and supporting decomposition. We then present Q-Learning for Reward Machines (QRM), an algorithm which appropriately decomposes the reward machine and uses off-policy q-learning to simultaneously learn subpolicies for the different components. QRM is guaranteed to converge to an optimal policy in the tabular case, in contrast to Hierarchical Reinforcement Learning methods which might converge to suboptimal policies. We demonstrate this behavior experimentally in two discrete domains. We also show how function approximation methods like neural networks can be incorporated into QRM, and that doing so can find better policies more quickly than hierarchical methods in a domain with a continuous state space.
APA
Icarte, R.T., Klassen, T., Valenzano, R. & McIlraith, S.. (2018). Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:2107-2116 Available from https://proceedings.mlr.press/v80/icarte18a.html.

Related Material