Constrained Upper Confidence Reinforcement Learning

Liyuan Zheng, Lillian Ratliff
Proceedings of the 2nd Conference on Learning for Dynamics and Control, PMLR 120:620-629, 2020.

Abstract

Constrained Markov Decision Processes are a class of stochastic decision problems in which the decision maker must select a policy that satisfies auxiliary cost constraints. This paper extends upper confidence reinforcement learning for settings in which the reward function and the constraints, described by cost functions, are unknown a priori but the transition kernel is known. Such a setting is well-motivated by a number of applications including exploration of unknown, potentially unsafe, environments. We present an algorithm C-UCRL and show that it achieves sub-linear regret with respect to the reward while satisfying the constraints even while learning with high probability. An illustrative example is provided.

Cite this Paper


BibTeX
@InProceedings{pmlr-v120-zheng20a, title = {Constrained Upper Confidence Reinforcement Learning}, author = {Zheng, Liyuan and Ratliff, Lillian}, booktitle = {Proceedings of the 2nd Conference on Learning for Dynamics and Control}, pages = {620--629}, year = {2020}, editor = {Bayen, Alexandre M. and Jadbabaie, Ali and Pappas, George and Parrilo, Pablo A. and Recht, Benjamin and Tomlin, Claire and Zeilinger, Melanie}, volume = {120}, series = {Proceedings of Machine Learning Research}, month = {10--11 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v120/zheng20a/zheng20a.pdf}, url = {https://proceedings.mlr.press/v120/zheng20a.html}, abstract = {Constrained Markov Decision Processes are a class of stochastic decision problems in which the decision maker must select a policy that satisfies auxiliary cost constraints. This paper extends upper confidence reinforcement learning for settings in which the reward function and the constraints, described by cost functions, are unknown a priori but the transition kernel is known. Such a setting is well-motivated by a number of applications including exploration of unknown, potentially unsafe, environments. We present an algorithm C-UCRL and show that it achieves sub-linear regret with respect to the reward while satisfying the constraints even while learning with high probability. An illustrative example is provided.} }
Endnote
%0 Conference Paper %T Constrained Upper Confidence Reinforcement Learning %A Liyuan Zheng %A Lillian Ratliff %B Proceedings of the 2nd Conference on Learning for Dynamics and Control %C Proceedings of Machine Learning Research %D 2020 %E Alexandre M. Bayen %E Ali Jadbabaie %E George Pappas %E Pablo A. Parrilo %E Benjamin Recht %E Claire Tomlin %E Melanie Zeilinger %F pmlr-v120-zheng20a %I PMLR %P 620--629 %U https://proceedings.mlr.press/v120/zheng20a.html %V 120 %X Constrained Markov Decision Processes are a class of stochastic decision problems in which the decision maker must select a policy that satisfies auxiliary cost constraints. This paper extends upper confidence reinforcement learning for settings in which the reward function and the constraints, described by cost functions, are unknown a priori but the transition kernel is known. Such a setting is well-motivated by a number of applications including exploration of unknown, potentially unsafe, environments. We present an algorithm C-UCRL and show that it achieves sub-linear regret with respect to the reward while satisfying the constraints even while learning with high probability. An illustrative example is provided.
APA
Zheng, L. & Ratliff, L.. (2020). Constrained Upper Confidence Reinforcement Learning. Proceedings of the 2nd Conference on Learning for Dynamics and Control, in Proceedings of Machine Learning Research 120:620-629 Available from https://proceedings.mlr.press/v120/zheng20a.html.

Related Material