Efficient Reinforcement Learning in Probabilistic Reward Machines

Lin, Xiaofeng; Zhang, Xuezhou

Statistics > Machine Learning

arXiv:2408.10381 (stat)

[Submitted on 19 Aug 2024]

Title:Efficient Reinforcement Learning in Probabilistic Reward Machines

Authors:Xiaofeng Lin, Xuezhou Zhang

View PDF HTML (experimental)

Abstract:In this paper, we study reinforcement learning in Markov Decision Processes with Probabilistic Reward Machines (PRMs), a form of non-Markovian reward commonly found in robotics tasks. We design an algorithm for PRMs that achieves a regret bound of $\widetilde{O}(\sqrt{HOAT} + H^2O^2A^{3/2} + H\sqrt{T})$, where $H$ is the time horizon, $O$ is the number of observations, $A$ is the number of actions, and $T$ is the number of time-steps. This result improves over the best-known bound, $\widetilde{O}(H\sqrt{OAT})$ of \citet{pmlr-v206-bourel23a} for MDPs with Deterministic Reward Machines (DRMs), a special case of PRMs. When $T \geq H^3O^3A^2$ and $OA \geq H$, our regret bound leads to a regret of $\widetilde{O}(\sqrt{HOAT})$, which matches the established lower bound of $\Omega(\sqrt{HOAT})$ for MDPs with DRMs up to a logarithmic factor. To the best of our knowledge, this is the first efficient algorithm for PRMs. Additionally, we present a new simulation lemma for non-Markovian rewards, which enables reward-free exploration for any non-Markovian reward given access to an approximate planner. Complementing our theoretical findings, we show through extensive experiment evaluations that our algorithm indeed outperforms prior methods in various PRM environments.

Comments:	33 pages, 4 figures
Subjects:	Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2408.10381 [stat.ML]
	(or arXiv:2408.10381v1 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2408.10381

Submission history

From: Xiaofeng Lin [view email]
[v1] Mon, 19 Aug 2024 19:51:53 UTC (3,210 KB)

Statistics > Machine Learning

Title:Efficient Reinforcement Learning in Probabilistic Reward Machines

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Efficient Reinforcement Learning in Probabilistic Reward Machines

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators