Iteratively Refined Behavior Regularization for Offline Reinforcement Learning

Hu, Xiaohan; Ma, Yi; Xiao, Chenjun; Zheng, Yan; Hao, Jianye

Computer Science > Machine Learning

arXiv:2306.05726 (cs)

[Submitted on 9 Jun 2023 (v1), last revised 17 Oct 2023 (this version, v2)]

Title:Iteratively Refined Behavior Regularization for Offline Reinforcement Learning

Authors:Xiaohan Hu, Yi Ma, Chenjun Xiao, Yan Zheng, Jianye Hao

View PDF

Abstract:One of the fundamental challenges for offline reinforcement learning (RL) is ensuring robustness to data distribution. Whether the data originates from a near-optimal policy or not, we anticipate that an algorithm should demonstrate its ability to learn an effective control policy that seamlessly aligns with the inherent distribution of offline data. Unfortunately, behavior regularization, a simple yet effective offline RL algorithm, tends to struggle in this regard. In this paper, we propose a new algorithm that substantially enhances behavior-regularization based on conservative policy iteration. Our key observation is that by iteratively refining the reference policy used for behavior regularization, conservative policy update guarantees gradually improvement, while also implicitly avoiding querying out-of-sample actions to prevent catastrophic learning failures. We prove that in the tabular setting this algorithm is capable of learning the optimal policy covered by the offline dataset, commonly referred to as the in-sample optimal policy. We then explore several implementation details of the algorithm when function approximations are applied. The resulting algorithm is easy to implement, requiring only a few lines of code modification to existing methods. Experimental results on the D4RL benchmark indicate that our method outperforms previous state-of-the-art baselines in most tasks, clearly demonstrate its superiority over behavior regularization.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2306.05726 [cs.LG]
	(or arXiv:2306.05726v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2306.05726

Submission history

From: Xiaohan Hu [view email]
[v1] Fri, 9 Jun 2023 07:46:24 UTC (1,607 KB)
[v2] Tue, 17 Oct 2023 16:25:25 UTC (1,934 KB)

Computer Science > Machine Learning

Title:Iteratively Refined Behavior Regularization for Offline Reinforcement Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Iteratively Refined Behavior Regularization for Offline Reinforcement Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators