User-Oriented Robust Reinforcement Learning

You, Haoyi; Yu, Beichen; Jin, Haiming; Yang, Zhaoxing; Sun, Jiahui

Computer Science > Machine Learning

arXiv:2202.07301 (cs)

[Submitted on 15 Feb 2022 (v1), last revised 10 Dec 2022 (this version, v4)]

Title:User-Oriented Robust Reinforcement Learning

Authors:Haoyi You, Beichen Yu, Haiming Jin, Zhaoxing Yang, Jiahui Sun

View PDF

Abstract:Recently, improving the robustness of policies across different environments attracts increasing attention in the reinforcement learning (RL) community. Existing robust RL methods mostly aim to achieve the max-min robustness by optimizing the policy's performance in the worst-case environment. However, in practice, a user that uses an RL policy may have different preferences over its performance across environments. Clearly, the aforementioned max-min robustness is oftentimes too conservative to satisfy user preference. Therefore, in this paper, we integrate user preference into policy learning in robust RL, and propose a novel User-Oriented Robust RL (UOR-RL) framework. Specifically, we define a new User-Oriented Robustness (UOR) metric for RL, which allocates different weights to the environments according to user preference and generalizes the max-min robustness metric. To optimize the UOR metric, we develop two different UOR-RL training algorithms for the scenarios with or without a priori known environment distribution, respectively. Theoretically, we prove that our UOR-RL training algorithms converge to near-optimal policies even with inaccurate or completely no knowledge about the environment distribution. Furthermore, we carry out extensive experimental evaluations in 4 MuJoCo tasks. The experimental results demonstrate that UOR-RL is comparable to the state-of-the-art baselines under the average and worst-case performance metrics, and more importantly establishes new state-of-the-art performance under the UOR metric.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2202.07301 [cs.LG]
	(or arXiv:2202.07301v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2202.07301

Submission history

From: Haoyi You [view email]
[v1] Tue, 15 Feb 2022 10:33:55 UTC (1,795 KB)
[v2] Thu, 17 Feb 2022 12:10:24 UTC (1,776 KB)
[v3] Fri, 18 Feb 2022 01:26:27 UTC (1,776 KB)
[v4] Sat, 10 Dec 2022 20:40:47 UTC (2,455 KB)

Computer Science > Machine Learning

Title:User-Oriented Robust Reinforcement Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:User-Oriented Robust Reinforcement Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators