scholar.google.com › citations
Off-policy evaluation (OPE) is important for closing the gap between offline training and evaluation of reinforcement learning (RL), by estimating performance and/or rank of target (evaluation) policies using offline trajectories only.
Oct 11, 2023
People also ask
What is off-policy evaluation?
What are the two 2 main types of policy evaluation?
What is off-policy evaluation via importance sampling?
What is the difference between policy improvement and policy evaluation?
Off-policy evaluation (OPE) aims to estimate the performance of reinforcement learning (RL) policies using only a fixed set of offline trajectories [61], i.e., ...
Feb 5, 2024 · The paper considers the challenging case where human feedback is only available at the end of an episode without any per-step immediate human reward (IHR).
Jun 14, 2024 · We formalize the problem of off-policy evaluation from logged human feedback as offline evaluation with ranked lists [13, 31, 18].
Jun 14, 2024 · This motivates us to study off-policy evaluation from logged human feedback. We formalize the problem, propose both model-based and model-free ...
May 30, 2024 · Off-policy evaluation (OPE) is important for closing the gap between offline training and evaluation of reinforcement learning (RL), ...
Sep 9, 2024 · This motivates us to study off-policy evaluation from logged human feedback. We formalize the problem, propose both model-based and model-free ...
Jun 14, 2024 · This work formalizes the problem of off-policy evaluation from logged human feedback, proposes both model-based and model-free estimators ...
Due to the mismatch between the visitation distributions of the behavior and target policies, evaluation in the off-policy setting is entirely different from ...
In this paper, we study the sample efficiency of OPE with human preference and establish a statistical guarantee for it.