Off-Policy Evaluation for Human Feedback.

AllImages Videos News Maps Shopping Books

Scholarly articles for Off-Policy Evaluation for Human Feedback.

scholar.google.com › citations

Off-policy evaluation for human feedback
Gao · Cited by 6

Way off-policy batch deep reinforcement learning of …
Jaques · Cited by 361

… : Human-centric off-policy evaluation for e-learning and …
Gao · Cited by 12

Off-policy evaluation (OPE) is important for closing the gap between offline training and evaluation of reinforcement learning (RL), by estimating performance and/or rank of target (evaluation) policies using offline trajectories only.

_{Oct 11, 2023}

[2310.07123] Off-Policy Evaluation for Human Feedback - arXiv

arxiv.org › cs

About Featured Snippets

[PDF] Off-Policy Evaluation for Human Feedback

proceedings.neurips.cc › paper › file

Off-policy evaluation (OPE) aims to estimate the performance of reinforcement learning (RL) policies using only a fixed set of offline trajectories [61], i.e., ...

Off-Policy Evaluation for Human Feedback | OpenReview

openreview.net › forum

Feb 5, 2024 · The paper considers the challenging case where human feedback is only available at the end of an episode without any per-step immediate human reward (IHR).

Off-Policy Evaluation from Logged Human Feedback | OpenReview

On Trajectory Augmentations for Off-Policy Evaluation - OpenReview

Benchmarks for Deep Off-Policy Evaluation - OpenReview

Goal-Conditioned Exploration from Human-in-the-Loop Feedback

More results from openreview.net

[PDF] Off-Policy Evaluation from Logged Human Feedback - arXiv

arxiv.org › pdf

Jun 14, 2024 · We formalize the problem of off-policy evaluation from logged human feedback as offline evaluation with ranked lists [13, 31, 18].

Off-Policy Evaluation from Logged Human Feedback | Papers With Code

paperswithcode.com › paper › off-policy...

Jun 14, 2024 · This motivates us to study off-policy evaluation from logged human feedback. We formalize the problem, propose both model-based and model-free ...

Off-policy evaluation for human feedback - ACM Digital Library

dl.acm.org › doi

May 30, 2024 · Off-policy evaluation (OPE) is important for closing the gap between offline training and evaluation of reinforcement learning (RL), ...

(PDF) Off-Policy Evaluation from Logged Human Feedback

www.researchgate.net › publication › 38...

Sep 9, 2024 · This motivates us to study off-policy evaluation from logged human feedback. We formalize the problem, propose both model-based and model-free ...

Off-Policy Evaluation from Logged Human Feedback - Semantic Scholar

www.semanticscholar.org › paper

Jun 14, 2024 · This work formalizes the problem of off-policy evaluation from logged human feedback, proposes both model-based and model-free estimators ...

[PDF] Policy Evaluation for Reinforcement Learning from Human Feedback

proceedings.mlr.press › ...

Due to the mismatch between the visitation distributions of the behavior and target policies, evaluation in the off-policy setting is entirely different from ...

Policy Evaluation for Reinforcement Learning from Human Feedback

proceedings.mlr.press › ...

In this paper, we study the sample efficiency of OPE with human preference and establish a statistical guarantee for it.