Actor critic with differentially private critic

J Lebensold, W Hamilton, B Balle, D Precup - arXiv preprint arXiv …, 2019 - arxiv.org
arXiv preprint arXiv:1910.05876, 2019arxiv.org
Reinforcement learning algorithms are known to be sample inefficient, and often
performance on one task can be substantially improved by leveraging information (eg, via
pre-training) on other related tasks. In this work, we propose a technique to achieve such
knowledge transfer in cases where agent trajectories contain sensitive or private
information, such as in the healthcare domain. Our approach leverages a differentially
private policy evaluation algorithm to initialize an actor-critic model and improve the …
Reinforcement learning algorithms are known to be sample inefficient, and often performance on one task can be substantially improved by leveraging information (e.g., via pre-training) on other related tasks. In this work, we propose a technique to achieve such knowledge transfer in cases where agent trajectories contain sensitive or private information, such as in the healthcare domain. Our approach leverages a differentially private policy evaluation algorithm to initialize an actor-critic model and improve the effectiveness of learning in downstream tasks. We empirically show this technique increases sample efficiency in resource-constrained control problems while preserving the privacy of trajectories collected in an upstream task.
arxiv.org
Showing the best result for this search. See all results