Provable Defense against Backdoor Policies in Reinforcement Learning

Bharti, Shubham Kumar; Zhang, Xuezhou; Singla, Adish; Zhu, Xiaojin

Computer Science > Machine Learning

arXiv:2211.10530 (cs)

[Submitted on 18 Nov 2022]

Title:Provable Defense against Backdoor Policies in Reinforcement Learning

Authors:Shubham Kumar Bharti, Xuezhou Zhang, Adish Singla, Xiaojin Zhu

View PDF

Abstract:We propose a provable defense mechanism against backdoor policies in reinforcement learning under subspace trigger assumption. A backdoor policy is a security threat where an adversary publishes a seemingly well-behaved policy which in fact allows hidden triggers. During deployment, the adversary can modify observed states in a particular way to trigger unexpected actions and harm the agent. We assume the agent does not have the resources to re-train a good policy. Instead, our defense mechanism sanitizes the backdoor policy by projecting observed states to a 'safe subspace', estimated from a small number of interactions with a clean (non-triggered) environment. Our sanitized policy achieves $\epsilon$ approximate optimality in the presence of triggers, provided the number of clean interactions is $O\left(\frac{D}{(1-\gamma)^4 \epsilon^2}\right)$ where $\gamma$ is the discounting factor and $D$ is the dimension of state space. Empirically, we show that our sanitization defense performs well on two Atari game environments.

Comments:	Accepted at Neurips 2022
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2211.10530 [cs.LG]
	(or arXiv:2211.10530v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2211.10530

Submission history

From: Shubham Kumar Bharti [view email]
[v1] Fri, 18 Nov 2022 23:12:24 UTC (279 KB)

Computer Science > Machine Learning

Title:Provable Defense against Backdoor Policies in Reinforcement Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Provable Defense against Backdoor Policies in Reinforcement Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators