Model Agnostic Defence against Backdoor Attacks in Machine Learning

Udeshi, Sakshi; Peng, Shanshan; Woo, Gerald; Loh, Lionell; Rawshan, Louth; Chattopadhyay, Sudipta

Computer Science > Machine Learning

arXiv:1908.02203 (cs)

[Submitted on 6 Aug 2019 (v1), last revised 31 Mar 2022 (this version, v3)]

Title:Model Agnostic Defence against Backdoor Attacks in Machine Learning

Authors:Sakshi Udeshi, Shanshan Peng, Gerald Woo, Lionell Loh, Louth Rawshan, Sudipta Chattopadhyay

View PDF

Abstract:Machine Learning (ML) has automated a multitude of our day-to-day decision making domains such as education, employment and driving automation. The continued success of ML largely depends on our ability to trust the model we are using. Recently, a new class of attacks called Backdoor Attacks have been developed. These attacks undermine the user's trust in ML models. In this work, we present NEO, a model agnostic framework to detect and mitigate such backdoor attacks in image classification ML models. For a given image classification model, our approach analyses the inputs it receives and determines if the model is backdoored. In addition to this feature, we also mitigate these attacks by determining the correct predictions of the poisoned images. An appealing feature of NEO is that it can, for the first time, isolate and reconstruct the backdoor trigger. NEO is also the first defence methodology, to the best of our knowledge that is completely blackbox.
We have implemented NEO and evaluated it against three state of the art poisoned models. These models include highly critical applications such as traffic sign detection (USTS) and facial detection. In our evaluation, we show that NEO can detect $\approx$88% of the poisoned inputs on average and it is as fast as 4.4 ms per input image. We also reconstruct the poisoned input for the user to effectively test their systems.

Comments:	IEEE Transactions on Reliability, 2022
Subjects:	Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Cite as:	arXiv:1908.02203 [cs.LG]
	(or arXiv:1908.02203v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1908.02203

Submission history

From: Sakshi Udeshi [view email]
[v1] Tue, 6 Aug 2019 15:11:37 UTC (2,068 KB)
[v2] Wed, 28 Aug 2019 11:24:50 UTC (1,398 KB)
[v3] Thu, 31 Mar 2022 12:22:35 UTC (2,671 KB)

Computer Science > Machine Learning

Title:Model Agnostic Defence against Backdoor Attacks in Machine Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Model Agnostic Defence against Backdoor Attacks in Machine Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators