research-article

Inspecting prediction confidence for detecting black-box backdoor attacks

AUTHORs:

Ting WangAuthors Info & Claims

AAAI'24/IAAI'24/EAAI'24: Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence

Article No.: 32, Pages 274 - 282

https://doi.org/10.1609/aaai.v38i1.27780

Published: 20 February 2024 Publication History

Abstract

Backdoor attacks have been shown to be a serious security threat against deep learning models, and various defenses have been proposed to detect whether a model is backdoored or not. However, as indicated by a recent black-box attack, existing defenses can be easily bypassed by implanting the backdoor in the frequency domain. To this end, we propose a new defense DTINSPECTOR against black-box backdoor attacks, based on a new observation related to the prediction confidence of learning models. That is, to achieve a high attack success rate with a small amount of poisoned data, backdoor attacks usually render a model exhibiting statistically higher prediction confidences on the poisoned samples. We provide both theoretical and empirical evidence for the generality of this observation. DTINSPECTOR then carefully examines the prediction confidences of data samples, and decides the existence of backdoor using the shortcut nature of backdoor triggers. Extensive evaluations on six backdoor attacks, four datasets, and three advanced attacking types demonstrate the effectiveness of the proposed defense.

References

[1]

Barni, M.; Kallas, K.; and Tondi, B. 2019. A new backdoor attack in CNNs by training set corruption without label poisoning. In ICIP, 101-105.

[2]

Chen, B.; Carvalho, W.; Baracaldo, N.; Ludwig, H.; Edwards, B.; Lee, T.; Molloy, I.; and Srivastava, B. 2019a. Detecting backdoor attacks on deep neural networks by activation clustering. In Workshop on Artificial Intelligence Safety.

[3]

Chen, H.; Fu, C.; Zhao, J.; and Koushanfar, F. 2019b. DeepInspect: A Black-box Trojan Detection and Mitigation Framework for Deep Neural Networks. In IJCAI.

[4]

Chen, X.; Liu, C.; Li, B.; Lu, K.; and Song, D. 2017. Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526.

[5]

Cohen, J.; Rosenfeld, E.; and Kolter, Z. 2019. Certified adversarial robustness via randomized smoothing. In ICML, 1310-1320. PMLR.

[6]

Costales, R.; Mao, C.; Norwitz, R.; Kim, B.; and Yang, J. 2020. Live Trojan attacks on deep neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 796-797.

[7]

Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; and Fei-Fei, L. 2009. Imagenet: A large-scale hierarchical image database. In CVPR, 248-255.

[8]

Doan, B. G.; Abbasnejad, E.; and Ranasinghe, D. C. 2020. Februus: Input purification defense against trojan attacks on deep neural network systems. In ACSAC, 897-912.

Digital Library

[9]

Gao, Y.; Doan, B. G.; Zhang, Z.; Ma, S.; Zhang, J.; Fu, A.; Nepal, S.; and Kim, H. 2020. Backdoor attacks and counter-measures on deep learning: A comprehensive review. arXiv preprint arXiv:2007.10760.

[10]

Gao, Y.; Xu, C.; Wang, D.; Chen, S.; Ranasinghe, D. C.; and Nepal, S. 2019. Strip: A defence against trojan attacks on deep neural networks. In ACSAC, 113-125.

Digital Library

[11]

Geirhos, R.; Jacobsen, J.-H.; Michaelis, C.; Zemel, R.; Brendel, W.; Bethge, M.; and Wichmann, F. A. 2020. Shortcut learning in deep neural networks. Nature Machine Intelligence, 2(11): 665-673.

[12]

Gu, T.; Dolan-Gavitt, B.; and Garg, S. 2017. Badnets: Identifying vulnerabilities in the machine learning model supply chain. arXiv preprint arXiv:1708.06733.

[13]

Guan, J.; Tu, Z.; He, R.; and Tao, D. 2022. Few-shot Backdoor Defense Using Shapley Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 13358-13367.

[14]

Guo, W.; Wang, L.; Xing, X.; Du, M.; and Song, D. 2020. Towards Inspecting and Eliminating Trojan Backdoors in Deep Neural Networks. In ICDM.

[15]

Hayase, J.; Kong, W.; Somani, R.; and Oh, S. 2021. SPECTRE: Defending Against Backdoor Attacks Using Robust Statistics. In ICML.

[16]

He, Y.; Shen, Z.; Xia, C.; Hua, J.; Tong, W.; and Zhong, S. 2021. RABA: A Robust Avatar Backdoor Attack on Deep Neural Network. arXiv preprint arXiv:2104.01026.

[17]

Huang, K.; Li, Y.; Wu, B.; Qin, Z.; and Ren, K. 2020. Backdoor Defense via Decoupling the Training Process. In ICLR.

[18]

Iglewicz, B.; and Hoaglin, D. C. 1993. How to detect and handle outliers, volume 16.

[19]

Kolouri, S.; Saha, A.; Pirsiavash, H.; and Hoffmann, H. 2020. Universal litmus patterns: Revealing backdoor attacks in cnns. In CVPR, 301-310.

[20]

Krizhevsky, A.; Hinton, G.; et al. 2009. Learning multiple layers of features from tiny images.

[21]

Kumar, N.; Berg, A. C.; Belhumeur, P. N.; and Nayar, S. K. 2009. Attribute and simile classifiers for face verification. In ICCV, 365-372.

[22]

Li, Y.; Koren, N.; Lyu, L.; Lyu, X.; Li, B.; and Ma, X. 2021a. Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks. In ICLR.

[23]

Li, Y.; Li, Y.; Wu, B.; Li, L.; He, R.; and Lyu, S. 2021b. Invisible Backdoor Attack With Sample-Specific Triggers. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 16463-16472.

[24]

Li, Y.; Lyu, X.; Koren, N.; Lyu, L.; Li, B.; and Ma, X. 2021c. Anti-backdoor learning: Training clean models on poisoned data. In Advances in Neural Information Processing Systems (NeurIPS), volume 34.

[25]

Lin, J.; Xu, L.; Liu, Y.; and Zhang, X. 2020. Composite Backdoor Attack for Deep Neural Network by Mixing Existing Benign Features. In CCS, 113-131.

[26]

Liu, K.; Dolan-Gavitt, B.; and Garg, S. 2018. Fine-pruning: Defending against backdooring attacks on deep neural networks. In RAID, 273-294.

[27]

Liu, Y.; Lee, W.-C.; Tao, G.; Ma, S.; Aafer, Y.; and Zhang, X. 2019. ABS: Scanning neural networks for back-doors by artificial brain stimulation. In CCS, 1265-1282.

[28]

Liu, Y.; Ma, S.; Aafer, Y.; Lee, W.; Zhai, J.; Wang, W.; and Zhang, X. 2018. Trojaning Attack on Neural Networks. In NDSS.

[29]

Liu, Y.; Ma, X.; Bailey, J.; and Lu, F. 2020. Reflection backdoor: A natural backdoor attack on deep neural networks. In ECCV, 182-199. Springer.

[30]

Ma, S.; and Liu, Y. 2019. Nic: Detecting adversarial samples with neural network invariant checking. In Proceedings of the 26th Network and Distributed System Security Symposium (NDSS 2019).

[31]

Moosavi-Dezfooli, S.-M.; Fawzi, A.; Fawzi, O.; and Frossard, P. 2017. Universal adversarial perturbations. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 1765-1773.

[32]

Nguyen, T. A.; and Tran, A. 2020. Input-Aware Dynamic Backdoor Attack. In NeurIPS.

[33]

Nguyen, T. A.; and Tran, A. T. 2021. WaNet-Imperceptible Warping-based Backdoor Attack. In International Conference on Learning Representations (ICLR).

[34]

Pang, R.; Shen, H.; Zhang, X.; Ji, S.; Vorobeychik, Y.; Luo, X.; Liu, A.; and Wang, T. 2020. A tale of evil twins: Adversarial inputs versus poisoned models. In CCS, 85-99.

[35]

Qi, X.; Xie, T.; Pan, R.; Zhu, J.; Yang, Y.; and Bu, K. 2022. Towards practical deployment-stage backdoor attack on deep neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 13347-13357.

[36]

Saha, A.; Subramanya, A.; and Pirsiavash, H. 2020. Hidden trigger backdoor attacks. In AAAI, 11957-11965.

[37]

Salem, A.; Wen, R.; Backes, M.; Ma, S.; and Zhang, Y. 2020. Dynamic backdoor attacks against machine learning models. arXiv preprint arXiv:2003.03675.

[38]

Shen, G.; Liu, Y.; Tao, G.; An, S.; Xu, Q.; Cheng, S.; Ma, S.; and Zhang, X. 2021. Backdoor Scanning for Deep Neural Networks through K-Arm Optimization. In ICML.

[39]

Shokri, R.; et al. 2020. Bypassing Backdoor Detection Algorithms in Deep Learning. In EuroS&P, 175-183.

[40]

Souri, H.; Fowl, L.; Chellappa, R.; Goldblum, M.; and Goldstein, T. 2022. Sleeper agent: Scalable hidden trigger backdoors for neural networks trained from scratch. In Advances in Neural Information Processing Systems, volume 35, 19165-19178.

[41]

Stallkamp, J.; Schlipsing, M.; Salmen, J.; and Igel, C. 2011. The German traffic sign recognition benchmark: a multi-class classification competition. In IJCNN, 1453-1460.

[42]

Tang, R.; Du, M.; Liu, N.; Yang, F.; and Hu, X. 2020. An embarrassingly simple approach for trojan attack in deep neural networks. In KDD, 218-228.

[43]

Tran, B.; Li, J.; and Madry, A. 2018. Spectral signatures in backdoor attacks. In NeurIPS, 8011-8021.

[44]

Turner, A.; Tsipras, D.; and Madry, A. 2018. Clean-label backdoor attacks.

[45]

Udeshi, S.; Peng, S.; Woo, G.; Loh, L.; Rawshan, L.; and Chattopadhyay, S. 2019. Model agnostic defence against backdoor attacks in machine learning. arXiv preprint arXiv:1908.02203.

[46]

Wang, B.; Yao, Y.; Shan, S.; Li, H.; Viswanath, B.; Zheng, H.; and Zhao, B. Y. 2019. Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In SP, 707-723.

[47]

Wang, T.; Yao, Y.; Xu, F.; An, S.; Tong, H.; and Wang, T. 2022. An Invisible Black-box Backdoor Attack through Frequency Domain. In European Conference on Computer Vision (ECCV).

[48]

Wenger, E.; Passananti, J.; Bhagoji, A. N.; Yao, Y.; Zheng, H.; and Zhao, B. Y. 2021. Backdoor Attacks Against Deep Learning Systems in the Physical World. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 6206-6215.

[49]

Wu, D.; and Wang, Y. 2021. Adversarial neuron pruning purifies backdoored deep models. In Advances in Neural Information Processing Systems (NeurIPS), volume 34.

[50]

Xu, X.; Wang, Q.; Li, H.; Borisov, N.; Gunter, C. A.; and Li, B. 2021. Detecting ai trojans using meta neural analysis. In SP, 103-120.

[51]

Yao, Y.; Li, H.; Zheng, H.; and Zhao, B. Y. 2019. Latent backdoor attacks on deep neural networks. In CCS, 2041-2055.

[52]

Zhao, P.; Chen, P.-Y.; Das, P.; Ramamurthy, K. N.; and Lin, X. 2020a. Bridging mode connectivity in loss landscapes and adversarial robustness. In ICLR.

[53]

Zhao, S.; Ma, X.; Zheng, X.; Bailey, J.; Chen, J.; and Jiang, Y.-G. 2020b. Clean-label backdoor attacks on video recognition models. In CVPR, 14443-14452.

Index Terms

Inspecting prediction confidence for detecting black-box backdoor attacks

Index terms have been assigned to the content through auto-classification.

Recommendations

Black-box adversarial attacks on XSS attack detection model
Abstract
Cross-site scripting (XSS) has been extensively studied, although mitigating such attacks in web applications remains challenging. While there is an increasing number of XSS attack detection approaches designed based on machine learning and deep ...
Detecting SYN flooding attacks based on traffic prediction

SYN flooding attacks are a common type of distributed denial-of-service attacks. Up to now, many defense schemes have been proposed against SYN flooding attacks. Traditional defense schemes rely on passively sniffing an attacking signature and are ...
B³: Backdoor Attacks against Black-box Machine Learning Models
Backdoor attacks aim to inject backdoors to victim machine learning models during training time, such that the backdoored model maintains the prediction power of the original model towards clean inputs and misbehaves towards backdoored inputs with the ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

AAAI'24/IAAI'24/EAAI'24: Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence

February 2024

23861 pages

ISBN:978-1-57735-887-9

Copyright © 2024 Association for the Advancement of Artificial Intelligence.

Sponsors

Association for the Advancement of Artificial Intelligence

Publisher

AAAI Press

Publication History

Published: 20 February 2024

Qualifiers

Research-article
Research
Refereed limited

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 25 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Table of Conten