skip to main content
research-article

Backdoor Two-Stream Video Models on Federated Learning

Published: 12 September 2024 Publication History

Abstract

Video models on federated learning (FL) enable continual learning of the involved models for video tasks on end-user devices while protecting the privacy of end-user data. As a result, the security issues on FL, e.g., the backdoor attacks on FL and their defense have increasingly become the domains of extensive research in recent years. The backdoor attacks on FL are a class of poisoning attacks, in which an attacker, as one of the training participants, submits poisoned parameters and thus injects the backdoor into the global model after aggregation. Existing backdoor attacks against videos based on FL only poison RGB frames, which makes it that the attack could be easily mitigated by two-stream model neutralization. Therefore, it is a big challenge to manipulate the most advanced two-stream video model with a high success rate by poisoning only a small proportion of training data in the framework of FL. In this paper, a new backdoor attack scheme incorporating the rich spatial and temporal structures of video data is proposed, which injects the backdoor triggers into both the optical flow and RGB frames of video data through multiple rounds of model aggregations. In addition, the adversarial attack is utilized on the RGB frames to further boost the robustness of the attacks. Extensive experiments on real-world datasets verify that our methods outperform the state-of-the-art backdoor attacks and show better performance in terms of stealthiness and persistence.

References

[1]
Eugene Bagdasaryan and Vitaly Shmatikov. 2021. Blind backdoors in deep learning models. In Usenix Security. (2021), 1505–1521.
[2]
Eugene Bagdasaryan, Andreas Veit, Yiqing Hua, Deborah Estrin, and Vitaly Shmatikov. 2020. How to backdoor federated learning. In International Conference on Artificial Intelligence and Statistics. PMLR, 2938–2948.
[3]
Mauro Barni, Kassem Kallas, and Benedetta Tondi. 2019. A new backdoor attack in CNNs by training set corruption without label poisoning. In 2019 IEEE International Conference on Image Processing (ICIP). IEEE, 101–105.
[4]
Arjun Nitin Bhagoji, Supriyo Chakraborty, Prateek Mittal, and Seraphin Calo. 2019. Analyzing federated learning through an adversarial lens. In International Conference on Machine Learning. PMLR, 634–643.
[5]
Peva Blanchard, El Mahdi El Mhamdi, Rachid Guerraoui, and Julien Stainer. 2017. Machine learning with adversaries: Byzantine tolerant gradient descent. Advances in Neural Information Processing Systems 30 (2017).
[6]
Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? A new model and the kinetics dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6299–6308.
[7]
Xinyun Chen, Chang Liu, Bo Li, Kimberly Lu, and Dawn Song. 2017. Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526 (2017).
[8]
Yiqiang Chen, Xin Qin, Jindong Wang, Chaohui Yu, and Wen Gao. 2020. FedHealth: A federated transfer learning framework for wearable healthcare. IEEE Intelligent Systems 35, 4 (2020), 83–93.
[9]
Mingxing Duan, Kenli Li, Jiayan Deng, Bin Xiao, and Qi Tian. 2022. A novel multi-sample generation method for adversarial attacks. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 18, 4 (2022), 1–21.
[10]
Abderrahim El Mhouti and Mohamed Erradi. 2018. Towards a smart learning management system (Smart-LMS) to improve collaborative learning in higher education. In Proceedings of the 3rd International Conference on Smart City Applications. 1–9.
[11]
Anmin Fu, Xianglong Zhang, Naixue Xiong, Yansong Gao, Huaqun Wang, and Jing Zhang. 2020. VFL: A verifiable federated learning with privacy-preserving for big data in industrial IoT. IEEE Transactions on Industrial Informatics 18, 5 (2020), 3316–3326.
[12]
Clement Fung, Chris J. M. Yoon, and Ivan Beschastnikh. 2018. Mitigating sybils in federated learning poisoning. arXiv preprint arXiv:1808.04866 (2018).
[13]
Yansong Gao, Change Xu, Derui Wang, Shiping Chen, Damith C. Ranasinghe, and Surya Nepal. 2019. Strip: A defence against trojan attacks on deep neural networks. In Proceedings of the 35th Annual Computer Security Applications Conference. 113–125.
[14]
Jingyi Ge. 2022. White-box inference attacks against centralized machine learning and federated learning. arXiv preprint arXiv:2301.03595 (2022).
[15]
Robin C. Geyer, Tassilo Klein, and Moin Nabi. 2017. Differentially private federated learning: A client level perspective. arXiv preprint arXiv:1712.07557 (2017).
[16]
Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014).
[17]
Tianyu Gu, Kang Liu, Brendan Dolan-Gavitt, and Siddharth Garg. 2019. BadNets: Evaluating backdooring attacks on deep neural networks. IEEE Access 7 (2019), 47230–47244.
[18]
Ning Han, Jingjing Chen, Hao Zhang, Huanwen Wang, and Hao Chen. 2022. Adversarial multi-grained embedding network for cross-modal text-video retrieval. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 18, 2 (2022), 1–23.
[19]
Zhihao Hu, Guo Lu, and Dong Xu. 2021. FVC: A new framework towards deep video compression in feature space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1502–1511.
[20]
Anbu Huang. 2020. Dynamic backdoor attacks against federated learning. arXiv preprint arXiv:2011.07429 (2020).
[21]
Bin Jiang, Jianqiang Li, Huihui Wang, and Houbing Song. 2021. Privacy-preserving federated learning for industrial edge computing via hybrid differential privacy and adaptive compression. IEEE Transactions on Industrial Informatics 19, 2 (2021), 1136–1144.
[22]
A. Krizhevsky and G. Hinton. 2009. Learning multiple layers of features from tiny images. University of Toronto. (2009), 7.
[23]
Hildegard Kuehne, Hueihan Jhuang, Estíbaliz Garrote, Tomaso Poggio, and Thomas Serre. 2011. HMDB: A large video database for human motion recognition. In 2011 International Conference on Computer Vision. IEEE, 2556–2563.
[24]
Tian Li, Anit Kumar Sahu, Ameet Talwalkar, and Virginia Smith. 2020. Federated learning: Challenges, methods, and future directions. IEEE Signal Processing Magazine 37, 3 (2020), 50–60.
[25]
Yuezun Li, Yiming Li, Baoyuan Wu, Longkang Li, Ran He, and Siwei Lyu. 2021. Invisible backdoor attack with sample-specific triggers. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 16463–16472.
[26]
Yonggang Li, Chunping Liu, Yi Ji, Shengrong Gong, and Haibao Xu. 2020. Spatio-temporal deep residual network with hierarchical attentions for video event recognition. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 16, 2s (2020), 1–21.
[27]
Deyin Liu, Lin Wu, Richang Hong, Zongyuan Ge, Jialie Shen, Farid Boussaid, and Mohammed Bennamoun. 2023. Generative metric learning for adversarially robust open-world person re-identification. ACM Transactions on Multimedia Computing, Communications and Applications 19, 1 (2023), 1–19.
[28]
Shiguang Liu, Huixin Wang, and Xiaoli Zhang. 2021. Video decolorization based on the CNN and LSTM neural network. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 17, 3 (2021), 1–18.
[29]
Yang Liu, Anbu Huang, Yun Luo, He Huang, Youzhi Liu, Yuanyuan Chen, Lican Feng, Tianjian Chen, Han Yu, and Qiang Yang. 2020. FedVision: An online visual object detection platform powered by federated learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 13172–13179.
[30]
Yingqi Liu, Shiqing Ma, Yousra Aafer, Wen-Chuan Lee, Juan Zhai, Weihang Wang, and Xiangyu Zhang. 2018. Trojaning attack on neural networks. Networks[C] //25th Annual Network And Distributed System Security Symposium, (2018). 1–15.
[31]
Giulio Lovisotto, Simon Eberz, and Ivan Martinovic. 2020. Biometric backdoors: A poisoning attack against unsupervised template updating. In 2020 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 184–197.
[32]
Guo Lu, Wanli Ouyang, Dong Xu, Xiaoyun Zhang, Chunlei Cai, and Zhiyong Gao. 2019. DVC: An end-to-end deep video compression framework. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11006–11015.
[33]
Lingjuan Lyu, Han Yu, Xingjun Ma, Lichao Sun, Jun Zhao, Qiang Yang, and Philip S. Yu. 2020. Privacy and robustness in federated learning: Attacks and defenses. arXiv preprint arXiv:2012.06337 (2020).
[34]
Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2017. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017).
[35]
Luca Melis, Congzheng Song, Emiliano De Cristofaro, and Vitaly Shmatikov. 2019. Exploiting unintended feature leakage in collaborative learning. In 2019 IEEE Symposium on Security and Privacy (SP). IEEE, 691–706.
[36]
Laurent Meunier, Jamal Atif, and Olivier Teytaud. 2019. Yet another but more efficient black-box adversarial attack: Tiling and evolution strategies. arXiv preprint arXiv:1910.02244 (2019).
[37]
Daniele Mugnai, Federico Pernici, Francesco Turchini, and Alberto Del Bimbo. 2022. Fine-grained adversarial semi-supervised learning. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 18, 1s (2022), 1–19.
[38]
Javier Sánchez Pérez, Enric Meinhardt-Llopis, and Gabriele Facciolo. 2013. TV-L1 optical flow estimation. Image Processing On Line 2013 (2013), 137–150.
[39]
Krishna Pillutla, Sham M. Kakade, and Zaid Harchaoui. 2022. Robust aggregation for federated learning. IEEE Transactions on Signal Processing 70 (2022), 1142–1154.
[40]
Aniruddha Saha, Akshayvarun Subramanya, and Hamed Pirsiavash. 2020. Hidden trigger backdoor attacks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 11957–11965.
[41]
Felix Sattler, Simon Wiedemann, Klaus-Robert Müller, and Wojciech Samek. 2019. Robust and communication-efficient federated learning from non-IID data. IEEE Transactions on Neural Networks and Learning Systems 31, 9 (2019), 3400–3413.
[42]
Giorgio Severi, Jim Meyer, Scott Coull, and Alina Oprea. 2021. Explanation-Guided backdoor poisoning attacks against malware classifiers[C] //30th USENIX security symposium (USENIX security 21). 2021, 1487–1504.
[43]
Ali Shafahi, W. Ronny Huang, Mahyar Najibi, Octavian Suciu, Christoph Studer, Tudor Dumitras, and Tom Goldstein. 2018. Poison frogs! Targeted clean-label poisoning attacks on neural networks. Advances in Neural Information Processing Systems 31 (2018).
[44]
Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. Advances in Neural Information Processing Systems 27 (2014).
[45]
Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012).
[46]
Ziteng Sun, Peter Kairouz, Ananda Theertha Suresh, and H. Brendan McMahan. 2019. Can you really backdoor federated learning? arXiv preprint arXiv:1911.07963 (2019).
[47]
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2013. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013).
[48]
Yi-Hsuan Tsai, Ming-Yu Liu, Deqing Sun, Ming-Hsuan Yang, and Jan Kautz. 2018. Learning binary residual representations for domain-specific video streaming. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.
[49]
Alexander Turner, Dimitris Tsipras, and Aleksander Madry. 2018. Clean-label backdoor attacks. (2018).
[50]
Alexander Turner, Dimitris Tsipras, and Aleksander Madry. 2019. Label-consistent backdoor attacks. arXiv preprint arXiv:1912.02771 (2019).
[51]
Matthew Walmer, Karan Sikka, Indranil Sur, Abhinav Shrivastava, and Susmit Jha. 2022. Dual-key multimodal backdoors for visual question answering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15375–15385.
[52]
Bolun Wang, Yuanshun Yao, Shawn Shan, Huiying Li, Bimal Viswanath, Haitao Zheng, and Ben Y. Zhao. 2019. Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In 2019 IEEE Symposium on Security and Privacy (SP). IEEE, 707–723.
[53]
Hongyi Wang, Kartik Sreenivasan, Shashank Rajput, Harit Vishwakarma, Saurabh Agarwal, Jy-yong Sohn, Kangwook Lee, and Dimitris Papailiopoulos. 2020. Attack of the tails: Yes, you really can backdoor federated learning. Advances in Neural Information Processing Systems 33 (2020), 16070–16084.
[54]
Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Guilin Liu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018. Video-to-video synthesis. In Conference on Neural Information Processing Systems (NeurIPS).
[55]
Zhi-Ming Wang, Meng-Ting Gu, and Jia-Hui Hou. 2019. Sample based fast adversarial attack method. Neural Processing Letters 50, 3 (2019), 2731–2744.
[56]
Chen Wu, Xian Yang, Sencun Zhu, and Prasenjit Mitra. 2020. Mitigating backdoor attacks in federated learning. arXiv preprint arXiv:2011.01767 (2020).
[57]
Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017).
[58]
Chulin Xie, Keli Huang, Pin-Yu Chen, and Bo Li. 2019. DBA: Distributed backdoor attacks against federated learning. In International Conference on Learning Representations.
[59]
Cong Xie, Oluwasanmi Koyejo, and Indranil Gupta. 2020. Fall of empires: Breaking Byzantine-tolerant SGD by inner product manipulation. In Uncertainty in Artificial Intelligence. PMLR, 261–270.
[60]
Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin Tong. 2019. Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST) 10, 2 (2019), 1–19.
[61]
Yawen Zeng, Da Cao, Shaofei Lu, Hanling Zhang, Jiao Xu, and Zheng Qin. 2022. Moment is important: Language-based video moment retrieval via adversarial learning. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 18, 2 (2022), 1–21.
[62]
Shihao Zhao, Xingjun Ma, Xiang Zheng, James Bailey, Jingjing Chen, and Yu-Gang Jiang. 2020. Clean-label backdoor attacks on video recognition models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14443–14452.
[63]
Xizhou Zhu, Yuwen Xiong, Jifeng Dai, Lu Yuan, and Yichen Wei. 2017. Deep feature flow for video recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2349–2358.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 20, Issue 11
November 2024
702 pages
EISSN:1551-6865
DOI:10.1145/3613730
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 September 2024
Online AM: 07 March 2024
Accepted: 23 November 2023
Revised: 01 September 2023
Received: 29 March 2023
Published in TOMM Volume 20, Issue 11

Check for updates

Author Tags

  1. Federated learning
  2. backdoor attack
  3. two-stream video recognition
  4. adversarial attack

Qualifiers

  • Research-article

Funding Sources

  • Joint Funds of the National Natural Science Foundation of China
  • Shenzhen Colleges and Universities Stable Support

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)345
  • Downloads (Last 6 weeks)41
Reflects downloads up to 24 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media