skip to main content
10.1145/3627106.3627178acmotherconferencesArticle/Chapter ViewAbstractPublication PagesacsacConference Proceedingsconference-collections
research-article

PSP-Mal: Evading Malware Detection via Prioritized Experience-based Reinforcement Learning with Shapley Prior

Published: 04 December 2023 Publication History

Abstract

With the widespread application of machine learning techniques in malware detection, researchers have proposed various adversarial attack methods to generate adversarial examples (AEs) of malware, thereby evading detection. Previous studies have shown that the reinforcement learning (RL) framework can enable black-box attacks by performing a sequence of function-preserving operations, which produces functional evasive malware samples. However, it is difficult to obtain the useful guidance and feedbacks from the environment for agent training in the black-box scenario, which results in the RL framework being unable to learn the effective evasion policy. In this paper, we propose the Shapley prior and establish a prior-guidance-based RL framework, namely PSP-Mal, to generate AEs against Portable Executable (PE) malware detectors. Our framework improves on existing methods in three aspects: 1) We explore feature effects of the black-box model by computing Shapley values and further propose the Shapley prior to represent the expected impact of operations. 2) A novel prioritized experience utilization mechanism is established regarding the Shapley prior guidance in the RL framework. 3) The actions are expanded into item-content pairs and we use the Thompson sampling to choose effective content, which helps to reduce randomness and ensure repeatability. We compare the attack performance of our framework with other methods, and experimental results demonstrate that our algorithm is more effective. The evasion rates of PSP-Mal against the LightGBM models trained on EMBER and SOREL-20M reach 76.88% and 72.03%, respectively.

References

[1]
Naveed Akhtar and Ajmal Mian. 2018. Threat of adversarial attacks on deep learning in computer vision: A survey. IEEE Access 6 (2018), 14410–14430.
[2]
Hyrum S Anderson, Anant Kharkar, Bobby Filar, and Phil Roth. 2017. Evading machine learning malware detection. Black Hat 2017 (2017).
[3]
Hyrum S Anderson and Phil Roth. 2018. Ember: an open dataset for training static pe malware machine learning models. arXiv preprint arXiv:1804.04637 (2018).
[4]
Zahra Bazrafshan, Hashem Hashemi, Seyed Mehdi Hazrati Fard, and Ali Hamzeh. 2013. A survey on heuristic malware detection techniques. In the 5th Conference on Information and Knowledge Technology. IEEE, 113–120.
[5]
Nicholas Carlini and David Wagner. 2018. Audio adversarial examples: Targeted attacks on speech-to-text. In IEEE Security and Privacy Workshops (SPW). IEEE, 1–7.
[6]
Olivier Chapelle and Lihong Li. 2011. An empirical evaluation of thompson sampling. Advances in Neural Information Processing Systems 24 (2011).
[7]
Bingcai Chen, Zhongru Ren, Chao Yu, Iftikhar Hussain, and Jintao Liu. 2019. Adversarial examples for cnn-based malware detectors. IEEE Access 7 (2019), 54360–54371.
[8]
Jun Chen, Jingfei Jiang, Rongchun Li, and Yong Dou. 2020. Generating adversarial examples for static PE malware detector based on deep reinforcement learning. In Journal of Physics: Conference Series, Vol. 1575. IOP Publishing, 012011.
[9]
George E Dahl, Jack W Stokes, Li Deng, and Dong Yu. 2013. Large-scale malware classification using random projections and neural networks. In IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 3422–3426.
[10]
Luca Demetrio, Battista Biggio, Giovanni Lagorio, Fabio Roli, and Alessandro Armando. 2019. Explaining vulnerabilities of deep learning to adversarial malware binaries. Italian Conference on Cybersecurity (2019).
[11]
Luca Demetrio, Battista Biggio, Giovanni Lagorio, Fabio Roli, and Alessandro Armando. 2021. Functionality-preserving black-box optimization of adversarial windows malware. IEEE Transactions on Information Forensics and Security 16 (2021), 3469–3478.
[12]
Luca Demetrio, Scott E Coull, Battista Biggio, Giovanni Lagorio, Alessandro Armando, and Fabio Roli. 2021. Adversarial exemples: a survey and experimental evaluation of practical attacks on machine learning for windows malware detection. ACM Transactions on Privacy and Security 24, 4 (2021), 1–31.
[13]
Tianyu Du, Shouling Ji, Jinfeng Li, Qinchen Gu, Ting Wang, and Raheem Beyah. 2020. Sirenattack: Generating adversarial audio for end-to-end acoustic systems. In Proceedings of the 15th ACM Asia Conference on Computer and Communications Security. 357–369.
[14]
Mohammadreza Ebrahimi, Jason Pacheco, Weifeng Li, James Lee Hu, and Hsinchun Chen. 2021. Binary Black-Box Attacks Against Static Malware Detectors with Reinforcement Learning in Discrete Action Spaces. In IEEE Security and Privacy Workshops (SPW). IEEE, 85–91.
[15]
Yong Fang, Yuetian Zeng, Beibei Li, Liang Liu, and Lei Zhang. 2020. DeepDetectNet vs RLAttackNet: An adversarial method to improve deep learning-based static malware detection model. Plos One 15, 4 (2020), e0231626.
[16]
Zhiyang Fang, Junfeng Wang, Jiaxuan Geng, Yingjie Zhou, and Xuan Kan. 2021. A3CMal: Generating adversarial samples to force targeted misclassification by reinforcement learning. Applied Soft Computing 109 (2021), 107505.
[17]
Zhiyang Fang, Junfeng Wang, Boya Li, Siqi Wu, Yingjie Zhou, and Haiying Huang. 2019. Evading anti-malware engines with deep reinforcement learning. IEEE Access 7 (2019), 48867–48879.
[18]
Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of Statistics (2001), 1189–1232.
[19]
Daniel Gibert, Matt Fredrikson, Carles Mateu, Jordi Planes, and Quan Le. 2022. Enhancing the insertion of NOP instructions to obfuscate malware via deep reinforcement learning. Computers & Security 113 (2022), 102543.
[20]
Daniel Gibert, Carles Mateu, Jordi Planes, and Ramon Vicens. 2018. Classification of malware by using structural entropy on convolutional neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.
[21]
Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014).
[22]
Richard Harang and Ethan M Rudd. 2020. SOREL-20M: A large scale benchmark dataset for malicious PE detection. arXiv preprint arXiv:2012.07634 (2020).
[23]
Weiwei Hu and Ying Tan. 2017. Black-box attacks against RNN based malware detection algorithms. arXiv preprint arXiv:1705.08131 (2017).
[24]
Masataka Kawai, Kaoru Ota, and Mianxing Dong. 2019. Improved malgan: Avoiding malware detector by leaning cleanware features. In the International Conference on Artificial Intelligence in Information and Communication (ICAIIC). IEEE, 040–045.
[25]
Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. Lightgbm: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems 30 (2017).
[26]
Aminollah Khormali, Ahmed Abusnaina, Songqing Chen, DaeHun Nyang, and Aziz Mohaisen. 2019. COPYCAT: practical adversarial attacks on visualization-based malware detection. arXiv preprint arXiv:1909.09735 (2019).
[27]
Bojan Kolosnjaji, Ambra Demontis, Battista Biggio, and Maiorca. 2018. Adversarial malware binaries: Evading deep learning for malware detection in executables. In the 26th European Signal Processing Conference (EUSIPCO). IEEE, 533–537.
[28]
Jeremy Z Kolter and Marcus A Maloof. 2004. Learning to detect malicious executables in the wild. In the 10th ACM International Conference on Knowledge Discovery and Data Mining. 470–478.
[29]
Felix Kreuk, Assi Barak, and Aviv-Reuven. 2018. Deceiving end-to-end deep learning malware detectors using adversarial examples. arXiv preprint arXiv:1802.04528 (2018).
[30]
Raphael Labaca-Castro, Sebastian Franz, and Gabi Dreo Rodosek. 2021. AIMED-RL: Exploring adversarial malware examples with reinforcement learning. In Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track: European Conference, ECML PKDD 2021. Springer, 37–52.
[31]
Xintong Li and Qi Li. 2021. An IRL-based malware adversarial generation method to evade anti-malware engines. Computers & Security 104 (2021), 102118.
[32]
Scott M Lundberg, Gabriel Erion, Hugh Chen, Alex DeGrave, Jordan M Prutkin, Bala Nair, Ronit Katz, Jonathan Himmelfarb, Nisha Bansal, and Su-In Lee. 2020. From local explanations to global understanding with explainable AI for trees. Nature machine intelligence 2, 1 (2020), 56–67.
[33]
Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems 30 (2017).
[34]
Christoph Molnar. 2020. Interpretable machine learning. Lulu. com.
[35]
Andrew Y Ng, Stuart Russell, 2000. Algorithms for inverse reinforcement learning. In International Conference on Machine Learning, Vol. 1. 2.
[36]
Fabio Pierazzi, Feargus Pendlebury, Jacopo Cortellazzi, and Lorenzo Cavallaro. 2020. Intriguing properties of adversarial ml attacks in the problem space. In IEEE Symposium on Security and Privacy (SP). IEEE, 1332–1349.
[37]
Tony Quertier, Benjamin Marais, Stéphane Morucci, and Bertrand Fournel. 2022. MERLIN–Malware Evasion with Reinforcement LearnINg. arXiv preprint arXiv:2203.12980 (2022).
[38]
Edward Raff, Jon Barker, Jared Sylvester, Robert Brandon, Bryan Catanzaro, and Charles K Nicholas. 2018. Malware detection by eating a whole exe. In Proceedings of the AAAI Conference on Artificial Intelligence.
[39]
Ishai Rosenberg, Asaf Shabtai, Lior Rokach, and Yuval Elovici. 2018. Generic black-box end-to-end attack against state of the art API call based malware classifiers. In the 21st International Symposium Research in Attacks, Intrusions and Defense. Springer, 490–510.
[40]
V Sai Sathyanarayan, Pankaj Kohli, and Bezawada Bruhadeshwar. 2008. Signature generation and detection of malware families. In Information Security and Privacy: 13th Australasian Conference. Springer, 336–349.
[41]
Joshua Saxe and Konstantin Berlin. 2015. Deep neural network based malware detection using two dimensional binary program features. In the 10th International Conference on Malicious and Unwanted Software. IEEE, 11–20.
[42]
Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. 2015. Prioritized experience replay. arXiv preprint arXiv:1511.05952 (2015).
[43]
Giorgio Severi, Jim Meyer, Scott E Coull, and Alina Oprea. 2021. Explanation-Guided Backdoor Poisoning Attacks Against Malware Classifiers. In USENIX Security Symposium. 1487–1504.
[44]
Wei Song, Xuezixiang Li, Sadia Afroz, Deepali Garg, Dmitry Kuznetsov, and Heng Yin. 2020. Mab-malware: A reinforcement learning framework for attacking static malware classifiers. arXiv preprint arXiv:2003.03100 (2020).
[45]
Octavian Suciu, Scott E Coull, and Jeffrey Johns. 2019. Exploring adversarial examples in malware detection. In IEEE Security and Privacy Workshops (SPW). IEEE, 8–14.
[46]
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2013. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013).
[47]
Hado Van Hasselt, Arthur Guez, and David Silver. 2016. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 30.
[48]
Xiruo Wang and Risto Miikkulainen. 2020. MDEA: Malware detection with evolutionary adversarial learning. In IEEE Congress on Evolutionary Computation (CEC). IEEE, 1–8.
[49]
Ziyu Wang, Tom Schaul, Matteo Hessel, Hado Hasselt, Marc Lanctot, and Nando Freitas. 2016. Dueling network architectures for deep reinforcement learning. In International Conference on Machine Learning. PMLR, 1995–2003.
[50]
Cangshuai Wu, Jiangyong Shi, Yuexiang Yang, and Wenhua Li. 2018. Enhancing machine learning based malware detection model by reinforcement learning. In Proceedings of the 8th International Conference on Communication and Network Security. 74–78.
[51]
Di Wu, Binxing Fang, Junnan Wang, Qixu Liu, and Xiang Cui. 2019. Evading machine learning botnet detection models via deep reinforcement learning. In IEEE International Conference on Communications (ICC). IEEE, 1–6.
[52]
Ilsun You and Kangbin Yim. 2010. Malware obfuscation techniques: A brief survey. In the International Conference on Broadband, Wireless Computing, Communication and Applications. IEEE, 297–300.
[53]
Junkun Yuan, Shaofang Zhou, Lanfen Lin, Feng Wang, and Jia Cui. 2020. Black-box adversarial attacks against deep learning based malware binaries detection with GAN. In the European Conference on Artificial Intelligence. IOS Press, 2536–2542.
[54]
Dazhi Zhan, Yexin Duan, Yue Hu, Lujia Yin, Zhisong Pan, and Shize Guo. 2023. AMGmal: Adaptive mask-guided adversarial attack against malware detection with minimal perturbation. Computers & Security 127 (2023), 103103.
[55]
Lan Zhang, Peng Liu, Yoon-Ho Choi, and Ping Chen. 2022. Semantics-preserving reinforcement learning attack against graph neural networks for malware detection. IEEE Transactions on Dependable and Secure Computing 20, 2 (2022), 1390–1402.
[56]
Fangtian Zhong, Pengfei Hu, Guoming Zhang, Hong Li, and Xiuzhen Cheng. 2022. Reinforcement learning based adversarial malware example generation against black-box detectors. Computers & Security 121 (2022), 102869.

Index Terms

  1. PSP-Mal: Evading Malware Detection via Prioritized Experience-based Reinforcement Learning with Shapley Prior

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      ACSAC '23: Proceedings of the 39th Annual Computer Security Applications Conference
      December 2023
      836 pages
      ISBN:9798400708862
      DOI:10.1145/3627106
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 04 December 2023

      Permissions

      Request permissions for this article.

      Check for updates

      Badges

      Author Tags

      1. Shapley value.
      2. adversarial example
      3. evasion attack
      4. malware detection
      5. prioritized experience replay
      6. reinforcement learning

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Funding Sources

      Conference

      ACSAC '23

      Acceptance Rates

      Overall Acceptance Rate 104 of 497 submissions, 21%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 160
        Total Downloads
      • Downloads (Last 12 months)160
      • Downloads (Last 6 weeks)18
      Reflects downloads up to 16 Oct 2024

      Other Metrics

      Citations

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media