research-article

Transformer-Based Language Models for Software Vulnerability Detection

Authors:

Seung Ick Jang,

Muhammad Ejaz Ahmed,

Josef Pieprzyk,

Surya NepalAuthors Info & Claims

ACSAC '22: Proceedings of the 38th Annual Computer Security Applications Conference

Pages 481 - 496

https://doi.org/10.1145/3564625.3567985

Published: 05 December 2022 Publication History

Abstract

The large transformer-based language models demonstrate excellent performance in natural language processing. By considering the transferability of the knowledge gained by these models in one domain to other related domains, and the closeness of natural languages to high-level programming languages, such as C/C++, this work studies how to leverage (large) transformer-based language models in detecting software vulnerabilities and how good are these models for vulnerability detection tasks. In this regard, firstly, we present a systematic (cohesive) framework that details source code translation, model preparation, and inference. Then, we perform an empirical analysis of software vulnerability datasets of C/C++ source codes having multiple vulnerabilities corresponding to the library function call, pointer usage, array usage, and arithmetic expression. Our empirical results demonstrate the good performance of the language models in vulnerability detection. Moreover, these language models have better performance metrics, such as F1-score, than the contemporary models, namely bidirectional long short term memory and bidirectional gated recurrent unit. Experimenting with the language models is always challenging due to the requirement of computing resources, platforms, libraries, and dependencies. Thus, this paper also analyses the popular platforms to efficiently fine-tune these models and present recommendations while choosing the platforms for our framework.

References

[1]

Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang. 2021. Unified Pre-training for Program Understanding and Generation. CoRR abs/2103.06333(2021). arXiv:2103.06333https://arxiv.org/abs/2103.06333

[2]

Anaconda. [n. d.]. Anaconda. https://www.anaconda.com

[3]

Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. (2014). https://doi.org/10.48550/ARXIV.1406.1078

[4]

Coverity. 2022. Coverity. https://scan.coverity.com/

[5]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proc. the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT. 4171–4186. https://doi.org/10.18653/v1/n19-1423

[6]

Xu Duan, Jingzheng Wu, Shouling Ji, Zhiqing Rui, Tianyue Luo, Mutian Yang, and Yanjun Wu. 2019. VulSniper: Focus Your Attention to Shoot Fine-Grained Vulnerabilities. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (Macao, China) (IJCAI’19). AAAI Press, 4665–4671.

[7]

Hugging Face. 2022. Transformers: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow. https://huggingface.co/docs/transformers/index

[8]

Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Proc. of Findings of the Association for Computational Linguistics: EMNLP 2020. 1536–1547. https://doi.org/10.18653/v1/2020.findings-emnlp.139

[9]

The Linux Foundation. 2022. Horovod. https://horovod.ai

[10]

Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, Horace He, Anish Thite, Noa Nabeshima, Shawn Presser, and Connor Leahy. 2020. The Pile: An 800GB Dataset of Diverse Text for Language Modeling. arXiv preprint arXiv:2101.00027(2020).

[11]

Google. 2020. Atheris. https://pypi.org/project/atheris/.

[12]

Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, Michele Tufano, Shao Kun Deng, Colin B. Clement, Dawn Drain, Neel Sundaresan, Jian Yin, Daxin Jiang, and Ming Zhou. 2020. GraphCodeBERT: Pre-training Code Representations with Data Flow. CoRR abs/2009.08366(2020). arXiv:2009.08366https://arxiv.org/abs/2009.08366

[13]

B. Hasheminezhad, S. Shirzad, N. Wu, P. Diehl, H. Schulz, and H. Kaiser. 2020. Towards a Scalable and Distributed Infrastructure for Deep Learning Applications. In 2020 IEEE/ACM Fifth Workshop on Deep Learning on Supercomputers (DLS). IEEE Computer Society, Los Alamitos, CA, USA, 20–30. https://doi.org/10.1109/DLS51937.2020.00008

[14]

Sepp Hochreiter and Jurgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735–1780. https://direct.mit.edu/neco/article-abstract/9/8/1735/6109/Long-Short-Term-Memory?redirectedFrom=fulltext

Digital Library

[15]

Huggingface.co. 2020. Tokenizer summary. https://huggingface.co/transformers/v3.0.2/tokenizer_summary.html

[16]

Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc Brockschmidt. 2019. CodeSearchNet Challenge: Evaluating the State of Semantic Code Search. CoRR abs/1909.09436(2019). arXiv:1909.09436http://arxiv.org/abs/1909.09436

[17]

Seulbae Kim, Seunghoon Woo, Heejo Lee, and Hakjoo Oh. 2017. VUDDY: A Scalable Approach for Vulnerable Code Clone Discovery. In 2017 IEEE Symposium on Security and Privacy (SP). 595–614. https://doi.org/10.1109/SP.2017.62

[18]

Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2019. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. CoRR abs/1910.13461(2019). arXiv:1910.13461http://arxiv.org/abs/1910.13461

[19]

Xin Li, Lu Wang, Yang Xin, Yixian Yang, and Yuling Chen. 2020. Automated Vulnerability Detection in Source Code Using Minimum Intermediate Representation Learning. Applied Sciences 10, 5 (2020). https://doi.org/10.3390/app10051692

[20]

Zhen Li, Deqing Zou, Shouhuai Xu, Zhaoxuan Chen, Yawei Zhu, and Hai Jin. 2021. VulDeeLocator: A Deep Learning-based Fine-grained Vulnerability Detector. IEEE Transactions on Dependable and Secure Computing (2021), 1–1. https://doi.org/10.1109/TDSC.2021.3076142

[21]

Zhen Li, Deqing Zou, Shouhuai Xu, Hai Jin, Hanchao Qi, and Jie Hu. 2016. VulPecker: an automated vulnerability detection system based on code similarity analysis. In Proc. of the 32nd Annual Conference on Computer Security Applications. 201–213. https://doi.org/10.1145/2991079.2991102

Digital Library

[22]

Zhen Li, Deqing Zou, Shouhuai Xu, Hai Jin, Yawei Zhu, and Zhaoxuan Chen. 2021. SySeVR: A framework for using deep learning to detect software vulnerabilities. IEEE Trans. Dependable Sec. Comput(2021). https://doi.org/abs/1807.06756

[23]

Zhen Li, Deqing Zou, Shouhuai Xu, Xinyu Ou, Hai Jin, Sujuan Wang, Zhijun Deng, and Yuyi Zhong. 2018. VulDeePecker: A Deep Learning-Based System for Vulnerability Detection. In Proc. Network and Distributed System Security Symposium (NDSS). Internet Society. https://doi.org/10.14722/ndss.2018.23158

[24]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692(2019). arxiv:1907.11692http://arxiv.org/abs/1907.11692

[25]

Microsoft. 2022. DeepSpeed. https://www.deepspeed.ai.

[26]

James Newsome and Dawn Song. 2005. Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software. In Proc. NDSS.

[27]

Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving Language Understanding by Generative Pre-Training. https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf.

[28]

Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language Models are Unsupervised Multitask Learners. (2019). https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf.

[29]

Anna Rogers, Olga Kovaleva, and Anna Rumshisky. 2020. A Primer in BERTology: What We Know About How BERT Works. Trans. of the Association for Computational Linguistics 8 (2020), 842–866. https://doi.org/10.1162/tacl_a_00349

[30]

Rebecca L. Russell, Louis Kim, Lei H. Hamilton, Tomo Lazovich, Jacob A. Harer, Onur Ozdemir, Paul M. Ellingwood, and Marc W. McConley. 2018. Automated Vulnerability Detection in Source Code Using Deep Representation Learning. In Proc. ICMLA. 757– 762.

[31]

Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. In Proc. EMC2: 5th Edition Co-located with NeurIPS’19. 1–5.

[32]

Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, and Bryan Catanzaro. 2020. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism. arxiv:1909.08053 [cs.CL]

[33]

Michael Sutton, Adam Greene, and Pedram Amini. 2007. Fuzzing: Brute Force Vulnerability Discovery. Pearson Education (2007).

[34]

the software quality company TIOBE. 2022. TIOBE Index for May 2022. https://www.tiobe.com/tiobe-index/

[35]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proc. Advances in neural information processing systems, Vol. 30. Curran Associates, Inc., 5998–6008.

[36]

VULDB. 2020. Apple macOS USD File buffer overflow. https://vuldb.com/?id.163591

[37]

VULDB. 2020. Facebook WhatsApp on Android Video Stream buffer overflow. https://vuldb.com/?id.160672

[38]

VULDB. 2021. NVIDIA Shield TV up to 8.2.1 NVDEC buffer overflow. https://vuldb.com/?id.168508

[39]

VulDeePecker. 2018. Database of "VulDeePecker A Deep Learning-Based System for Vulnerability Detection". https://github.com/CGCL-codes/VulDeePecker

[40]

David A. Wheeler. 2018. Flawfinder. https://dwheeler.com/flawfinder/.

[41]

Fabian Yamaguchi, Nico Golde, Daniel Arp, and Konrad Rieck. 2014. Modeling and discovering vulnerabilities with code property graphs. In Proc. IEEE Symposium on Security and Privacy. 590–604.

Digital Library

[42]

Yaqin Zhou, Shangqing Liu, Jingkai Siow, Xiaoning Du, and Yang Liu. 2019. Devign: Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks. In Proc. NeurIPS.

[43]

Yukun Zhu, Ryan Kiros, Richard S. Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books. CoRR abs/1506.06724(2015). arXiv:1506.06724http://arxiv.org/abs/1506.06724

[44]

Noah Ziems and Shaoen Wu. 2021. Security Vulnerability Detection Using Deep Learning Natural Language Processing. arxiv:2105.02388 [cs.CR]

[45]

Deqing Zou, Sujuan Wang, Shouhuai Xu, Zhen Li, and Hai Jin. 2019. µVulDeePecker: A deep learning-based system for multiclass vulnerability detection. IEEE Trans. Dependable Sec. Comput(2019). https://doi.org/10.1109/TDSC.2019.2942930

Digital Library

Cited By

Ding WAbdel-Basset MAli AMoustafa N(2025)Large language models for cyber resilience: A comprehensive review, challenges, and future perspectivesApplied Soft Computing10.1016/j.asoc.2024.112663170(112663)Online publication date: Feb-2025
https://doi.org/10.1016/j.asoc.2024.112663
Tihanyi NBisztray TFerrag MJain RCordeiro L(2025)How secure is AI-generated code: a large-scale comparison of large language modelsEmpirical Software Engineering10.1007/s10664-024-10590-130:2Online publication date: 1-Mar-2025
https://dl.acm.org/doi/10.1007/s10664-024-10590-1
Sindiramutty SJhanjhi NAkbar RHussain MRay SAmsaad F(2024)Generative AI for Threat Hunting and Behaviour AnalysisUtilizing Generative AI for Cyber Defense Strategies10.4018/979-8-3693-8944-7.ch007(235-286)Online publication date: 13-Sep-2024
https://doi.org/10.4018/979-8-3693-8944-7.ch007
Show More Cited By

Index Terms

Transformer-Based Language Models for Software Vulnerability Detection
1. Security and privacy
  1. Software and application security

Recommendations

Prompt-Enhanced Software Vulnerability Detection Using ChatGPT
ICSE-Companion '24: Proceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings

With the increase in software vulnerabilities that cause significant economic and social losses, automatic vulnerability detection has become essential in software development and maintenance. Recently, large language models (LLMs) have received ...
Collaboration to Repository-Level Vulnerability Detection
ISSTA 2024: Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis

Large Language Model (LLM)-based methods have proven to be effective for many software engineering domains, with a potential for substantial productivity effective for software vulnerability detection. However, due to the limitation of the length of ...
Software Vulnerability: Identification and Minimization

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ACSAC '22: Proceedings of the 38th Annual Computer Security Applications Conference

December 2022

1021 pages

ISBN:9781450397599

DOI:10.1145/3564625

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 December 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Artifacts Evaluated & Functional / v1.1

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

US Army International Technology Center Indo-Pacific

Conference

ACSAC

ACSAC: Annual Computer Security Applications Conference

December 5 - 9, 2022

TX, Austin, USA

Acceptance Rates

Overall Acceptance Rate 104 of 497 submissions, 21%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

38
Total Citations
View Citations
1,201
Total Downloads

Downloads (Last 12 months)562
Downloads (Last 6 weeks)54

Reflects downloads up to 13 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ding WAbdel-Basset MAli AMoustafa N(2025)Large language models for cyber resilience: A comprehensive review, challenges, and future perspectivesApplied Soft Computing10.1016/j.asoc.2024.112663170(112663)Online publication date: Feb-2025
https://doi.org/10.1016/j.asoc.2024.112663
Tihanyi NBisztray TFerrag MJain RCordeiro L(2025)How secure is AI-generated code: a large-scale comparison of large language modelsEmpirical Software Engineering10.1007/s10664-024-10590-130:2Online publication date: 1-Mar-2025
https://dl.acm.org/doi/10.1007/s10664-024-10590-1
Sindiramutty SJhanjhi NAkbar RHussain MRay SAmsaad F(2024)Generative AI for Threat Hunting and Behaviour AnalysisUtilizing Generative AI for Cyber Defense Strategies10.4018/979-8-3693-8944-7.ch007(235-286)Online publication date: 13-Sep-2024
https://doi.org/10.4018/979-8-3693-8944-7.ch007
Hou XZhao YLiu YYang ZWang KLi LLuo XLo DGrundy JWang H(2024)Large Language Models for Software Engineering: A Systematic Literature ReviewACM Transactions on Software Engineering and Methodology10.1145/369598833:8(1-79)Online publication date: 20-Sep-2024
https://dl.acm.org/doi/10.1145/3695988
Jiang ZSun WGu XWu JWen THu HYan M(2024)DFEPT: Data Flow Embedding for Enhancing Pre-Trained Model Based Vulnerability DetectionProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3671388(95-104)Online publication date: 24-Jul-2024
https://dl.acm.org/doi/10.1145/3671016.3671388
Zhang XMuralee SCherupattamoolayil SMachiry A(2024)On the Effectiveness of Large Language Models for GitHub WorkflowsProceedings of the 19th International Conference on Availability, Reliability and Security10.1145/3664476.3664497(1-14)Online publication date: 30-Jul-2024
https://dl.acm.org/doi/10.1145/3664476.3664497
Mai YGao ZHu XBao LLiu YSun J(2024)Are Human Rules Necessary? Generating Reusable APIs with CoT Reasoning and In-Context LearningProceedings of the ACM on Software Engineering10.1145/36608111:FSE(2355-2377)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3660811
Liu YLe-Cong TWidyasari RTantithamthavorn CLi LLe XLo D(2024)Refining ChatGPT-Generated Code: Characterizing and Mitigating Code Quality IssuesACM Transactions on Software Engineering and Methodology10.1145/364367433:5(1-26)Online publication date: 4-Jun-2024
https://dl.acm.org/doi/10.1145/3643674
Hadid AChakraborty TBusby D(2024) When geoscience meets generative AI and large language models: Foundations, trends, and future challenges Expert Systems10.1111/exsy.13654Online publication date: 11-Jun-2024
https://doi.org/10.1111/exsy.13654
Gao SZhou HChen THe MXu RLi J(2024)PE-Attack: On the Universal Positional Embedding Vulnerability in Transformer-Based ModelsIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.344261719(9359-9373)Online publication date: 2024
https://doi.org/10.1109/TIFS.2024.3442617
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents