Towards a Block-Level Conformer-Based Python Vulnerability Detection
Abstract
:1. Introduction
- Research Question 1:Does increasing the amount of training data have a positive impact on the performance of models, or have they reached their maximum potential?
- Research Question 2:Does the design of the model have a significant influence on its effectiveness?
- Research Question 3:Do large language models perform better in detecting vulnerabilities than the state-of-the-art models depending on code-structure features?
2. Related Work
2.1. Conventional Methodologies
2.2. Machine-Learning-Based Approaches
2.3. Deep-Learning-Based Approaches
2.4. Large Language Model-Based Approaches
3. Background
3.1. Control and Data Flow Graphs
3.1.1. Abstract Syntax Trees
3.1.2. Control Flow Graphs
3.1.3. Data Flow Graphs
3.1.4. Code Sequence Embedding
3.2. Transformer
3.3. Conformer
3.4. Large Language Models
4. Approach
4.1. Dataset
4.1.1. Data Source
4.1.2. Labeling
4.1.3. Transformation
4.1.4. Preprocessing the Data
4.2. Structural Information
4.3. Code Sequence Embedding (CSE)
4.4. Conformer
5. Implementation
5.1. Collecting the Dataset
5.1.1. Scraping GitHub
5.1.2. Filtering the Results
5.1.3. Processing the Data
5.2. Building Graphs
5.3. Building CSE
5.4. Network Implementation
5.4.1. Multi-Head Self-Attention
5.4.2. Sinusoidal Position Encoding
5.5. LLM
6. Evaluation
6.1. Experimental Setup
6.2. Performance Metrics
- Accuracy: This metric calculates the percentage of samples that have been successfully identified relative to all samples. It can be calculated mathematically as:
- F1-Score: The F1-score provides a balance between precision and recall by taking the harmonic mean of the two criteria. When there is an imbalance in the classes, it is especially helpful. It is the harmonic mean of memory and precision in mathematics:
- Precision: The percentage of affirmative identifications that were in fact accurate is measured by precision. It centers on the false positive rate. In terms of math:
- Recall: The percentage of true positive cases that were accurately detected is measured by recall. The rate of false negatives is the main focus. In terms of math:
6.3. Environment Configuration
6.4. Experimental Results
6.5. Comparison without Same Dataset
6.6. Comparison with Same Dataset
6.7. Ablation Study
6.7.1. Structural Information
6.7.2. Conformer
6.7.3. Attention-Modified Layer
6.7.4. LLM
6.8. Research Questions
7. Conclusions
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Nist. Available online: https://nvd.nist.gov/vuln/vulnerability-detail-pages (accessed on 28 July 2024).
- Ferschke, O.; Gurevych, I.; Rittberger, M. FlawFinder: A Modular System for Predicting Quality Flaws in Wikipedia. In CLEF (Online Working Notes/Labs/Workshop); AAAI: Washington, DC, USA, 2012; pp. 1–10. [Google Scholar]
- Ayewah, N.; Pugh, W.; Hovemeyer, D.; Morgenthaler, J.D.; Penix, J. Using static analysis to find bugs. IEEE Softw. 2008, 25, 22–29. [Google Scholar] [CrossRef]
- Perl, H.; Dechand, S.; Smith, M.; Arp, D.; Yamaguchi, F.; Rieck, K.; Fahl, S.; Acar, Y. Vccfinder: Finding potential vulnerabilities in open-source projects to assist code audits. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, Denver, CO, USA, 12–16 October 2015; pp. 426–437. [Google Scholar]
- Ghaffarian, S.M.; Shahriari, H.R. Software vulnerability analysis and discovery using machine-learning and data-mining techniques: A survey. ACM Comput. Surv. (CSUR) 2017, 50, 1–36. [Google Scholar] [CrossRef]
- Wang, H.; Ye, G.; Tang, Z.; Tan, S.H.; Huang, S.; Fang, D.; Feng, Y.; Bian, L.; Wang, Z. Combining graph-based learning with automated data collection for code vulnerability detection. IEEE Trans. Inf. Forensics Secur. 2020, 16, 1943–1958. [Google Scholar] [CrossRef]
- Zhou, Y.; Liu, S.; Siow, J.; Du, X.; Liu, Y. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In NIPS Proceedings—Advances in Neural Information Processing Systems 32 (NIPS 2019), Vancouver, Canada, 8–14 December 2019; Neural Information Processing Systems (NIPS): San Diego, CA, USA, 2019; Volume 32. [Google Scholar]
- Feng, Z.; Guo, D.; Tang, D.; Duan, N.; Feng, X.; Gong, M.; Shou, L.; Qin, B.; Liu, T.; Jiang, D.; et al. Codebert: A pre-trained model for programming and natural languages. arXiv 2020, arXiv:2002.08155. [Google Scholar]
- Wang, Y.; Wang, W.; Joty, S.; Hoi, S.C. Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. arXiv 2021, arXiv:2109.00859. [Google Scholar]
- Gpt-4. Available online: https://platform.openai.com/playground/chat?mode=chat&model=gpt-4o&models=gpt-4o (accessed on 28 July 2024).
- Gulati, A.; Qin, J.; Chiu, C.C.; Parmar, N.; Zhang, Y.; Yu, J.; Han, W.; Wang, S.; Zhang, Z.; Wu, Y.; et al. Conformer: Convolution-augmented transformer for speech recognition. arXiv 2020, arXiv:2005.08100. [Google Scholar]
- Cui, S.; Zhao, G.; Gao, Y.; Tavu, T.; Huang, J. VRust: Automated vulnerability detection for solana smart contracts. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, Los Angeles, CA, USA, 7–11 November 2022; pp. 639–652. [Google Scholar]
- Johns, M.; Pfistner, S.; SAP SE. End-to-End Taint Tracking for Detection and Mitigation of iNjection Vulnerabilities in Web Applications. U.S. Patent 10,129,285, 13 November 2018. [Google Scholar]
- Wang, D.; Jiang, B.; Chan, W.K. WANA: Symbolic execution of wasm bytecode for cross-platform smart contract vulnerability detection. arXiv 2020, arXiv:2007.15510. [Google Scholar]
- Dinh, S.T.; Cho, H.; Martin, K.; Oest, A.; Zeng, K.; Kapravelos, A.; Ahn, G.J.; Bao, T.; Wang, R.; Doupé, A.; et al. Favocado: Fuzzing the Binding Code of JavaScript Engines Using Semantically Correct Test Cases. In Proceedings of the Network and Distributed System Security Symposium, Virtual, 21–25 February 2021. [Google Scholar]
- He, J.; Balunović, M.; Ambroladze, N.; Tsankov, P.; Vechev, M. Learning to fuzz from symbolic execution with application to smart contracts. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, London, UK, 11–15 November 2019; pp. 531–548. [Google Scholar]
- Al-Yaseen, W.L.; Othman, Z.A.; Nazri, M.Z.A. Multi-level hybrid support vector machine and extreme learning machine based on modified K-means for intrusion detection system. Expert Syst. Appl. 2017, 67, 296–303. [Google Scholar] [CrossRef]
- Lomio, F.; Iannone, E.; De Lucia, A.; Palomba, F.; Lenarduzzi, V. Just-in-time software vulnerability detection: Are we there yet? J. Syst. Softw. 2022, 188, 111283. [Google Scholar] [CrossRef]
- Zolanvari, M.; Teixeira, M.A.; Gupta, L.; Khan, K.M.; Jain, R. Machine learning-based network vulnerability analysis of industrial Internet of Things. IEEE Internet Things J. 2019, 6, 6822–6834. [Google Scholar] [CrossRef]
- Zou, D.; Wang, S.; Xu, S.; Li, Z.; Jin, H. μVulDeePecker: A Deep Learning-Based System for Multiclass Vulnerability Detection. IEEE Trans. Dependable Secur. Comput. 2019, 18, 2224–2236. [Google Scholar] [CrossRef]
- Allamanis, M.; Brockschmidt, M.; Khademi, M. Learning to represent programs with graphs. arXiv 2017, arXiv:1711.00740. [Google Scholar]
- Steenhoek, B.; Rahman, M.M.; Jiles, R.; Le, W. An empirical study of deep learning models for vulnerability detection. In Proceedings of the 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), Melbourne, Australia, 14–20 May 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 2237–2248. [Google Scholar]
- Hin, D.; Kan, A.; Chen, H.; Babar, M.A. Linevd: Statement-level vulnerability detection using graph neural networks. In Proceedings of the 19th International Conference on Mining Software Repositories, Pittsburgh, PA, USA, 23–24 May 2022; pp. 596–607. [Google Scholar]
- Alon, A.; Zilberstein, U.; Levy, O.; Yahav, E. Code2Vec: Learning Distributed Representations of Code. Proc. ACM Program. Lang. 2019, 3, 1–29. [Google Scholar] [CrossRef]
- Guo, D.; Ren, S.; Lu, S.; Pan, J.; Zhang, C.; Feng, X.; de Rijke, M. GraphCodeBERT: Pre-training Code Representations with Data Flow. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Virtual Event/Punta Cana, Dominican Republic, 7–11 November 2021. [Google Scholar]
- Kanade, A.; Maniatis, P.; Balakrishnan, G.; Shi, K. Learning and Evaluating Contextual Embedding of Source Code. arXiv 2020, arXiv:2001.00059. [Google Scholar]
- Raychev, V.; Vechev, M.; Yahav, E. Probabilistic Model for Code with Decision Trees. In ACM SIGPLAN Notices; ACM: New York, NY, USA, 2016; Volume 51, pp. 731–747. [Google Scholar]
- Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.A.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.; Azhar, F.; et al. Llama: Open and efficient foundation language models. arXiv 2023, arXiv:2302.13971. [Google Scholar]
- Finnie-Ansley, J.; Denny, P.; Becker, B.A.; Luxton-Reilly, A.; Prather, J. The robots are coming: Exploring the implications of openai codex on introductory programming. In Proceedings of the 24th Australasian Computing Education Conference, Melbourne, VIC, Australia, 14–18 February 2022; pp. 10–19. [Google Scholar]
- Pearce, H.; Tan, B.; Ahmad, B.; Karri, R.; Dolan-Gavitt, B. Examining zero-shot vulnerability repair with large language models. In Proceedings of the 2023 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 23–24 May 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 2339–2356. [Google Scholar]
- Cheshkov, A.; Zadorozhny, P.; Levichev, R. Evaluation of chatgpt model for vulnerability detection. arXiv 2023, arXiv:2304.07232. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems, 30, NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Curran Associates Inc.: Red Hook, NY, USA, 2017. [Google Scholar]
- Zhou, Y.; Sharma, A. Automated identification of security issues from commit messages and bug reports. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, Paderborn, Germany, 4–8 September 2017; pp. 914–919. [Google Scholar]
- Liu, K.; Kim, D.; Bissyandé, T.F.; Yoo, S.; Traon, Y.L. Mining fix patterns for findbugs violations. IEEE Trans. Softw. Eng. 2018, 47, 165–188. [Google Scholar] [CrossRef]
- Bagheri, A.; Hegedűs, P. A comparison of different source code representation methods for vulnerability prediction in python. In Quality of Information and Communications Technology: 14th International Conference, QUATIC 2021, Algarve, Portugal, 8–11 September 2021, Proceedings 14; Springer International Publishing: Berlin/Heidelberg, Germany, 2021; pp. 267–281. [Google Scholar]
- Morrison, P.; Herzig, K.; Murphy, B.; Williams, L. Challenges with applying vulnerability prediction models. In Proceedings of the 2015 Symposium and Bootcamp on the Science of Security, Urbana, IL, USA, 21–22 April 2015; p. 4. [Google Scholar]
- Dam, H.K.; Tran, T.; Pham, T. A deep language model for software code. arXiv 2016, arXiv:1608.02715. [Google Scholar]
- Hovsepyan, A.; Scandariato, R.; Joosen, W.; Walden, J. Software vulnerability prediction using text analysis techniques. In Proceedings of the 4th International Workshop on Security Measurements and Metrics, Lund, Sweden, 21 September 2012; pp. 7–10. [Google Scholar]
- Owasp. Available online: https://owasp.org/www-community/attacks (accessed on 28 July 2024).
- Hpc. Available online: https://docs.hpc.kifu.hu/tasks/overview.html#compute-nodes (accessed on 28 July 2024).
Paper | Graph-Based | Deep Learning | Large LMs | Manual Meth. | IoT-Specific |
---|---|---|---|---|---|
Vuldeepecker | ✓ | - | - | - | - |
FUNDED | - | ✓ | - | - | - |
LineVD | - | ✓ | - | - | - |
Code2Vec | - | ✓ | - | - | - |
CodeBERT | - | ✓ | - | - | - |
GraphCodeBERT | - | ✓ | - | - | - |
CuBERT | - | ✓ | - | - | - |
Py150 | - | ✓ | - | - | - |
Llama, CodeX | - | - | ✓ | - | - |
GPT-4 | - | - | ✓ | - | - |
Cheshkov et al | - | - | ✓ | - | - |
ChatGPT | - | - | ✓ | - | - |
Flawfinder, Vrust | - | - | - | ✓ | - |
Zolanvari et al. | - | - | - | - | ✓ |
Vulnerability | Repository | Commits | Files | Functions | LOC |
---|---|---|---|---|---|
SQL Injection | 632 | 871 | 1225 | 9822 | 203,527 |
XSS | 122 | 159 | 157 | 1142 | 68,916 |
Command injection | 428 | 824 | 952 | 6762 | 124,032 |
XSRF | 211 | 219 | 584 | 8413 | 102,198 |
Remote code exe | 272 | 158 | 686 | 5198 | 60,591 |
Path disclosure | 574 | 413 | 732 | 8596 | 92,324 |
Vunlnerability | Accuracy | Precision | Recall | F1 |
---|---|---|---|---|
SQL Injection | 99.33% | 97.82% | 99.62% | 98.73% |
XSS | 99.14% | 97.51% | 99.48% | 98.48% |
Command injection | 99.21% | 97.57% | 99.52% | 98.55% |
XSRF | 99.53% | 97.69% | 99.51% | 98.72% |
Remote code execution | 99.20% | 97.72% | 99.52% | 98.61% |
Path disclosure | 99.34% | 97.89% | 99.57% | 98.72% |
Vunlnerability | Accuracy | Precision | Recall | F1 |
---|---|---|---|---|
CNN | 58.19% | 38.2% | 38.0% | 38.12% |
CodeBERT | 90.83% | 83.91% | 83.83% | 83.87% |
SELFATT | 84.01% | 62.32% | 62.03% | 62.13% |
Devign | 82.50% | 53.53% | 53.12% | 53.33% |
VulDeepecker | 80.70% | 89.44% | 89.24% | 89.32% |
FUNDED | 88.89% | 91.04% | 90.74% | 90.87% |
DeepVulSeeker | 90.80% | 80.75% | 80.42% | 80.52% |
Vuldetective | 99.80% | 98.29% | 99.64% | 98.12% |
Vunlnerability | Accuracy | Precision | Recall | F1 |
---|---|---|---|---|
Code2Vec | 43.12% | 45.33% | 43.21% | 44.04% |
CodeBERT | 56.78% | 58.65% | 59.66% | 57.32% |
GraphCodeBERT | 51.87% | 48.56% | 50.65% | 49.54% |
CuBERT | 67.23% | 65.98% | 66.65% | 64.99% |
Vuldetective | 99.80% | 98.29% | 99.64% | 98.12% |
Vunlnerability | Accuracy | Precision | Recall | F1 |
---|---|---|---|---|
Vuldetective | 99.80% | 98.29% | 99.64% | 98.12% |
without AST | 62.32% | 70.53% | 70.02% | 70.23% |
without DFG | 62.52% | 67.31% | 67.02% | 67.13% |
without CFG | 63.33% | 65.12% | 64.81% | 64.92% |
without Conformer | 60.52% | 64.07% | 63.59% | 63.76% |
without Attention L | 61.12% | 67.23% | 66.81% | 66.99% |
without LLM | 52.30% | 64.75% | 63.42% | 63.81% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Bagheri, A.; Hegedűs, P. Towards a Block-Level Conformer-Based Python Vulnerability Detection. Software 2024, 3, 310-327. https://doi.org/10.3390/software3030016
Bagheri A, Hegedűs P. Towards a Block-Level Conformer-Based Python Vulnerability Detection. Software. 2024; 3(3):310-327. https://doi.org/10.3390/software3030016
Chicago/Turabian StyleBagheri, Amirreza, and Péter Hegedűs. 2024. "Towards a Block-Level Conformer-Based Python Vulnerability Detection" Software 3, no. 3: 310-327. https://doi.org/10.3390/software3030016
APA StyleBagheri, A., & Hegedűs, P. (2024). Towards a Block-Level Conformer-Based Python Vulnerability Detection. Software, 3(3), 310-327. https://doi.org/10.3390/software3030016