research-article

VulCurator: a vulnerability-fixing commit detector

Authors:

Truong Giang Nguyen,

Xuan-Bach D. Le,

David LoAuthors Info & Claims

ESEC/FSE 2022: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Pages 1726 - 1730

https://doi.org/10.1145/3540250.3558936

Published: 09 November 2022 Publication History

Abstract

Open-source software (OSS) vulnerability management process is important nowadays, as the number of discovered OSS vulnerabilities is increasing over time. Monitoring vulnerability-fixing commits is a part of the standard process to prevent vulnerability exploitation. Manually detecting vulnerability-fixing commits is, however, time-consuming due to the possibly large number of commits to review. Recently, many techniques have been proposed to automatically detect vulnerability-fixing commits using machine learning. These solutions either: (1) did not use deep learning, or (2) use deep learning on only limited sources of information. This paper proposes VulCurator, a tool that leverages deep learning on richer sources of information, including commit messages, code changes and issue reports for vulnerability-fixing commit classification. Our experimental results show that VulCurator outperforms the state-of-the-art baselines up to 16.1% in terms of F1-score.

VulCurator tool is publicly available at https://github.com/ntgiang71096/VFDetector and https://zenodo.org/record/7034132# .Yw3MN-xBzDI, with a demo video at https://youtu.be/uMlFmWSJYOE

References

[1]

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, and Michael Isard. 2016. TensorFlow: a system for Large-Scale machine learning. In 12th USENIX symposium on operating systems design and implementation (OSDI 16). 265–283. https://doi.org/10.5555/3026877.3026899

Digital Library

[2]

Yang Chen, Andrew E Santosa, Ang Ming Yi, Abhishek Sharma, Asankhaya Sharma, and David Lo. 2020. A Machine Learning Approach for Vulnerability Curation. In Proceedings of the 17th International Conference on Mining Software Repositories (MSR). 32–42. https://doi.org/10.1145/3379597.3387461

Digital Library

[3]

The MITRE Corporation. 1999. Common Vulnerabilities and Exposures. https://cve.mitre.org

[4]

Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, and Daxin Jiang. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings. 1536–1547. https://doi.org/10.18653/v1/2020.findings-emnlp.139

[5]

Milod Kazerounian, Jeffrey S Foster, and Bonan Min. 2021. SimTyper: sound type inference for Ruby using type equality prediction. Proceedings of the ACM on Programming Languages, 5, OOPSLA (2021), 1–27. https://doi.org/10.1145/3485483

Digital Library

[6]

Triet Huynh Minh Le, David Hin, Roland Croft, and M Ali Babar. 2021. Deepcva: Automated commit-level vulnerability assessment with deep multi-task learning. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). 717–729. https://doi.org/10.1109/ASE51524.2021.9678622

Digital Library

[7]

Thanh Le-Cong, Kang Hong Jin, Truong Giang Nguyen, Stefanus Agus Haryono, David Lo, Xuan Bach Le Dinh, and Thang Huynh-Quyet. 2022. AutoPruner: Tranformer-based Call Graph Pruning. In 2022 the 30th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). https://doi.org/10.1145/3540250.3549175

Digital Library

[8]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. CoRR, abs/1907.11692 (2019).

[9]

Ehsan Mashhadi and Hadi Hemmati. 2021. Applying codebert for automated program repair of java simple bugs. In 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). 505–509. https://doi.org/10.1109/MSR52588.2021.00063

[10]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

[11]

Giang Nguyen-Truong, Thanh Le-Cong, Hong Jin Kang, Xuan-Bach Dinh Le, and David Lo. 2022. VulCurator’s Docker Image. https://hub.docker.com/r/nguyentruongggiang/vfdetector

[12]

Giang Nguyen-Truong, Thanh Le-Cong, Hong Jin Kang, Xuan-Bach Dinh Le, and David Lo. 2022. VulCurator’s Repository. https://github.com/ntgiang71096/VFDetector

[13]

U.S. National Institute of Standards and Technology. 1999. National Vulnerability Database. https://nvd.nist.gov

[14]

Chanathip Pornprasit, Chakkrit Tantithamthavorn, Jirayus Jiarpakdee, Michael Fu, and Patanamon Thongtanunam. 2021. PyExplainer: Explaining the Predictions of Just-In-Time Defect Models. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). 407–418. https://doi.org/abs/10.1109/ASE51524.2021.9678763

[15]

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. " Why should i trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1135–1144. https://doi.org/10.1145/2939672.2939778

Digital Library

[16]

Antonino Sabetta and Michele Bezzi. 2018. A practical approach to the automatic classification of security-relevant commits. In 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 579–582. https://doi.org/10.1109/ICSME.2018.00058

[17]

Arthur D Sawadogo, Tegawendé F Bissyandé, Naouel Moha, Kevin Allix, Jacques Klein, Li Li, and Yves Le Traon. 2020. Learning to catch security patches. arXiv preprint arXiv:2001.09148.

[18]

Yan Sun, Qing Wang, and Ye Yang. 2017. Frlink: Improving the recovery of missing issue-commit links by revisiting file relevance. Information and Software Technology, 84 (2017), 33–47. https://doi.org/10.1016/j.infsof.2016.11.010

Digital Library

[19]

Yuan Tian, Julia Lawall, and David Lo. 2012. Identifying linux bug fixing patches. In 2012 34th International Conference on Software Engineering (ICSE). 386–396. https://doi.org/10.1109/ICSE.2012.6227176

[20]

Nguyen Truong-Giang, Kang Hong Jin, David Lo, Abhishek Sharma, Andrew Santosa, Asankhaya Sharma, and Ming Yi Ang. 2022. HERMES: Using Commit-Issue Linking to Detect Vulnerability-Fixing Commits. In The 2022 29th IEEE International Conference on Software Analysis, Evolution and Reengineering. IEEE. https://doi.org/10.1109/SANER53432.2022.00018

[21]

Jiayuan Zhou, Michael Pacheco, Zhiyuan Wan, Xin Xia, David Lo, Yuan Wang, and Ahmed E Hassan. 2021. Finding A Needle in a Haystack: Automated Mining of Silent Vulnerability Fixes. In 2021 36th IEEE/ACM Automated Software Engineering Conference (ASE). IEEE. https://doi.org/10.1109/ASE51524.2021.9678720

Digital Library

[22]

Xin Zhou, DongGyun Han, and David Lo. 2021. Assessing generalizability of CodeBERT. In 2021 IEEE International Conference on Software Maintenance and Evolution (ICSME). 425–436. https://doi.org/10.1109/ICSME52107.2021.00044

[23]

Yaqin Zhou and Asankhaya Sharma. 2017. Automated identification of security issues from commit messages and bug reports. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (FSE). 914–919. https://doi.org/10.1145/3106237.3117771

Digital Library

[24]

Yaqin Zhou, Jing Kai Siow, Chenyu Wang, ShangQing Liu, and Yang Liu. 2021. SPI: Automated Identification of Security Patches via Commits. ACM Transactions on Software Engineering and Methodology (TOSEM), https://doi.org/10.1145/3468854

Digital Library

Cited By

Farhi NKoenigstein NShavitt Y(2025)PatchView: Multi-modality detection of security patchesComputers & Security10.1016/j.cose.2025.104356(104356)Online publication date: Jan-2025
https://doi.org/10.1016/j.cose.2025.104356
Heričko TŠumak BKarakatič S(2024)Commit-Level Software Change Intent Classification Using a Pre-Trained Transformer-Based Code ModelMathematics10.3390/math1207101212:7(1012)Online publication date: 28-Mar-2024
https://doi.org/10.3390/math12071012
Akhoundali JNouri SRietveld KGadyatskaya OShang WLamothe MWan Z(2024)MoreFixes: A Large-Scale Dataset of CVE Fix Commits Mined through Enhanced Repository DiscoveryProceedings of the 20th International Conference on Predictive Models and Data Analytics in Software Engineering10.1145/3663533.3664036(42-51)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3663533.3664036
Show More Cited By

Index Terms

VulCurator: a vulnerability-fixing commit detector

Index terms have been assigned to the content through auto-classification.

Recommendations

An efficient software transactional memory using commit-time invalidation
CGO '10: Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization

To improve the performance of transactional memory (TM), researchers have found many eager and lazy optimizations for conflict detection, the process of determining if transactions can commit. Despite these optimizations, nearly all TMs perform one ...
Fixing deadlocks via lock pre-acquisitions
ICSE '16: Proceedings of the 38th International Conference on Software Engineering

Manual deadlock fixing is error-prone and time-consuming. Existing generic approach (GA) simply inserts gate locks to fix deadlocks by serializing executions, which could introduce various new deadlocks and incur high runtime overhead. We propose a ...
Commit-Level, Neural Vulnerability Detection and Assessment
ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Software Vulnerabilities (SVs) are security flaws that are exploitable in cyber-attacks. Delay in the detection and assessment of SVs might cause serious consequences due to the unknown impacts on the attacked systems. The state-of-the-art approaches ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ESEC/FSE 2022: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

November 2022

1822 pages

ISBN:9781450394130

DOI:10.1145/3540250

General Chair:
Abhik Roychoudhury
National University of Singapore, Singapore
,
Program Chairs:
Cristian Cadar
Imperial College London, UK
,
Miryung Kim
University of California at Los Angeles, USA

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 November 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ESEC/FSE '22

Sponsor:

ESEC/FSE '22: 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

November 14 - 18, 2022

Singapore, Singapore

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
369
Total Downloads

Downloads (Last 12 months)132
Downloads (Last 6 weeks)19

Reflects downloads up to 31 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Farhi NKoenigstein NShavitt Y(2025)PatchView: Multi-modality detection of security patchesComputers & Security10.1016/j.cose.2025.104356(104356)Online publication date: Jan-2025
https://doi.org/10.1016/j.cose.2025.104356
Heričko TŠumak BKarakatič S(2024)Commit-Level Software Change Intent Classification Using a Pre-Trained Transformer-Based Code ModelMathematics10.3390/math1207101212:7(1012)Online publication date: 28-Mar-2024
https://doi.org/10.3390/math12071012
Akhoundali JNouri SRietveld KGadyatskaya OShang WLamothe MWan Z(2024)MoreFixes: A Large-Scale Dataset of CVE Fix Commits Mined through Enhanced Repository DiscoveryProceedings of the 20th International Conference on Predictive Models and Data Analytics in Software Engineering10.1145/3663533.3664036(42-51)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3663533.3664036
Li KZhang JChen SLiu HLiu YChen YChristakis MPradel M(2024)PatchFinder: A Two-Phase Approach to Security Patch Tracing for Disclosed Vulnerabilities in Open-Source SoftwareProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680305(590-602)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3650212.3680305
Sun JChen JXing ZLu QXu XZhu LRoychoudhury APaiva AAbreu RStorey M(2024)Where is it? Tracing the Vulnerability-relevant Files from Vulnerability ReportsProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639202(1-13)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3639202
Liu DFeng Y(2024)Mining Fine-Grained Code Change Patterns Using Multiple Feature AnalysisInternational Journal of Software Engineering and Knowledge Engineering10.1142/S021819402450050535:01(111-138)Online publication date: 27-Nov-2024
https://doi.org/10.1142/S0218194024500505
Zhang JHu XBao LXia XLi S(2024)Dual Prompt-Based Few-Shot Learning for Automated Vulnerability Patch Localization2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER60148.2024.00102(940-951)Online publication date: 12-Mar-2024
https://doi.org/10.1109/SANER60148.2024.00102
Mukhtar AJannach DWotawa F(2024)Investigating Reproducibility in Deep Learning-Based Software Fault Prediction2024 IEEE 24th International Conference on Software Quality, Reliability and Security (QRS)10.1109/QRS62785.2024.00038(306-317)Online publication date: 1-Jul-2024
https://doi.org/10.1109/QRS62785.2024.00038
Liu CChen XLi XXue Y(2024)Making vulnerability prediction more practical: Prediction, categorization, and localizationInformation and Software Technology10.1016/j.infsof.2024.107458171(107458)Online publication date: Jul-2024
https://doi.org/10.1016/j.infsof.2024.107458
Nguyen TLe-Cong TKang HWidyasari RYang CZhao ZXu BZhou JXia XHassan ALe XLo D(2023)Multi-Granularity Detector for Vulnerability FixesIEEE Transactions on Software Engineering10.1109/TSE.2023.328127549:8(4035-4057)Online publication date: 1-Aug-2023
https://dl.acm.org/doi/10.1109/TSE.2023.3281275
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten