skip to main content
10.1145/3540250.3558936acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

VulCurator: a vulnerability-fixing commit detector

Published: 09 November 2022 Publication History

Abstract

Open-source software (OSS) vulnerability management process is important nowadays, as the number of discovered OSS vulnerabilities is increasing over time. Monitoring vulnerability-fixing commits is a part of the standard process to prevent vulnerability exploitation. Manually detecting vulnerability-fixing commits is, however, time-consuming due to the possibly large number of commits to review. Recently, many techniques have been proposed to automatically detect vulnerability-fixing commits using machine learning. These solutions either: (1) did not use deep learning, or (2) use deep learning on only limited sources of information. This paper proposes VulCurator, a tool that leverages deep learning on richer sources of information, including commit messages, code changes and issue reports for vulnerability-fixing commit classification. Our experimental results show that VulCurator outperforms the state-of-the-art baselines up to 16.1% in terms of F1-score.
VulCurator tool is publicly available at https://github.com/ntgiang71096/VFDetector and https://zenodo.org/record/7034132# .Yw3MN-xBzDI, with a demo video at https://youtu.be/uMlFmWSJYOE

References

[1]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, and Michael Isard. 2016. TensorFlow: a system for Large-Scale machine learning. In 12th USENIX symposium on operating systems design and implementation (OSDI 16). 265–283. https://doi.org/10.5555/3026877.3026899
[2]
Yang Chen, Andrew E Santosa, Ang Ming Yi, Abhishek Sharma, Asankhaya Sharma, and David Lo. 2020. A Machine Learning Approach for Vulnerability Curation. In Proceedings of the 17th International Conference on Mining Software Repositories (MSR). 32–42. https://doi.org/10.1145/3379597.3387461
[3]
The MITRE Corporation. 1999. Common Vulnerabilities and Exposures. https://cve.mitre.org
[4]
Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, and Daxin Jiang. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings. 1536–1547. https://doi.org/10.18653/v1/2020.findings-emnlp.139
[5]
Milod Kazerounian, Jeffrey S Foster, and Bonan Min. 2021. SimTyper: sound type inference for Ruby using type equality prediction. Proceedings of the ACM on Programming Languages, 5, OOPSLA (2021), 1–27. https://doi.org/10.1145/3485483
[6]
Triet Huynh Minh Le, David Hin, Roland Croft, and M Ali Babar. 2021. Deepcva: Automated commit-level vulnerability assessment with deep multi-task learning. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). 717–729. https://doi.org/10.1109/ASE51524.2021.9678622
[7]
Thanh Le-Cong, Kang Hong Jin, Truong Giang Nguyen, Stefanus Agus Haryono, David Lo, Xuan Bach Le Dinh, and Thang Huynh-Quyet. 2022. AutoPruner: Tranformer-based Call Graph Pruning. In 2022 the 30th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). https://doi.org/10.1145/3540250.3549175
[8]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. CoRR, abs/1907.11692 (2019).
[9]
Ehsan Mashhadi and Hadi Hemmati. 2021. Applying codebert for automated program repair of java simple bugs. In 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). 505–509. https://doi.org/10.1109/MSR52588.2021.00063
[10]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
[11]
Giang Nguyen-Truong, Thanh Le-Cong, Hong Jin Kang, Xuan-Bach Dinh Le, and David Lo. 2022. VulCurator’s Docker Image. https://hub.docker.com/r/nguyentruongggiang/vfdetector
[12]
Giang Nguyen-Truong, Thanh Le-Cong, Hong Jin Kang, Xuan-Bach Dinh Le, and David Lo. 2022. VulCurator’s Repository. https://github.com/ntgiang71096/VFDetector
[13]
U.S. National Institute of Standards and Technology. 1999. National Vulnerability Database. https://nvd.nist.gov
[14]
Chanathip Pornprasit, Chakkrit Tantithamthavorn, Jirayus Jiarpakdee, Michael Fu, and Patanamon Thongtanunam. 2021. PyExplainer: Explaining the Predictions of Just-In-Time Defect Models. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). 407–418. https://doi.org/abs/10.1109/ASE51524.2021.9678763
[15]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. " Why should i trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1135–1144. https://doi.org/10.1145/2939672.2939778
[16]
Antonino Sabetta and Michele Bezzi. 2018. A practical approach to the automatic classification of security-relevant commits. In 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 579–582. https://doi.org/10.1109/ICSME.2018.00058
[17]
Arthur D Sawadogo, Tegawendé F Bissyandé, Naouel Moha, Kevin Allix, Jacques Klein, Li Li, and Yves Le Traon. 2020. Learning to catch security patches. arXiv preprint arXiv:2001.09148.
[18]
Yan Sun, Qing Wang, and Ye Yang. 2017. Frlink: Improving the recovery of missing issue-commit links by revisiting file relevance. Information and Software Technology, 84 (2017), 33–47. https://doi.org/10.1016/j.infsof.2016.11.010
[19]
Yuan Tian, Julia Lawall, and David Lo. 2012. Identifying linux bug fixing patches. In 2012 34th International Conference on Software Engineering (ICSE). 386–396. https://doi.org/10.1109/ICSE.2012.6227176
[20]
Nguyen Truong-Giang, Kang Hong Jin, David Lo, Abhishek Sharma, Andrew Santosa, Asankhaya Sharma, and Ming Yi Ang. 2022. HERMES: Using Commit-Issue Linking to Detect Vulnerability-Fixing Commits. In The 2022 29th IEEE International Conference on Software Analysis, Evolution and Reengineering. IEEE. https://doi.org/10.1109/SANER53432.2022.00018
[21]
Jiayuan Zhou, Michael Pacheco, Zhiyuan Wan, Xin Xia, David Lo, Yuan Wang, and Ahmed E Hassan. 2021. Finding A Needle in a Haystack: Automated Mining of Silent Vulnerability Fixes. In 2021 36th IEEE/ACM Automated Software Engineering Conference (ASE). IEEE. https://doi.org/10.1109/ASE51524.2021.9678720
[22]
Xin Zhou, DongGyun Han, and David Lo. 2021. Assessing generalizability of CodeBERT. In 2021 IEEE International Conference on Software Maintenance and Evolution (ICSME). 425–436. https://doi.org/10.1109/ICSME52107.2021.00044
[23]
Yaqin Zhou and Asankhaya Sharma. 2017. Automated identification of security issues from commit messages and bug reports. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (FSE). 914–919. https://doi.org/10.1145/3106237.3117771
[24]
Yaqin Zhou, Jing Kai Siow, Chenyu Wang, ShangQing Liu, and Yang Liu. 2021. SPI: Automated Identification of Security Patches via Commits. ACM Transactions on Software Engineering and Methodology (TOSEM), https://doi.org/10.1145/3468854

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ESEC/FSE 2022: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering
November 2022
1822 pages
ISBN:9781450394130
DOI:10.1145/3540250
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 November 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. BERT
  2. Deep Learning
  3. Vulnerability-Fixing Commits

Qualifiers

  • Research-article

Conference

ESEC/FSE '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)132
  • Downloads (Last 6 weeks)19
Reflects downloads up to 31 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media