research-article

An Empirical Study on Just-in-time Conformal Defect Prediction

Authors:

Xhulja Shahini,

Andreas Metzger,

Klaus PohlAuthors Info & Claims

MSR '24: Proceedings of the 21st International Conference on Mining Software Repositories

Pages 88 - 99

https://doi.org/10.1145/3643991.3644928

Published: 02 July 2024 Publication History

Abstract

Code changes can introduce defects that affect software quality and reliability. Just-in-time (JIT) defect prediction techniques provide feedback at check-in time on whether a code change is likely to contain defects. This immediate feedback allows practitioners to make timely decisions regarding potential defects. However, a prediction model may deliver false predictions, that may negatively affect practitioners' decisions. False positive predictions lead to unnecessarily spending resources on investigating clean code changes, while false negative predictions may result in overlooking defective changes. Knowing how uncertain a defect prediction is, would help practitioners to avoid wrong decisions. Previous research in defect prediction explored different approaches to quantify prediction uncertainty for supporting decision-making activities. However, these approaches only offer a heuristic quantification of uncertainty and do not provide guarantees.

In this study, we use conformal prediction (CP) as a rigorous uncertainty quantification approach on top of JIT defect predictors. We assess how often CP can provide guarantees for JIT defect predictions. We also assess how many false JIT defect predictions CP can filter out. We experiment with two state-of-the-art JIT defect prediction techniques (DeepJIT and CC2Vec) and two widely used datasets (Qt and OpenStack).

Our experiments show that CP can ensure correctness with a 95% probability, for only 27% (for DeepJIT) and 9% (for CC2Vec) of the JIT defect predictions. Additionally, our experiments indicate that CP might be a valuable technique for filtering out the false predictions of JIT defect predictors. CP can filter out up to 100% of false negative predictions and 90% of false positives generated by CC2Vec, and up to 86% of false negative predictions and 83% of false positives generated by DeepJIT.

References

[1]

Anastasios N. Angelopoulos and Stephen Bates. 2023. Conformal Prediction: A Gentle Introduction. Foundations and Trends® in Machine Learning 16, 4 (2023), 494--591.

Digital Library

[2]

Mohammad Azzeh, Yousef Alqasrawi, and Yousef Elsheikh. [n. d.]. A soft computing approach for software defect density prediction. Journal of Software: Evolution and Process ([n. d.]), e2553.

Digital Library

[3]

Jacob Barnett, Charles Gathuru, Luke Soldano, and Shane McIntosh. 2016. The relationship between commit message detail and defect proneness in Java projects on GitHub. 496--499.

Digital Library

[4]

Momotaz Begum and Tadashi Dohi. 2016. Prediction Interval of Cumulative Number of Software Faults Using Multilayer Perceptron. 619 (01 2016), 43--58.

[5]

Momotaz Begum, Mohammad Hossain, Md. Jakirul Islam, and Md. Samzid Hafiz. 2021. Long-term Software Fault Prediction with Robust Prediction Interval Analysis via Refined Artificial Neural Network (RANN) Approach. Engineering Letters 29 (08 2021), 44.

[6]

Tracy Hall David Bowes and Jean Petrić. 2017. Software defect prediction: do different classifiers find the same defects? Software Quality Journal 26, 2 (2017), 525--552.

Digital Library

[7]

Ruifeng Duan, Haitao Xu, Yuanrui Fan, and Meng Yan. 2021. The impact of duplicate changes on just-in-time defect prediction. IEEE Transactions on Reliability 71, 3 (2021), 1294--1308.

[8]

Yuanrui Fan, Xin Xia, Daniel Alencar da Costa, David Lo, Ahmed E. Hassan, and Shanping Li. 2021. The Impact of Mislabeled Changes by SZZ on Just-in-Time Defect Prediction. IEEE Transactions on Software Engineering 47, 8 (2021), 1559--1586.

[9]

N. Fenton, P. Krause, and M. Neil. 2002. Software measurement: uncertainty and causal modeling. IEEE Software 19, 4 (2002), 116--122.

Digital Library

[10]

Yoav Freund and Llew Mason. 1999. The Alternating Decision Tree Learning Algorithm. In ICML. 124--133. http://www.lsmason.com/papers/ICML99-AlternatingTrees.pdf

[11]

Shikai Guo, Dongmin Li, Lin Huang, Sijia Lv, Rong Chen, Hui Li, Xiaochen Li, and He Jiang. 2023. Estimating Uncertainty in Labeled Changes by SZZ Tools on Just-In-Time Defect Prediction. ACM Trans. Softw. Eng. Methodol. (dec 2023).

Digital Library

[12]

María José Hernández-Molinos, Angel J. Sánchez-García, Rocío Erandi Barrientos-Martínez, Juan Carlos Pérez-Arriaga, and Jorge Octavio Ocharán-Hernández. 2023. Software Defect Prediction with Bayesian Approaches. Mathematics 11, 11 (2023).

[13]

Kim Herzig, Sascha Just, and Andreas Zeller. 2015. The impact of tangled code changes on defect prediction models. Empirical Software Engineering 21 (04 2015).

Digital Library

[14]

Thong Hoang, Hoa Khanh Dam, Yasutaka Kamei, David Lo, and Naoyasu Ubayashi. 2019. DeepJIT: an end-to-end deep learning framework for just-in-time defect prediction. In Proceedings of the 16th International Conference on Mining Software Repositories, MSR 2019, 26-27 May 2019, Montreal, Canada, Margaret-Anne D. Storey, Bram Adams, and Sonia Haiduc (Eds.). IEEE / ACM, 34--45.

Digital Library

[15]

Thong Hoang, Hong Jin Kang, David Lo, and Julia Lawall. 2020. Cc2vec: Distributed representations of code changes. Proceedings - International Conference on Software Engineering (2020), 518 -- 529. All Open Access, Green Open Access.

Digital Library

[16]

Eyke Hüllermeier and Willem Waegeman. 2021. Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods. Mach. Learn. 110, 3 (2021), 457--506.

[17]

Tian Jiang, Lin Tan, and Sunghun Kim. 2013. Personalized defect prediction. In 2013 28th IEEE/ACM International Conference on Automated Software Engineering, ASE 2013, Silicon Valley, CA, USA, November 11-15, 2013, Ewen Denney, Tevfik Bultan, and Andreas Zeller (Eds.). IEEE, 279--289.

Digital Library

[18]

Yue Jiang, Bojan Cukic, and Tim Menzies. 2008. Cost Curve Evaluation of Fault Prediction Models. Proceedings - International Symposium on Software Reliability Engineering, ISSRE, 197--206.

Digital Library

[19]

Jirayus Jiarpakdee, Chakkrit Kla Tantithamthavorn, Hoa Khanh Dam, and John Grundy. 2022. An Empirical Study of Model-Agnostic Techniques for Defect Prediction Models. IEEE Transactions on Software Engineering 48, 1 (2022), 166--185.

Digital Library

[20]

Yasutaka Kamei, Takafumi Fukushima, Shane McIntosh, Kazuhiro Yamashita, Naoyasu Ubayashi, and Ahmed E. Hassan. 2016. Studying just-in-time defect prediction using cross-project models. Empirical Software Engineering 21 (10 2016), 2072--2106.

Digital Library

[21]

Yasutaka Kamei, Emad Shihab, Bram Adams, Ahmed E. Hassan, Audris Mockus, Anand Sinha, and Naoyasu Ubayashi. 2013. A Large-Scale Empirical Study of Just-in-Time Quality Assurance. Software Engineering, IEEE Transactions on 39 (06 2013), 757--773.

Digital Library

[22]

Taghi M. Khoshgoftaar, Xiaojing Yuan, Edward B. Allen, Wendell D. Jones, and John P. Hudepohl. 2002. Uncertain Classification of Fault-Prone Software Modules. Empirical Software Engineering 7 (2002), 297--318.

Digital Library

[23]

Meelis Kull and Peter A. Flach. 2014. Reliability Maps: A Tool to Enhance Probability Estimates and Improve Classification Accuracy. In Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2014, Nancy, France, September 15-19, 2014. Proceedings, Part II (Lecture Notes in Computer Science, Vol. 8725), Toon Calders, Floriana Esposito, Eyke Hüllermeier, and Rosa Meo (Eds.). Springer, 18--33.

Digital Library

[24]

Chris Lewis, Zhongpeng Lin, Caitlin Sadowski, Xiaoyan Zhu, Rong Ou, and E. James Whitehead Jr. 2013. Does Bug Prediction Support Human Developers? Findings from a Google Case Study. In Proceedings of the 2013 International Conference on Software Engineering (San Francisco, CA, USA) (ICSE '13). IEEE Press, 372--381.

[25]

Chenglin Li, Yangyang Zhao, and Yibiao Yang. 2022. An Empirical Study of the Bug Link Rate. In 2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS). IEEE, 177--188.

[26]

Gernot Liebchen and Martin Shepperd. 2016. Data Sets and Data Quality in Software Engineering: Eight Years On. In Proceedings of the The 12th International Conference on Predictive Models and Data Analytics in Software Engineering (Ciudad Real, Spain) (PROMISE 2016). Association for Computing Machinery, New York, NY, USA, Article 7, 4 pages.

Digital Library

[27]

Gernot A. Liebchen and Martin Shepperd. 2008. Data Sets and Data Quality in Software Engineering. In Proceedings of the 4th International Workshop on Predictor Models in Software Engineering (Leipzig, Germany) (PROMISE '08). Association for Computing Machinery, New York, NY, USA, 39--44.

Digital Library

[28]

Mingxia Liu, Linsong Miao, and Daoqiang Zhang. 2014. Two-Stage Cost-Sensitive Learning for Software Defect Prediction. IEEE Transactions on Reliability 63, 2 (2014), 676--686.

[29]

Shiran Liu, Zhaoqiang Guo, Yanhui Li, Chuanqi Wang, Lin Chen, Zhongbin Sun, Yuming Zhou, and Baowen Xu. 2022. Inconsistent defect labels: Essence, causes, and influence. IEEE Transactions on Software Engineering 49, 2 (2022), 586--610.

[30]

Huihua Lu, Ekrem Kocaguneli, and Bojan Cukic. 2014. Defect Prediction between Software Versions with Active Learning and Dimensionality Reduction. In 2014 IEEE 25th International Symposium on Software Reliability Engineering. 312--322.

Digital Library

[31]

Lech Madeyski and Marian Jureczko. 2014. Which process metrics can significantly improve defect prediction models? An empirical study. Software Quality Journal 23 (2014), 393 -- 422. https://api.semanticscholar.org/CorpusID:255071743

Digital Library

[32]

Suvodeep Majumder, Pranav V. Mody, and Tim Menzies. 2020. Revisiting process versus product metrics: a large scale analysis. Empirical Software Engineering 27 (2020). https://api.semanticscholar.org/CorpusID:221246191

[33]

Shane McIntosh and Yasutaka Kamei. 2017. Are Fix-Inducing Changes a Moving Target? A Longitudinal Case Study of Just-In-Time Defect Prediction. IEEE Transactions on Software Engineering PP (04 2017), 1--1.

[34]

Thomas Melluish, Craig Saunders, Ilia Nouretdinov, and Volodya Vovk. 2001. Comparing the Bayes and Typicalness Frameworks. In Machine Learning: ECML 2001, Luc De Raedt and Peter Flach (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 360--371.

[35]

Diego P.P. Mesquita, Lincoln S. Rocha, João Paulo P. Gomes, and Ajalmar R. Rocha Neto. 2016. Classification with reject option for software defect prediction. Applied Soft Computing 49 (2016), 1085--1093.

Digital Library

[36]

Chao Ni, Wei Wang, Kaiwen Yang, Xin Xia, Kui Liu, and David Lo. 2022. The best of both worlds: integrating semantic features with expert features for defect prediction and localization. 672--683.

Digital Library

[37]

Chao Ni, Xiaodan Xu, Kaiwen Yang, and David Lo. 2023. Boosting Just-in-Time Defect Prediction with Specific Features of C/C++ Programming Languages in Code Changes. In 20th IEEE/ACM International Conference on Mining Software Repositories, MSR 2023, Melbourne, Australia, May 15-16, 2023. IEEE, 472--484.

[38]

Chao Ni, Xiaodan Xu, Kaiwen Yang, and David Lo. 2023. Boosting Just-in-Time Defect Prediction with Specific Features of C/C++ Programming Languages in Code Changes. In 2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR). 472--484.

[39]

Chao Ni, Kaiwen Yang, Yan Zhu, Xiang Chen, and Xiaohu Yang. 2023. Unifying Defect Prediction, Categorization, and Repair by Multi-Task Deep Learning. 1980--1992.

Digital Library

[40]

Liang Niu, Jianwu Wan, Hongyuan Wang, and Kaiwei Zhou. 2020. Cost-Sensitive Dictionary Learning for Software Defect Prediction. Neural Process. Lett. 52, 3 (dec 2020), 2415--2449.

Digital Library

[41]

Sushant Kumar Pandey, Ravi Bhushan Mishra, and Anil Kumar Tripathi. 2021. Machine learning based methods for software fault prediction: A survey. Expert Systems with Applications 172 (2021), 114595.

[42]

Harris Papadopoulos, Kostas Proedrou, Volodya Vovk, and Alex Gammerman. 2002. Inductive Confidence Machines for Regression. In Machine Learning: ECML 2002, Tapio Elomaa, Heikki Mannila, and Hannu Toivonen (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 345--356.

[43]

Harris Papadopoulos, Volodya Vovk, and Alex Gammerman. 2007. Conformal Prediction with Neural Networks. In 19th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2007), October 29-31, 2007, Patras, Greece, Volume 2. IEEE Computer Society, 388--395.

Digital Library

[44]

Luca Pascarella, Fabio Palomba, and Alberto Bacchelli. 2019. Fine-grained justin-time defect prediction. J. Syst. Softw. 150 (2019), 22--36.

[45]

Chanathip Pornprasit and Chakkrit Kla Tantithamthavorn. 2021. JITLine: A Simpler, Better, Faster, Finer-grained Just-In-Time Defect Prediction. 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR) (2021), 369--379. https://api.semanticscholar.org/CorpusID:232223034

[46]

Chanathip Pornprasit and Chakkrit Kla Tantithamthavorn. 2023. DeepLineDP: Towards a Deep Learning Approach for Line-Level Defect Prediction. IEEE Transactions on Software Engineering 49, 1 (2023), 84--98.

[47]

Sophia Quach, Maxime Lamothe, Bram Adams, Yasutaka Kamei, and Weiyi Shang. 2021. Evaluating the Impact of Falsely Detected Performance Bug-Inducing Changes in JIT Models. Empirical Softw. Engg. 26, 5 (sep 2021), 32 pages.

Digital Library

[48]

Foyzur Rahman and Premkumar Devanbu. 2013. How, and why, process metrics are better. In 2013 35th International Conference on Software Engineering (ICSE). 432--441.

[49]

Santosh Rathore and Sandeep Kumar. 2021. An empirical study of ensemble techniques for software fault prediction. Applied Intelligence 51 (06 2021), 1--30.

Digital Library

[50]

Glenn Shafer and Vladimir Vovk. 2008. A Tutorial on Conformal Prediction. J. Mach. Learn. Res. 9 (jun 2008), 371--421.

[51]

Xhulja Shahini, Domenic Bubel, and Andreas Metzger. 2023. Variance of ML-based software fault predictors: are we really improving fault prediction? CoRR abs/2310.17264 (2023). arXiv:2310.17264

[52]

Mohammad Hossein Shaker and Eyke Hüllermeier. 2020. Aleatoric and Epistemic Uncertainty with Random Forests. In Advances in Intelligent Data Analysis XVIII - 18th International Symposium on Intelligent Data Analysis, IDA 2020, Konstanz, Germany, April 27-29, 2020, Proceedings (Lecture Notes in Computer Science, Vol. 12080), Michael R. Berthold, Ad Feelders, and Georg Krempl (Eds.). Springer, 444--456.

Digital Library

[53]

Martin Shepperd, Qinbao Song, Zhongbin Sun, and Carolyn Mair. 2013. Data Quality: Some Comments on the NASA Software Defect Datasets. Software Engineering, IEEE Transactions on 39 (09 2013), 1208--1215.

Digital Library

[54]

Jacek Sliwerski, Thomas Zimmermann, and Andreas Zeller. 2005. When do changes induce fixes? ACM SIGSOFT Softw. Eng. Notes 30, 4 (2005), 1--5.

Digital Library

[55]

Sadia Tabassum, Leandro L. Minku, Danyi Feng, George G. Cabral, and Liyan Song. 2020. An Investigation of Cross-Project Learning in Online Just-in-Time Software Defect Prediction. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (Seoul, South Korea) (ICSE '20). Association for Computing Machinery, New York, NY, USA, 554--565.

Digital Library

[56]

Vladimir Vovk. 2013. Conditional validity of inductive conformal predictors. Mach. Learn. 92, 2-3 (2013), 349--376.

Digital Library

[57]

Volodya Vovk, Alexander Gammerman, and Craig Saunders. 1999. Machine-Learning Applications of Algorithmic Randomness. In Proceedings of the Sixteenth International Conference on Machine Learning (ICML 1999), Bled, Slovenia, June 27 - 30, 1999, Ivan Bratko and Saso Dzeroski (Eds.). Morgan Kaufmann, 444--453.

[58]

Romi Wahono. 2015. A Systematic Literature Review of Software Defect Prediction: Research Trends, Datasets, Methods and Frameworks. Journal of Software Engineering 1 (05 2015).

[59]

Song Wang, Taiyue Liu, and Lin Tan. 2016. Automatically Learning Semantic Features for Defect Prediction. In Proceedings of the 38th International Conference on Software Engineering (Austin, Texas) (ICSE '16). Association for Computing Machinery, New York, NY, USA, 297--308.

Digital Library

[60]

Xinli Yang, David Lo, Xin Xia, and Jianling Sun. 2017. TLEL: A two-layer ensemble learning approach for just-in-time defect prediction. Information and Software Technology 87 (03 2017).

Digital Library

[61]

Xinli Yang, David Lo, Xin Xia, Yun Zhang, and Jianling Sun. 2015. Deep Learning for Just-in-Time Defect Prediction. In 2015 IEEE International Conference on Software Quality, Reliability and Security, QRS 2015, Vancouver, BC, Canada, August 3-5, 2015. IEEE, 17--26.

Digital Library

[62]

Yibiao Yang, Yuming Zhou, Jinping Liu, Yangyang Zhao, Hongmin Lu, Lei Xu, Baowen Xu, and Hareton Leung. 2016. Effort-aware just-in-time defect prediction: simple unsupervised models could be better than supervised models. In Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2016, Seattle, WA, USA, November 13-18, 2016, Thomas Zimmermann, Jane Cleland-Huang, and Zhendong Su (Eds.). ACM, 157--168.

Digital Library

[63]

Ruijie Chen Xiaolin Ju Yubin Qu, Xiang Chen and Jiangfeng Guo. 2019. Active Learning using Uncertainty Sampling and Query-by-Committee for Software Defect Prediction. International Journal of Performability Engineering 15, 10, Article 2701 (2019), 7 pages.

[64]

Zhengran Zeng, Yuqun Zhang, Haotian Zhang, and Lingming Zhang. 2021. Deep just-in-time defect prediction: how far are we?. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis (Virtual, Denmark) (ISSTA 2021). Association for Computing Machinery, New York, NY, USA, 427--438.

Digital Library

[65]

Huan Zhang, Li Kuang, Aolang Wu, Qiuming Zhao, and Xiaoxian Yang. 2023. Justin-time defect prediction enhanced by the joint method of line label fusion and file filtering. IET Software 17 (06 2023), n/a--n/a.

Digital Library

[66]

Tanghaoran Zhang, Yue Yu, Xinjun Mao, Yao Lu, Zhixing Li, and Huaimin Wang. 2022. FENSE: A Feature-Based Ensemble Modeling Approach to Cross-Project Just-in-Time Defect Prediction. Empirical Softw. Engg. 27, 7 (dec 2022), 41 pages.

Digital Library

[67]

Yunhua Zhao, Kostadin Damevski, and Hui Chen. 2023. A Systematic Survey of Just-in-Time Software Defect Prediction. ACM Comput. Surv. 55, 10, Article 201 (feb 2023), 35 pages.

Digital Library

[68]

Xin Zhou, Bowen Xu, DongGyun Han, Zhou Yang, Junda He, and David Lo. 2023. CCBERT: Self-Supervised Code Change Representation Learning. CoRR abs/2309.15474 (2023). arXiv:2309.15474

[69]

Ömer Faruk Arar and Kürşat Ayan. 2015. Software defect prediction using cost-sensitive neural network. Applied Soft Computing 33 (2015), 263--277.

Digital Library

Index Terms

An Empirical Study on Just-in-time Conformal Defect Prediction
1. Software and its engineering

Recommendations

Heterogeneous defect prediction
ESEC/FSE 2015: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering

Software defect prediction is one of the most active research areas in software engineering. We can build a prediction model with defect data collected from a software project and predict defects in the same project, i.e. within-project defect ...
Cross-project defect prediction: a large scale experiment on data vs. domain vs. process
ESEC/FSE '09: Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering

Prediction of software defects works well within projects as long as there is a sufficient amount of data available to train any models. However, this is rarely the case for new software projects and for many companies. So far, only a few have studies ...
Cross-project smell-based defect prediction
Abstract
Defect prediction is a technique introduced to optimize the testing phase of the software development pipeline by predicting which components in the software may contain defects. Its methodology trains a classifier with data regarding a set of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MSR '24: Proceedings of the 21st International Conference on Mining Software Repositories

April 2024

788 pages

ISBN:9798400705878

DOI:10.1145/3643991

Chair:
Diomidis Spinellis,
Program Chair:
Alberto Bacchelli,
Program Co-chair:
Eleni Constantinou

Copyright © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 July 2024

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MSR '24

Sponsor:

SIGSOFT

MSR '24: 21st International Conference on Mining Software Repositories

April 15 - 16, 2024

Lisbon, Portugal

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
46
Total Downloads

Downloads (Last 12 months)46
Downloads (Last 6 weeks)5

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents