skip to main content
10.1145/3643991.3644928acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

An Empirical Study on Just-in-time Conformal Defect Prediction

Published: 02 July 2024 Publication History

Abstract

Code changes can introduce defects that affect software quality and reliability. Just-in-time (JIT) defect prediction techniques provide feedback at check-in time on whether a code change is likely to contain defects. This immediate feedback allows practitioners to make timely decisions regarding potential defects. However, a prediction model may deliver false predictions, that may negatively affect practitioners' decisions. False positive predictions lead to unnecessarily spending resources on investigating clean code changes, while false negative predictions may result in overlooking defective changes. Knowing how uncertain a defect prediction is, would help practitioners to avoid wrong decisions. Previous research in defect prediction explored different approaches to quantify prediction uncertainty for supporting decision-making activities. However, these approaches only offer a heuristic quantification of uncertainty and do not provide guarantees.
In this study, we use conformal prediction (CP) as a rigorous uncertainty quantification approach on top of JIT defect predictors. We assess how often CP can provide guarantees for JIT defect predictions. We also assess how many false JIT defect predictions CP can filter out. We experiment with two state-of-the-art JIT defect prediction techniques (DeepJIT and CC2Vec) and two widely used datasets (Qt and OpenStack).
Our experiments show that CP can ensure correctness with a 95% probability, for only 27% (for DeepJIT) and 9% (for CC2Vec) of the JIT defect predictions. Additionally, our experiments indicate that CP might be a valuable technique for filtering out the false predictions of JIT defect predictors. CP can filter out up to 100% of false negative predictions and 90% of false positives generated by CC2Vec, and up to 86% of false negative predictions and 83% of false positives generated by DeepJIT.

References

[1]
Anastasios N. Angelopoulos and Stephen Bates. 2023. Conformal Prediction: A Gentle Introduction. Foundations and Trends® in Machine Learning 16, 4 (2023), 494--591.
[2]
Mohammad Azzeh, Yousef Alqasrawi, and Yousef Elsheikh. [n. d.]. A soft computing approach for software defect density prediction. Journal of Software: Evolution and Process ([n. d.]), e2553.
[3]
Jacob Barnett, Charles Gathuru, Luke Soldano, and Shane McIntosh. 2016. The relationship between commit message detail and defect proneness in Java projects on GitHub. 496--499.
[4]
Momotaz Begum and Tadashi Dohi. 2016. Prediction Interval of Cumulative Number of Software Faults Using Multilayer Perceptron. 619 (01 2016), 43--58.
[5]
Momotaz Begum, Mohammad Hossain, Md. Jakirul Islam, and Md. Samzid Hafiz. 2021. Long-term Software Fault Prediction with Robust Prediction Interval Analysis via Refined Artificial Neural Network (RANN) Approach. Engineering Letters 29 (08 2021), 44.
[6]
Tracy Hall David Bowes and Jean Petrić. 2017. Software defect prediction: do different classifiers find the same defects? Software Quality Journal 26, 2 (2017), 525--552.
[7]
Ruifeng Duan, Haitao Xu, Yuanrui Fan, and Meng Yan. 2021. The impact of duplicate changes on just-in-time defect prediction. IEEE Transactions on Reliability 71, 3 (2021), 1294--1308.
[8]
Yuanrui Fan, Xin Xia, Daniel Alencar da Costa, David Lo, Ahmed E. Hassan, and Shanping Li. 2021. The Impact of Mislabeled Changes by SZZ on Just-in-Time Defect Prediction. IEEE Transactions on Software Engineering 47, 8 (2021), 1559--1586.
[9]
N. Fenton, P. Krause, and M. Neil. 2002. Software measurement: uncertainty and causal modeling. IEEE Software 19, 4 (2002), 116--122.
[10]
Yoav Freund and Llew Mason. 1999. The Alternating Decision Tree Learning Algorithm. In ICML. 124--133. http://www.lsmason.com/papers/ICML99-AlternatingTrees.pdf
[11]
Shikai Guo, Dongmin Li, Lin Huang, Sijia Lv, Rong Chen, Hui Li, Xiaochen Li, and He Jiang. 2023. Estimating Uncertainty in Labeled Changes by SZZ Tools on Just-In-Time Defect Prediction. ACM Trans. Softw. Eng. Methodol. (dec 2023).
[12]
María José Hernández-Molinos, Angel J. Sánchez-García, Rocío Erandi Barrientos-Martínez, Juan Carlos Pérez-Arriaga, and Jorge Octavio Ocharán-Hernández. 2023. Software Defect Prediction with Bayesian Approaches. Mathematics 11, 11 (2023).
[13]
Kim Herzig, Sascha Just, and Andreas Zeller. 2015. The impact of tangled code changes on defect prediction models. Empirical Software Engineering 21 (04 2015).
[14]
Thong Hoang, Hoa Khanh Dam, Yasutaka Kamei, David Lo, and Naoyasu Ubayashi. 2019. DeepJIT: an end-to-end deep learning framework for just-in-time defect prediction. In Proceedings of the 16th International Conference on Mining Software Repositories, MSR 2019, 26-27 May 2019, Montreal, Canada, Margaret-Anne D. Storey, Bram Adams, and Sonia Haiduc (Eds.). IEEE / ACM, 34--45.
[15]
Thong Hoang, Hong Jin Kang, David Lo, and Julia Lawall. 2020. Cc2vec: Distributed representations of code changes. Proceedings - International Conference on Software Engineering (2020), 518 -- 529. All Open Access, Green Open Access.
[16]
Eyke Hüllermeier and Willem Waegeman. 2021. Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods. Mach. Learn. 110, 3 (2021), 457--506.
[17]
Tian Jiang, Lin Tan, and Sunghun Kim. 2013. Personalized defect prediction. In 2013 28th IEEE/ACM International Conference on Automated Software Engineering, ASE 2013, Silicon Valley, CA, USA, November 11-15, 2013, Ewen Denney, Tevfik Bultan, and Andreas Zeller (Eds.). IEEE, 279--289.
[18]
Yue Jiang, Bojan Cukic, and Tim Menzies. 2008. Cost Curve Evaluation of Fault Prediction Models. Proceedings - International Symposium on Software Reliability Engineering, ISSRE, 197--206.
[19]
Jirayus Jiarpakdee, Chakkrit Kla Tantithamthavorn, Hoa Khanh Dam, and John Grundy. 2022. An Empirical Study of Model-Agnostic Techniques for Defect Prediction Models. IEEE Transactions on Software Engineering 48, 1 (2022), 166--185.
[20]
Yasutaka Kamei, Takafumi Fukushima, Shane McIntosh, Kazuhiro Yamashita, Naoyasu Ubayashi, and Ahmed E. Hassan. 2016. Studying just-in-time defect prediction using cross-project models. Empirical Software Engineering 21 (10 2016), 2072--2106.
[21]
Yasutaka Kamei, Emad Shihab, Bram Adams, Ahmed E. Hassan, Audris Mockus, Anand Sinha, and Naoyasu Ubayashi. 2013. A Large-Scale Empirical Study of Just-in-Time Quality Assurance. Software Engineering, IEEE Transactions on 39 (06 2013), 757--773.
[22]
Taghi M. Khoshgoftaar, Xiaojing Yuan, Edward B. Allen, Wendell D. Jones, and John P. Hudepohl. 2002. Uncertain Classification of Fault-Prone Software Modules. Empirical Software Engineering 7 (2002), 297--318.
[23]
Meelis Kull and Peter A. Flach. 2014. Reliability Maps: A Tool to Enhance Probability Estimates and Improve Classification Accuracy. In Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2014, Nancy, France, September 15-19, 2014. Proceedings, Part II (Lecture Notes in Computer Science, Vol. 8725), Toon Calders, Floriana Esposito, Eyke Hüllermeier, and Rosa Meo (Eds.). Springer, 18--33.
[24]
Chris Lewis, Zhongpeng Lin, Caitlin Sadowski, Xiaoyan Zhu, Rong Ou, and E. James Whitehead Jr. 2013. Does Bug Prediction Support Human Developers? Findings from a Google Case Study. In Proceedings of the 2013 International Conference on Software Engineering (San Francisco, CA, USA) (ICSE '13). IEEE Press, 372--381.
[25]
Chenglin Li, Yangyang Zhao, and Yibiao Yang. 2022. An Empirical Study of the Bug Link Rate. In 2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS). IEEE, 177--188.
[26]
Gernot Liebchen and Martin Shepperd. 2016. Data Sets and Data Quality in Software Engineering: Eight Years On. In Proceedings of the The 12th International Conference on Predictive Models and Data Analytics in Software Engineering (Ciudad Real, Spain) (PROMISE 2016). Association for Computing Machinery, New York, NY, USA, Article 7, 4 pages.
[27]
Gernot A. Liebchen and Martin Shepperd. 2008. Data Sets and Data Quality in Software Engineering. In Proceedings of the 4th International Workshop on Predictor Models in Software Engineering (Leipzig, Germany) (PROMISE '08). Association for Computing Machinery, New York, NY, USA, 39--44.
[28]
Mingxia Liu, Linsong Miao, and Daoqiang Zhang. 2014. Two-Stage Cost-Sensitive Learning for Software Defect Prediction. IEEE Transactions on Reliability 63, 2 (2014), 676--686.
[29]
Shiran Liu, Zhaoqiang Guo, Yanhui Li, Chuanqi Wang, Lin Chen, Zhongbin Sun, Yuming Zhou, and Baowen Xu. 2022. Inconsistent defect labels: Essence, causes, and influence. IEEE Transactions on Software Engineering 49, 2 (2022), 586--610.
[30]
Huihua Lu, Ekrem Kocaguneli, and Bojan Cukic. 2014. Defect Prediction between Software Versions with Active Learning and Dimensionality Reduction. In 2014 IEEE 25th International Symposium on Software Reliability Engineering. 312--322.
[31]
Lech Madeyski and Marian Jureczko. 2014. Which process metrics can significantly improve defect prediction models? An empirical study. Software Quality Journal 23 (2014), 393 -- 422. https://api.semanticscholar.org/CorpusID:255071743
[32]
Suvodeep Majumder, Pranav V. Mody, and Tim Menzies. 2020. Revisiting process versus product metrics: a large scale analysis. Empirical Software Engineering 27 (2020). https://api.semanticscholar.org/CorpusID:221246191
[33]
Shane McIntosh and Yasutaka Kamei. 2017. Are Fix-Inducing Changes a Moving Target? A Longitudinal Case Study of Just-In-Time Defect Prediction. IEEE Transactions on Software Engineering PP (04 2017), 1--1.
[34]
Thomas Melluish, Craig Saunders, Ilia Nouretdinov, and Volodya Vovk. 2001. Comparing the Bayes and Typicalness Frameworks. In Machine Learning: ECML 2001, Luc De Raedt and Peter Flach (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 360--371.
[35]
Diego P.P. Mesquita, Lincoln S. Rocha, João Paulo P. Gomes, and Ajalmar R. Rocha Neto. 2016. Classification with reject option for software defect prediction. Applied Soft Computing 49 (2016), 1085--1093.
[36]
Chao Ni, Wei Wang, Kaiwen Yang, Xin Xia, Kui Liu, and David Lo. 2022. The best of both worlds: integrating semantic features with expert features for defect prediction and localization. 672--683.
[37]
Chao Ni, Xiaodan Xu, Kaiwen Yang, and David Lo. 2023. Boosting Just-in-Time Defect Prediction with Specific Features of C/C++ Programming Languages in Code Changes. In 20th IEEE/ACM International Conference on Mining Software Repositories, MSR 2023, Melbourne, Australia, May 15-16, 2023. IEEE, 472--484.
[38]
Chao Ni, Xiaodan Xu, Kaiwen Yang, and David Lo. 2023. Boosting Just-in-Time Defect Prediction with Specific Features of C/C++ Programming Languages in Code Changes. In 2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR). 472--484.
[39]
Chao Ni, Kaiwen Yang, Yan Zhu, Xiang Chen, and Xiaohu Yang. 2023. Unifying Defect Prediction, Categorization, and Repair by Multi-Task Deep Learning. 1980--1992.
[40]
Liang Niu, Jianwu Wan, Hongyuan Wang, and Kaiwei Zhou. 2020. Cost-Sensitive Dictionary Learning for Software Defect Prediction. Neural Process. Lett. 52, 3 (dec 2020), 2415--2449.
[41]
Sushant Kumar Pandey, Ravi Bhushan Mishra, and Anil Kumar Tripathi. 2021. Machine learning based methods for software fault prediction: A survey. Expert Systems with Applications 172 (2021), 114595.
[42]
Harris Papadopoulos, Kostas Proedrou, Volodya Vovk, and Alex Gammerman. 2002. Inductive Confidence Machines for Regression. In Machine Learning: ECML 2002, Tapio Elomaa, Heikki Mannila, and Hannu Toivonen (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 345--356.
[43]
Harris Papadopoulos, Volodya Vovk, and Alex Gammerman. 2007. Conformal Prediction with Neural Networks. In 19th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2007), October 29-31, 2007, Patras, Greece, Volume 2. IEEE Computer Society, 388--395.
[44]
Luca Pascarella, Fabio Palomba, and Alberto Bacchelli. 2019. Fine-grained justin-time defect prediction. J. Syst. Softw. 150 (2019), 22--36.
[45]
Chanathip Pornprasit and Chakkrit Kla Tantithamthavorn. 2021. JITLine: A Simpler, Better, Faster, Finer-grained Just-In-Time Defect Prediction. 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR) (2021), 369--379. https://api.semanticscholar.org/CorpusID:232223034
[46]
Chanathip Pornprasit and Chakkrit Kla Tantithamthavorn. 2023. DeepLineDP: Towards a Deep Learning Approach for Line-Level Defect Prediction. IEEE Transactions on Software Engineering 49, 1 (2023), 84--98.
[47]
Sophia Quach, Maxime Lamothe, Bram Adams, Yasutaka Kamei, and Weiyi Shang. 2021. Evaluating the Impact of Falsely Detected Performance Bug-Inducing Changes in JIT Models. Empirical Softw. Engg. 26, 5 (sep 2021), 32 pages.
[48]
Foyzur Rahman and Premkumar Devanbu. 2013. How, and why, process metrics are better. In 2013 35th International Conference on Software Engineering (ICSE). 432--441.
[49]
Santosh Rathore and Sandeep Kumar. 2021. An empirical study of ensemble techniques for software fault prediction. Applied Intelligence 51 (06 2021), 1--30.
[50]
Glenn Shafer and Vladimir Vovk. 2008. A Tutorial on Conformal Prediction. J. Mach. Learn. Res. 9 (jun 2008), 371--421.
[51]
Xhulja Shahini, Domenic Bubel, and Andreas Metzger. 2023. Variance of ML-based software fault predictors: are we really improving fault prediction? CoRR abs/2310.17264 (2023). arXiv:2310.17264
[52]
Mohammad Hossein Shaker and Eyke Hüllermeier. 2020. Aleatoric and Epistemic Uncertainty with Random Forests. In Advances in Intelligent Data Analysis XVIII - 18th International Symposium on Intelligent Data Analysis, IDA 2020, Konstanz, Germany, April 27-29, 2020, Proceedings (Lecture Notes in Computer Science, Vol. 12080), Michael R. Berthold, Ad Feelders, and Georg Krempl (Eds.). Springer, 444--456.
[53]
Martin Shepperd, Qinbao Song, Zhongbin Sun, and Carolyn Mair. 2013. Data Quality: Some Comments on the NASA Software Defect Datasets. Software Engineering, IEEE Transactions on 39 (09 2013), 1208--1215.
[54]
Jacek Sliwerski, Thomas Zimmermann, and Andreas Zeller. 2005. When do changes induce fixes? ACM SIGSOFT Softw. Eng. Notes 30, 4 (2005), 1--5.
[55]
Sadia Tabassum, Leandro L. Minku, Danyi Feng, George G. Cabral, and Liyan Song. 2020. An Investigation of Cross-Project Learning in Online Just-in-Time Software Defect Prediction. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (Seoul, South Korea) (ICSE '20). Association for Computing Machinery, New York, NY, USA, 554--565.
[56]
Vladimir Vovk. 2013. Conditional validity of inductive conformal predictors. Mach. Learn. 92, 2-3 (2013), 349--376.
[57]
Volodya Vovk, Alexander Gammerman, and Craig Saunders. 1999. Machine-Learning Applications of Algorithmic Randomness. In Proceedings of the Sixteenth International Conference on Machine Learning (ICML 1999), Bled, Slovenia, June 27 - 30, 1999, Ivan Bratko and Saso Dzeroski (Eds.). Morgan Kaufmann, 444--453.
[58]
Romi Wahono. 2015. A Systematic Literature Review of Software Defect Prediction: Research Trends, Datasets, Methods and Frameworks. Journal of Software Engineering 1 (05 2015).
[59]
Song Wang, Taiyue Liu, and Lin Tan. 2016. Automatically Learning Semantic Features for Defect Prediction. In Proceedings of the 38th International Conference on Software Engineering (Austin, Texas) (ICSE '16). Association for Computing Machinery, New York, NY, USA, 297--308.
[60]
Xinli Yang, David Lo, Xin Xia, and Jianling Sun. 2017. TLEL: A two-layer ensemble learning approach for just-in-time defect prediction. Information and Software Technology 87 (03 2017).
[61]
Xinli Yang, David Lo, Xin Xia, Yun Zhang, and Jianling Sun. 2015. Deep Learning for Just-in-Time Defect Prediction. In 2015 IEEE International Conference on Software Quality, Reliability and Security, QRS 2015, Vancouver, BC, Canada, August 3-5, 2015. IEEE, 17--26.
[62]
Yibiao Yang, Yuming Zhou, Jinping Liu, Yangyang Zhao, Hongmin Lu, Lei Xu, Baowen Xu, and Hareton Leung. 2016. Effort-aware just-in-time defect prediction: simple unsupervised models could be better than supervised models. In Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2016, Seattle, WA, USA, November 13-18, 2016, Thomas Zimmermann, Jane Cleland-Huang, and Zhendong Su (Eds.). ACM, 157--168.
[63]
Ruijie Chen Xiaolin Ju Yubin Qu, Xiang Chen and Jiangfeng Guo. 2019. Active Learning using Uncertainty Sampling and Query-by-Committee for Software Defect Prediction. International Journal of Performability Engineering 15, 10, Article 2701 (2019), 7 pages.
[64]
Zhengran Zeng, Yuqun Zhang, Haotian Zhang, and Lingming Zhang. 2021. Deep just-in-time defect prediction: how far are we?. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis (Virtual, Denmark) (ISSTA 2021). Association for Computing Machinery, New York, NY, USA, 427--438.
[65]
Huan Zhang, Li Kuang, Aolang Wu, Qiuming Zhao, and Xiaoxian Yang. 2023. Justin-time defect prediction enhanced by the joint method of line label fusion and file filtering. IET Software 17 (06 2023), n/a--n/a.
[66]
Tanghaoran Zhang, Yue Yu, Xinjun Mao, Yao Lu, Zhixing Li, and Huaimin Wang. 2022. FENSE: A Feature-Based Ensemble Modeling Approach to Cross-Project Just-in-Time Defect Prediction. Empirical Softw. Engg. 27, 7 (dec 2022), 41 pages.
[67]
Yunhua Zhao, Kostadin Damevski, and Hui Chen. 2023. A Systematic Survey of Just-in-Time Software Defect Prediction. ACM Comput. Surv. 55, 10, Article 201 (feb 2023), 35 pages.
[68]
Xin Zhou, Bowen Xu, DongGyun Han, Zhou Yang, Junda He, and David Lo. 2023. CCBERT: Self-Supervised Code Change Representation Learning. CoRR abs/2309.15474 (2023). arXiv:2309.15474
[69]
Ömer Faruk Arar and Kürşat Ayan. 2015. Software defect prediction using cost-sensitive neural network. Applied Soft Computing 33 (2015), 263--277.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MSR '24: Proceedings of the 21st International Conference on Mining Software Repositories
April 2024
788 pages
ISBN:9798400705878
DOI:10.1145/3643991
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 July 2024

Check for updates

Author Tags

  1. defect prediction
  2. quality assurance
  3. conformal prediction
  4. machine learning
  5. deep learning
  6. correctness guarantees
  7. uncertainty

Qualifiers

  • Research-article

Conference

MSR '24
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 46
    Total Downloads
  • Downloads (Last 12 months)46
  • Downloads (Last 6 weeks)5
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media