short-paper

The relationship between commit message detail and defect proneness in Java projects on GitHub

Authors:

Jacob G. Barnett,

Charles K. Gathuru,

Luke S. Soldano,

Shane McIntoshAuthors Info & Claims

MSR '16: Proceedings of the 13th International Conference on Mining Software Repositories

Pages 496 - 499

https://doi.org/10.1145/2901739.2903496

Published: 14 May 2016 Publication History

Abstract

Just-In-Time (JIT) defect prediction models aim to predict the commits that will introduce defects in the future. Traditionally, JIT defect prediction models are trained using metrics that are primarily derived from aspects of the code change itself (e.g., the size of the change, the author's prior experience). In addition to the code that is submitted during a commit, authors write commit messages, which describe the commit for archival purposes. It is our position that the level of detail in these commit messages can provide additional explanatory power to JIT defect prediction models. Hence, in this paper, we analyze the relationship between the defect proneness of commits and commit message volume (i.e., the length of the commit message) and commit message content (approximated using spam filtering technology). Through analysis of JIT models that were trained using 342 GitHub repositories, we find that our JIT models outperform random guessing models, achieving AUC and Brier scores that range between 0.63-0.96 and 0.01-0.21, respectively. Furthermore, our metrics that are derived from commit message detail provide a statistically significant boost to the explanatory power to the JIT models in 43%-80% of the studied systems, accounting for up to 72% of the explanatory power. Future JIT studies should consider adding commit message detail metrics.

References

[1]

S. Bird, E. Loper, and E. Klein. Natural Language Processing with Python. O'Reilly Media Inc., 2009.

Digital Library

[2]

J. M. Chambers and T. J. Hastie, editors. Statistical Models in S. Wadsworth and Brooks/Cole, 1992.

Digital Library

[3]

R. Dyer, H. A. Nguyen, H. Rajan, and T. N. Nguyen. Boa: A Language and Infrastructure for Analyzing Ultra-Large-Scale Software Repositories. In Proc. of the 35th Int'l Conf. on Software Engineering (ICSE), pages 422--431, 2013.

Digital Library

[4]

B. Efron. How Biased is the Apparent Error Rate of a Prediction Rule? Journal of the American Statistical Association, 81(394):461--470, 1986.

[5]

T. Fukushima, Y. Kamei, S. McIntosh, K. Yamashita, and N. Ubayashi. An Empirical Study of Just-in-Time Defect Prediction using Cross-Project Models. In Proc. of the 11th Working Conf. on Mining Software Repositories (MSR), pages 172--181, 2014.

Digital Library

[6]

Y. Kamei, T. Fukushima, S. McIntosh, K. Yamashita, N. Ubayashi, and A. E. Hassan. Studying Just-In-Time Defect Prediction using Cross-Project Models. Empirical Software Engineering, To appear, 2016.

[7]

Y. Kamei, E. Shihab, B. Adams, A. E. Hassan, A. Mockus, A. Sinha, and N. Ubayashi. A Large-Scale Empirical Study of Just-in-Time Quality Assurance. Transactions on Software Engineering (TSE), 39(6):757--773, 2013.

Digital Library

[8]

S. Kim, E. J. Whitehead, Jr., and Y. Zhang. Classifying software changes: Clean or buggy? Transactions on Software Engineering (TSE), 34(2):181--196, 2008.

Digital Library

[9]

S. McIntosh, Y. Kamei, B. Adams, and A. E. Hassan. An Empirical Study of the Impact of Modern Code Review Practices on Software Quality. Empirical Software Engineering, In press, 2015.

[10]

A. Mockus and D. M. Weiss. Predicting Risk of Software Changes. Bell Labs Technical Journal, 5(2):169--180, 2000.

[11]

T. Myer and B. Whately. SpamBayes: Effective open-source, Bayesian based, email classification system. In Proc. of the 7th Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference, 2004.

[12]

C. Rosen, B. Grawi, and E. Shihab. Commit guru: Analytics and risk prediction of software commits. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2015, pages 966--969, New York, NY, USA, 2015. ACM.

Digital Library

[13]

E. Shihab, A. E. Hassan, B. Adams, and Z. M. Zhang. An Industrial Study on the Risk of Software Change. In Proc. of the 20th Symposium on the Foundations of Software Engineering (FSE), pages 62:1--62:11, 2012.

Digital Library

[14]

E. Shihab, A. Ihara, Y., Kamei, W. M. Ibrahim, M. Ohira, B. Adams, A. E. Hassan, and K. Matsumoto. Studying re-opened bugs in open source software. Empirical Software Engineering, 18(5): 1005--1042, 2012.

[15]

M. Tan, L. Tan, S. Dara, and C. Mayeux. Online Defect Prediction for Imbalanced Data. In Proc. of the 37th Int'l Conf. on Software Engineering (ICSE), volume 2, pages 99--108, 2015.

Digital Library

Cited By

Heričko TŠumak BKarakatič S(2024)Commit-Level Software Change Intent Classification Using a Pre-Trained Transformer-Based Code ModelMathematics10.3390/math1207101212:7(1012)Online publication date: 28-Mar-2024
https://doi.org/10.3390/math12071012
Lee GJu HLee SFilkov VRay BZhou M(2024)NeuroJIT: Improving Just-In-Time Defect Prediction Using Neurophysiological and Empirical Perceptions of Modern DevelopersProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695056(594-605)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695056
Shahini XMetzger APohl KSpinellis DConstantinou EBacchelli A(2024)An Empirical Study on Just-in-time Conformal Defect PredictionProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644928(88-99)Online publication date: 15-Apr-2024
https://dl.acm.org/doi/10.1145/3643991.3644928
Show More Cited By

Recommendations

What makes a good commit message?
ICSE '22: Proceedings of the 44th International Conference on Software Engineering

A key issue in collaborative software development is communication among developers. One modality of communication is a commit message, in which developers describe the changes they make in a repository. As such, commit messages serve as an "audit trail" ...
Commit Message Matters: Investigating Impact and Evolution of Commit Message Quality
ICSE '23: Proceedings of the 45th International Conference on Software Engineering

Commit messages play an important role in communication among developers. To measure the quality of commit messages, researchers have defined what semantically constitutes a Good commit message: it should have both the summary of the code change (What)...
An efficient software transactional memory using commit-time invalidation
CGO '10: Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization

To improve the performance of transactional memory (TM), researchers have found many eager and lazy optimizations for conflict detection, the process of determining if transactions can commit. Despite these optimizations, nearly all TMs perform one ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MSR '16: Proceedings of the 13th International Conference on Mining Software Repositories

May 2016

544 pages

ISBN:9781450341868

DOI:10.1145/2901739

General Chair:
Miryung Kim
University of California, Los Angeles
,
Program Chairs:
Romain Robbes
University of Chile, Chile
,
Christian Bird
Microsoft Research

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

ACM: Association for Computing Machinery
SIGSOFT: ACM Special Interest Group on Software Engineering
IEEE-CS\DATC: IEEE Computer Society
TCSE: IEEE Computer Society's Tech. Council on Software Engin.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 May 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Short-paper

Conference

ICSE '16

Sponsor:

ACM
SIGSOFT
IEEE-CS\DATC
TCSE

ICSE '16: 38th International Conference on Software Engineering

May 14 - 22, 2016

Texas, Austin

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

18
Total Citations
View Citations
411
Total Downloads

Downloads (Last 12 months)43
Downloads (Last 6 weeks)6

Reflects downloads up to 24 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Heričko TŠumak BKarakatič S(2024)Commit-Level Software Change Intent Classification Using a Pre-Trained Transformer-Based Code ModelMathematics10.3390/math1207101212:7(1012)Online publication date: 28-Mar-2024
https://doi.org/10.3390/math12071012
Lee GJu HLee SFilkov VRay BZhou M(2024)NeuroJIT: Improving Just-In-Time Defect Prediction Using Neurophysiological and Empirical Perceptions of Modern DevelopersProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695056(594-605)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695056
Shahini XMetzger APohl KSpinellis DConstantinou EBacchelli A(2024)An Empirical Study on Just-in-time Conformal Defect PredictionProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644928(88-99)Online publication date: 15-Apr-2024
https://dl.acm.org/doi/10.1145/3643991.3644928
Tao WZhou YWang YZhang HWang HZhang W(2024)KADEL: Knowledge-Aware Denoising Learning for Commit Message GenerationACM Transactions on Software Engineering and Methodology10.1145/364367533:5(1-32)Online publication date: 4-Jun-2024
https://dl.acm.org/doi/10.1145/3643675
Mandli ARajput SSharma T(2024)COMET: Generating commit messages using delta graph context representationJournal of Systems and Software10.1016/j.jss.2024.112307(112307)Online publication date: Dec-2024
https://doi.org/10.1016/j.jss.2024.112307
Wimalasooriya CLicorish Sda Costa DMacDonell S(2024)Just-in-Time crash prediction for mobile appsEmpirical Software Engineering10.1007/s10664-024-10455-729:3Online publication date: 8-May-2024
https://doi.org/10.1007/s10664-024-10455-7
Bryan JMoriano P(2023)Graph-based machine learning improves just-in-time defect predictionPLOS ONE10.1371/journal.pone.028407718:4(e0284077)Online publication date: 13-Apr-2023
https://doi.org/10.1371/journal.pone.0284077
Mo RZhang YWang YZhang SXiong PLi ZZhao Y(2023)Exploring the Impact of Code Clones on Deep Learning SoftwareACM Transactions on Software Engineering and Methodology10.1145/360718132:6(1-34)Online publication date: 3-Jul-2023
https://dl.acm.org/doi/10.1145/3607181
Zhao YDamevski KChen H(2023)A Systematic Survey of Just-in-Time Software Defect PredictionACM Computing Surveys10.1145/356755055:10(1-35)Online publication date: 2-Feb-2023
https://dl.acm.org/doi/10.1145/3567550
Yang YRonchieri ECanaparo M(2022)Natural Language Processing Application on Commit Messages: A Case Study on HEP SoftwareApplied Sciences10.3390/app12211077312:21(10773)Online publication date: 24-Oct-2022
https://doi.org/10.3390/app122110773
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents