skip to main content
10.1145/3340482.3342742acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

SZZ unleashed: an open implementation of the SZZ algorithm - featuring example usage in a study of just-in-time bug prediction for the Jenkins project

Published: 27 August 2019 Publication History

Abstract

Machine learning applications in software engineering often rely on detailed information about bugs. While issue trackers often contain information about when bugs were fixed, details about when they were introduced to the system are often absent. As a remedy, researchers often rely on the SZZ algorithm as a heuristic approach to identify bug-introducing software changes. Unfortunately, as reported in a recent systematic literature review, few researchers have made their SZZ implementations publicly available. Consequently, there is a risk that research effort is wasted as new projects based on SZZ output need to initially reimplement the approach. Furthermore, there is a risk that newly developed (closed source) SZZ implementations have not been properly tested, thus conducting research based on their output might introduce threats to validity. We present SZZ Unleashed, an open implementation of the SZZ algorithm for git repositories. This paper describes our implementation along with a usage example for the Jenkins project, and conclude with an illustrative study on just-in-time bug prediction. We hope to continue evolving SZZ Unleashed on GitHub, and warmly invite the community to contribute.

References

[1]
K. Berg and O. Svensson. 2018. SZZ Unleashed: Bug Prediction on the Jenkins Core Repository (Open Source Implementations of Bug Prediction Tools on Commit Level). Msc Thesis, Lund University, Sweden.
[2]
G. Canfora, L. Cerulo, and M. Di Penta. 2007. Identifying Changed Source Code Lines from Version Repositories. In Proc. of the 4th International Workshop on Mining Software Repositories.
[3]
Y. Cavalcanti, P. Silveira Neto, I. Machado, T. Vale, E. Almeida, and S. Meira. 2014. Challenges and Opportunities for Software Change Request Repositories: A Systematic Mapping Study. Journal of Software: Evolution and Process 26, 7 (2014), 620–653.
[4]
J. Correia. 2017. old-szz. https://github.com/intelligentagents/old-szz
[5]
J. Czerwonka, R. Das, N. Nagappan, A. Tarvo, and A. Teterev. 2011. CRANE: Failure Prediction, Change Analysis and Test Prioritization in Practice âĂŞ Experiences from Windows. In Proc. of the 4th Conference on Software Testing, Verification and Validation. 357–366.
[6]
M. D’Ambros, M. Lanza, and R. Robbes. 2009. On the Relationship Between Change Coupling and Software Defects. In 2009 16th Working Conference on Reverse Engineering. 135–144.
[7]
M. de Freitas Farias, R. Novais, M. Junior, L. da Silva Carvalho, M. Mendonca, and R. Spinola. 2016. A Systematic Mapping Study on Mining Software Repositories. In Proc. of the 31st Annual ACM Symposium on Applied Computing. 1472–1479.
[8]
E. Engström, P. Runeson, and M. Skoglund. 2010. A Systematic Review on Regression Test Selection Techniques. Information and Software Technology 52, 1 (2010), 14–30.
[9]
N. Fenton and M. Neil. 1999. A Critique of Software Defect Prediction Models. Transactions on Software Engineering 25, 5 (1999), 675–689. 1109/32.815326
[10]
M. Godfrey and L. Zou. 2005. Using Origin Analysis to Detect Merging and Splitting of Source Code Entities. Transactions on Software Engineering 31, 2 (2005), 166–181.
[11]
T. Hall, S. Beecham, D. Bowes, D. Gray, and S. Counsell. 2012. A Systematic Literature Review on Fault Prediction Performance in Software Engineering. Transactions on Software Engineering 38, 6 (2012), 1276–1304.
[12]
1109/TSE.2011.103
[13]
L. Jonsson, M. Borg, D. Broman, K. Sandahl, S. Eldh, and P. Runeson. 2016. Automated Bug Assignment: Ensemble-based Machine Learning in Large Scale Industrial Contexts. Empirical Software Engineering 21, 4 (2016), 1533–1578.
[14]
Y. Kamei, E. Shihab, B. Adams, A. Hassan, A. Mockus, A. Sinha, and N. Ubayashi. 2013. A Large-scale Empirical Study of Just-in-Time Quality Assurance. In Transactions on Software Engineering, Vol. 39. 757–773. TSE.2012.70
[15]
Sunghun Kim, Thomas Zimmermann, Kai Pan, and Whitehead James. 2006. Automatic identification of bug-introducing changes. In Proc. of the 21st International Conference on Automated Software Engineering. 81–90.
[16]
G. Lemaitre, F. Nogueira, and C. Aridas. 2017. Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning. Journal of Machine Learning Research 18, 17 (2017), 1–5.
[17]
R. Moser, W. Pedrycz, and G. Succi. 2008. A Comparative Analysis of the Efficiency of Change Metrics and Static Code Attributes for Defect Prediction. In Proc.of the 30th International Conference on Software Engineering. 181–190. org/10.1145/1368088.1368114
[18]
N. Nagappan and T. Ball. 2005. Use of Relative Code Churn Measures to Predict System Defect Density. In Proc. of the 27th International Conference on Software Engineering. 284–292.
[19]
F. Rahman, S. Khatri, E. Barr, and P. Devanbu. 2014. Comparing Static Bug Finders and Statistical Prediction. In Proc. of the 36th International Conference on Software Engineering. 424–434.
[20]
G. Rodriguez-Perez, G. Robles, and J. Gonzalez-Barahona. 2018. Reproducibility and Credibility in Empirical Software Engineering: A Case Study based on a Systematic Literature Review of the use of the SZZ algorithm. Information and Software Technology (2018).
[21]
C. Rosen, B. Grawi, and E. Shihab. 2015. Commit Guru: Analytics and Risk Prediction of Software Commits. In Proc. of the 2015 10th Joint Meeting on Foundations of Software Engineering. 966–969.
[22]
J. Sliwerski, T. Zimmermann, and A. Zeller. 2005. When Do Changes Induce Fixes?. In Proc. of the 2005 International Workshop on Mining Software Repositories, Vol. 30. 1–5.
[23]
M. Sohn, S. Pearce, A. Loskutov, C. Aniszczyk, C. Halstrick, C. Ranger, D. Borowitz, D. Pursehouse, G. Wagenknecht, J. Nieder, K. Sawicki, M. Kinzler, R. Rosenberg, R. Stocker, S. Zivkov, S. Lay, T. Parker, and T. Wolf. 2017. Eclipse JGit. https: //github.com/eclipse/jgit
[24]
O. Svensson and K. Berg. 2018. SZZ Unleashed. https://github.com/wogscpar/ SZZUnleashed
[25]
M. Tan, L. Tan, S. Dara, and C. Mayeux. 2015. Online Defect Prediction for Imbalanced Data. In Proc. of the 37th International Conference on Software Engineering. 99–108.
[26]
A. Tornhill. 2017. code-maat. https://github.com/adamtornhill/code-maat
[27]
C. Williams and J. Spacco. 2008. SZZ Revisited: Verifying When Changes Induce Fixes. In Proc. of the 2008 Workshop on Defects in Large Software Systems. 32–36.

Cited By

View all

Index Terms

  1. SZZ unleashed: an open implementation of the SZZ algorithm - featuring example usage in a study of just-in-time bug prediction for the Jenkins project

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        MaLTeSQuE 2019: Proceedings of the 3rd ACM SIGSOFT International Workshop on Machine Learning Techniques for Software Quality Evaluation
        August 2019
        42 pages
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 27 August 2019

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. SZZ
        2. defect prediction
        3. issue tracking
        4. mining software repositories

        Qualifiers

        • Research-article

        Funding Sources

        Conference

        ESEC/FSE '19
        Sponsor:

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)69
        • Downloads (Last 6 weeks)15
        Reflects downloads up to 25 Dec 2024

        Other Metrics

        Citations

        Cited By

        View all

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media