research-article

A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction

Authors:

Witold Pedrycz,

Giancarlo SucciAuthors Info & Claims

ICSE '08: Proceedings of the 30th international conference on Software engineering

Pages 181 - 190

https://doi.org/10.1145/1368088.1368114

Published: 10 May 2008 Publication History

Abstract

In this paper we present a comparative analysis of the predictive power of two different sets of metrics for defect prediction. We choose one set of product related and one set of process related software metrics and use them for classifying Java files of the Eclipse project as defective respective defect-free. Classification models are built using three common machine learners: logistic regression, Naïve Bayes, and decision trees. To allow different costs for prediction errors we perform cost-sensitive classification, which proves to be very successful: >75% percentage of correctly classified files, a recall of >80%, and a false positive rate <30%. Results indicate that for the Eclipse data, process metrics are more efficient defect predictors than code metrics.

References

[1]

Basili, V. R., Briand, L. C., and Melo, W. L. 1996. A Validation of Object Oriented Design Metrics as Quality Indicators. IEEE Transactions on Software Engineering, 22(10): 267--271.

Digital Library

[2]

Bell, R. M., Ostrand, T. J., Weyuker, E. J. 2006. Looking For Bugs in All the Right Places. International Symp. on Software Testing and Analysis, (Portland, Maine, USA, July 17-20, 2006), ISSTA'06.

Digital Library

[3]

Duda, R. O., Hart, and P. E., Stork, D. G. 2002. Pattern Classification. 2nd edition, Wiley Interscience.

Digital Library

[4]

Fenton, N., Neil, M. 1999. A Critique of Software Defect Prediction Models. IEEE Transactions on Software Engineering, 25(5): 675 -- 689 (October 1999).

Digital Library

[5]

Gall, H., Jazayeri, M., Ratzinger, J. 2003. CVS release history data for detecting logical couplings. Proc. of the International Workshop on Principles of Software Evolution (Lisbon, Portugal), IEEE Computer Society Press, pp.13--23.

Digital Library

[6]

Graves, T. L., Karr, A. F., Marron, J. S., Siy, H. 2000. Predicting fault incidence using software change history. IEEE Transactions on Software Engineering, 26(7): 653 -- 661 (July 2000).

Digital Library

[7]

Hassan A. E., and Holt, R. C. 2005. The Top Ten List: Dynamic Fault Prediction. Proc. 21st IEEE International Conference on Software Maintenance (Budapest, Hungary, September 25 - 30, 2005).

Digital Library

[8]

Hollander, M. and Wolfe, D. A. 1973. Nonparametric Statistical Methods. Wiley.

[9]

Hall, M. M., and Holmes, G. 2003. Benchmarking Attribute Selection Techniques for Discrete Class Data Mining. IEEE Trans. Knowledge and Data Eng., 15(6): 1437--1447 (June 2003).

Digital Library

[10]

Khoshgoftaar, T. M., Bhattacharyya, B. B., Richardson, G. D. 1992. Predicting Software Errors, During Development, Using Nonlinear Regression Models: A Comparative Study. IEEE Transactions on Reliability, 41(3): 390--395 (September 1992).

[11]

Khoshgoftaar, T. M., Geleyn, E., Nguyen, L., and Bullard, L. 2002. Cost-Sensitive Boosting In Software Quality Modeling. Proc. of the 7th IEEE international Symposium on High Assurance Systems Engineering (October 23 - 25, 2002), Hase'02.

Digital Library

[12]

Knab, P., Pinzger, M., Bernstein, A. 2006. Predicting Defect Densities in Source Code Files with Decision Tree Learners. Proc. International Workshop on Mining Software Repositories (Shanghai, China, May 22-23, 2006), MSR'06.

Digital Library

[13]

Lanubile, F., Visaggio, G. 1997. Evaluating Predictive Quality Models Derived from Software Measures: Lessons Learned. Journal Systems Software, 38: 225--234.

Digital Library

[14]

Menzies, T., Greenwald, J., Frank, A. 2007. Data Mining Static Code Attributes to Learn Defect Predictors. IEEE Transactions on Software Engineering, 32(11): 1--12 (January 2007).

Digital Library

[15]

Moser, R., Pedrycz, W., Succi, G. 2007. Incremental effort prediction models in Agile Development using Radial Basis Functions. Proc. 19th International Conf. on Software Engineering & Knowledge Engineering (Boston, MA, USA, July 9-11, 2007), SEKE'07, pp. 519--522.

[16]

Nagappan, N., Ball, T. 2005. Use of Relative Code Churn Measures to Predict System Defect Density. Proc. of 27th International Conference on Software Engineering (St. Louis, MO, USA, May 15-21, 2005), ICSE '05.

Digital Library

[17]

Nagappan, N., Ball, T., Zeller, A. 2006. Mining Metrics to Predict Component Failures. Proc. of 28th International Conference on Software Engineering (Shanghai, China, May 20-28, 2006), ICSE'06.

Digital Library

[18]

Ohlsson, N., and Alberg, H. 1996. Predicting Error-Prone Software Modules in Telephone Switches. IEEE Transactions on Software Engineering, 22(12): 886--894.

Digital Library

[19]

Ohlsson, N., and Fenton, N. 2000. Quantitative Analysis of Faults and Failures in a Complex Software System. IEEE Transactions on Software Engineering, 26(8): 797--814.

Digital Library

[20]

Ostrand, T. J., Weyuker, E. J., Bell, R. M. 2005. Predicting the Location and Number of Faults in Large Software Systems. IEEE Transactions on Software Engineering (April 2005), 31(4): 340--355.

Digital Library

[21]

Quinlan, R. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers (San Mateo, CA, 1993).

Digital Library

[22]

Ratzinger, J., Pinzger, M., Gall, H. 2007. EQ-Mine: Predicting Short-Term Defects for Software Evolution. Proc. of FASE'07 (Braga, Portugal, 24 March - 1 April, 2007), pp. 12--26.

Digital Library

[23]

Schröter, A., Zimmermann, T., Zeller, A. 2006. Predicting Component Failures at Design Time. Proc. of ACM-IEEE 5th International Symposium on Empirical Software Engineering (Rio de Janeiro, Brazil, 2006), ISESE'06.

Digital Library

[24]

Schröter, A., Zimmermann, T., Premraj, R., and Zeller, A. 2006. If Your Bug Database Could Talk .. Proc. of ACM-IEEE 5th International Symposium on Empirical Software Engineering, Volume II: Short Papers and Posters (Rio de Janeiro, Brazil, 2006), ISESE'06.

[25]

Shull, F., Boehm, V. B., Brown, A., Costa, P., Lindvall, M., Port, D., Rus, I., Tesoriero, R., and Zelkowitz, M. 2002. What We Have Learned About Fighting Defects. Proc. of 8th Int'l Software Metrics Symp., pp. 249--258.

Digital Library

[26]

Subramanyam, R., and Krishnan, M. S. Empirical Analysis of CK Metrics for Object-Oriented Design Complexity: Implications for Software Defects. 2003. IEEE Transactions on Software Engineering (April 2003), 29(4): 297--310.

Digital Library

[27]

Weyuker, E. J., Ostrand, T. J., Bell, R. M. 2007. Using Developer Information as a Factor for Fault Prediction. Proc. 3rd International Workshop on Predictor Models in Software Engineering (Minneapolis, MN, USA, May 20, 2007), PROMISE'07.

Digital Library

[28]

Witten, I. H., and Frank, E. 2005. Data Mining: Practical machine learning tools and techniques. 2nd Edition, Morgan Kaufmann (San Francisco, 2005).

Digital Library

[29]

Zhou, Y., and Leung, H., Empirical Analysis of Object-Oriented Design Metrics for Predicting High and Low Severity Faults. 2006. IEEE Transactions on Software Engineering (October 2006), 32(10): 771--789.

Digital Library

[30]

Zimmermann, T., and Weißgerber, P. 2004. Preprocessing CVS Data for Fine-Grained Analysis. Proc. of International Workshop on Mining Software Repositories (Edinburgh, Scotland, UK, May 25, 2004), MSR'04.

[31]

Zimmermann, T., Premraj, R., Zeller, A. 2007. Predicting Defects for Eclipse. Proc. 3rd International Workshop on Predictor Models in Software Engineering (Minneapolis, MN, USA, May, 2007), PROMISE'07.

Digital Library

Cited By

Ahmad SChowdhury SHolmes R(2025)Impact of methodological choices on the analysis of code metrics and maintenanceJournal of Systems and Software10.1016/j.jss.2024.112263220(112263)Online publication date: Feb-2025
https://doi.org/10.1016/j.jss.2024.112263
Jiang MJiang JWu TMa ZLuo XZhou Y(2024)Understanding Vulnerability Inducing Commits of the Linux KernelACM Transactions on Software Engineering and Methodology10.1145/367245233:7(1-28)Online publication date: 14-Jun-2024
https://dl.acm.org/doi/10.1145/3672452
Abdu AZhai ZAbdo HAlgabri R(2024)Software Defect Prediction Based on Deep Representation Learning of Source Code From Contextual Syntax and Semantic GraphIEEE Transactions on Reliability10.1109/TR.2024.335496573:2(820-834)Online publication date: Jun-2024
https://doi.org/10.1109/TR.2024.3354965
Show More Cited By

Index Terms

A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction
1. Social and professional topics
  1. Professional topics
    1. Management of computing and information systems
      1. System management
        Quality assurance
2. Software and its engineering
  1. Software creation and management
    1. Software development techniques
      1. Reusability
        Software product lines
    2. Software verification and validation
      1. Process validation

Recommendations

An empirical study on software defect prediction with a simplified metric set

ContextSoftware defect prediction plays a crucial role in estimating the most defect-prone components of software, and a large number of studies have pursued improving prediction accuracy within a project or across projects. However, the rules for making ...
Modeling Structural Model for Defect Categories Based On Software Metrics for Categorical Defect Prediction
ICCCT '15: Proceedings of the Sixth International Conference on Computer and Communication Technology 2015

Software Defect prediction is the pre-eminent area of software engineering which has witnessed huge importance over last decades. The identification of defects in the early stages of software development not only improve the quality of the software ...
Heterogeneous defect prediction
ESEC/FSE 2015: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering

Software defect prediction is one of the most active research areas in software engineering. We can build a prediction model with defect data collected from a software project and predict defects in the same project, i.e. within-project defect ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICSE '08: Proceedings of the 30th international conference on Software engineering

May 2008

558 pages

ISBN:9781605580791

DOI:10.1145/1368088

General Chair:
Wilhelm Schäfer
University of Paderborn
,
Program Chairs:
Matthew B. Dwyer
University of Nebraska
,
Volker Gruhn
University of Leipzig

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 May 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICSE '08

Sponsor:

ICSE '08: International Conference on Software Engineering

May 10 - 18, 2008

Leipzig, Germany

Acceptance Rates

ICSE '08 Paper Acceptance Rate 56 of 370 submissions, 15%;

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

509
Total Citations
View Citations
3,208
Total Downloads

Downloads (Last 12 months)125
Downloads (Last 6 weeks)7

Reflects downloads up to 07 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ahmad SChowdhury SHolmes R(2025)Impact of methodological choices on the analysis of code metrics and maintenanceJournal of Systems and Software10.1016/j.jss.2024.112263220(112263)Online publication date: Feb-2025
https://doi.org/10.1016/j.jss.2024.112263
Jiang MJiang JWu TMa ZLuo XZhou Y(2024)Understanding Vulnerability Inducing Commits of the Linux KernelACM Transactions on Software Engineering and Methodology10.1145/367245233:7(1-28)Online publication date: 14-Jun-2024
https://dl.acm.org/doi/10.1145/3672452
Abdu AZhai ZAbdo HAlgabri R(2024)Software Defect Prediction Based on Deep Representation Learning of Source Code From Contextual Syntax and Semantic GraphIEEE Transactions on Reliability10.1109/TR.2024.335496573:2(820-834)Online publication date: Jun-2024
https://doi.org/10.1109/TR.2024.3354965
Su WHuang C(2024)Application of Weighted Combinations of Activation Functions to Defect Prediction in Software DevelopmentIEEE Transactions on Reliability10.1109/TR.2023.328485773:1(680-694)Online publication date: Mar-2024
https://doi.org/10.1109/TR.2023.3284857
Morita IKashiwa YKondo MSohn JMcIntosh SKamei YUbayashi N(2024)TraceJIT: Evaluating the Impact of Behavioral Code Change on Just-In-Time Defect Prediction2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER60148.2024.00065(580-591)Online publication date: 12-Mar-2024
https://doi.org/10.1109/SANER60148.2024.00065
Chen ZHuang CLin JFang CChu W(2024)Employing CNN with Spatial Pyramid Pooling for Predicting Software Defects through Image Analysis2024 IEEE 24th International Conference on Software Quality, Reliability and Security (QRS)10.1109/QRS62785.2024.00039(318-327)Online publication date: 1-Jul-2024
https://doi.org/10.1109/QRS62785.2024.00039
Green CJaspan CHodges MLin J(2024)Developer Productivity for Humans, Part 7: Software QualityIEEE Software10.1109/MS.2023.332483041:1(25-30)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/MS.2023.3324830
Haldar SCapretz L(2024)Feature Importance in the Context of Traditional and Just-In-Time Software Defect Prediction Models2024 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE)10.1109/CCECE59415.2024.10667167(818-822)Online publication date: 6-Aug-2024
https://doi.org/10.1109/CCECE59415.2024.10667167
Jász J(2024)The Effectiveness of Hidden Dependence Metrics in Bug PredictionIEEE Access10.1109/ACCESS.2024.340692912(77214-77225)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3406929
Kaliraj SKishoore ASivakumar V(2024)Software Fault Prediction Using Cross-Project Analysis: A Study on Class Imbalance and Model GeneralizationIEEE Access10.1109/ACCESS.2024.339749412(64212-64227)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3397494
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents