skip to main content
10.1145/1368088.1368114acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction

Published: 10 May 2008 Publication History

Abstract

In this paper we present a comparative analysis of the predictive power of two different sets of metrics for defect prediction. We choose one set of product related and one set of process related software metrics and use them for classifying Java files of the Eclipse project as defective respective defect-free. Classification models are built using three common machine learners: logistic regression, Naïve Bayes, and decision trees. To allow different costs for prediction errors we perform cost-sensitive classification, which proves to be very successful: >75% percentage of correctly classified files, a recall of >80%, and a false positive rate <30%. Results indicate that for the Eclipse data, process metrics are more efficient defect predictors than code metrics.

References

[1]
Basili, V. R., Briand, L. C., and Melo, W. L. 1996. A Validation of Object Oriented Design Metrics as Quality Indicators. IEEE Transactions on Software Engineering, 22(10): 267--271.
[2]
Bell, R. M., Ostrand, T. J., Weyuker, E. J. 2006. Looking For Bugs in All the Right Places. International Symp. on Software Testing and Analysis, (Portland, Maine, USA, July 17-20, 2006), ISSTA'06.
[3]
Duda, R. O., Hart, and P. E., Stork, D. G. 2002. Pattern Classification. 2nd edition, Wiley Interscience.
[4]
Fenton, N., Neil, M. 1999. A Critique of Software Defect Prediction Models. IEEE Transactions on Software Engineering, 25(5): 675 -- 689 (October 1999).
[5]
Gall, H., Jazayeri, M., Ratzinger, J. 2003. CVS release history data for detecting logical couplings. Proc. of the International Workshop on Principles of Software Evolution (Lisbon, Portugal), IEEE Computer Society Press, pp.13--23.
[6]
Graves, T. L., Karr, A. F., Marron, J. S., Siy, H. 2000. Predicting fault incidence using software change history. IEEE Transactions on Software Engineering, 26(7): 653 -- 661 (July 2000).
[7]
Hassan A. E., and Holt, R. C. 2005. The Top Ten List: Dynamic Fault Prediction. Proc. 21st IEEE International Conference on Software Maintenance (Budapest, Hungary, September 25 - 30, 2005).
[8]
Hollander, M. and Wolfe, D. A. 1973. Nonparametric Statistical Methods. Wiley.
[9]
Hall, M. M., and Holmes, G. 2003. Benchmarking Attribute Selection Techniques for Discrete Class Data Mining. IEEE Trans. Knowledge and Data Eng., 15(6): 1437--1447 (June 2003).
[10]
Khoshgoftaar, T. M., Bhattacharyya, B. B., Richardson, G. D. 1992. Predicting Software Errors, During Development, Using Nonlinear Regression Models: A Comparative Study. IEEE Transactions on Reliability, 41(3): 390--395 (September 1992).
[11]
Khoshgoftaar, T. M., Geleyn, E., Nguyen, L., and Bullard, L. 2002. Cost-Sensitive Boosting In Software Quality Modeling. Proc. of the 7th IEEE international Symposium on High Assurance Systems Engineering (October 23 - 25, 2002), Hase'02.
[12]
Knab, P., Pinzger, M., Bernstein, A. 2006. Predicting Defect Densities in Source Code Files with Decision Tree Learners. Proc. International Workshop on Mining Software Repositories (Shanghai, China, May 22-23, 2006), MSR'06.
[13]
Lanubile, F., Visaggio, G. 1997. Evaluating Predictive Quality Models Derived from Software Measures: Lessons Learned. Journal Systems Software, 38: 225--234.
[14]
Menzies, T., Greenwald, J., Frank, A. 2007. Data Mining Static Code Attributes to Learn Defect Predictors. IEEE Transactions on Software Engineering, 32(11): 1--12 (January 2007).
[15]
Moser, R., Pedrycz, W., Succi, G. 2007. Incremental effort prediction models in Agile Development using Radial Basis Functions. Proc. 19th International Conf. on Software Engineering &amp; Knowledge Engineering (Boston, MA, USA, July 9-11, 2007), SEKE'07, pp. 519--522.
[16]
Nagappan, N., Ball, T. 2005. Use of Relative Code Churn Measures to Predict System Defect Density. Proc. of 27th International Conference on Software Engineering (St. Louis, MO, USA, May 15-21, 2005), ICSE '05.
[17]
Nagappan, N., Ball, T., Zeller, A. 2006. Mining Metrics to Predict Component Failures. Proc. of 28th International Conference on Software Engineering (Shanghai, China, May 20-28, 2006), ICSE'06.
[18]
Ohlsson, N., and Alberg, H. 1996. Predicting Error-Prone Software Modules in Telephone Switches. IEEE Transactions on Software Engineering, 22(12): 886--894.
[19]
Ohlsson, N., and Fenton, N. 2000. Quantitative Analysis of Faults and Failures in a Complex Software System. IEEE Transactions on Software Engineering, 26(8): 797--814.
[20]
Ostrand, T. J., Weyuker, E. J., Bell, R. M. 2005. Predicting the Location and Number of Faults in Large Software Systems. IEEE Transactions on Software Engineering (April 2005), 31(4): 340--355.
[21]
Quinlan, R. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers (San Mateo, CA, 1993).
[22]
Ratzinger, J., Pinzger, M., Gall, H. 2007. EQ-Mine: Predicting Short-Term Defects for Software Evolution. Proc. of FASE'07 (Braga, Portugal, 24 March - 1 April, 2007), pp. 12--26.
[23]
Schröter, A., Zimmermann, T., Zeller, A. 2006. Predicting Component Failures at Design Time. Proc. of ACM-IEEE 5th International Symposium on Empirical Software Engineering (Rio de Janeiro, Brazil, 2006), ISESE'06.
[24]
Schröter, A., Zimmermann, T., Premraj, R., and Zeller, A. 2006. If Your Bug Database Could Talk .. Proc. of ACM-IEEE 5th International Symposium on Empirical Software Engineering, Volume II: Short Papers and Posters (Rio de Janeiro, Brazil, 2006), ISESE'06.
[25]
Shull, F., Boehm, V. B., Brown, A., Costa, P., Lindvall, M., Port, D., Rus, I., Tesoriero, R., and Zelkowitz, M. 2002. What We Have Learned About Fighting Defects. Proc. of 8th Int'l Software Metrics Symp., pp. 249--258.
[26]
Subramanyam, R., and Krishnan, M. S. Empirical Analysis of CK Metrics for Object-Oriented Design Complexity: Implications for Software Defects. 2003. IEEE Transactions on Software Engineering (April 2003), 29(4): 297--310.
[27]
Weyuker, E. J., Ostrand, T. J., Bell, R. M. 2007. Using Developer Information as a Factor for Fault Prediction. Proc. 3rd International Workshop on Predictor Models in Software Engineering (Minneapolis, MN, USA, May 20, 2007), PROMISE'07.
[28]
Witten, I. H., and Frank, E. 2005. Data Mining: Practical machine learning tools and techniques. 2nd Edition, Morgan Kaufmann (San Francisco, 2005).
[29]
Zhou, Y., and Leung, H., Empirical Analysis of Object-Oriented Design Metrics for Predicting High and Low Severity Faults. 2006. IEEE Transactions on Software Engineering (October 2006), 32(10): 771--789.
[30]
Zimmermann, T., and Weißgerber, P. 2004. Preprocessing CVS Data for Fine-Grained Analysis. Proc. of International Workshop on Mining Software Repositories (Edinburgh, Scotland, UK, May 25, 2004), MSR'04.
[31]
Zimmermann, T., Premraj, R., Zeller, A. 2007. Predicting Defects for Eclipse. Proc. 3rd International Workshop on Predictor Models in Software Engineering (Minneapolis, MN, USA, May, 2007), PROMISE'07.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSE '08: Proceedings of the 30th international conference on Software engineering
May 2008
558 pages
ISBN:9781605580791
DOI:10.1145/1368088
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 May 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cost-sensitive classification
  2. defect prediction
  3. software metrics

Qualifiers

  • Research-article

Conference

ICSE '08
Sponsor:

Acceptance Rates

ICSE '08 Paper Acceptance Rate 56 of 370 submissions, 15%;
Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)125
  • Downloads (Last 6 weeks)7
Reflects downloads up to 07 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media