skip to main content
10.1145/1868328.1868341acmotherconferencesArticle/Chapter ViewAbstractPublication PagespromiseConference Proceedingsconference-collections
research-article

Usage of multiple prediction models based on defect categories

Published: 12 September 2010 Publication History

Abstract

Background: Most of the defect prediction models are built for two purposes: 1) to detect defective and defect-free modules (binary classification), and 2) to estimate the number of defects (regression analysis). It would also be useful to give more information on the nature of defects so that software managers can plan their testing resources more effectively.
Aims: In this paper, we propose a defect prediction model that is based on defect categories.
Method: We mined the version history of a large-scale enterprise software product to extract churn and static code metrics. and grouped them into three defect categories according to different testing phases. We built a learning-based model for each defect category. We compared the performance of our proposed model with a general one. We conducted statistical techniques to evaluate the relationship between defect categories and software metrics. We also tested our hypothesis by replicating the empirical work on Eclipse data.
Results: Our results show that building models that are sensitive to defect categories is cost-effective in the sense that it reveals more information and increases detection rates (pd) by 10% keeping the false alarms (pf) constant.
Conclusions: We conclude that slicing defect data and categorizing it for use in a defect prediction model would enable practitioners to take immediate actions. Our results on Eclipse replication showed that haphazard categorization of defects is not worth the effort.

References

[1]
}}Eclipse project website. http://www.eclipse.org.
[2]
}}B. Caglayan, A. Bener, and S. Koch. Merits of using repository metrics in defect prediction for open source projects. 2009 ICSE Workshop on Emerging Trends in Free/Libre/Open Source Software Research and Development, pages 31--36, May 2009.
[3]
}}V. Dallmeier and T. Zimmermann. Extraction of bug localization benchmarks from history. In Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering, November 2007.
[4]
}}G. Di Fatta, S. Leue, and E. Stegantova. Discriminative pattern mining in software fault detection. In SOQUA '06: Proceedings of the 3rd international workshop on Software quality assurance, pages 62--69, New York, NY, USA, 2006. ACM.
[5]
}}M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: an update. SIGKDD Explor. Newsl., 11(1):10--18, 2009.
[6]
}}M. P. Jacek Ratzinger and H. Gall. EQ-Mine: Predicting Short-Term Defects for Software Evolution. In Proceedings of the Fundamental Approaches to Software Engineering at the European Joint Conference on Theory and Practice of Software, pages 12--26. Springer Berlin, 2007.
[7]
}}E. Kocaguneli, A. Tosun, A. B. Bener, B. Turhan, and B. Caglayan. Prest: An intelligent software metrics extraction, analysis and defect prediction tool. In SEKE, pages 637--642, 2009.
[8]
}}A. G. Koru and K. E. Emam. The theory of relative dependency: Higher coupling concentration in smaller modules. IEEE Software, 27:81--89, 2010.
[9]
}}A. G. Koru, D. Zhang, K. El Emam, and H. Liu. An investigation into the functional form of the size-defect relationship for software modules. IEEE Trans. Softw. Eng., 35(2):293--304, 2009.
[10]
}}M. Leszak, D. E. Perry, and D. Stoll. Classification and evaluation of defects in a project retrospective. J. Syst. Softw., 61(3):173--187, 2002.
[11]
}}M. A. Maloof. Learning when data sets are imbalanced and when costs are unequal and unknown. In ICML-2003 Workshop on Learning from Imbalanced Data Sets II, 2003.
[12]
}}T. Menzies, J. Greenwald, and A. Frank. Data mining static code attributes to learn defect predictors. Software Engineering, IEEE Transactions on, 33(1):2--13--, 2007.
[13]
}}G. J. Myers, T. Badgett, T. Thomas, and C. Sandler. The Art of Software Testing. 2nd ed. John Wiley & Sons, 2004.
[14]
}}N. Nagappan and T. Ball. Using software dependencies and churn metrics to predict field failures: An empirical case study. In ESEM '07: Proceedings of the First International Symposium on Empirical Software Engineering and Measurement, pages 364--373, Washington, DC, USA, 2007. IEEE Computer Society.
[15]
}}N. Nagappan, L. Williams, M. Vouk, and J. Osborne. Using in-process testing metrics to estimate post-release field quality. In ISSRE '07: Proceedings of the The 18th IEEE International Symposium on Software Reliability, pages 209--214, Washington, DC, USA, 2007. IEEE Computer Society.
[16]
}}T. J. Ostrand, E. J. Weyuker, and R. M. Bell. Predicting the location and number of faults in large software systems. IEEE Transactions on Software Engineering, 31(4):340--355--, 2005.
[17]
}}A. Schroter, T. Zimmermann, R. Premraj, and A. Zeller. If your bug database could talk. In Proceedings of the 5th International Symposium on Empirical Software Engineering, Volume II: Short Papers and Posters, pages 18--20, 2006.
[18]
}}C. Stringfellow, A. Andrews, C. Wohlin, and H. Petersson. Estimating the number of components with defects post-release that showed no defects in testing. Software Testing, Verification and Reliability, 12(2):93--122, 2002.
[19]
}}A. Tosun, B. Turhan, and A. Bener. Practical considerations in deploying ai for defect prediction: a case study within the turkish telecommunication industry. In PROMISE '09: Proceedings of the 5th International Conference on Predictor Models in Software Engineering, pages 1--9, New York, NY, USA, 2009. ACM.
[20]
}}A. Tosun, B. Turhan, and A. Bener. Validation of network measures as indicators of defective modules in software systems. In PROMISE '09: Proceedings of the 5th International Conference on Predictor Models in Software Engineering, pages 1--9, New York, NY, USA, 2009. ACM.
[21]
}}B. Turhan and A. Bener. A multivariate analysis of static code attributes for defect prediction. In Quality Software, 2007. QSIC '07. Seventh International Conference on, pages 231--237, 2007.
[22]
}}B. Turhan, T. Menzies, A. Bener, and J. Distefano. On the relative value of cross-company and within-company data for defect prediction. Empirical Software Engineering Journal, 2009. in print. DOI 10.1007/s10664-008-9103-7.
[23]
}}T. Zimmermann, R. Premraj, and A. Zeller. Predicting defects for eclipse. In PROMISE '07: Proceedings of the Third International Workshop on Predictor Models in Software Engineering, page 9, Washington, DC, USA, 2007. IEEE Computer Society.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
PROMISE '10: Proceedings of the 6th International Conference on Predictive Models in Software Engineering
September 2010
195 pages
ISBN:9781450304047
DOI:10.1145/1868328
  • General Chair:
  • Tim Menzies,
  • Program Chair:
  • Gunes Koru
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 September 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. defect categories
  2. defect prediction
  3. software quality

Qualifiers

  • Research-article

Funding Sources

Conference

Promise '10

Acceptance Rates

PROMISE '10 Paper Acceptance Rate 19 of 53 submissions, 36%;
Overall Acceptance Rate 98 of 213 submissions, 46%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)2
Reflects downloads up to 30 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media