skip to main content
10.1145/3238147.3240469acmconferencesArticle/Chapter ViewAbstractPublication PagesaseConference Proceedingsconference-collections
research-article

node2defect: using network embedding to improve software defect prediction

Published: 03 September 2018 Publication History

Abstract

Network measures have been proved to be useful in predicting software defects. Leveraging the dependency relationships between software modules, network measures can capture various structural features of software systems. However, existing studies have relied on user-defined network measures (e.g., degree statistics or centrality metrics), which are inflexible and require high computation cost, to describe the structural features. In this paper, we propose a new method called node2defect which uses a newly proposed network embedding technique, node2vec, to automatically learn to encode dependency network structure into low-dimensional vector spaces to improve software defect prediction. Specifically, we firstly construct a program's Class Dependency Network. Then node2vec is used to automatically learn structural features of the network. After that, we combine the learned features with traditional software engineering features, for accurate defect prediction. We evaluate our method on 15 open source programs. The experimental results show that in average, node2defect improves the state-of-the-art approach by 9.15% in terms of F-measure.

References

[1]
Ethem Alpaydin. 2004. Introduction to Machine Learning. MIT Press,. 28 pages.
[2]
Gareth Baxter, Marcus Frean, James Noble, Mark Rickerby, Hayden Smith, Matt Visser, Hayden Melton, and Ewan Tempero. 2006. Understanding the shape of Java software. Acm Sigplan Notices 41, 10 (2006), 397–412.
[3]
Ulrik Brandes. 2001. A faster algorithm for betweenness centrality. Journal of mathematical sociology 25, 2 (2001), 163–177.
[4]
Leo Breiman. 2001. Random Forests. Machine Learning 45, 1 (2001), 5–32.
[5]
Hongyun Cai, Vincent W Zheng, and Kevin Chang. 2018. A comprehensive survey of graph embedding: problems, techniques and applications. IEEE Transactions on Knowledge and Data Engineering (2018).
[6]
Lin Chen, Wanwangying Ma, Yuming Zhou, Lei Xu, Ziyuan Wang, Zhifei Chen, and Baowen Xu. 2016. Empirical analysis of network measures for predicting high severity software faults. Science China Information Sciences 59, 12 (2016), 122901.
[7]
Shyam R Chidamber and Chris F Kemerer. 1994. A metrics suite for object oriented design. IEEE Transactions on software engineering 20, 6 (1994), 476–493.
[8]
Giulio Concas, Michele Marchesi, Cristina Monni, Matteo Orrù, and Roberto Tonelli. 2017. Software Quality and Community Structure in Java Software Networks. International Journal of Software Engineering and Knowledge Engineering 27, 07 (2017), 1063–1096.
[9]
Sergio Di Martino, Filomena Ferrucci, Carmine Gravino, and Federica Sarro. 2011. A genetic algorithm to configure support vector machines for predicting fault-prone components. In International Conference on Product Focused Software Process Improvement. Springer, 247–261.
[10]
Palash Goyal and Emilio Ferrara. 2018. Graph embedding techniques, applications, and performance: A survey. Knowledge-Based Systems 151 (2018), 78–94.
[11]
Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable Feature Learning for Networks. In Acm Sigkdd International Conference on Knowledge Discovery & Data Mining. 855–864.
[12]
William L Hamilton, Rex Ying, and Jure Leskovec. 2017. Representation Learning on Graphs: Methods and Applications. arXiv preprint arXiv:1709.05584 (2017).
[13]
Marian Jureczko and Lech Madeyski. 2010. Towards identifying software project clusters with regard to defect prediction. In Proceedings of the 6th International Conference on Predictive Models in Software Engineering. ACM, 9.
[14]
Yihao Li. 2017. Applying Social Network Analysis to Software Fault-Proneness Prediction. Ph.D. Dissertation. University of Texas at Dallas.
[15]
Panagiotis Louridas, Diomidis Spinellis, and Vasileios Vlachos. 2008. Power laws in software. ACM Transactions on Software Engineering and Methodology (TOSEM) 18, 1 (2008), 2.
[16]
Wanwangying Ma, Lin Chen, Yibiao Yang, Yuming Zhou, and Baowen Xu. 2016. Empirical analysis of network measures for effort-aware fault-proneness prediction. Information and Software Technology 69 (2016), 50–70.
[17]
Tim Menzies, Rahul Krishna, and David Pryor. 2015. The Promise Repository of Empirical Software Engineering Data. http://openscience.us/repo. North Carolina State University, Department of Computer Science.
[18]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).
[19]
Thanh HD Nguyen, Bram Adams, and Ahmed E Hassan. 2010. Studying the impact of dependency network measures on software quality. In Software Maintenance (ICSM), 2010 IEEE International Conference on. IEEE, 1–10.
[20]
Dario Di Nucci, Fabio Palomba, Rocco Oliveto, and Andrea De Lucia. 2017. Dynamic Selection of Classifiers in Bug Prediction: An Adaptive Method. IEEE Transactions on Emerging Topics in Computational Intelligence 1, 3 (2017), 202– 212.
[21]
Rahul Premraj and Kim Herzig. 2011. Network Versus Code Metrics to Predict Defects: A Replication Study. In International Symposium on Empirical Software Engineering and Measurement. 215–224.
[22]
Thomas Shippey, Tracy Hall, Steve Counsell, and David Bowes. 2016. So You Need More Method Level Datasets for Your Software Defect Prediction?: Voilà!. In Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. ACM, 12.
[23]
Lovro Šubelj and Marko Bajec. 2011. Community structure of complex software systems: Analysis and applications. Physica A: Statistical Mechanics and its Applications 390, 16 (2011), 2968–2975.
[24]
Ayşe Tosun, Burak Turhan, and Ayşe Bener. 2009. Validation of network measures as indicators of defective modules in software systems. In Proceedings of the 5th international conference on predictor models in software engineering. ACM, 5.
[25]
Ian H Witten, Eibe Frank, Mark A Hall, and Christopher J Pal. 2016. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASE '18: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering
September 2018
955 pages
ISBN:9781450359375
DOI:10.1145/3238147
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 September 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Software defect
  2. defect prediction
  3. network embedding
  4. software metrics

Qualifiers

  • Research-article

Funding Sources

  • National Natural Science Foundation of China

Conference

ASE '18
Sponsor:

Acceptance Rates

Overall Acceptance Rate 82 of 337 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)32
  • Downloads (Last 6 weeks)1
Reflects downloads up to 26 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media