research-article

On the Value of Oversampling for Deep Learning in Software Defect Prediction

Authors:

Rahul Yedida,

Tim MenziesAuthors Info & Claims

IEEE Transactions on Software Engineering, Volume 48, Issue 8

Pages 3103 - 3116

https://doi.org/10.1109/TSE.2021.3079841

Published: 01 August 2022 Publication History

Abstract

One truism of deep learning is that the automatic feature engineering (seen in the first layers of those networks) excuses data scientists from performing tedious manual feature engineering prior to running DL. For the specific case of deep learning for defect prediction, we show that that truism is false. Specifically, when we pre-process data with a novel oversampling technique called fuzzy sampling, as part of a larger pipeline called GHOST (Goal-oriented Hyper-parameter Optimization for Scalable Training), then we can do significantly better than the prior DL state of the art in 14/20 defect data sets. Our approach yields state-of-the-art results significantly faster deep learners. These results present a cogent case for the use of oversampling prior to applying deep learning on software defect prediction datasets.

Cited By

View all

Lustosa AMenzies T(2024)Learning from Very Little Data: On the Value of Landscape Analysis for Predicting Software Project HealthACM Transactions on Software Engineering and Methodology10.1145/363025233:3(1-22)Online publication date: 14-Mar-2024
https://dl.acm.org/doi/10.1145/3630252
Qiu FLiu ZHu XXia XChen GWang X(2024)Vulnerability Detection via Multiple-Graph-Based Code RepresentationIEEE Transactions on Software Engineering10.1109/TSE.2024.342781550:8(2178-2199)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1109/TSE.2024.3427815
Tong HZhang DLiu JXing WLu LLu WWu Y(2024)MASTER: Multi-Source Transfer Weighted Ensemble Learning for Multiple Sources Cross-Project Defect PredictionIEEE Transactions on Software Engineering10.1109/TSE.2024.338123550:5(1281-1305)Online publication date: 25-Mar-2024
https://dl.acm.org/doi/10.1109/TSE.2024.3381235
Show More Cited By

Index Terms

On the Value of Oversampling for Deep Learning in Software Defect Prediction
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
2. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging
  2. Software notations and tools
    1. Software configuration management and version control systems

Index terms have been assigned to the content through auto-classification.

Recommendations

Software defect prediction using a bidirectional LSTM network combined with oversampling techniques
Abstract
Software defects are a critical issue in software development that can lead to system failures and cause significant financial losses. Predicting software defects is a vital aspect of ensuring software quality. This can significantly impact both ...
Efficiency of oversampling methods for enhancing software defect prediction by using imbalanced data
Abstract
Software defect prediction (SDP) is essential to analyze and identify defects present in a software model in early stages of software development. The identification of these defects and their early removal provides cost-efficient software. ...
Investigation on the stability of SMOTE-based oversampling techniques in software defect prediction
Abstract Context:
In practice, software datasets tend to have more non-defective instances than defective ones, which is referred to as the class imbalance problem in software defect prediction (SDP). Synthetic Minority Oversampling ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Software Engineering

IEEE Transactions on Software Engineering Volume 48, Issue 8

Aug. 2022

511 pages

ISSN:0098-5589

Issue’s Table of Contents

0098-5589 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Publisher

IEEE Press

Publication History

Published: 01 August 2022

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 31 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Lustosa AMenzies T(2024)Learning from Very Little Data: On the Value of Landscape Analysis for Predicting Software Project HealthACM Transactions on Software Engineering and Methodology10.1145/363025233:3(1-22)Online publication date: 14-Mar-2024
https://dl.acm.org/doi/10.1145/3630252
Qiu FLiu ZHu XXia XChen GWang X(2024)Vulnerability Detection via Multiple-Graph-Based Code RepresentationIEEE Transactions on Software Engineering10.1109/TSE.2024.342781550:8(2178-2199)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1109/TSE.2024.3427815
Tong HZhang DLiu JXing WLu LLu WWu Y(2024)MASTER: Multi-Source Transfer Weighted Ensemble Learning for Multiple Sources Cross-Project Defect PredictionIEEE Transactions on Software Engineering10.1109/TSE.2024.338123550:5(1281-1305)Online publication date: 25-Mar-2024
https://dl.acm.org/doi/10.1109/TSE.2024.3381235
Nikravesh NKeyvanpour M(2024)Parameter tuning for software fault prediction with different variants of differential evolutionExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121251237:PCOnline publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1016/j.eswa.2023.121251
Martinez MFalleri JMonperrus M(2023)Hyperparameter Optimization for AST DifferencingIEEE Transactions on Software Engineering10.1109/TSE.2023.331593549:10(4814-4828)Online publication date: 1-Oct-2023
https://dl.acm.org/doi/10.1109/TSE.2023.3315935
Yedida RKang HTu HYang XLo DMenzies T(2023)How to Find Actionable Static Analysis Warnings: A Case Study With FindBugsIEEE Transactions on Software Engineering10.1109/TSE.2023.323420649:4(2856-2872)Online publication date: 1-Apr-2023
https://dl.acm.org/doi/10.1109/TSE.2023.3234206
Yang XWang SLi YWang SGrundy JPollock LPenta M(2023)Does Data Sampling Improve Deep Learning-Based Vulnerability Detection? Yeas! and Nays!Proceedings of the 45th International Conference on Software Engineering10.1109/ICSE48619.2023.00192(2287-2298)Online publication date: 14-May-2023
https://dl.acm.org/doi/10.1109/ICSE48619.2023.00192
Tong HLu WXing WWang S(2023)ARRAYJournal of Systems and Software10.1016/j.jss.2023.111721202:COnline publication date: 1-Aug-2023
https://dl.acm.org/doi/10.1016/j.jss.2023.111721
Yedida RKrishna RKalia AMenzies TXiao JVukovic MGrundy J(2021)Lessons learned from hyper-parameter tuning for microservice candidate identificationProceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE51524.2021.9678704(1141-1145)Online publication date: 15-Nov-2021
https://dl.acm.org/doi/10.1109/ASE51524.2021.9678704

Abstract

Cited By

Index Terms

Recommendations

Software defect prediction using a bidirectional LSTM network combined with oversampling techniques

Efficiency of oversampling methods for enhancing software defect prediction by using imbalanced data

Investigation on the stability of SMOTE-based oversampling techniques in software defect prediction

Comments

Information

Published In

Publisher

Publication History

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations