4.1. Data Description
In this study, the steel plate fault dataset was utilized to assess the effectiveness of the proposed method. This dataset is provided in the Data Availability Statement. Detailed information about the dataset used in this study is listed in
Table 2. This dataset is a multivariate dataset designed to develop and test machine learning models for the automatic identification of fault patterns on steel plate surfaces. Since 2010, this dataset has been widely used in numerous studies to evaluate and compare various fault detection methods. The dataset contains 1941 records, each labeled to indicate different types of faults that may occur on the surface of steel plates. The dataset includes a variety of attribute types, encompassing both integer and real values.
The dataset comprises 27 distinct features, as detailed in
Table 3, including statistical attributes such as the minimum, mean, and maximum values for each feature. Each feature provides critical information about the condition of the steel plate surface, aiding in making accurate judgments during fault classification. These features encompass a wide range of mechanical properties, including strength, toughness, elongation, shape, dimensional accuracy, and appearance. Specifically, strength features may include yield strength and tensile strength, describing the steel plate’s performance under stress. Toughness features reflect the material’s ability to absorb energy before fracturing. Elongation features describe the extent to which the material stretches during tensile testing, which is crucial for understanding the material’s ductility and plastic deformation behavior.
Moreover, shape and dimensional accuracy features provide important information about the geometric shape and dimensional consistency of the steel plates. These features may include the thickness, width, and length of the plates, as well as the tolerances for these dimensions during the manufacturing process. Appearance features cover various indicators of surface quality, such as surface roughness, glossiness, and the number of defects.
Statistical attributes such as the minimum, mean, and maximum values are also crucial for understanding each feature. The minimum value provides the lower bound of the feature, indicating the worst-case performance of the steel plate; the maximum value shows the best-case performance; and the mean value offers an overall performance overview.
The rich features and detailed statistical attributes of this dataset provide a solid foundation for developing high-performance fault detection models. Through in-depth analysis of these data, models can effectively capture the complex patterns of faults on steel plate surfaces, enhancing the accuracy and robustness of identification and classification.
4.2. Results of Data Imbalance Processing
The steel production dataset comprises a total of 1941 records, including 158 records of Pastry, 190 records of Z-Scratch, 391 records of K-Scratch, 72 records of Stains, 55 records of Dirtiness, 402 records of Bumps, and 637 records of Other Faults. Upon analyzing the dataset, it is evident that there is a class imbalance problem. Class imbalance means that the number of samples in some categories far exceeds that in others, which can negatively impact the model’s training effectiveness, leading to poor learning performance and reduced overall accuracy. In such cases, the model tends to predict the majority classes, neglecting the minority class samples, thereby failing to accurately identify and classify the fault patterns in these minority classes.
To address the class imbalance issue in the dataset, this study employs a combined sampling method using ENN and the SMOTE to reconstruct the dataset. The processed fault types and the number of associated instances are shown in
Table 4. After processing, there were 500 records of Pastry, 546 records of Z-Scratch, 554 records of K-Scratch, 572 records of Stains, 571 records of Dirtiness, 309 records of Bumps, and 190 records of Other Faults. This results in a more balanced dataset compared to the original.
Figure 5 illustrates the data fluctuation before and after processing, showing that the class distribution curve is much smoother after applying the ENN-SMOTE treatment than it was without it.
In
Figure 5a, the ENN-SMOTE treatment results in a much smoother class distribution, with the curve indicating that the number of samples across the classes has become more uniform. In (b) and (c), the comparison with other methods like KNN and ADASYN further demonstrates that ENN-SMOTE offers a more stable distribution, minimizing the class imbalance issue while preserving the integrity of the data. This improved balance allows the model to more effectively capture and classify patterns across all fault types, leading to better generalization and more accurate fault predictions.
In terms of computational cost, it is important to note that while ENN-SMOTE offers a powerful method for balancing the dataset, it does incur additional computational overhead. Specifically, the process of identifying nearest neighbors and removing borderline instances can be computationally expensive, particularly for large datasets. This additional cost is primarily due to the K-nearest neighbor search and the need to calculate distances between samples. To quantify the computational cost, we measured the time taken for the resampling process compared to simpler methods such as Random Oversampling and Tomek Links. We found that ENN-SMOTE requires more processing time and memory, particularly for datasets with a large number of features. However, the performance improvements achieved in terms of accuracy and generalization justify the additional computational expense.
4.3. Classification Prediction Results
An important input parameter for the LMT is the number of trees. To achieve the highest accuracy of this method, an increment of 10 is achieved by using 1 to 100 trees, in steps of 10, to achieve the highest accuracy for this method. The results are presented in
Table 5. These outputs were obtained by averaging the results from a 10-fold cross-validation. The evaluation showed that both 50 and 60 trees achieved the same highest accuracy of 93.14%. While 60 trees were highlighted, 50 trees might also serve as an optimal choice considering computational efficiency and model complexity. Beyond this peak (60 trees), the accuracy slightly declined. The evaluation results indicate that increasing the number of trees beyond 50 does not necessarily lead to higher accuracy, as both 50 and 60 trees achieved the same maximum accuracy of 93.14%. While increasing the number of trees might enhance model stability in some cases, it also results in greater computational costs. Therefore, selecting an intermediate number of trees, such as 50, provides a balance between performance and computational efficiency.
When the ensemble includes a small number of trees, the likelihood of misclassification increases due to the instability of individual decision trees. With a small ensemble size, each classifier has a significant impact on the final prediction. If the quality of the individual members is poor, the overall performance is consequently affected. Therefore, increasing the number of trees is necessary to mitigate the influence of low-quality members.
LSTM excels at capturing temporal dependencies in sequential data but may not be as effective in learning static features. In contrast, the LMT, as an ensemble learning method, can combine the results of multiple weak classifiers to better capture complex features and handle nonlinear relationships in the data. When the LMT is used to generate new features, these features are often more discriminative than the original ones. By combining these two approaches, we leverage the LMT’s feature extraction capability and LSTM’s ability to learn temporal dependencies, thereby enhancing the overall model performance.
Given that 50 trees achieved the highest accuracy, the number of trees was set to 50. Based on this, the classification results were fed into a multi-layer LSTM network. The classification results obtained are presented in
Table 6. The classification accuracy for each category exceeded 93.6%, with the specific accuracy values provided in the table, or higher, and the overall model accuracy was as high as 98.1%.
4.4. Training Process Analysis
Figure 6 depicts the training and validation loss and accuracy curves over 50 epochs, providing insights into the model’s early-stage training dynamics. In subfigure (a), the training and validation loss decrease steadily, showing proper convergence and indicating that the model effectively minimizes the error on both datasets. Subfigure (b) illustrates the training and validation accuracy, with a rapid improvement observed in the initial epochs, followed by stabilization. The alignment of the validation accuracy with the training accuracy suggests that the model achieves good generalization within 50 epochs, with minimal overfitting.
Figure 7, on the other hand, extends the training period to 100 epochs, offering a deeper view into the model’s performance over a longer duration. In subfigure (a), the training and validation loss continue to decrease and stabilize as the training progresses, with the validation loss remaining close to the training loss. This further confirms the model’s stability and its ability to avoid overfitting over an extended training period. Subfigure (b) presents the accuracy curves, which improve gradually and stabilize near the maximum accuracy. The validation accuracy closely follows the training accuracy throughout the 100 epochs, highlighting the model’s robustness and ability to generalize effectively to unseen data.
Based on the analysis of the loss and accuracy curves for both 50 and 100 epochs, it is evident that the model achieves satisfactory convergence within 50 epochs, as the loss stabilizes and the training and validation accuracy curves align closely, indicating minimal overfitting. Extending the training to 100 epochs shows only marginal improvements in the loss and accuracy, with a slight risk of overfitting as the gap between training and validation loss begins to widen slightly. Additionally, the computational cost of training for 100 epochs outweighs the negligible performance gain observed. Therefore, 50 epochs is a more efficient and optimal choice, providing a balance between computational efficiency and model performance.
4.5. Comparative Analysis of Results
In our study, we will conduct a comparative analysis using data processed with ENN-SMOTE and unprocessed data with the same model. Additionally, we will explicitly compare the performance of traditional machine learning methods, such as decision tree (DT) and Random Forest (RF), with the proposed machine learning and neural network-based steel plate defect prediction model. The comparison highlights the relative strengths and weaknesses of each approach in terms of metrics such as Accuracy, Precision, Recall, and F1 Score. Specifically, the original data and the data processed with ENN-SMOTE will be compared across the models LMT, LSTM, and LMT-LSTM. By comparing the performance of each model on metrics such as Accuracy, Precision, Recall, and F1 Score, we validate the superiority of the proposed model in steel plate defect prediction.
The comparative results for the original dataset are shown in
Table 7, while the results for the ENN-SMOTE processed data are shown in
Table 8. The confusion matrix comparison for the original data is illustrated in
Figure 8, and the confusion matrix comparison for the ENN-SMOTE processed data is shown in
Figure 9. Additionally, the bar chart comparison of scores for the ENN-SMOTE processed data is presented in
Figure 9.
As shown in
Table 7, the traditional methods, DT and RF, achieved lower performance on the original dataset, with accuracy values of 73.1% and 73.8%, respectively. These results highlight their limited ability to handle the challenges of unprocessed data, such as imbalances and noise. On the other hand, as shown in
Table 8, after applying ENN-SMOTE preprocessing, the performance of DT and RF significantly improved to 86.1% and 91.8%, respectively. However, their results are still lower than the proposed LMT-LSTM model, which achieved an accuracy of 98.2%, demonstrating the advantages of the hybrid approach in both raw and preprocessed data scenarios.
The results without data preprocessing clearly show that the model proposed in this article has the best performance of 81.4%, outperforming traditional methods like DT (73.1%) and RF (73.8%). Similarly, after data preprocessing with ENN-SMOTE, the proposed LMT-LSTM model achieves a significant improvement with an accuracy of 98.2%, compared to DT (86.1%) and RF (91.8%).
Figure 9 illustrates the comparison of confusion matrices and performance metrics (Precision, Recall, and F1 Score) across three different models, the LMT, LSTM, and the proposed LMT-LSTM, using both processed and unprocessed datasets. Panels (a), (b), and (c) correspond to the performance of the LMT and LSTM models, while panels (d), (e), and (f) show the results for the LMT-LSTM model.
The results indicate that the proposed LMT-LSTM model achieved the best overall performance, with a classification accuracy of 98.2%, which significantly outperforms the LMT model (93.1%) and the LSTM model (95.2%). This is evident from the confusion matrix in panel (e), where most of the diagonal elements demonstrate high true positive rates for all defect classes, indicating superior classification accuracy and minimal misclassification errors compared to panels (a) and (c).
Additionally, the bar charts in panels (b), (d), and (f) provide a detailed comparison of the Precision, Recall, and F1 Score for each defect category. For all defect classes, the LMT-LSTM model consistently achieved higher metric values than the LMT and LSTM models.
Moreover, the figures highlight the benefits of data preprocessing. By comparing the results for processed and unprocessed datasets, it is evident that the processed dataset contributes significantly to the performance improvement across all the models. This is reflected in the higher metric values and reduced misclassification errors when using the processed dataset.
Table 9 summarizes previous studies and methods applied to the same dataset, highlighting a variety of traditional and advanced approaches used in the field. It includes algorithms such as DT, SVM, K-nearest neighbors (KNNs), and more sophisticated techniques like the LSTM networks and PCA-based decision tree forests (PDTDFs). The accuracy achieved by these methods varies, with some incorporating optimization techniques like I-PDTDF or advanced instance selection strategies such as ENN-MQRWA. By presenting these results, the table emphasizes the state-of-the-art methods’ limitations in achieving higher accuracy. In contrast, our method achieves the highest accuracy, demonstrating its clear advantage and superior performance. This comparison underscores the effectiveness and innovation of our approach in surpassing prior methods and advancing the field.
In summary, the proposed LMT-LSTM model, combined with effective data preprocessing techniques, not only achieves the highest accuracy but also demonstrates consistent improvements in Precision, Recall, and F1 Score across all defect categories, making it a highly reliable approach for defect detection in steel plates.