1. Introduction
As one of the core components in the aerospace field, the health condition of aircraft engines significantly impacts the stable and reliable operation of aircrafts. With the continuous development of the aviation industry and an increasing demand for engine performance, the internal structure of aircraft engines has become increasingly complex. Many precision components of the engine need to operate reliably under harsh conditions such as high temperature, high pressure, high speed, and high load during routine flights. This results in a higher frequency of component failures. Additionally, due to varying flight scenarios and durations, the different components of aircraft engines may experience different levels of performance degradation, leading to various types of faults.
To reduce unnecessary maintenance efforts and associated human, material, and financial costs, as well as to promote the digital development of the aviation industry, precise maintenance and condition-based maintenance have become crucial goals in the field of aviation maintenance. Prognostic and Health Management (PHM) technology for equipment fault prediction and health management is considered an effective means to ensure operational quality, improve operational efficiency, reduce resource consumption, lower maintenance costs, and ensure reliable equipment operation [
1,
2]. PHM for aircraft engines plays a crucial role in providing intelligent maintenance solutions and preventing catastrophic accidents. Therefore, the fault diagnosis of aircraft engines, as an essential component of PHM, holds paramount importance.
Currently, fault diagnosis methods for equipment such as aircraft engines can be broadly categorized into three types: physics model-based methods, data-driven methods, and hybrid model-based methods combining both physics models and data-driven models. Physics-based modeling methods utilize real physical models and the digital simulations of equipment to establish degradation models, assess current health status, and predict future conditions. For instance, Im et al. proposed a model-based online fault diagnosis method that estimates fault severity indices based on negative sequence currents, identifying inter-turn faults in induction motors [
3]. However, physics-based modeling methods face challenges in obtaining universally applicable physical simulation models, especially with the increasing complexity and diversity of physical entities. These methods often require substantial prior knowledge and expert experience, limiting the generalizability of physical simulation models.
With the rapid development of the Internet of Things (IoT) and the acceleration of industrial digital transformation, data-driven fault diagnosis methods have gained prominence. These methods leverage big data technologies, combining data mining and artificial intelligence (AI) techniques, to directly learn the behavior features of equipment from sensor monitoring data. Wang et al. proposed an innovative method called MTF-CNN, which automatically learns data features by combining Markov Transition Fields (MTF) and Convolutional Neural Networks (CNN) for the fault diagnosis of rolling bearings [
4]. Data-driven methods do not require an in-depth understanding of the complex internal workings of the equipment. By collecting abundant degradation data from physical entities, these methods learn fault degradation features and classify fault modes, making them the mainstream research direction currently. Hybrid methods, combining both physics and data-driven approaches, aim to address the interpretability issues often associated with purely data-driven models. Zhou et al. introduced a Data-Model Cooperative Linking framework based on end-to-end deep network sparse denoising, utilizing both data- and model-driven elements for conducting fault diagnosis [
5]. However, the application threshold of hybrid methods is higher, often requiring profound expert knowledge and experience in physical entities.
With the continuous development of IoT technology, edge devices generate increasingly abundant usable data. Data-driven methods based on big data technologies offer advantages of simplicity, efficiency, and convenience, garnering widespread attention from academia and industry. Particularly with the assistance of computing power and intelligent chips, AI technologies can be deployed on edge devices, enabling the application of deep learning algorithms such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Stacked Denoising Autoencoders (SDAEs), and Deep Belief Networks (DBNs) in practical industrial production and management. Compared to traditional shallow machine learning models, deep learning algorithms could model nonlinear systems and automatically learn intrinsic degradation features from data. Deep learning models to some extent mitigate individual differences between monitoring data, reduce the impact of data noise, and address issues such as insufficient generalization and susceptibility to local optima associated with traditional methods. For example, Wen et al. proposed a hybrid fault diagnosis method using ReliefF-Principal Component Analysis (PCA) and DNN, achieving an improved accuracy for wind turbine fault diagnosis [
6]. Additionally, edge equipment, producing its usable data, has led to a surge in the application of data-driven methods based on big data technologies due to their simplicity, efficiency, and convenience. Notably, deep learning algorithms, including 1DCNN, have demonstrated excellent performance in handling time-dependent sequential data.
One-dimensional Convolutional Neural Network (1DCNN) has been employed in recent years to process time-dependent sequential data, showing a good performance in accuracy and efficiency. Raw data can be directly input into the model for training, allowing the exploration of temporal dependencies between data and addressing the limitations of previous models. Wang et al. proposed a novel method that combines features from multiple sensors using 1DCNN for predicting bearing faults [
7]. Du X et al. proposed a fault diagnosis method for rotating machinery using a sequence Transformer model based on SPBO-SDAE and attention mechanism. Firstly, the Student Psychology Based Optimization Algorithm (SPBO) is used to adaptively select hyperparameters for the SDAE network. Then, the SPBO-SDAE network is used to extract the features of the original high-dimensional data layer by layer. This method has significant advantages in generalization performance, fault diagnosis accuracy, and time efficiency [
8]. Jiao J et al. proposed a fault diagnosis method based on DBN joint information fusion technology. Firstly, wavelet transform is used to denoise, decompose, and reconstruct the vibration signals of industrial robot joint bearings. Then, a normalized feature vector is established for reconstructing energy entropy, and the normalized feature vector is used as an input for DBN. Finally, a combination of DBN and wavelet energy entropy technology is used for the fault diagnosis of industrial robots [
9]. However, most of these studies require other algorithms to assist in feature extraction, or to use other algorithms to convert the original signal into an image before conducting fault diagnosis. While obtaining the fault diagnosis model, a significant amount of time was also spent on feature extraction and signal conversion. In recent years, 1DCNN has been used to process temporal data with time-dependent characteristics, demonstrating a good performance in terms of accuracy and time. At the same time, the raw data can be directly input into the model for training, which not only mines the temporal dependencies between data but also solves the shortcomings of the above model. Wang X et al. proposed a new method for fusing multimodal sensor signals. Firstly, features are extracted from the original vibration signal and acoustic signal, and then a deep neural network based on 1DCNN is used for fusion to predict bearing faults [
10]. Yaser Ali Almatheel et al. proposed a 1DCNN with two convolutional layers to diagnose the time-domain vibration signals of bearings. Combining dataset augmentation techniques, 1DCNN will directly act on time-domain vibration signals to diagnose bearing faults [
11]. Wang Y et al. proposed a bearing fault feature extraction and recognition method based on Particle Swarm Optimization for Maximum Related Kurtosis Deconvolution (MCKD) and 1DCNN. Firstly, fault feature selection is performed on the multi-channel signals of rolling bearings, and signals containing fault features are filtered using MCKD. Then, 1DCNN is used to identify faults in feature signals under different damage conditions [
12]. Zhang S et al. proposed a novel fault detection method using Zero Shot Learning (ZSL). This method first extracts features from the original signal by applying 1DCNN, then establishes semantic descriptions as shared fault attributes between known and unseen faults, and finally uses bilinear compatibility functions to find the highest level of bearing fault type [
13].
In the process of equipment fault degradation, signals collected by sensors change over time. 1DCNN alone may not capture the bidirectional temporal dependencies between data. Furthermore, there is limited literature that investigates fault diagnosis on the CMAPSS dataset. Han et al. proposed a 1DCNN-based aircraft engine fault mode classification model under multiple operating conditions. They clustered six flight conditions and classified fault modes into two categories: HPC fault and HPC&Fan mixed fault. They then established a 1DCNN binary classification fault diagnosis model [
14]. However, this approach did not consider bidirectional temporal dependencies between data and did not diagnose the fault mode of aircraft engines with the faultless mode. During the initial flight cycles of aircraft operations, the engine operates in a healthy mode (i.e., faultless). Considering all fault modes that aircraft engines may experience throughout their entire life cycle, this paper proposes the aircraft engine fault diagnosis model based on 1DCNN-BiLSTM with CBAM. This model does not require the integration of other feature extraction algorithms or the conversion of sensor signals into time–frequency images. To train the model in a supervised manner, labels are assigned to data for the faultless mode, the HPC fault (single-fault) mode, and the HPC&Fan fault (mixed-fault) mode. Additionally, as monitoring values from different components in different flight scenarios may have significant dimensional differences, the original sensor data is standardized in this study. Finally, the preprocessed data is input into the 1DCNN-BiLSTM model with CBAM for training to obtain the fault diagnosis model. The major innovative points in this article are summarized as follows:
- (1)
The proposed model can be directly applied to raw monitoring data without the need for additional algorithms to extract features.
- (2)
A channel and spatial attention mechanism (CBAM) is added after the 1DCNN layers, which could assign higher weights to features relevant to fault categories and make the model pay more attention to them. Also, BiLSTM is added after the CBAM layer, capturing the nonlinear time feature sequences and bidirectional contextual feature information, to improve the prediction accuracy and model performance.
- (3)
In addition to diagnosing various faulted modes of aircraft engines, the faultless mode is also diagnosed, which has played a positive role in further predicting the RUL and spare parts management.
The content arrangement of this paper is as follows:
Section 2 provides a detailed introduction to the fault diagnosis framework for aircraft engines.
Section 3 presents experimental verification results on the CMAPSS dataset.
Section 4 summarizes the paper, addressing some limitations and prospects.
3. Experimental Analysis
3.1. Dataset
This paper validates the effectiveness of the 1DCNN-BiLSTM with the CBAM model on the publicly available Commercial Modular Aero-Propulsion System Simulation (CMAPSS) dataset from the National Aeronautics and Space Administration (NASA). The dataset simulates the actual degradation process of a turbofan engine from the healthy state to the Run to Failure (RtF) overflight cycles. The dataset is divided into four different subsets, as shown in
Table 1.
The engine’s fault modes consist of two types, where the single fault mode datasets FD001 and FD002 only include High-Pressure Compressor (HPC) degradation, and the mixed fault mode datasets FD003 and FD004 include HPC&Fan degradation. Additionally, there are two types of operating conditions: FD001 and FD003 include only a single operating condition, while FD002 and FD004 encompass multiple operating conditions. In each subset, training sets, test sets, and datasets recording the actual RUL labels are provided. The training and test sets include engine ID, operational cycle count, three operating condition parameters of the engine (flight height H, Mach number Ma, and throttle resolver angle TRA), and measurements from 21 sensors. The training set records the engine’s degradation process from a healthy state to states of increasing severity until the system’s Run to Failure (RtF).
Simultaneously, the CMAPSS dataset has been widely used for predicting the RUL of turbofan engines. In most of the literature on RUL prediction [
19,
20,
21,
22], the degradation of RUL is treated as a piecewise linear degradation model, as illustrated in
Figure 10.
In
Figure 10, for the initial several cycles, the RUL is assigned a fixed value. This fixed value is referred to as the early RUL. As the flight cycles continue to increase, when the RUL value reaches the early RUL, the RUL follows a linear degradation model. Therefore, during the early RUL period, the aircraft operates smoothly and healthily, and no faults are assumed to occur. In the linear degradation phase of RUL, the engine is considered to be in a faulty state. Based on this, this paper selects the FD001 and FD003 datasets. Following the research in the literature [
19,
20], the early RUL value is set to 125. The fault modes are categorized into three classes (faultless, HPC single fault, HPC&Fan mixed fault) for analysis.
3.2. Data Preprocessing
(1) Fault Label Encoding
This paper categorizes fault modes into three labels, i.e., faultless, HPC single fault, and HPC&Fan mixed fault, for the supervised training of the proposed model. To prevent any extraneous information interference from class variable values having no actual meaningful relationship to the model, it is necessary to perform one-hot encoding on these fault categories before training the model, as shown in
Table 2. Faultless mode is encoded as [1,0,0], HPC single fault mode as [0,1,0], and HPC&Fan mixed fault mode as [0,0,1].
(2) Denoising
As the aircraft continuously flies and lands, certain sensors exhibit constant readings. Throughout the engine’s entire lifecycle, these sensors can be considered irrelevant to the engine’s aging process and should be excluded. In order to depict the distribution of the original sensor data, this paper presents boxplots for the three operating condition parameters and 21 sensors, as shown in
Figure 11.
From
Figure 11, it is observed that the values of operational parameter 3 and sensors 1, 5, 6, 10, 16, 18, and 19 are constant and should be removed. Additionally, since the operating conditions for FD001 and FD003 are the same, the influence of operational parameters will not be considered, and only the measurements from sensors will be used. Therefore, this paper retains 14 sensors [
19], namely, T24, T30, T50, P30, Nf, Nc, Ps30, Phi, NRf, NRc, BPR, htBleed, W31, and W32. The specific meanings of these sensor parameters are detailed in
Table 3.
(3) Standardization
As observed from
Figure 7, the dimensions of the monitored values for various sensors differ significantly. Directly inputting these data into the network model would result in uneven weight distribution for input features, making the model unable to converge. Therefore, this paper individually performs mean subtraction and variance normalization on the monitoring data of each sensor listed in
Table 3, as shown in Formula (14).
where
represents the original sensor data,
is the mean of all samples for one sensor,
is the standard deviation of all samples for one sensor, and
represents the normalized data. Data standardization accelerates the convergence of gradient descent to the optimal solution and improves the precision and accuracy of prediction results.
(4) Time Series Transformation
Following the time window selection in the literature, the time window size for the FD001 and FD003 datasets is set to 30 [
19,
22], with a stride of 1. The data scale before and after time series transformation for the FD001 and FD003 datasets is shown in
Table 4.
After the time window transformation, the original unordered two-dimensional feature space is extended into the time dimension. Compared to the original data dimensions, the transformed data now has an additional time dimension. This helps the BiLSTM model in capturing temporal dependencies between features, aligning with the mechanism of fault degradation over time.
After the various steps of data preprocessing, we have tabulated the sample counts for the three fault modes, as shown in
Table 5.
From
Table 5, it can be observed that the proportions of each fault class to the total number of samples are relatively similar. The ‘Faultless’ mode has a larger number of samples, while the counts for ‘HPC Single Fault’ and ‘HPC&Fan Mixed Fault’ are nearly equal.
3.3. Evaluation Index
In order to verify and evaluate the effectiveness of the proposed method, this study applied four commonly used performance metrics: Macro-precision, Macro-recall, accuracy, and F-Score. Macro-precision represents the precision calculated for each class separately, followed by the arithmetic mean calculation with all categories. Macro-recall represents the recall calculated for each class separately, also followed by the arithmetic mean calculation with all categories. Accuracy represents the ratio of correctly classified samples to the total number of samples. The single category values of precision, recall, and accuracy can be obtained from the confusion matrix, as shown in Formulas (15)–(17):
Here, represents true positives, represents true negatives, represents false positives, and represents false negatives.
To comprehensively consider both Macro-precision and Macro-recall, the F-Score is introduced. It represents the weighted harmonic mean of Macro-precision and Macro-recall, as shown in Formula (18):
Here, β is commonly set to one, known as , aiming to balance the impact of and on the prediction results and provide a relatively reasonable evaluation metric.
3.4. Result Analysis
This paper validated the proposed model on the ‘train_FD001’ and ‘train_FD003’ subsets of the CMAPSS dataset. The dataset was randomly split, allocating 70% of the data for training, 20% for validation, and 10% for testing. The parameters of the models were trained and updated on a personal computer equipped with Inter Core i5 processor, 2.5 GHz clock speed, and 16 GB RAM.
3.4.1. Result Comparison
To verify the effectiveness of the proposed model, the results are compared with traditional shallow machine learning models, such as Support Vector Machine (SVM) [
23,
24,
25,
26] and Random Forest (RF) [
25,
27,
28], and deep learning models: Fully Connected Neural Network (FNN) [
29], Recurrent Neural Network (RNN) [
30], 1DCNN [
7,
10,
11], LSTM [
17], and BiLSTM [
18] models. The average accuracy, Macro-precision, Macro-recall, and F1-Score on the test set are shown in
Table 6.
According to
Table 6, the average accuracy, Macro-precision, Macro-recall, and F1-Score of SVM, FNN, and RNN are not significantly different. Due to the ability of RNN to extract temporal features of faults, the classification effect is better; RF is an integrated learning model, and its effect is better than that of single models such as SVM, FNN, and RNN, but it belongs to traditional machine learning models, so its effect is not as good as that of complex deep learning models such as LSTM; both LSTM and BiLSTM are the improvements of RNN, incorporating gate mechanisms to alleviate the problem of vanishing gradients, resulting in better performance than RNN; BiLSTM performs better than LSTM due to its ability to extract bidirectional dependencies of features; due to the inability of RNN-based deep learning models to extract spatial features, 1DCNN not only extracts spatial features but also one-dimensional temporal features to a certain extent, resulting in a better classification performance than BiLSTM; however, 1DCNN assigns the same weight to the extracted features and cannot extract the bidirectional temporal dependencies of the features. The proposed 1DCNN-BiLSTM with the CBAM model not only assigns higher weights to fault-related features through CBAM, but also uses BiLSTM to extract the bidirectional temporal dependencies of the features, resulting in the highest average accuracy, Macro-precision, Macro-recall, and F1-Score, as can be seen in bold. Therefore, the model proposed in this article is effective for the fault mode classification of aircraft engines and has a good diagnostic performance.
3.4.2. Ablation Experiments
To further demonstrate the effectiveness of the 1DCNN-BiLSTM with the CBAM model, we conducted ablation experiments to demonstrate the role of CBAM and BiLSTM. The average accuracy, Macro-precision, Macro-recall, and F1-Score of 1DCNN-LSTM, 1DCNN-BiLSTM, 1DCNN-LSTM with CBAM, 1DCNN with CBAM, and 1DCNN-BiLSTM with CBAM were compared on the test set, as shown in
Table 7. Among them, in all the experiments, 1DCNN, LSTM, and BiLSTM all used the same network structure.
According to
Table 7, on the test set, the average accuracy, Macro-precision, Macro- recall, and F1-Score of 1DCNN-BiLSTM are higher than those of 1DCNN-LSTM, and those of 1DCNN-BiLSTM with CBAM are higher than those of 1DCNN with CBAM, indicating the effectiveness of BiLSTM in mining feature bidirectional temporal dependencies. The average accuracy, Macro-precision, Macro-recall, and F1-Score of 1DCNN-LSTM with CBAM(bold in
Table 7) are all higher than those of 1DCNN-LSTM, and those of 1DCNN-BiLSTM with CBAM are also higher than those of 1DCNN-BiLSTM. This indicates that the addition of CBAM focuses more attention on the fault-related features, resulting in a better predictive performance of the model. This once again confirms that the performance of this model is superior to other models.
3.4.3. Visualization
We also used t-SNE visualization technology to visualize the features of the model only with 1DCNN layers, 1DCNN layers with CBAM, and 1DCNN-BiLSTM layers with CBAM, as shown in
Figure 12.
From the t-SNE visualization in
Figure 12, it can be seen that the classification boundary in (b) is clearer than that in (a), but there are still some intersections. The distance between different classes in (c) is larger than that in (b), indicating that the network model with BiLSTM layer can cluster samples of the same category well and can also classify the samples of different categories well. The classification results are better than those for the network models only with 1DCNN layers and 1DCNN layers with CBAM, and this once again confirms that the performance of this model is more effective.