4.2. Selection of Weighted Permutation Entropy Parameters
In the calculation of the weighted permutation entropy, three parameter values need to be considered, the length of the time series , the embedding dimension , and the delay time . Different parameter settings will have a certain impact on the calculation results of the entropy value.
Firstly, to verify the effect of the delay time
on the calculated WPE values, the WPE values are plotted against the number of embedding dimensions at different delay times for a normal bearing vibration signal of length 2048.
Figure 6 shows the WPE and PE curves of normal bearing signals with different time delays
and different embedding dimensions
. The values are normalized ones. As shown in
Figure 6, it can be seen that the WPE value of the bearing signal is minimally affected when the delay time
is varied at
. Considering the calculation time issue, we use
for the calculation of WPE in this paper. When
, we can see that WPE curves are lower than PE curves, which indicates that the normal signal is more regular and has fewer abrupt changes.
For a
d-dimensional system, the phase space is guaranteed to accommodate the features of the original state space when
[
16]. If
is too small, the system characteristics are not well represented, whereas when
is too large, the effect of noise is simultaneously amplified as the embedding dimension becomes larger and the computation time of the algorithm is prolonged, which is not beneficial in practical applications. Therefore, the selection of the embedding dimension is particularly important. Cao, Bandt [
17,
18] suggested that the permutation entropy value can best characterize the dynamic properties of the time series with
,
or
. Therefore, the embedding dimension of the WPE should also be chosen as
,
or
.
Figure 7 shows the WPE and PE curves of normal bearing signals with different data lengths and different embedding dimensions (
) with
. The data lengths
are 256, 512, 1024, 2048, and 4096, respectively. As shown in
Figure 7, the changes in WPE values with changing lengths are relatively stable when
and
. When the embedding dimension
, the WPE values of normal bearing signals with several different lengths start to show a significant decrease. Moreover,
Figure 8 shows the curves of the sample standard deviations
of the WPE and PE values of different lengths with the fixed embedding dimension value
. For the errors of WPE and PE of the normal bearing signals with several different lengths, those with the embedding dimension
are the smallest. The embedding dimension means that a vector of
elements can have at most
permutation patterns, the more patterns in the rows, the longer the calculation time. Taking into account the computational time and accuracy, the length of the signal sequence is chosen to be 2048 and the embedding dimension
is chosen as the best parameter for the simulations in
Section 4.3.
After truncating the data of seven states of the rolling bearing variation signals (Normal, I1, I2, I3, O1, O2, O3, B1, B2, B3) consisted of one kind of normal bearing signal and three damage diameters of three types of faults (inner-race fault, outer-race fault, and rolling ball fault) of the same bearing SKF 6205-2RS, detection parameters need to be obtained to construct the feature matrix. In this paper, wavelet decomposition and weighted permutation entropy are used to extract the features from the sample data.
Table 1 shows the average accuracy of the training set obtained by the ELM with different decomposition layers of different wavelet basis functions for five-fold cross-validation under the PE and WPE algorithms. In
Table 1, ‘Basis’ is the wavelet basis function; ‘PE’ and ‘WPE’ denote permutation entropy and weighted permutation entropy algorithms; ‘N’ is the number of wavelet decomposition layers; ‘Positive mean (PM)’ is the average accuracy (the total number of the correct predictions of all the signals of seven types of divided by the number of all the signals of seven types). The average accuracy PM of the training set is obtained by cross-validating a total of 350 groups of normal and six fault signals into five folds, i.e., 50 groups of each fault signal were set into five folds of 10 signals each in turn as the training set for training. For the number of hidden layer nodes K, when K increases from 1 to 100, for each fold we try to find the range of K at the highest training accuracy and finally calculate the union set of the range of K for each fold so as to obtain the range of values of the number of hidden layer nodes K at the highest mean value of detection accuracy. It can be seen that under WPE, the accuracy of the training set is 100%; under PE, the floating range between different decomposition layers is below 1.00% except for N = 8. Considering the scale of wavelet decomposition is too large, it will increase the complexity of data calculation, and if the scale is too small, it will lead to incomplete signal decomposition, so N < 4 is not considered. Therefore, the wavelet of scale
(the number of decomposition layers is four) is selected to decompose the fault signal. The accuracy of the training set under the PE algorithm with different wavelet bases was observed under the premise that the number of decomposition layers is N = 4. The floating range was 0.85%, and the influence of the wavelet basis function on the accuracy is small. The ’db3′ is chosen as the basis function of wavelet decomposition in this paper.
The ‘db3’ wavelet basis function is used for four-layer wavelet decomposition, and then the approximate and detail components decomposed at different scales are extracted, and their weighted permutation entropy can be calculated to construct the feature matrix.
There are 500 samples with a length of 2048 for the 10 signals (Normal, I1, I2, I3, O1, O2, O3, B1, B2, B3) of different damage degrees of the three types of faults (inner-race fault, outer-race fault, and rolling ball fault) of the same bearing SKF 6205-2RS, and the number of samples for each damage level is 50 groups. In order to ensure the accuracy of diagnosis and classification, the data are divided into a training set and test set in proportion; that is, the numbers of samples in the training set and test set for each type of fault signal are 40 and 10, respectively. Finally, the number of samples in the training set is 400, and the number of samples in the test set is 100 for each different fault signal. Different damage degrees of rolling body fault, inner-race fault, and outer-race fault are classified and identified, respectively, and the control signals are normal signals. Then, the different damage degrees of all fault signals are grouped together for classification and recognition.
In this paper, we firstly use 350 samples with a length of 2048 for seven types of signals (Normal, I1, I2, O1, O2, B1, B2) of different damage degrees of inner-race, outer-race, and rolling body faults to determine the number of wavelet decomposition and wavelet basis function. Therefore, we have 280 samples for training and 70 samples for test.
4.3. Experimental Results and Analysis
In the process of orthogonal wavelet decomposition, the low-frequency coefficients are generally decomposed into two parts. After decomposition, an approximation coefficient vector (CA) and a detail coefficient vector (CD) are obtained. The information lost in the two consecutive approximation coefficients can be obtained in the detail coefficients. The next step is to further decompose the approximation coefficient vector into two parts, whereas the detail coefficient vector is no longer decomposed.
Figure 9 shows the approximation coefficient curves of four-layer wavelet decomposition based on the ‘db3’ of a normal bearing signal. When we use the WPE method to do the calculation by
and
we can obtain WPE (CA1, CA2, CA3, CA4) = [0.9008, 0.9583,0.8362, 0.8384]. In
Figure 9, four pictures from top to bottom correspond to the wavelet decomposition layers 1–4. The amplitude of the waveform of CA1 in the first picture of
Figure 9 is relatively small compared with the waveform of CA2 in
Figure 9, although there are a lot of fluctuations in the signal, and a lot of changing trends of each embedding vector in the phase space reconstruction matrix with embedding dimension
, there are no large abrupt mutations, and the WPE value is 0.9008. It reflects that the time series has a lot of information. Although the two waveforms of CA1 and CA2 are similar to each other at first sight, the fluctuations of the amplitude values become larger in the second picture of CA2 in
Figure 9, so the WPE value becomes larger. The waveform of CA3 in
Figure 9 has fewer changing trends of data and the WPE is smaller than that of the waveform of CA2 in
Figure 9. Compared with the waveforms of CA3 in
Figure 9, the waveform of CA4 in
Figure 9 misses a lot of small fluctuations, the changing of the magnitudes is not very substantial apparently, so the WPE is similar to that of the waveform of CA3 in
Figure 9. The missing information can be found in the waveforms of detail coefficient curves in
Figure 10. We can obtain
WPE (CD1, CD2, CD3, CD4) = [0.7552,0.8009,0.8094, 0.8108] by the same parameters
and
. The magnitudes of the signal of CD1 in
Figure 10 are very small, but the maximum magnitude of the signal of CD2 reaches 0.1. Therefore, the WPE value of the CD2 sequence becomes larger than that of CD1 of the normal bearing signal.
At the same time, we can obtain PE (CA1, CA2, CA3, CA4) = [0.9467,0.9776,0.8899,0.8708], PE (CD1, CD2, CD3, CD4) = [0.7715,0.8446, 0.8798,0.8639] by the same parameters and . The PE values reflect the information of the permutation types in the signals with no effects on the magnitudes and large mutations in the signals. For simplicity, we use WPE (CA, CD) to denote WPE (CA1, CA2, CA3, CA4, CD1, CD2, CD3, CD4), WPE (CA) to denote WPE (CA1, CA2, CA3, CA4), WPE (CD) to denote WPE (CD1, CD2, CD3, CD4), PE (CA, CD) to denote PE (CA1, CA2, CA3, CA4, CD1, CD2, CD3, CD4), PE (CA) to denote PE (CA1, CA2, CA3, CA4), PE (CD) to denote PE (CD1, CD2, CD3, CD4).
Figure 11 shows the curves of the mean values of WPE and PE of all the CA and CD sequences of four-layer wavelet decomposition of 200 groups of the normal bearing signals and inner-race fault signals of different levels (I1, I2, and I3) by the same parameters
and
. The data set of each type has 50 groups of data with a length of 2048. The horizontal axis labeled as ‘Data index’ denotes the number of the sample data. There are large differences between the curves of the mean values of WPE (WPEmean) of the normal signal and the inner-race fault signal of I3 in
Figure 11a. It is difficult to separate the curves of the WPEmean of I2 and the normal signal. In
Figure 11b, the curves of the mean values of PE (PEmean) of I1, I2, and the normal bearing signal can be separated from each other, but it is hard to separate the curves of I3 from the normal bearing signal. Therefore, the WPEmean and PEmean parameters will not be used in the simulations of the classification method in this paper.
We focus on the four types of bearing signals; each data set has 50 groups of data with a length of 2048 and we use the first 40 groups of each type to do the training and the last 10 groups to do the testing simulations by ELM. The test data set of each type has 10 groups of data with lengths of 2048.
Figure 12 shows the distribution image of the values of WPE and PE of all the CA sequences of four-layer wavelet decomposition of 40 groups of the normal bearing signals and inner-race fault signals of different levels (I1, I2, and I3) by the same parameters
and
for the test of the fifth fold simulation of five-fold cross-variation. The horizontal axis labeled as ‘Data index’ denotes the number of the sample data. In
Figure 12a, we can separate normal bearing signals from I2 and I3, but it is hard to tell normal ones from the I1 signals. In
Figure 12b, we can separate normal bearing signals from I3, but it is hard to tell normal ones from the I1 and I2 signals. In
Figure 12c, we can separate normal bearing signals from I2 and I3, but it is hard to tell normal ones from the I1 signals. There are a few differences in the WPE values of the normal bearing signal and the inner-race fault signals of three levels in
Figure 12d.
Figure 13 shows the distribution images of the test values of WPE of all the CD sequences of four-layer wavelet decomposition of 40 groups of the normal bearing signals and inner-race fault signals of different levels (I1, I2, and I3) by the same parameters
and
for the test of the fifth fold simulation of five-fold cross-variation. The values of WPE of each layer of detailed coefficients show part of the characteristics of the signals in each subgraph. Therefore, we will use the WPE values of CA1, CA2, CA3, CA4, CD1, CD2, CD3, and CD4 to construct a vector to do the classification by ELM.
Figure 14 and
Figure 15 show the distribution images of the values of PE of all the CA and CD sequences of four-layer wavelet decomposition based on ‘db3’ of 40 test groups of the normal bearing signals and inner-race fault signals of different levels (I1, I2, and I3) by the same parameters
and
for the test of the fifth fold simulation of five-fold cross-variation. The PE values of the signals of CA or CD of a single layer of the normal bearing signals and inner-race fault signals of I1, I2, and I3 cannot be separated from each other.
Table 2 shows the training accuracy and testing accuracy of the classification methods with different detection parameters of normal bearing signals and inner-race fault signals of I1, I2, and I3 based on WPE and PE by four-layer wavelet decomposition based on ‘db3’ and ELM. K is the number of nodes in the hidden layer.
Figure 16 shows the histogram of the training accuracy and testing accuracy. We divide the training and test sets for each signal in a ratio of 4:1, i.e., sets 1–40 are training sets, and sets 41–50 are test sets for each signal. For classification by using ELM, we first perform five-fold cross-validation to find the range of value of the number of hidden neurons K, then take the minimum value of the range of K as the number of hidden neurons, and the transfer function of the ELM is a sigmoidal function. When the number of hidden neurons is K-min, the training accuracy and testing accuracy of the classification method based on the methods listed in
Table 2 is recorded by one simulation. In
Table 2, we can see that the classification methods based on the feature vectors consisting of WPE (CA, CD), WPE (CD), or PE (CA, CD) have the best performances, the accuracies of them were 100%.
Figure 16 shows the maxima and minima of training and testing accuracy values of each kind of feature vector by the histogram of training accuracy and the testing accuracy of the classification methods with different detection parameters of normal bearing signals and inner-race fault signals of I1, I2, and I3 based on WPE and PE by four-layer wavelet decomposition based on ‘db3’ and ELM.
Figure 17 and
Figure 18 show the distribution images of the values of WPE of all the CA and CD sequences of four-layer wavelet decomposition based on the ‘db3’ of 40 test groups of the normal bearing signals and inner-race fault signals of I1, I2, and I3 by the same parameters
and
. The WPE values of the signals of CA or CD of a single layer of the normal bearing signals and 3 types of fault signals of I1, O1, and B1 cannot be separated from each other.
Figure 19 and
Figure 20 show the distribution images of the values of PE of all the CA and CD sequences of four-layer wavelet decomposition based on the ‘db3’ of 40 test groups of the normal bearing signals and inner-race fault signals of I1, I2, and I3 by the same parameters
and
. The PE values of the signals of CA or CD of a single layer of the normal bearing signals and three types of fault signals of I1, O1, and B1 cannot be separated from each other. Only in
Figure 20a the PE values of the signals of CD1 can be separated from each other.
Table 3 shows the training accuracy and testing accuracy of the classification methods with different detection parameters of normal bearing signals and three types of fault signals of I1, O1, and B1 based on WPE and PE by four-layer wavelet decomposition based on ‘db3’ and ELM. K is the number of nodes in the hidden layer.
Figure 21 shows the histogram of the training accuracy and testing accuracy. We divided the training and test sets for each signal in a ratio of 4:1, i.e., sets 1–40 are training sets, and sets 41–50 are test sets for each signal. For classification using ELM, first perform five-fold cross-validation to find the range of value of the number of hidden neurons K, then take the minimum value of the range of K as the number of hidden neurons, and the transfer function of the ELM is sigmoidal function. When the number of hidden neurons is K-min, the training accuracy and testing accuracy of the classification method based on the methods listed in
Table 3 is recorded by one simulation. In
Table 3, we can see that the classification methods based on the feature vectors of WPE (CA, CD), WPE (CD), PE (CA, CD), and PE (CD) have the best performances.
Figure 21 shows the maxima and minima of training and testing accuracy values of each kind of feature vector by the histogram of training accuracy and the testing accuracy of the classification methods with different detection parameters of normal bearing signals and inner-race fault signals of I1, O1, and B1 based on WPE and PE by four-layer wavelet decomposition based on ‘db3’ and ELM.
Table 4 shows the training accuracy and testing accuracy of the classification methods with different detection parameters of normal bearing signals and six types of fault signals of I1, I2, O1, O2, B1, and B2 based on WPE and PE by four-layer wavelet decomposition based on ‘db3’ and ELM. K is the number of nodes in the hidden layer.
Figure 22 shows the histogram of the training accuracy and testing accuracy. We divided the training and test sets for each signal in a ratio of 4:1, i.e., sets 1–40 are training sets and sets 41–50 are test sets for each signal. For classification using ELM, firstly, we perform five-fold cross-validation to find the range of value of the number of hidden neurons K, then take the minimum value of the range of K as the number of hidden neurons, and the transfer function of the ELM is the sigmoidal function. In
Table 4, we can see that the classification method based on the feature vector consisted of WPE (CA, CD), and has the best performance. When the number of hidden neurons is 37, the training accuracy and testing accuracy of the classification method based on WPE (CA, CD) can reach 98.57% by one simulation. At the same time, for the feature vector PE (CA, CD), the number of hidden neurons is 60, although the training accuracy can reach 98.21% and the testing accuracy is 97.14%. For the other feature vector by ELM listed in
Table 4, the performances will be worse.
Figure 22 shows the maxima and minima of training and testing accuracy values of each kind of feature vector by the histogram of training accuracy and the testing accuracy of the classification methods with different detection parameters of normal bearing signals and inner-race fault signals of I1, I2, O1, O2, B1, and B2 based on WPE and PE by four-layer wavelet decomposition based on ‘db3’ and ELM.
As shown in
Table 5, we compared the average runtime and average accuracy of the algorithm in this paper with the paper by Yang Y. In the paper of Yang Y., they use Normal, I1, I3, O1, O3, B1, and B3 to do the simulations based on the detrended fluctuation analysis (MFDFA) method-singularity power spectrum (SPS) method by ELM, denoted as MFDFA-SPS+ELM. The method of cross-validation for ELM in Yang Y.’s paper is random validation: firstly, randomly select 40 out of 50 sets of signals as training sets, and the other 10 sets are the test sets; secondly, take the number of hidden nodes to be 50; finally, the average of the accuracies obtained from five runs is taken as the average accuracy. In
Table 5, we use Normal, I1, I2, O1, O2, B1, and B2 to do the calculations based on PE or WPE by wavelet decomposition and classified by the ELM with the random cross-validation used in the paper [
15]. We also perform calculations using the same data (Normal, I1, I3, O1, O3, B1, and B3) as in Yang Y.’s paper and compare their average runtime and average accuracy using the methods in this paper. The difference in signal faults between levels 1 and 3 is relatively larger than the difference in faults between levels 1 and 2. We can see that the average accuracy obtained in
Table 5 for different data types but with the same method. The average accuracy obtained for Type 1 is slightly less than the average accuracy obtained for Type 2 and the average runtime of the two data is close to each other. For the data from Type 2, we can see that the average accuracy of PE (CA, CD) +ELM reaches 96.57%, whereas the average accuracy of WPE (CA, CD) +ELM reaches 99.71%. The highest accuracy is 99.25% with the MFDFA-SPS method. Comparing the two methods, WPE (CA, CD) +ELM and MFDFA-SPS+ELM, with the same classification method and same parameters, it can be seen that the difference in average accuracy is 0.46%, but WPE (CA, CD) +ELM has a faster runtime.