1. Introduction
The detection of anomalies is gaining importance in industrial scenarios, driven by the relatively cheap solutions the “Industry 4.0” revolution has made available for data acquisition, storage, cloud computing, and efficient machine learning algorithms. Hence, the growing interest in implementing adequate data-driven decisions is pulling the recent development of several intelligent computing methods for anomaly detection (AD) in industry [
1].
Industrial facilities deteriorate over time due to usage and normal wear. Such deterioration must be identified early to prevent possible damages and losses and to maximize machine productivity and remaining useful life (RUL) while reducing maintenance and repair costs. Machinery is then usually instrumented with sensors dedicated to monitoring the performance of industrial components. The data collected by these sensors may include temperature, pressure, vibration, acoustic emission, and motor current [
2]. Industrial damage detection typically deals with defects in mechanical components such as motors, turbines, pipeline oil flow, and physical structures.
Most existing machine condition monitoring techniques involve sample/signal analysis and use an indicator to detect fault by comparing it to a predefined alarm threshold. This approach requires expertise in data analysis and knowledge of the machine and its operation. Moreover, the threshold-based technique does not ensure the detection of all possible faults, since some of them involve complex behaviors, such as unwanted oscillations or abnormal increasing/decreasing speed of monitored signals, which remain within thresholds. Recent progress in Artificial Intelligence (AI) technologies has brought several advantages by developing automated fault detection procedures based on available data.
The lack of reliable tagged data may represent a typical issue in many industrial environments, where a labeled set of anomalous data instances covering all possible types of anomalous behavior is usually more challenging to obtain than a labeled set of normal data. In addition, the anomalous behavior is often dynamic (new types of anomalies might arise) and sometimes would result in a catastrophic event, e.g., [
3]. For these reasons, semi-supervised and unsupervised learning approaches are the most widely applicable in the industrial domain.
Hence, in this work, an unsupervised approach for AD is investigated and applied to a typical fault detection problem in the industrial field, i.e., the bearings diagnosis. As some of the most relevant rotating machinery components, rolling element bearings play an essential role in adequately functioning various industrial facilities. Bearings are the best location for measuring machinery vibration, where the machine’s primary dynamic loads and forces are applied. Therefore, condition monitoring and fault diagnosis of this critical component can represent the machine’s condition.
The methodology is applied to the Case Western Reserve University (CWRU) benchmark dataset, widely used for developing bearing fault detection approaches. The dataset provides vibration data collected for normal and faulty bearings. Most approaches applied to the CWRU bearing dataset are based on supervised learning and aim to classify different bearing states. This work aims to outline a methodology based on unsupervised learning to monitor online bearing health state and detect possible anomalous behaviors for undertaking preventive countermeasures as soon as the anomaly occurs.
Two different approaches are analyzed for this scope, the first based on the manual extraction of typical vibration metrics to be provided as input to a machine learning (ML) algorithm, the second based on a deep learning (DL) method able to extract useful features from raw data automatically. The two approaches are compared by testing a representative method for each method class, evaluating their effectiveness and efficiency and their strengths and weaknesses in an industrial context.
The article is organized as follows.
Section 2 provides a brief background analysis and a literature review of the standard anomaly detection methods applied to the industrial domain. In
Section 3, the dataset and the applied methods are described in detail, whereas in
Section 4 the obtained results are presented. In
Section 5, conclusions are drawn from this study by outlining further research developments.
2. Background and Related Work
As a rule, AD methodologies aim to identify data patterns that differ significantly from expected normal behavior [
4]. These nonconforming patterns are most referred to as anomalies or outliers, sometimes interchangeably. Several key aspects need to be considered in developing an adequate AD methodology.
The nature of input data represents a primary factor for an AD technique. Input is generally a collection of data instances. Each data instance can be described using a set of attributes (also referred to as variable, characteristic, feature, or dimension) and might consist of only one attribute (univariate data) or multiple attributes (multivariate data). The attributes can be of different types, such as binary, categorical, or continuous.
Another essential aspect of AD problems regards the type of anomaly. Anomalies can be classified into point, contextual, and collective anomalies [
5]. An individual data instance is called a point anomaly if it can be considered anomalous concerning the rest of the data. That is the simplest type of anomaly and the most popular in anomaly detection research. A contextual anomaly occurs if a data instance is anomalous in a specific context but not otherwise. Then, data instances’ contextual and behavioral attributes must be specified as part of the problem formulation. A collective anomaly is defined as a collection of anomalous related data instances for the entire dataset. In this latter case, the individual data instances in the collection may not be anomalies alone, but their occurrence together as a collection is anomalous.
The label of a data instance determines whether that instance can be classified as normal or anomalous. The availability of labeled data for training/validation of models used by AD techniques is usually a significant issue. A human expert often does manual labeling, and obtaining an accurate and well-representative labeled training dataset can be prohibitively expensive. Hence, based on the availability or unavailability of labeled data instances, AD techniques operate in supervised, semi-supervised, and unsupervised modes. Supervised techniques assume the availability of a training dataset that has labeled instances for normal and anomaly classes. A typical approach in such cases is to build a predictive model for normal vs. anomaly classes and use the model to determine to which class any unseen data instance belongs. Unsupervised techniques assume that the training data wholly consist of normal samples, i.e., samples belonging to the class chosen as normal. On the other hand, semi-supervised AD refers to experiments where, in addition to the training set samples belonging only to the normal class, a few labeled anomalies are available and considered by the training objective to help improve the detection [
6].
Another key factor for any AD technique is how to report the anomalies. Typically, the outputs may be scores or labels. Scoring methods allow defining a so-called Anomaly Score (AS), a quantitative index assigned to each instance in the test data depending on the observation’s degree of ‘outlierness’. Thus, the output of such methods is a ranked list of anomalies. An analyst may choose to analyze the top few anomalies or use a cutoff threshold to select the most relevant ones. In industrial applications, the AS can be exploited as a health factor for real-time system monitoring. Inspections or other maintenance actions can be undertaken when the AS overcomes a pre-determined threshold. Other methods directly assign a label (normal or anomalous) to each test instance.
AD uses many approaches, including model-based and data-driven methods. However, the complexity of present-day industrial systems and the rising cost of model building lead to limitations on using model-based techniques. Recently, data-driven methods have gained attention because of their ability to derive models without prior knowledge of the application domain, also fostered by recent advances in sensor technologies, data processing, and computation power growth.
Several data-driven algorithms have been applied to AD tasks [
7]. Unsupervised rather than supervised methods are often adopted in industrial applications due to the inability, in many realistic cases, to collect a representative set of defective data corresponding to several faulty operations.
Classical ML approaches have been successfully adopted for unsupervised AD. In particular, One-Class Support-Vector Machines (OC-SVMs) are one of the most robust unsupervised outlier detection algorithms and have been successfully used for fault diagnosis by the ML community over the past 15 years. That is due to its excellent generalization ability and high accuracy using a relatively small number of samples [
8].
In [
9], an automatic method for bearing fault detection and diagnosis is presented based on a one-class SVM trained using only historical data under normal conditions.
The work in [
10] proposes novel data-driven architectures trained exclusively on healthy signals, combining LSTM regressors and OC-SVM classifiers, to develop an automated algorithm to identify any abnormal mechanical behavior captured by vibration measurements.
The scientific community has adopted unsupervised DL methods since they outperform conventional ML techniques, which are limited in processing raw data by the need for careful engineering and considerable domain expertise for transforming the data into a suitable internal representation or feature vector. Conversely, DL methods are representation learning methods that allow a machine to be fed with raw data and to automatically discover the representations needed for detection or classification [
11]. Several DL approaches in unsupervised AD tasks exploit the autoencoders (AEs) framework. The study in [
12] proposes an intelligent fault diagnosis approach that uses a deep neural network (DNN) based on a stacked denoising autoencoder applied to unlabeled data. Authors in [
13] propose an unsupervised method for diagnosing faults of electric motors by using a novelty detection approach based on deep AEs. In [
14], a comparison of different AE architectures is presented, including vanilla and variational AEs with different types and number of layers, for AD of a real manufacturing industrial furnace.
An innovative DL-based model using AEs is developed in [
15], which improves anomaly detection accuracy in bearing condition monitoring.
AD finds extensive use in various application domains, especially industrial damage detection. Among all the publicly available datasets, CWRU Bearing Data Center data have become a standard reference for testing new diagnostic bearing algorithms [
16,
17]. In most of the research papers that use the CWRU bearing dataset, the aim is to apply unsupervised learning approaches to classify the fault category. Moreover, in the literature, the main concern is the diagnosis of data records. Although classic bearing fault features dominate some signals, others are unclear or display other fault symptoms. The study in [
18] provides a benchmark analysis applying three diagnostic methods that use the squared envelope spectrum. The authors found that the data records ranged from easily diagnosable to undiagnosable with applied methods. The ball faults are among the most difficult to diagnose. The same outcome is verified in [
19], where envelope analysis is applied to the signal filtered in the frequency band with the most diagnostic information.
In the current research, both OC-SVM and AEs-based approaches are then implemented, assessing and comparing their performance for unsupervised fault detection in the CWRU bearing dataset.
4. Results and Discussion
First, the models’ performances are assessed using some of the most used evaluation metrics listed in the following.
Accuracy: the fraction of correctly predicted instances over the total number of instances.
Precision: the fraction of positive instances the model correctly predicts, useful when the cost of false positives is high.
Recall: the fraction of positive instances correctly predicted by the model out of all the positive instances in the data, useful when the cost of false negatives is high.
F1 Score: the harmonic mean of precision and recall, a balanced measure considering false positives and false negatives.
AUC-ROC: the “Area Under the Receiver Operating Characteristic” curve, i.e., the probability that a randomly selected positive instance will be ranked higher than a randomly selected negative instance [
37].
Table 8,
Table 9 and
Table 10 show the values of the evaluation metrics across the two different approaches for each anomaly class and load condition. Test performances are computed on five different random train-test splits; therefore, metrics values are here expressed in terms of mean and standard deviation. The results reveal that, on average, both methods perform well. In almost all cases, OC-SVM exhibits the best performance in detecting bearing inner race and outer race defects.
At the same time, the AE reconstruction method performs better for the diagnosis of ball fault cases. For this type of anomaly, the ML approach shows lower accuracy and recall values when the fault diameter is 0.007 inch. It is worth noticing that, unlike OC-SVM, the approach based on AEs allows 100% recall (FN = 0) for all experiments, presenting a 100% rate in recognizing faults.
For better comprehension, the simulation results of compared methods are also analyzed in terms of AS and confusion matrices, which help to understand the types of errors the model makes, showing the number of true positives, true negatives, false positives, and false negatives. Specifically, the OC-SVM score samples and the AE loss are used as anomaly scores, and their values are plotted for 200 normal and 200 faulty test samples. Some of the most representative results are shown in
Figure 7,
Figure 8 and
Figure 9, where each black dot corresponds to a normal sample. In contrast, a blue, green, and red one corresponds to an abnormality, and yellow diamond points, if present, mark false positives and false negatives.
Confusion matrices in
Figure 10,
Figure 11 and
Figure 12 better detail the results in
Table 8,
Table 9 and
Table 10. As expected, the worst case can be related to the OC-SVM method presenting many false negatives (FN = 51) when detecting the bearing ball fault of 0.007 inch -size. The resulting low recall value means the algorithm confuses the small-size ball defects with the normal mode.
From a detailed data analysis, both approaches are considered accurate in distinguishing high-severity faults from normal behavior. In
Figure 8, the red dot corresponding to the highest AE loss (sample # 261—Fault_IR021_2) is the red dot with the most considerable distance to the OC-SVM score of the normal observations. The analysis in the time and frequency domain of sample # 261 in
Figure 13 highlights high-amplitude vibration, mainly in the range of 3–4 kHz, for the 0.021-inch size fault class, much higher than vibration amplitudes shown by the same sample number corresponding to the anomaly classes with a lower fault size (Fault_IR014_2 and Fault_IR007_2). This evidence is also reflected by the values of the features given as input to the ML model and shown in
Figure 14.
On the other hand, the ML algorithm’s difficulty in correctly ranking the fault severity is highlighted.
Figure 15 demonstrates differences in the 3–4 kHz amplitudes of the two compared signals, not detected by OC-SVM that assigns a similar AS to both samples. The AE reconstruction method, instead, is more effective in detecting even slight differences in the vibration signature. The scarce sensitivity of the OC-SVM method is probably due to the small set of features extracted, which can only capture some helpful information from the raw signals.
The unsupervised recognition of ball faults finally demonstrates the best overall performances of AEs-based methods. As reported in the literature [
18,
19], the ball faults are certainly the most difficult to diagnose, as in many cases, they do not give the classic spectrum symptoms when using established bearing diagnostic techniques. That explains the poor performance of OC-SVM in detecting small-size ball defects since it uses traditional time-domain and frequency-domain characteristic features.
So, the ML algorithm underperforms in comparison with the AE reconstruction method. Of course, the latter needs to be further improved. Varying model architecture regarding types of layers and hyperparameter settings during the training could reduce the false alarm rate of normal samples to zero.
Despite the lower performance, the strength of OC-SVM is the capability to provide additional information, giving insights into the possible root causes of the faults. As shown in
Table 4, the SVM models are built on a reduced set of features, which differs for each analyzed anomaly class. That can help the anomaly diagnosis by identifying the features most sensitive to a specific type of fault. The real-time monitoring of their value ensures the detection and diagnosis of the fault, providing information about the defect’s location. Evidence is given in
Figure 16, where the inner race fault signature appears better described by peak amplitude. At the same time, kurtosis and RMS levels in the mid-frequency range characterize outer race and ball fault vibration, respectively.
5. Conclusions
The proposed research study aims to explore using two different unsupervised approaches for AD on a typical benchmark industrial dataset, i.e., the CWRU bearing fault dataset. The first approach is based on the manual extraction of typical vibration metrics to be provided as input to an ML model based on the OC-SVM algorithm. In contrast, the second DL approach automatically uses AEs to learn latent representation from raw data.
The evaluation metrics demonstrate that OC-SVM exhibits, in almost all cases, better performances than AEs in detecting bearing inner race and outer race defects, but it shows lower values of accuracy and recall for ball fault detection. Conversely, the approach based on AEs allows 100% recall for all experiments, presenting a 100% detection rate for all anomaly types, including the not easily diagnosable ball fault. Both approaches accurately distinguish high-severity faults from normal behavior, even though the ML algorithm presents a lower ability to rank fault severity properly.
So, the AE reconstruction method outperforms the ML algorithm. The main drawback of the DL method is the lack of interpretability of its internal operations, operating as a black box machine. On the other hand, depending on the anomaly class, selecting different feature sets allows identifying the most relevant features to detect a specific type of fault, improving the diagnosis results. That would make the process explainable to maintenance operators.
Future work may focus on improving the DL model to eliminate the false positive rate and studying methodologies to make the AE interpretable. The OC-SVM performances should also be improved to extend the number of extracted features, enabling the capture of all relevant and helpful information from the raw signals. Another development should make the feature selection process unrelated to the test set performance.