1. Introduction
Railways are the lifeline of countries, especially developing countries like Pakistan, and serve the public’s transportation needs, as well as being the backbone for trade and supply chains. Over the years, the railway market has grown stronger, offering greater prospects for the public and the country’s economy. As reported in [
1], an increase of 1.3% to 2.4% in the annual growth of the railway industry was observed between 2016 to 2018. However, for the uninterrupted operation of railway trains and the safety of travelers, high-performance railway operations need to be ensured. The public, including school children, busy commuters and tourists, travel on trains and their safety is compromised if railway tracks are not appropriate for regular operations. Similarly, cargo safety and reliability are pivotal for the supply chain and require fault-free and tolerant railway tracks. Since mechanical and physical wear and tear may occur over time, regular inspections are required to minimize the derailing of trains.
Pakistan is a country where a large number of people travel by train with an estimated 70 million doing so from 2018 to 2019 [
2]. However, several major accidents have taken place over the past few years with high human and financial loss. Such accidents occur due to human error and railway tracks wearing out. According to the annual reports by Pakistan Railways [
3], 127 accidents were registered between 2013 and 2020 due to trains derailing due to railway track faults. In 2013, a total of 22 trains, including 13 passenger trains and nine goods trains, were derailed. Another 16 passenger trains and 22 goods trains were derailed in 2014, the maximum for any state. In 2015, 37 passenger trains and 37 goods trains had accidents. According to a report on train derailing accidents for the years 2018–2019, on 16 September 2018, nine bogies were derailed near Attock and 22 passengers were badly injured [
4]. On 27 September 2018, near Peshawar, a freight train was derailed, overturning 11 bogies. On 9 June 2019, 23 bogies of a Karachi-bound freight train were derailed near Sukkur.
Railway tracks need proper and timely maintenance; if they fail, it can have a major impact on train operations [
5]. The identification of cracks is important for running the system efficiently. In order to reduce the negative impacts, the feasibility of a low-cost automated traditional cart System capable of monitoring the health of the railway track needs to be developed and assessed, in order to help with the efficient and accurate diagnosis and maintenance of tracks so as to avoid accidents. To minimize human error, automated fault detection of the railway track system is mandatory.
For the continuous operation of railway trains with a higher level of safety and reliability, railway track condition monitoring is of significant importance where railway tracks are continuously inspected to find and repair cracks. However, monitoring the hundreds of thousands of miles of railway track requires both a substantial amount of money and manpower. Even so, human inspection is prone to error and manual inspection is tedious and biased. For railway track inspection in Pakistan, currently, a railway cart is used where human experts manually inspect the track and find where the repairs are needed. Owing to the importance of track inspection, this study presents and contributes a smart automated cost-effective track conditions inspection method and makes the following contributions:
An automatic railway track inspection system is presented which can discriminate three types of track condition including wheel burnt, superelevation, and normal track. The intended system triggers an alarm if it detects a defect. Capabilities can be added such as fault location and integration with IoT for remote fault detection understanding, enabling hot spot identification and reasoning;
A dataset is collected spanning 1 year of collection, where an ECM-X7BMP microphone is used to collect the acoustic signal. The Mel-frequency cepstrum coefficients (MFCC) [
6] features from the acoustic signals are later used with different classifiers for the automatic detection of track faults. The scope of this work is confined to detecting railway track faults using acoustic analysis;
Two well-known machine learning classifiers, logistic regression (LR) and support vector machines (SVM), are used, as well as two deep learning approaches including multilayer perceptron (MLP) and convolutional neural network (CNN) for the automatic detection of railway track faults. The performance is evaluated using accuracy, precision, recall, and F1 score.
The rest of the paper is structured as follows. The background on the nature of different cracks in railway tracks and important research on identifying such faults are provided in
Section 2. The data collection procedure, apparatus used for the data collection, and proposed research methodology are presented in
Section 3.
Section 4 contains the results and discussions while the conclusion is given in
Section 5.
2. Background and Literature Review
Because manually driven approaches are insufficient to monitor the health of tracks systematically, robustly, regularly, and uniformly due to human error, the automatic detection and monitoring of tracks’ faults/cracks is very important. However, an understanding of common problems related to railway tracks is crucial. Major railway track faults can be categorized into wheel burn, ballast issues, superelevation, and loose nuts and bolts.
Figure 1a,b shows some examples of the cracks found on railway tracks in Pakistan. Such cracks appear due to several factors such as wear and tear due to the long use of the track without maintenance, overheating, displacement of supporting basement, and train overload, and so forth.
Wheel burn on a track appears either because of a jammed wheel as shown in
Figure 2a or a locomotive jumping due to an imbalanced ballast. Similarly, an expired or weak ballast issue is shown in
Figure 2b. A track ballast forms the trackbed upon which railroad sleepers (ties) are laid, packed below and around the ties bearing the load from the railroad sleepers to facilitate not only water drainage but also to dampen down vegetation that might interfere with the track’s structure.
Super Elevation arises when the outer rail of a track becomes higher than the desired elevation. The outer rail is normally set higher than the inner rail for a curved track. Most counties achieve the desired level of positive cant by raising the outside rail to a certain level, which is called superelevation.
Railway fish plates and fish bolts are a joint bar, a metal bar that is bolted to the ends of two rails to join them together in a track, and are the common rail connection parts. They are used to maintain the strength and stiffness of the joint for uniform elasticity. Most often, two railway fish plates are fixed on either side of the rail waist. Missing any one of the plates is usually due to missing nuts or bolts. This is also one of the main causes of train derailments. The superelevation problem is shown in
Figure 3a while
Figure 3b exhibits the nuts and bolts problem.
Acoustic analysis has the potential to distinguish and recognize sounds based on pitch, energy, sound entropy, and spectral analysis. The rapid growth and miniaturization of sensors and electronics equipment has made it ubiquitous and available on the market [
12]. Researchers have shown its capability to classify defects in machines by their acoustic signature. Hence, this study leverages acoustic signals to detect and classify faults in Pakistan’s railway tracks.
An obvious reason to perform an inspection of railway tracks is to avoid train accidents and save human lives. For this purpose, periodic and regular inspection of railway tracks is of paramount importance. Track defects and non-compliance, if not spotted early, ultimately lead to stark consequences such as train derailments and loss of life. However, a human inspection of hundreds of thousands of miles of track is cumbersome, time-consuming, laborious, and subject to human error. Consequently, different automated approaches have been proposed to ease human effort and increase efficiency.
Track detection based on computer vision is becoming more popular among researchers. The use of drones instead of a moving wagon may provide even more cost-effective track monitoring. For example, the authors of [
13] proposed a novel approach to computing gauge measurement using drone imagery and the health of the track was determined by applying computer vision techniques to the drone data. Da-Jiang Innovations (DJI) Phantom 3, professionally equipped with 4k camera and Sony sensors, is used for data acquisition. Images are taken at 29.76768000 and 78.01000000 Latitude and longitude respectively over a predefined path and images are automatically captured at continuous intervals. The images are converted into hue, saturation, value (HSV) color space to minimize the illumination effects caused by different weather conditions, followed by a Gaussian smoothing filter to reduce the noise. It is observed that the railway tracks have a purple/pinkish hue, so to obtain fine object (track) detection, all hues between the range of cyan and magenta are extracted by applying multiple threshold masks. Morphological operations are used to remove all connected pixels below a threshold value and subsequently, a Canny edge detector is applied for obtaining accurate results.
Railway track monitoring data are available but not all anomalies can be analyzed using image processing, for example, vegetation overgrowth and sun kinks are rare and difficult to find. These can be generated manually using tools like paint, but it can be a very cumbersome, labor-intensive process. Hence, if synthetic data can be generated for the anomalies mentioned above, it can ease the training process and reduce the problem of over-fitting. The study [
14] performs a computer vision-oriented experiment using a camera that captures an image at 30 frames per second. It is mounted on a locomotive aiming for a consistent steady image for real-time railway track fault detection. The Inception V3 model is applied on the Image net dataset to fine-tune for a binary class classification. For vegetation overgrowth, the model generalizes well on actual vegetation images. A sun kink classifier can classify professionally simulated sun kink videos with a precision of 97.5%.
A visual-based track inspection system (VTIS) is attributed with a high speed, low cost, and attractive performance and is regarded as the most appealing track surface defect detection technique. Study [
15] presents a VTIS system using a multiphase deep learning-based rail surface anomaly detection and classification technique called TrackNet. The study adopts CNNs, such as ResNet and DenseNet, as the baseline techniques for performance comparison with the proposed TrackNet. Results indicate the proposed system performs better than the baseline approaches. Another vision-based approach for track inspection and fault detection is presented in [
16]. The input image is decomposed by a Gabor filter and texture features are extracted using segmentation-based fractal texture analysis (SFTA). The AdaBoost classifier is used to classify the track faults. Study [
17] worked on the detection of cracks and missing fasteners in railway lines using the structure topic model (STM). The study proposed an effective vision-based automatic rail inspection system to detect the presence or absence of sleepers or fasteners, by inspecting real-time images acquired by a digital camera installed under a diagnostic train.
The authors designed a prototype in [
18] that detects cracks using an Arduino mega powered by solar panels along with a LASER source. In addition, avalanche photodiodes (APD) and vibration sensors are used to detect cracks. A global positioning system (GPS) is also implemented to provide the exact location of the detected crack. The study [
19] proposed an automated fault detection system consisting of different sensor modules mounted on a moving robot. Faults addressed in this study are discontinuity, obstacles on the track, absence of nuts and bolts, and misalignment. The sensors layer includes an infrared (IR) sensor, a limit switch [
20] and ultrasonic sensors that are incorporated with an LPC 1768 ARM microcontroller. Upon the detection of any of the faults mentioned above, the localization along with fault type is sent to the control room by using the GSM module. Similarly, [
21] designed a model robust railway crack detection scheme (RRCDS) to address the faults on tracks using IR sensors that detect cracks on the railway track. Existing manual systems are inefficient at monitoring the near-surface cracks precisely and are inappropriate for use in tunnels. Thus, to maintain safety standards, an economical and lower power PRCDS is presented in [
22]. RRCDS is comprised of two IR sensors; an Arduino board coupled with Bluetooth is mounted at the front end of the inspection robot to monitor the track. This system automatically detects a faulty railway track without any human interference. It also tracks the GPS location of the track.
Early failure detection is critical for maintenance and to aid timely replacements to avoid accidents. [
23] proposed a system for the early detection and diagnosis of faults in railway points using acoustic analysis. Dataset collection was performed by an NS-AM type railway point machine equipped with audio sensors for data collection. Faults, such as ice obstruction, ballast obstruction, and slackened nuts, were analyzed in this study. Two different experiments were carried out, one for fault detection on the whole dataset and the other for fault classification. The model evaluation shows an accuracy of 94.1%.
The authors performed an investigation regarding the detection of multiple types of fastener damage in [
24]. A fastener is a place on the track where the track is fastened with the tie. An automated vision-based railway inspection system is proposed that uses SVM, AdaBoost, and likelihood algorithms for the detection of tracks and sleepers. Along the same lines, [
25] proposed a railway track derailment inspection system for the automated visual inspection of railroad tracks, which detects faults from prerecorded videos. To detect the fault, spectral estimation and signal processing methods are utilized. The scope of the work [
25] is confined to the localization of rail defects, ballast, tie and tie plate, and the localization of spikes, tie plate holes, and anchors.
Real-time rail track detection and adaptability is paramount to prevent human, goods, and assets losses. Yongzhi Min et al. [
26] devised a real-time visual portable machine vision inspection system for track defects. It is equipped with an acoustic emission sensor and a passive infrared (PIR) sensor for the detection of cracks. An acoustic emission sensor is used to detect cracks on a track and a PIR sensor is used to check for the presence of human or animal bodies on the track. Written in LabVIEW environment, the system accesses high-quality images from a light source environment by adding the hoods and LED auxiliary light source in the image acquisition equipment in the first stage. In the second stage, the H value of the color image is used directly to extract the original image, which can shorten the time of image preprocessing steps and it is good for a target area with a small range. Based on morphological processing, the interference of a large amount of redundant information is removed and the direction chain code is used to quickly extract the defect’s shape features to carry out the defect type identification. In order to ensure the adaptability of the system in a complex environment, the issues of adaptive dual threshold selection in edge detection, combined with the histogram concavity analysis, have been solved. In the rail area rapid locating method, the track defects are detected in real-time and the system has strict requirements for the detection time. If a crack is found on a track, its latitude and longitude coordinates are sent to the nearby base station.
The authors propose a mechanism to detect cracks on railway tracks in [
27]. The study points out that ultrasonic metal detecting sensors are capable of detecting cracks with higher accuracy. Encoders and RF transmitters are used for crack detection, where a continuous flow of the current between the encoders shows that tracks are properly maintained. As long as the current remains continuous, the transmitter will broadcast RF signals. On the train’s engine, a receiver circuitry with a decoder is used. The receiver is linked to the train’s microprocessor, which regulates its operation. If a crack in the track occurs the current flow between the encoders will no longer be continuous. This prevents the transmitter from transmitting RF signals resulting in no signal being received by the locomotive’s receiver, causing the microcontroller to halt the train. Studies [
16,
21,
22,
23] performed experiments using wireless sensor networks and Bluetooth technology. Several different sensors were used to identify cracks on railway tracks; however, applying sensors and devices incurs a deployment cost, which makes such systems costly. In addition, faulty sensors require the replacement of the sensors which adds an extra cost to the system. Moreover, the maintenance of such systems requires skilled staff.
4. Results and Discussion
For the performance evaluation of the classification models, standard parameters, such as accuracy, precision, recall, and F1 score, were used. Accuracy refers to the ratio of correctly predicted instances to the total predictions. Precision indicates the exactness of the classifier and considers the number of true positives (TP) to TP and false positives (FP). Recall, also known as sensitivity, takes into account TP and the summation of TP and false negatives (FN). Precision and recall alone can be misleading, so often the h1 score is used to indicate the performance of the models. The F1 score considers both precision and recall and provides a value between 0 and 1. The F1 score, often called the F-measure, is defined as the harmonic mean of precision and recall.
Mathematical equations for accuracy, precision, recall, and F1 scores are given here:
Experiments were performed using the selected models with MFCC features from the collected audio data with different ratios of train–test splits including 60:40, 70:30, 80:20, and 90:10 for train and test, respectively. The objective of using multiple train–test splits was to analyze the performance of the machine learning and deep learning models when the amount of training data was changed.
Table 3 shows the classification results obtained using a 60–40 train–test split. RF and DT achieved the highest accuracy of 0.97 each, followed by LR. The lowest accuracy was by MLP, which was 0.68, with a large difference in precision and recall.
Table 4 shows the performance evaluation metrics when 70% data were used for training. Results indicate that RF outperforms both machine learning and deep learning models in terms of accuracy, precision, recall, and F1 score. The 0.96 accuracy of DT is marginally lower than RF, followed by LR with a 0.94 accuracy. Other parameters for RF, DT, and LR are in the conformation of accuracy which shows a good fit of these models to the training data.
Using an 80–20 train-test split, the machine learning classifiers sustain their performance and there is no improvement in the classification accuracy, as shown in
Table 5. A marginal difference in the accuracy is observed in RF and DR while the accuracy of LR is improved from 0.74 to 0.76 when the training data are increased. On the other hand, SVM has a reduced accuracy of 0.77 compared to 0.79 with a 70–30 train–test split. In the end, results using a 90–10 train–test split are given in
Table 6, which indicate that LR, RF, and DT have accuracy scores of 0.97, 0.96, and 0.94, respectively, and are among the best performers. Precision, recall, and F1 scores of these classifiers are very similar to the accuracy, indicating good fits for these classifiers.
Traditionally, SVM works well with unstructured and semi-structured data and performs well with text data. Based on the geometrical properties, it shows a better performance with a small number of features for a small number of training samples. However, it is not the case here, as the data used for experiments are structured with a large number of features. Therefore, LR shows a better performance with an accuracy of 0.97 for detecting normal, superelevation, and wheel burnt railway tracks. The F1 score has a close resemblance to accuracy, which indicates that the model is a good fit.
For illustrating the good fit of the models and proving that the models are not overfitted on the data,
Table 7 is provided. It shows the results of all the models on the training data used for the experiments. So it provides the training accuracy for the different splits of data used for training.
Generally, deep learning architectures show a superior performance as compared to the machine learning models. Deep learning models are better at understanding the complex relationships found in the data and show good results. However, the performance of MLP and CNN is comparatively poor for the experiments conducted in this study. The primary reason is the number of samples used for training the deep learning models. MLP and CNNs show better results when trained on large datasets containing thousands of samples for each class. However, the dataset used in this study contains 720 samples in total which is not enough to get a good fit for deep learning models. Consequently, the performance of the deep learning [
55] models is poor. The architecture of the CNN is further optimized using different levels of CNN after evaluating each layer. For this purpose, CNN layers are presented in a stacking manner as shown in
Figure 11. After levels 3 and 4, there is no change in the accuracy, so the execution is stopped as further optimization is not possible. The results of the CNN at each level are provided in
Table 8.
Figure 12 presents the accuracy, precision, recall, and F1 score of all the classifiers used in the study. It indicates that CNN has large fluctuations in accuracy when the amount of training data is changed. Traditionally, too little training data leads to poor approximation and the model will underfit the small training dataset. Conversely, an under-constrained model will likely overfit the training data. For both underfit and overfit cases, the result is poor performance. Fluctuations in the performance of deep learning models are attributed to a smaller change in the training data size. For obtaining good results from deep learning models using supervised learning, further experiments are needed to estimate the amount of data required to approximate the underlying mapping function and the amount of test data needed to determine the performance. Results reported in
Table 3,
Table 4,
Table 5 and
Table 6 show the test accuracy. For performance evaluation, k-fold cross-validation is performed as well.
Cross-validation results given in
Table 9 indicate that DT and RF are the best performers for the task at hand with 0.96 accuracy each with a standard deviation of 0.02 and 0.04, respectively. MLP shows the worst performance among all the used classifiers with 0.66 accuracy. On average, machine learning classifiers perform better than deep learning models owing to the amount of data used for the experiments.