Prediction of PM2.5 Concentration Based on Deep Learning for High-Dimensional Time Series

Hu, Jie; Jia, Yuan; Jia, Zhen-Hong; He, Cong-Bing; Shi, Fei; Huang, Xiao-Hui

doi:10.3390/app14198745

Open AccessArticle

Prediction of PM_2.5 Concentration Based on Deep Learning for High-Dimensional Time Series

by

Jie Hu

^1,2

,

Yuan Jia

³,

Zhen-Hong Jia

^1,2,*,

Cong-Bing He

^1,2,

Fei Shi

^1,2

and

Xiao-Hui Huang

^1,2

¹

School of Computer Science and Technology, Xinjiang University, Urumqi 830046, China

²

Xinjiang Uygur Autonomous Region Signal Detection and Processing Key Laboratory, Xinjiang University, Urumqi 830046, China

³

School of Statistics, Renmin University of China, Beijing 100872, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(19), 8745; https://doi.org/10.3390/app14198745

Submission received: 13 August 2024 / Revised: 25 September 2024 / Accepted: 26 September 2024 / Published: 27 September 2024

(This article belongs to the Section Ecology Science and Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

PM_2.5 poses a serious threat to human life and health, so the accurate prediction of PM_2.5 concentration is essential for controlling air pollution. However, previous studies lacked the generalization ability to predict high-dimensional PM_2.5 concentration time series. Therefore, a new model for predicting PM_2.5 concentration was proposed to address this in this paper. Firstly, the linear rectification function with leakage (LeakyRelu) was used to replace the activation function in the Temporal Convolutional Network (TCN) to better capture the dependence of feature data over long distances. Next, the residual structure, dilated rate, and feature-matching convolution position of the TCN were adjusted to improve the performance of the improved TCN (LR-TCN) and reduce the amount of computation. Finally, a new prediction model (GRU-LR-TCN) was established, which adaptively integrated the prediction of the fused Gated Recurrent Unit (GRU) and LR-TCN based on the inverse ratio of root mean square error (RMSE) weighting. The experimental results show that, for monitoring station #1001, LR-TCN increased the RMSE, mean absolute error (MAE), and determination coefficient (R²) by 12.9%, 11.3%, and 3.8%, respectively, compared with baselines. Compared with LR-TCN, GRU-LR-TCN improved the index symmetric mean absolute percentage error (SMAPE) by 7.1%. In addition, by comparing the estimation results with other models on other air quality datasets, all the indicators have advantages, and it is further demonstrated that the GRU-LR-TCN model exhibits superior generalization across various datasets, proving to be more efficient and applicable in predicting urban PM_2.5 concentration. This can contribute to enhancing air quality and safeguarding public health.

Keywords:

fine particulate matter; temporal convolutional network; PM_2.5 prediction; deep learning; high-dimensional time series

1. Introduction

As urbanization in China accelerates rapidly, people’s production activities have caused an increase in the pollutant emission base, resulting in the haze phenomenon occurring from time to time in certain areas and cities across China; this issue has emerged as a significant challenge in both urban and regional air pollution across China in recent years. During hazy conditions, the particulate matter (PM) concentration rises sharply compared to clear weather, highlighting that elevated levels of particulate matter are a key factor contributing to haze formation [1].

Particles with a diameter of 2.5 μm or smaller are referred to as PM_2.5 [2]. These miniscule particles not only pose a significant threat to human life and well-being [3,4] but also induce various other detrimental impacts [5]. The Global Air Quality Database, published by the World Health Organization in 2018, reports that air pollution, both indoors and outdoors, is responsible for around 7 million deaths each year globally [6]. Moreover, the burden of air pollution on the global economy is about USD 225 billion per year [6]. Hence, it becomes imperative to accurately and reliably predict PM_2.5 concentration to evaluate air pollution severity effectively, thereby enabling measures to be implemented to mitigate PM_2.5 levels and consequently reduce the associated health risks. Traditional machine learning models such as Autoregressive Integrated Moving Average (ARIMA), Support Vector Machine (SVM), Markov model (HMM), and Random Forest (RF) have been used to predict PM_2.5 concentration [7,8,9,10,11]. However, due to the nonlinear relationship between the change in PM_2.5 concentration and external factors, these models cannot accurately predict PM_2.5 concentration.

Neural network-based machine learning models effectively predict PM_2.5 concentration [12]. Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN) have been used to learn complex nonlinear relationships in PM_2.5 concentration time series data to improve prediction accuracy [13,14]. However, RNNs face challenges such as the vanishing gradient and exploding gradient issues when dealing with long sequences.

To more effectively address the vanishing gradient issue and the long-term dependency challenge in RNN, Long Short-Term Memory neural network (LSTM) [15] and gated recurrent network (GRU) [16] have been proposed to solve this problem. Since then, Shi et al. have proposed a balanced social LSTM (BS-LSTM) for predicting PM_2.5 concentration in cities [17]. Bhimavarapu and Sreedevi introduced an enhanced loss function (ELF) to decrease the error and improve the accurate prediction of daily PM_2.5 concentration in India [18]. Zhang et al. proposed a model based on inverse convolution and LSTM so that the model can extract the spatial feature correlation of atmospheric pollutant concentration data to realize the accurate prediction of PM_2.5 concentration [19]. Yang et al. established a new hybrid model combining CNN, LSTM, and GRU for predicting PM_2.5 concentration in Seoul, South Korea [20]. In addition, Ding and Zhu developed an LSTM model that incorporates Principal Component Analysis (PCA) and an attention mechanism. This approach mitigated the correlation effects between indicators, simplified the model, and yielded improved prediction results in their experiments [21]. However, as the length of time series data increases, LSTM and GRU models may deteriorate into random guessing.

Compared to the LSTM network, the GRU network features a simpler architecture, reduced computational complexity, and quicker training [22]. Unlike LSTM and GRU, the Temporal Convolutional Network (TCN) offers the more comprehensive parallel processing of information, a simpler structure, fewer parameters, and is better suited for time series modeling [23]. The TCN has been shown to be more effective than LSTMs for time series prediction tasks [24].

Since PM_2.5 concentration is measured at specific monitoring stations, traditional methods may struggle to generalize effectively across different locations and environmental conditions. The advantage of hybrid deep learning methods lies in their ability to integrate multiple data sources and modeling techniques, capturing the complex relationships between meteorological factors, geographic variations, and pollutant dispersion patterns. This not only improves the accuracy and generalization capability of PM_2.5 predictions but also provides more reliable support for air quality management and public health protection.

Therefore, a combined model called GRU-LR-TCN is proposed to achieve more accurate PM_2.5 concentration prediction under complex time-varying conditions in cities in this paper. Based on the original TCN, an improved version of the TCN, known as LR-TCN, is introduced to strengthen the feature extraction ability of the TCN. The enhanced feature extraction of LR-TCN is then combined with the time series prediction capabilities of GRU. In contrast to other studies that aim to enhance accuracy by optimizing parameters or increasing model complexity, this model emphasizes leveraging the strengths of different models to reduce complexity, shortening training time, and boosting prediction accuracy.

Compared with previous work on PM_2.5 concentration prediction, the method proposed in this paper has the following contributions.

In this paper, LR-TCN is built upon the foundation of the TCN, and LR-TCN can predict future PM_2.5 concentrations.
Based on LR-TCN proposed in this paper, a combined prediction model is established by combining GRU and LR-TCN, and the outputs of the GRU prediction model and the LR-TCN prediction model are weighted and fused according to the inverse root mean square error ratio to realize the short-term prediction of PM_2.5 concentration.
Comparison experiments with other models reveal that the GRU-LR-TCN prediction model demonstrates better prediction performance and generalization ability, helping to improve air quality and protect public health.

The remainder of the paper is organized as follows: Section 2 cites some recent work using TCN-based prediction. A brief description of the TCN principle, the GRU principle, and the proposed method is given in Section 3. The dataset, hyperparameter settings, and experimental results are discussed in Section 4. Finally, the experiments and the proposed method are summarized with an outlook in Section 5.

2. Related Work

Many improvement methods have been proposed for prediction models based on the TCN. These methods can be roughly categorized into two main groups: (1) TCN-based combinatorial modeling methods and (2) TCN structure-based improvement methods. The former aims to apply new attention mechanisms and combinatorial networks on the TCN to improve the accuracy of prediction, while the latter is based on the network structure of the TCN and tries to improve the model’s performance by improving the activation function, weight initialization, and other aspects.

2.1. Combination Model

Shi et al. suggested extracting features with the TCN first and then combining them with Bi-GRU to achieve more accurate PM_2.5 concentration predictions [25]. Similarly, attention-based mechanisms, time-window strategies, or autoencoders have been further designed to capture the importance of different temporal stages and different feature states [26]. Therefore, Chen proposed a new attentional mechanism combined with the TCN for the accurate hour-by-hour prediction of PM_2.5 concentration [27]. Liu and Deng proposed an enhanced hybrid integrated deep learning model for the modal decomposition of PM_2.5 concentration data for parallel prediction fusion, which was able to accurately predict PM_2.5 concentration [28]. Yuan et al. integrated four basic models—simple-RNN, LSTM, GRU, and TCN—into a new hybrid deep learning (HDL) model for predicting PM_2.5 concentration in Changsha City [29]. For predicting PM_2.5 concentration across various areas within the city, Zhang et al. combined the correlation features between urban areas to train the TCN to improve the accuracy of PM_2.5 concentration prediction in the next hour [30]. Shi et al. integrated the LASSO regression algorithm, attention mechanism, and the TCN to predict indoor PM_2.5 concentrations. This approach combined LASSO regression, attention mechanisms, and the TCN for indoor PM_2.5 prediction [31].

2.2. Modified TCN

Zeng et al. proposed a two-channel TCN based on TCN(DD-TCN) for improving the accuracy of the regression prediction of mixed gas concentrations [32]. Li et al. adjusted the activation function and weight initialization of TCN(GL-TCN), and the model has a better fitting performance on high-dimensional time series datasets [33]. Ni et al. adjusted the activation function and residual structure of TCN(Gaussian-TCN), and the model outperformed the traditional recurrent network in terms of prediction accuracy [34]. Lei et al. proposed a multi-channel asymmetric structure prediction model based on the TCN for PM_2.5 concentration prediction in Fushun City, Liaoning Province [35].

The methods proposed in this paper differ from previous approaches in the following aspects.

In this paper, the TCN is improved in more aspects. The activation function, residual structure, expansion rate, and feature-matching convolution position of the TCN are adjusted to make the LR-TCN model perform better.
The proposed LR-TCN is combined with other models to enhance the generalization ability of the model.
Adaptive weights are used, which can be adapted to different datasets.
Better generalization capability makes the model robust.
Model complexity is reduced and less training time is required.

3. Materials and Methods

In this section, the fundamentals of the TCN with LeakyRelu (L-TCN) are first briefly described. Then, the improved TCN model (LR-TCN) is highlighted. Finally, the combined GRU-LR-TCN model is introduced. The method flow of this paper is shown in Figure 1.

3.1. TCN with LeakyRelu (L-TCN)

The linear rectification function with leakage (LeakyReLu) [36] is more effective for time series prediction tasks than the linear rectification function (ReLu). To enhance the learning of long-term dependencies in time series data, LeakyReLU can be used in place of ReLU as the activation function in the TCN. Its structure is shown in Figure 2, which will be abbreviated as L-TCN in this paper. When the input value x is negative, the gradient of LeakyReLu is a constant

λ \in (0, 1)

instead of 0, allowing the neuron to update the weight on the negative input. LeakyReLu and ReLu are consistent when the input value is positive. The operational algorithms are the following Equations (1) and (2):

\tilde{x} = \max (0, W_{t} x + b) i f x > 0

(1)

\tilde{x} = \max (0, W_{t} λ x + b) i f x \leq 0

(2)

where

λ

is constant and

λ \in (0, 1)

.

In addition, opting for LeakyReLU over ReLU can help maintain model performance while minimizing the expansion rate. Usually, the dilated rate will increase exponentially with the number of dilated convolution layers. Because of the larger dilated rate, although the receptive field is increased, the amount of computation is also increased. Therefore, different from the general TCN dilated rate size selection, too large dilated rates are not used to reduce the amount of model computation in this paper, as shown in Figure 3. The dilated causal convolution algorithms are the following Equations (3)–(5):

Suppose that by given a one-dimensional time series

{X = x}_{0}, x_{1}, x_{2}, \dots, x_{t}

the corresponding output sequence is

{Y = y}_{0}, y_{1}, y_{2}, \dots, y_{t}

, the causal convolution operation on the time series is formulated as follows:

P_{(x_{t})} = \prod_{t = 0}^{T} P_{(x_{t} | x_{0}, x_{1}, x_{2}, \dots, x_{t - 1})}

(3)

where

P_{(x_{t})}

is the predicted probability and

T

is the total moment.

The dilated convolution D operation on the time series is formulated as follows:

D (T) = (X * f_{d}) (t) = \sum_{i = 0}^{k - 1} f (i) \cdot X_{t - 2} i \geq 1

(4)

D (T) = (X * f_{d}) (t) = \sum_{i = 0}^{k - 1} f (i) \cdot X_{t - 1} i = 0

(5)

where d denotes the expansion factor, k is the filter size,

f (i)

is the ith element in the convolution kernel, and

*

denotes the convolution operation.

3.2. Improved TCN (LR-TCN)

Studies have concluded that residual structures with more than two layers are usually needed to maintain stability as the network becomes deeper and larger, with more obvious advantages [37]. Since PM_2.5 concentration time series data in cities have more nonlinear and dynamic features and require deeper residual structures to be sufficient for extracting complex features, the residual structure used in the TCN architecture is not adopted in this paper. The new residual structure proposed in this paper contains two L-TCNs, as shown in Figure 4. At the same time, the feature-matching convolution of the residual module in the TCN is moved before the first dilated convolution to match the hidden features of the dilated convolution, which will be referred to as LR-TCN in this paper. Because the application times of feature-matching convolution are reduced, the training time of the model is shortened.

3.3. Integrated Model (GRU-LR-TCN)

TCN and LSTM are two different types of neural networks used to process time series data. Since they have their characteristics and advantages, better prediction results can be obtained by combining them through adaptive weighted fusion to construct an integrated prediction model [38]. Inspired by the TCN-LSTM integrated prediction model, since LR-TCN is better than the TCN in capturing local and global patterns in the time series, GRU is superior to LSTM in handling long-term dependencies. For this reason, the GRU-LR-TCN integrated prediction model was proposed in this paper, which fuses the prediction results of the GRU prediction model and the LR-TCN prediction model and combines the characteristics and advantages of the two models to enhance the prediction accuracy of the model. The GRU-LR-TCN integrated prediction model is shown in Figure 5.

The GRU-LR-TCN integrated prediction model used the root mean square error of the validation sets of the two models for the weight calculation of the adaptive weighted fusion. If the RMSE of the model is smaller, its weight in the combined prediction model is greater. Assuming that both models predict the nth time step concurrently, the weighting algorithms are the following Equations (6)–(8):

W_{n}^{T} = \frac{\frac{1}{S_{n}^{T}}}{\frac{1}{S_{n}^{T}} + \frac{1}{S_{n}^{G}}}

(6)

W_{n}^{G} = \frac{\frac{1}{S_{n}^{G}}}{\frac{1}{S_{n}^{T}} + \frac{1}{S_{n}^{G}}}

(7)

y_{n} = W_{n}^{T} \otimes y_{n}^{T} + W_{n}^{G} \otimes y_{n}^{G}

(8)

where

S_{n}^{T}

is the LR-TCN validation set RMSE metric,

S_{n}^{G}

is the GRU validation set RMSE metric,

W_{n}^{T}

is the LR-TCN prediction result weight,

W_{n}^{G}

is the GRU prediction result weight,

y_{n}

is the fusion prediction result, and the operator ‘

\otimes

’ denotes the sequential multiplication of array elements.

4. Experimental Results and Discussion

4.1. Data Description and Setup

The air quality dataset from the Urban Air project of the Urban Computing team at Microsoft Research is used in this paper [39,40,41]. The dataset contains urban data, regional data, air monitoring station data, air quality data, meteorological data, and weather predict data for 43 cities, including Beijing, for the time from 1 May 2014 to 30 April 2015. All data are geographically aligned with latitude and longitude, with air quality data recorded every hour and meteorological data recorded every three hours.

In this paper, air quality and meteorological data from this dataset are chosen as experimental datasets and preprocessed. The specific preprocessing is as follows:

(1): For air quality data, there are missing values in the air quality data for some time periods at each monitoring station, and at the same moment, certain gas concentration data may also be missing. To prevent information leakage and minimize its impact on the experimental results, missing values in the air quality data are filled with the gas concentration data from the preceding time point. For time periods with missing data, the data from the preceding time step of that period are used for supplementation, resulting in a complete set of air quality data comprising 8760 time steps.
(2): Because meteorological data are recorded every three hours and cannot correspond with the gas concentration data at each moment, the meteorological data for each moment are applied to the data for the following two hours. This is restructured to include 8760 time steps of meteorological data.

In this paper, the dataset of monitoring station 1001 is selected for ablation and comparison experiments of the model, and the generalization of the model is verified on the dataset of monitoring station 1002, monitoring station 1003, and monitoring station 1023. In this paper, sensor data samples were normalized using min–max normalization to a range of [0, 1]. The dataset was split into training, validation, and test sets with a ratio of 8:1:1.

4.2. Multiple Linear Regression and Collinearity Analysis

For the given dataset, estimating linear relationships might obscure the importance of variables, resulting in larger parameter estimation errors, which could have adverse effects on the estimated accuracy of the model. Therefore, evaluating the linear relationships among input variables is necessary. In this paper, Pearson’s correlation coefficient is employed as a key metric for correlation analysis [42]. Pearson’s correlations between PM_2.5-related features at monitoring station 1001 are shown in Figure 6. On the other hand, there is a certain degree of correlation between O₃ concentration, temperature, wind speed and wind direction, and PM_2.5 concentration in the city [43,44]. Therefore, all 12 variables were used as input variables in this paper.

4.3. Evaluation Metrics

The output variable of the model is the PM_2.5 concentration in the future. Therefore, this paper selects seven representative evaluation metrics to assess the model’s performance, including root mean square error (RMSE), Mean Squared Error (MSE), mean absolute error (MAE), symmetric mean absolute percentage error (SMAPE), Normalized Absolute Error (NAE), coefficient of determination (R²), and the Index of Agreement (IA). Seven representative evaluation metrics were selected to assess the model’s performance, which include root mean square error (RMSE), mean square error (MSE), mean absolute error (MAE), symmetric mean absolute percentage error (SMAPE), standardized absolute error (NAE), coefficient of determination (R²), and integrated assessment indicator (IA). The seven metrics are formulated as the following Equations (9)–(15).

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(9)

M S E = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}

(10)

M A E = \frac{1}{n} \sum_{i = 1}^{n} |{\hat{y}}_{i} - y_{i}|

(11)

S M A P E = \frac{100 %}{n} \sum_{i = 1}^{n} \frac{|{\hat{y}}_{i} - y_{i}|}{(|{\hat{y}}_{i}| + |y_{i}|) / 2}

(12)

N A E = \frac{1}{n} \sum_{i = 1}^{n} \frac{|{\hat{y}}_{i} - y_{i}|}{m a x (|{\hat{y}}_{i}|, |y_{i}| + ε)}

(13)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} |{\bar{y}}_{i} - y_{i}|}

(14)

I A = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} (|{\bar{y}}_{i} - y_{i}| + {|{\hat{y}}_{i} - y_{i}|}^{2}}

(15)

where

y_{i}

is the true value,

{\hat{y}}_{i}

is the predicted value,

{\bar{y}}_{i}

is the mean of the true values, and ε = 1 × 10⁻⁸ .

Among the seven evaluation metrics, smaller values for RMSE, MSE, MAE, SMAPE, and NAE indicate better estimated the performance of the model, while smaller values of R² and IA indicate worse estimated performance of the model.

In addition, the training time required for different models on the same data are used to assess model complexity and highlight the advantages of the proposed model.

4.4. Model Parameter Selection

In order to improve the ability of the model to solve problems, the link of model reference fitting is very important. The parameter settings directly affect the performance of the model. For this reason, this paper used randomized multi-parameter combinations to select the best parameter combinations after many experiments; the specific parameter settings are shown in Table 1.

Table 1. Parameter settings of each model.

	Learning Rate	Epochs	Optimizer	Filter Size	Dilation Factor	Levels	Dropout	Loss Function	Batch Size
LSTM [15]	0.0001	100	Adam	None	None	2	None	MSELoss	128
GRU [16]	0.0001	100	Adam	None	None	2	None	MSELoss	128
SRU [45]	0.0001	100	Adam	None	None	2	None	MSELoss	128
TCN [23]	0.0001	100	Adam	2	[1,2,4,8]	4	0.2	MSELoss	128
Gaussian-TCN [34]	0.0001	100	Adam	2	[1,2,4,8]	4	0.2	MSELoss	128
GL-TCN [33]	0.0001	100	Adam	2	[1,2,4,8]	4	0.2	MSELoss	128
DD-TCN [32]	0.0001	100	Adam	2	[1,2,4,8]	4	0.2	MSELoss	128
D-TCN [46]	0.0001	100	Adam	2	[1,2,4,8]	4	0.2	MSELoss	128
ST-TCN [44]	0.0001	100	Adam	4	[1,2,4,8]	4	0.2	MSELoss	128
DMSnet [47]	0.0001	100	Adam	4	[1,2,4,8,16]	5	0.2	MSELoss	128
LR-TCN	0.0001	100	Adam	2	[1,2,2,2]	2	0.2	MSELoss	128

In Table 1, given the need for the same number of dilated causal convolutional layers, each LR-TCN contains two L-TCN, so the LR-TCN model is set to two layers. Usually, the dilated rate increases exponentially with the number of dilated convolutional layers, and model computation becomes larger. To achieve the effect of reducing model computation, the dilated rate of the four L-TCN is [1,2,2,2] in turn. In addition, according to the characteristics of the dataset, 24 time steps were chosen to estimate one time step.

All algorithms were implemented in Python V3.8 using the integrated development environment PyCharm 2021.2.3 (Community Edition). The programs were executed on a Windows 11 (x64) operating system with a 12th Gen Intel(R) Core (TM) i7-12700H CPU (Intel, Santa Clara, CA, USA), NVIDIA GTX 3060 GPU (NVIDIA, Santa Clara, CA, USA), and 16 GB of RAM, utilizing the CUDA-enabled version of PyTorch (V1.13.1) as the primary computational framework.

4.5. Ablation Experiment

In this paper, the structure of the TCN is adjusted by the activation function, residual structure, dilation rate, and feature-matching convolution position to improve the performance of the neural network. At the same time, to prove that each of these optimizations has its contribution to the model and prove that each of these optimizations has its importance, an ablation experiment is carried out using the monitoring station 1001 dataset, and the experimental results are shown in Table 2. In Table 2, CNN-TCN denotes the application of feature-matching convolutional layers first and [1,2,4,8] indicates the dilated rate used by the model sequentially. The absence of [1,2,4,8] indicates that the model employs dilation rates of [1,2,2,2]. ‘Re’ indicates that the activation function used by the model is ReLu, and ‘Lr’ indicates that the activation function used by the model is LeakyReLu. All eight networks in the experiment, except the ‘original TCN’, used the new residual structure proposed in this paper.

As shown in Table 2, in the nine ablation experiments, all the metrics of the ‘original TCN’ are the worst, and all metrics are improved after adjusting the residual structure. The adjustment of the activation function and the dilated rate improves each indicator, and the positional adjustment of the feature-matching convolution resulted in significant improvements in each metric. Therefore, the optimal improved TCN model can be obtained by combining all adjustment strategies.

4.6. LR-TCN Comparison Experiment

To thoroughly evaluate the performance of LR-TCN, eight comparison models were selected to experimentally analyze and compare their estimation results across seven metrics, highlighting the performance advantages of LR-TCN. When selecting comparison models, considering the characteristics of the dataset in this paper, preference was given to models designed for time series data. The models include LSTM [15], GRU [16], SRU [44], TCN [23], Gaussian-TCN [34], GL-TCN [33], DD-TCN [32], D-TCN [45], and LR-TCN, shown in Table 3. From Table 3, it can be seen that LR-TCN performs the best on seven metrics except for SMAPE. However, compared to some other models, LR-TCN still has an advantage in terms of SMAPE, and it requires the least training time compared to other improved TCN models. Compared with TCN, the indicator RMSE has increased by 12.9%, the metric MAE has increased by 11.3%, and the metric R² has increased by 3.8%. The seven metrics of the GRU model outperform the LSTM and TCN models. Compared to the TCN, the RMSE metric is reduced by 8.9%, the MAE metric is reduced by 18.3%, and the SMAPE metric is reduced by 11.6%. The R² value of the GRU model is 0.970, ranking second among all models, just behind LR-TCN, indicating a high goodness of fit.

4.7. Integrated Model Ablation Experiment

To better evaluate the performance of GRU-LR-TCN, the estimation results of different improved TCNs combined with different gated recurrent network models are experimentally analyzed and compared in seven metrics and time as a way to prove that GRU-LR-TCN has excellent estimation performance, and the experimental results are shown in Table 4. From Table 4, it can be seen that in the monitoring station 1001 dataset, GRU-LR-TCN outperforms all other combination models in seven metrics. In addition, compared with LR-TCN, the seven metrics are improved, and the SMAPE index has increased by 7.1%.

4.8. Generality Experiment

To better evaluate the generality of LR-TCN and GRU-LR-TCN, the datasets of monitoring station 1002, monitoring station 1003, and monitoring station 1023 were selected for experiments. The estimation performance of different improved TCNs, different gated networks, and the proposed integrated model is experimentally analyzed and compared with seven metrics and time. To make the conclusions more convincing, comparisons were also made with some integrated algorithms Spatiotemporal causal convolutional network (ST-TCN) [46], dual memory scale network (DMSnet) [47], and the experimental results are shown in Table 5. As shown in Table 5, on the same dataset, the estimation performance of LR-TCN outperforms the other improved TCN models in most metrics, while GRU outperforms the other models based on the gating mechanism in most metrics. The integrated GRU-LR-TCN further improves the performance of GRU and LR-TCN models. On different datasets, combining all the metrics, LR-TCN can be applied more generally than other improved TCN models. However, on datasets with higher complexity and severe data loss, its estimation performance is not satisfactory. This requires a detailed analysis of the spatial relationships between monitoring stations and the specific conditions during data collection. Combining datasets from correlated monitoring stations for model training can improve the accuracy of the evaluation results. Compared with other gating-based models, GRU has better generalization ability, while GRU-LR-TCN effectively compensates for the limitations of LR-TCN, combining the advantages of the two models to achieve better generalization. In addition, the experimental comparisons of GRU-LR-TCN over other combined models on different datasets all demonstrate better estimation performance and generalization ability of the proposed model. For station 1002, the RMSE of GRU-LR-TCN improved by 1.3% and 4% compared to GRU and LR-TCN, respectively. For station 1003, the RMSE of GRU-LR-TCN improved by 1% compared to GRU. For station 1023, the RMSE of GRU-LR-TCN improved by 1.7% compared to LR-TCN.

4.9. Estimating Results

Figure 7 shows the estimation results of PM_2.5 concentration for the test sets of monitoring stations 1001, 1002, 1003, and 1023 using GRU, LR-TCN, and GRU-LR-TCN. The subplots display the zoomed-in estimation results from the 20th to the 30th hour for each monitoring station. In Figure 7, the red solid line represents the actual measurements, the yellow dashed line represents the GRU estimation, the black dashed line represents the LR-TCN estimation, and the blue dashed line represents the GRU-LR-TCN estimation. From Figure 7, it can be observed that the GRU-LR-TCN estimation is closer to the actual measurements, indicating better estimation performance. However, GRU, LR-TCN, and GRU-LR-TCN all show inaccuracies in estimating both high and low local PM_2.5 concentrations, which may be due to excessive missing values in the dataset. This suggests that the data preprocessing methods need to be further improved. In the future, machine learning-based missing value imputation techniques can be considered to further enhance data quality. It is also necessary to consider the spatial relationships between monitoring stations and the spatiotemporal characteristics of the dataset to improve the model’s feature extraction capabilities during training, thereby enhancing the accuracy of the evaluation results. It is also necessary to consider the spatial relationship between monitoring stations and the spatiotemporal characteristics of the dataset to enhance the performance of the model. Additionally, the estimation results across different monitoring stations indicate that GRU-LR-TCN generally outperforms GRU and LR-TCN in terms of estimation performance and spatial generalization.

5. Conclusions

PM_2.5 concentration is an important indicator for environmental evaluation and occupies an important position in the field of air pollutant monitoring. In this paper, the relationship between meteorological characteristics and the concentration of six characteristics, including PM_2.5, and the influence of nonlinearity and dynamics of time series on PM_2.5 concentration prediction are fully considered, and an integrated prediction model based on GRU and LR-TCN is proposed for PM_2.5 concentration prediction. In this paper, the air quality data and meteorological data of monitoring station 1001, monitoring station 1002, monitoring station 1003, and monitoring station 1023 in Beijing are selected for experiments to compare the estimation performance of LR-TCN with traditional models, single models, and integrated models and to verify the estimation performance and universality of GRU-LR-TCN. Compared with the TCN, LR-TCN is proposed in this paper. Firstly, the linear rectification function with leakage was used to replace the activation function in the TCN, which helped LR-TCN to learn long-term dependencies in the time series data. Then, by adjusting the residual structure of the TCN with feature-matching convolutional positions, optimizing the expansion rate of the TCN helps to stabilize the LR-TCN model and reduce model training time. The experimental results on the dataset of monitoring station 1001 in Beijing showed that LR-TCN can effectively improve the estimation accuracy of PM_2.5 concentration and shorten the model training time at the same time. Finally, the GRU-LR-TCN model proposed in this paper based on LR-TCN combines the estimation results of the GRU model and the LR-TCN model in a weighted manner and uses the inverse root mean square error to correct the time series data with the large error of a single model to reduce the error of a single model. The experimental results show that on the dataset of monitoring station 1001, LR-TCN improved the RMSE, mean absolute error (MAE), and determination coefficient (R²) by 12.9%, 11.3%, and 3.8%, respectively, compared with the baseline model. Compared to LR-TCN, GRU-LR-TCN improved the symmetric mean absolute percentage error (SMAPE) by 7.1%. Meanwhile, datasets from Beijing monitoring stations 1002, 1003, and 1023 were selected to test the generalization ability of the model. The experimental results show that the GRU-LR-TCN integrated model has a better generalization ability than both the LR-TCN and GRU models and outperforms some integrated models in terms of performance. The estimation performance of GRU-LR-TCN is not satisfactory on datasets with high complexity and serious data loss. This requires the use of machine learning-based missing value filling techniques to further improve the data’s quality, as well as a detailed analysis of the spatial relationship between monitoring stations and the specific situation at the time of data collection to extract the spatiotemporal correlation features between monitoring stations. The model can be extended to other cities for PM_2.5 concentration prediction, but the model parameters need to be tuned to obtain optimal results. The spatiotemporal correlation between monitoring stations in specific application cities also needs to be considered to improve the model appropriately.

Author Contributions

Conceptualization, J.H. and Y.J.; methodology, J.H.; software, Z.-H.J.; validation, J.H., C.-B.H. and Y.J.; formal analysis, Z.-H.J.; investigation, F.S.; resources, Z.-H.J.; data curation, X.-H.H.; writing—original draft preparation, J.H.; writing—review and editing, Z.-H.J.; visualization, Y.J.; supervision, F.S.; project administration, X.-H.H.; funding acquisition, Z.-H.J. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Key R&D Program Projects in Xinjiang Autonomous Region (No. 2022B01010-3) and Tianshan Talent Training Project-Xinjiang Science and Technology Innovation Team Program (No. 2023TSYCTD).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset used is in the public domain. The code can be requested from the authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Gan, T.; Liang, W.; Yang, H.; Liao, X. The effect of economic development on haze pollution (PM_2.5) based on a spatial perspective: Urbanization as a mediating variable. J. Clean. Prod. 2020, 266, 121880. [Google Scholar] [CrossRef]
Wang, C.; Tu, Y.; Yu, Z.; Lu, R. PM_2.5 and cardiovascular diseases in the elderly: An overview. Int. J. Environ. Res. Public Health 2015, 12, 8187–8197. [Google Scholar] [CrossRef] [PubMed]
Wang, L.; Bao, S.; Liu, X.; Wang, F.; Zhang, J.; Dang, P.; Huang, W.; Li, B.; Lin, Y. Low-dose exposure to black carbon significantly increase lung injury of cadmium by promoting cellular apoptosis. Ecotoxicol. Environ. Saf. 2021, 224, 112703. [Google Scholar] [CrossRef] [PubMed]
Kranc, H.; Novack, V.; Shtein, A.; Sonkin, R.; Jaffe, E.; Novack, L. Ambient air pollution and out-of-hospital cardiac arrest. Israel nation wide assessment. Atmos. Environ. 2021, 261, 118567. [Google Scholar] [CrossRef]
Hao, Y.; Peng, H.; Temulun, T.; Liu, L.Q.; Mao, J.; Lu, Z.N.; Chen, H. How harmful is air pollution to economic development? New evidence from PM_2.5 concentrations of Chinese cities. J. Clean. Prod. 2018, 172, 743–757. [Google Scholar] [CrossRef]
AirVisual, IQAir. 2018. Available online: https://www.airvisual.com/worldmost-polluted-cities/world-air-quality-report-2018-en.pdf (accessed on 3 April 2021).
Zhang, L.; Lin, J.; Qiu, R.; Hu, X.; Zhang, H.; Chen, Q.; Tan, H.; Lin, D.; Wang, J. Trend analysis and forecast of PM_2.5 in Fuzhou, China using the ARIMA model. Ecol. Indic. 2018, 95, 702–710. [Google Scholar] [CrossRef]
Yan, X.; Enhua, X. ARIMA and Multiple Regression Additive Models for PM_2.5 Based on Linear Interpolation. In Proceedings of the 2020 International Conference on Big Data & Artificial Intelligence & Software Engineering (ICBASE), Bangkok, Thailand, 30 October–1 November 2020; pp. 266–269. [Google Scholar] [CrossRef]
Vong, C.M.; Ip, W.F.; Wong, P.K.; Yang, J.Y. Short-term prediction of air pollution in Macau using support vector machines. J. Control. Sci. Eng. 2012, 2012, 518032. [Google Scholar] [CrossRef]
Yang, W.; Deng, M.; Xu, F.; Wang, H. Prediction of hourly PM_2.5 using a space-time support vector regression model. Atmos. Environ. 2018, 181, 12–19. [Google Scholar] [CrossRef]
Laña, I.; Del Ser, J.; Padró, A.; Vélez, M.; Casanova-Mateo, C. The role of local urban traffic and meteorological conditions in air pollution: A data-based case study in Madrid, Spain. Atmos. Environ. 2016, 145, 424–438. [Google Scholar] [CrossRef]
Collado, J.; Pinzon, C. Air Pollution Prediction Using Machine Learning Algorithms: A Literature Review. In Proceedings of the 2022 V Congreso Internacional en Inteligencia Ambiental, Ingeniería de Software y Salud Electrónica y Móvil (AmITIC), San Jose, Costa Rica, 14–16 September 2022; pp. 1–6. [Google Scholar] [CrossRef]
Singh, K.P.; Gupta, S.; Kumar, A.; Shukla, S.P. Linear and nonlinear modeling approaches for urban air quality prediction. Sci. Total Environ. 2012, 426, 244–255. [Google Scholar] [CrossRef]
Samal, K.K.R.; Babu, K.S.; Das, S.K. Multi-directional temporal convolutional artificial neural network for PM_2.5 forecasting with missing values: A deep learning approach. Urban Clim. 2021, 36, 100800. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Dey, R.; Salem, F.M. Gate-variants of gated recurrent unit (GRU) neural networks. In Proceedings of the 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS), Boston, MA, USA, 6–9 August 2017; pp. 1597–1600. [Google Scholar] [CrossRef]
Shi, L.; Zhang, H.; Xu, X.; Han, M.; Zuo, P. A balanced social LSTM for PM_2.5 concentration prediction based on local spatiotemporal correlation. Chemosphere 2022, 291, 133124. [Google Scholar] [CrossRef] [PubMed]
Bhimavarapu, U.; Sreedevi, M. An enhanced loss function in deep learning model to predict PM_2.5 in India. Intell. Decis. Technol. 2023, 17, 363–376. [Google Scholar] [CrossRef]
Zhang, B.; Liu, Y.; Yong, R.; Zou, G.; Yang, R.; Pan, J.; Li, M. A spatial correlation prediction model of urban PM_2.5 concentration based on deconvolution and LSTM. Neurocomputing 2023, 544, 126280. [Google Scholar] [CrossRef]
Yang, G.; Lee, H.; Lee, G. A hybrid deep learning model to forecast particulate matter concentration levels in Seoul, South Korea. Atmosphere 2020, 11, 348. [Google Scholar] [CrossRef]
Ding, W.; Zhu, Y. Prediction of PM_2.5 concentration in NingxiaHui autonomous region based on PCA-Attention-LSTM. Atmosphere 2022, 13, 1444. [Google Scholar] [CrossRef]
Wang, B.; Kong, W.; Zhao, P. An air quality forecasting model based on improved convnet and RNN. Soft Comput. 2022, 25, 9209–9218. [Google Scholar] [CrossRef]
Bai, S.; Kolter, J.Z.; Koltun, V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv 2018, arXiv:1803.01271. [Google Scholar] [CrossRef]
Zhu, R.; Liao, W.; Wang, Y. Short-term prediction for wind power based on temporal convolutional network. Energy Rep. 2020, 6, 424–429. [Google Scholar] [CrossRef]
Shi, T.; Li, P.; Yang, W.; Qi, A.; Qiao, J. Application of TCN-biGRU neural network in PM_2.5 concentration prediction. Environ. Sci. Pollut. Res. 2023, 30, 119506–119517. [Google Scholar] [CrossRef] [PubMed]
Samal, K.K.R.; Babu, K.S.; Das, S.K. A neural network approach with iterative strategy for long-term PM_2.5 forecasting. In Proceedings of the 2021 IEEE 18th India Council International Conference (INDICON), Guwahati, India, 19–21 December 2021; pp. 1–6. [Google Scholar] [CrossRef]
Chen, J. Short-Term Prediction of PM_2.5 Concentration based on Self-Attention Mechanism Improved Temporal Convolution Network. In Proceedings of the 2023 International Seminar on Computer Science and Engineering Technology (SCSET), New York, NY, USA, 29–30 April 2023; pp. 528–534. [Google Scholar] [CrossRef]
Liu, H.; Deng, D.H. An enhanced hybrid ensemble deep learning approach for forecasting daily PM_2.5. J. Cent. South Univ. 2022, 29, 2074–2083. [Google Scholar] [CrossRef]
Yuan, P.; Mei, Y.; Zhong, Y.; Xia, Y.; Fang, L. A Hybrid Deep Learning Model for Predicting PM_2.5. In Proceedings of the 2022 7th International Conference on Intelligent Computing and Signal Processing (ICSP), Xi’an, China, 15–17 April 2022; pp. 274–278. [Google Scholar] [CrossRef]
Zhang, H.; Zhan, Y.; Li, J.; Chao, C.Y.; Liu, Q.; Wang, C.; Jia, S.; Ma, L.; Biswas, P. Using Kriging incorporated with wind direction to investigate ground-level PM_2.5 concentration. Sci. Total Environ. 2021, 751, 141813. [Google Scholar] [CrossRef] [PubMed]
Shi, T.; Yang, W.; Qi, A.; Li, P.; Qiao, J. LASSO and attention-TCN: A concurrent method for indoor particulate matter prediction. Appl. Intell. 2023, 53, 20076–20090. [Google Scholar] [CrossRef]
Zeng, L.; Xu, Y.; Ni, S.; Xu, M.; Jia, P. A mixed gas concentration regression prediction method for electronic nose based on two-channel TCN. Sens. Actuators B Chem. 2023, 382, 133528. [Google Scholar] [CrossRef]
Li, X.; Jiang, Q.; Ni, S.; Xu, Y.; Xu, M.; Jia, P. An electronic nose for CO concentration prediction based on GL-TCN. Sens. Actuators B Chem. 2023, 387, 133821. [Google Scholar] [CrossRef]
Ni, S.; Jia, P.; Xu, Y.; Zeng, L.; Li, X.; Xu, M. Prediction of CO concentration in different conditions based on Gaussian-TCN. Sens. Actuators B Chem. 2023, 376, 133010. [Google Scholar] [CrossRef]
Lei, F.; Zhang, X.; Yang, Y. PM_2.5 concentration prediction based on temporal convolutional network. In Proceedings of the International Conference on Cloud Computing, Performance Computing, and Deep Learning (CCPCDL 2022), Wuhan, China, 11–13 March 2022; pp. 472–479. [Google Scholar] [CrossRef]
Xu, J.; Li, Z.; Du, B.; Zhang, M.; Liu, J. Reluplex made more practical: Leaky ReLU. In Proceedings of the 2020 IEEE Symposium on Computers and communications (ISCC), Rennes, France, 7–10 July 2020; pp. 1–7. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Identity mappings in deep residual networks. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part IV 14. pp. 630–645. [Google Scholar]
Zuo, K. Integrated Forecasting Models Based on LSTM and TCN for Short-Term Electricity Load Forecasting. In Proceedings of the 2023 9th International Conference on Electrical Engineering, Control and Robotics (EECR), Wuhan, China, 24–26 February 2023; pp. 207–211. [Google Scholar] [CrossRef]
Zheng, Y.; Capra, L.; Wolfson, O.; Yang, H. Urban computing: Concepts, methodologies, and applications. ACM Trans. Intell. Syst. Technol. (TIST) 2014, 5, 1–55. [Google Scholar] [CrossRef]
Zheng, Y.; Liu, F.; Hsieh, H.P. U-air: When urban air quality inference meets big data. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, USA, 11–14 August 2013; pp. 1436–1444. [Google Scholar] [CrossRef]
Zheng, Y.; Yi, X.; Li, M.; Li, R.; Shan, Z.; Chang, E.; Li, T. Forecasting fine-grained air quality based on big data. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, Australia, 10–13 August 2015; pp. 2267–2276. [Google Scholar] [CrossRef]
Zheng, Q.; Tian, X.; Yu, Z.; Jin, B.; Jiang, N.; Ding, Y.; Yang, M.; Elhanashi, A.; Saponara, S.; Kpalma, K. Application of complete ensemble empirical mode decomposition based multi-stream informer (CEEMD-MsI) in PM_2.5 concentration long-term prediction. Expert Syst. Appl. 2024, 245, 123008. [Google Scholar] [CrossRef]
Shao, M.; Xu, X.; Lu, Y.; Dai, Q. Spatio-temporally differentiated impacts of temperature inversion on surface PM_2.5 in eastern China. Sci. Total Environ. 2023, 855, 158785. [Google Scholar] [CrossRef]
Zhang, L.; Na, J.; Zhu, J.; Shi, Z.; Zou, C.; Yang, L. Spatiotemporal causal convolutional network for forecasting hourly PM_2.5 concentrations in Beijing, China. Comput. Geosci. 2021, 155, 104869. [Google Scholar] [CrossRef]
Lei, T.; Zhang, Y.; Wang, S.I.; Dai, H.; Artzi, Y. Simple recurrent units for highly parallelizable recurrence. arXiv 2017, arXiv:1709.02755. [Google Scholar] [CrossRef]
Liu, C.; Zhang, L.; Yao, R.; Wu, C. Dual attention-based temporal convolutional network for fault prognosis under time-varying operating conditions. IEEE Trans. Instrum. Meas. 2021, 70, 1–10. [Google Scholar] [CrossRef]
Guo, Y.; Zhang, S.; Yang, J.; Yu, G.; Wang, Y. Dual memory scale network for multi-step time series forecasting in thermal environment of aquaculture facility: A case study of recirculating aquaculture water temperature. Expert Syst. Appl. 2022, 208, 118218. [Google Scholar] [CrossRef]

Figure 1. The flowchart diagram of the research design.

Figure 2. L-TCN structure.

Figure 3. Dilated causal convolution. (a) TCN dilated causal convolution structure; (b) L-TCN dilated causal convolution structure.

Figure 4. LR-TCN structure.

Figure 5. Flowchart of the GRU-LR-TCN integrated predicting model.

Figure 6. Pearson correlation between features related to PM_2.5 in monitoring station 1001.

Figure 7. Results of PM_2.5 concentration estimation for the next hour: (a) estimated results at monitoring station 1001; (b) estimated results at monitoring station 1002; (c) estimated results at monitoring station 1003; (d) estimated results at monitoring station 1023.

Table 2. Ablation experiments for regression estimation of PM_2.5 concentration in monitoring station 1001.

	RMSE	MSE	MAE	SMAPE	NAE	R²	IA
Orginal TCN[1,2,4,8,Re]	18.787	352.977	13.067	23.342	0.042	0.862	0.959
TCN[1,2,4,8,Re]	18.773	352.442	12.887	22.753	0.041	0.862	0.962
TCN[1,2,4,8,Lr]	17.686	312.809	11.401	20.356	0.036	0.878	0.966
TCN[Re]	17.929	321.462	11.917	21.541	0.038	0.874	0.966
TCN[Lr]	17.298	299.246	11.221	20.059	0.036	0.883	0.967
CNN-TCN[1,2,4,8,Re]	16.433	270.064	10.680	19.542	0.034	0.894	0.970
CNN-TCN[1,2,4,8,Lr]	16.529	273.223	10.695	19.596	0.034	0.893	0.970
CNN-TCN[Re]	16.504	272.414	10.762	20.174	0.034	0.893	0.971
CNN-TCN[Lr]	16.306	265.900	10.593	20.440	0.034	0.896	0.972

Table 3. Comparison experiments of regression estimation monolithic models for PM_2.5 concentration in monitoring station 1001.

	RMSE	MSE	MAE	SMAPE	NAE	R²	IA	Time
LSTM	18.719	350.411	11.941	26.959	0.036	0.863	0.963	21.46 s
GRU	17.090	292.075	10.678	20.628	0.034	0.886	0.970	24.10 s
SRU	17.223	296.632	10.543	20.720	0.033	0.884	0.969	23.29 s
TCN	18.787	352.977	13.067	23.342	0.042	0.862	0.959	64.66 s
Gaussian-TCN	17.592	309.499	11.427	20.874	0.036	0.879	0.969	80.38 s
GL-TCN	17.349	300.988	11.361	21.106	0.036	0.882	0.968	67.69 s
DD-TCN	17.621	310.518	11.222	19.906	0.036	0.879	0.969	108.34 s
D-TCN	17.675	312.411	11.2111	19.921	0.036	0.878	0.968	80.01 s
LR-TCN	16.306	265.900	10.593	20.440	0.034	0.896	0.972	55.62 s

Table 4. Comparison experiments of estimation integrated models for PM_2.5 concentration in monitoring station 1001.

	RMSE	MSE	MAE	SMAPE	NAE	R²	IA	Time
LSTM-TCN	17.951	322.265	11.771	21.864	0.037	0.874	0.964	97.09 s
GRU-TCN	16.862	284.340	10.561	19.192	0.033	0.889	0.969	91.47 s
SRU-TCN	17.364	301.542	11.475	21.171	0.036	0.882	0.967	90.09 s
LSTM-Gaussian-TCN	18.019	324.693	11.723	21.881	0.037	0.873	0.967	116.38 s
GRU-Gaussian-TCN	16.954	287.445	10.620	20.046	0.034	0.888	0.970	117.46 s
SRU-GaussianTCN	17.144	293.943	10.869	20.533	0.034	0.885	0.970	104.47 s
LSTM-GL-TCN	17.855	318.835	11.477	21.693	0.036	0.875	0.966	95.38 s
GRU-GL-TCN	17.038	290.294	10.724	20.514	0.034	0.887	0.969	95.49 s
SRU-GL-TCN	16.848	283.867	10.626	19.737	0.034	0.889	0.970	92.94 s
LSTM-DD-TCN	17.351	301.066	10.921	20.597	0.035	0.882	0.969	143.65 s
DDTCN-GRU	17.040	290.375	10.704	20.480	0.034	0.886	0.970	144.91 s
SRU-DD-TCN	17.119	293.062	10.719	19.613	0.034	0.885	0.970	133.45 s
LSTM-D-TCN	17.754	315.232	11.495	21.512	0.036	0.877	0.967	112.55 s
GRU-D-TCN	17.333	300.458	10.822	20.069	0.034	0.883	0.969	112.46 s
SRU-D-TCN	17.063	291.179	10.573	19.508	0.034	0.886	0.970	100.72 s
LSTM-LR-TCN	17.001	289.021	10.816	20.184	0.034	0.887	0.970	84.68 s
GRU-LR-TCN	16.261	264.444	10.138	18.978	0.032	0.897	0.972	78.54 s
SRU-LR-TCN	16.369	267.969	10.348	19.476	0.033	0.897	0.895	80.01 s

Table 5. Experiment on the generalization of regression estimation of PM_2.5 concentration.

Station	Network	RMSE	MSE	MAE	SMAPE	NAE	R²	IA	Time
	LSTM	17.100	292.438	11.141	29.090	0.049	0.874	0.966	25.19 s
	GRU	15.053	226.618	9.092	20.795	0.039	0.903	0.975	23.71 s
	SRU	15.236	232.158	9.184	21.066	0.041	0.900	0.974	24.49 s
	TCN	15.577	242.661	9.626	20.725	0.043	0.896	0.973	62.88 s
1002	Gaussian-TCN	15.370	236.251	9.149	20.088	0.039	0.898	0.973	90.53 s
	GL-TCN	16.172	261.561	9.955	20.791	0.042	0.888	0.972	68.26 s
	DD-TCN	15.752	248.132	9.488	20.085	0.042	0.893	0.973	119.98 s
	D-TCN	16.111	259.579	10.736	23.352	0.047	0.888	0.970	84.22 s
	ST-TCN	18.352	341.812	12.281	35.152	0.285	0.668	0.913	67.38 s
	DMSnet	16.622	280.291	11.371	28.682	0.210	0.824	0.957	17391 s
	LR-TCN	15.454	238.839	9.637	20.862	0.042	0.897	0.974	59.99 s
	GRU-LR-TCN	14.837	220.158	9.024	19.813	0.039	0.905	0.975	86.24 s
	LSTM	19.620	384.953	11.154	20.505	0.037	0.842	0.957	21.64 s
	GRU	18.934	358.513	10.625	17.873	0.035	0.853	0.961	22.77 s
	SRU	18.919	357.954	10.601	18.075	0.035	0.853	0.961	19.83 s
	TCN	18.822	354.279	10.820	17.985	0.036	0.855	0.959	70.24 s
1003	Gaussian-TCN	19.096	364.671	10.901	17.524	0.036	0.850	0.961	81.81 s
	GL-TCN	18.834	354.727	10.503	17.250	0.035	0.855	0.962	68.13 s
	DD-TCN	18.857	355.611	10.609	16.955	0.035	0.854	0.961	119.54 s
	D-TCN	19.049	362.886	11.132	17.706	0.037	0.851	0.960	77.66 s
	ST-TCN	19.643	390.855	18.281	20.117	0.233	0.721	0.902	65.41 s
	DMSnet	18.845	360.126	13.345	19.091	0.190	0.736	0.932	16974 s
	LR-TCN	18.695	349.530	10.556	16.162	0.035	0.857	0.962	57.34 s
	GRU-LR-TCN	18.746	351.422	10.530	16.417	0.035	0.856	0.962	85.42 s
	LSTM	18.692	349.421	11.534	34.034	0.042	0.867	0.965	22.04 s
	GRU	17.832	317.985	10.132	19.563	0.039	0.879	0.968	24.32 s
	SRU	17.830	317.910	10.531	19.585	0.041	0.879	0.968	21.09 s
	TCN	18.586	345.452	11.003	20.384	0.035	0.869	0.964	63.52 s
1023	Gaussian-TCN	18.607	346.228	11.003	23.537	0.037	0.868	0.966	88.23 s
	GL-TCN	18.317	335.532	11.343	19.693	0.043	0.872	0.967	69.91 s
	DD-TCN	18.067	326.416	10.741	20.532	0.041	0.876	0.968	119.86 s
	D-TCN	18.130	328.718	10.910	20.068	0.041	0.875	0.967	88.71 s
	ST-TCN	19.628	411.201	15.056	29.230	0.306	0.665	0.885	63.72 s
	DMSnet	18.662	358.742	12.469	26.936	0.194	0.801	0.943	16810 s
	LR-TCN	18.200	331.270	10.980	18.917	0.040	0.874	0.968	54.65 s
	GRU-LR-TCN	17.878	319.646	10.630	21.768	0.041	0.878	0.969	83.10 s

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hu, J.; Jia, Y.; Jia, Z.-H.; He, C.-B.; Shi, F.; Huang, X.-H. Prediction of PM_2.5 Concentration Based on Deep Learning for High-Dimensional Time Series. Appl. Sci. 2024, 14, 8745. https://doi.org/10.3390/app14198745

AMA Style

Hu J, Jia Y, Jia Z-H, He C-B, Shi F, Huang X-H. Prediction of PM_2.5 Concentration Based on Deep Learning for High-Dimensional Time Series. Applied Sciences. 2024; 14(19):8745. https://doi.org/10.3390/app14198745

Chicago/Turabian Style

Hu, Jie, Yuan Jia, Zhen-Hong Jia, Cong-Bing He, Fei Shi, and Xiao-Hui Huang. 2024. "Prediction of PM_2.5 Concentration Based on Deep Learning for High-Dimensional Time Series" Applied Sciences 14, no. 19: 8745. https://doi.org/10.3390/app14198745

APA Style

Hu, J., Jia, Y., Jia, Z.-H., He, C.-B., Shi, F., & Huang, X.-H. (2024). Prediction of PM_2.5 Concentration Based on Deep Learning for High-Dimensional Time Series. Applied Sciences, 14(19), 8745. https://doi.org/10.3390/app14198745

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Prediction of PM_2.5 Concentration Based on Deep Learning for High-Dimensional Time Series

Abstract

1. Introduction

2. Related Work

2.1. Combination Model

2.2. Modified TCN

3. Materials and Methods

3.1. TCN with LeakyRelu (L-TCN)

3.2. Improved TCN (LR-TCN)

3.3. Integrated Model (GRU-LR-TCN)

4. Experimental Results and Discussion

4.1. Data Description and Setup

4.2. Multiple Linear Regression and Collinearity Analysis

4.3. Evaluation Metrics

4.4. Model Parameter Selection

4.5. Ablation Experiment

4.6. LR-TCN Comparison Experiment

4.7. Integrated Model Ablation Experiment

4.8. Generality Experiment

4.9. Estimating Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI