Prediction of the Height of Water-Conducting Fissure Zone for Shallow-Buried Coal Seams Under Fully Mechanized Caving Conditions in Northern Shaanxi Province

Chen, Wei; Geng, Shujia; Chen, Xi; Li, Tao; Tsangaratos, Paraskevas; Ilia, Ioanna

doi:10.3390/w17030312

Open AccessArticle

Prediction of the Height of Water-Conducting Fissure Zone for Shallow-Buried Coal Seams Under Fully Mechanized Caving Conditions in Northern Shaanxi Province

by

Wei Chen

^1,*,

Shujia Geng

¹,

Xi Chen

²,

Tao Li

³,

Paraskevas Tsangaratos

⁴

and

Ioanna Ilia

⁴

¹

College of Geology and Environment, Xi’an University of Science and Technology, Xi’an 710054, China

²

Shandong Construction Reconnoissance Group Co., Ltd., Jinan 250031, China

³

School of Mining and Civil Engineering, Liupanshui Normal University, Liupanshui 553004, China

⁴

Laboratory of Engineering Geology and Hydrogeology, Department of Geological Sciences, School of Mining and Metallurgical Engineering, National Technical University of Athens, 15780 Zografou, Greece

^*

Author to whom correspondence should be addressed.

Water 2025, 17(3), 312; https://doi.org/10.3390/w17030312

Submission received: 7 November 2024 / Revised: 14 January 2025 / Accepted: 20 January 2025 / Published: 23 January 2025

Download

Browse Figures

Versions Notes

Abstract

:

Accurate prediction of the height of water-conducting fissure zone (HWCFZ) is an important issue in coal water control and a prerequisite for ensuring the safe production of coal mines. At present, the prediction model of HWCFZ has some issues such as poor prediction accuracy. Based on the widely collected measured data of the HWCFZ in different coal mines in northern Shaanxi Province, China, the HWCFZ in shallow-buried coal seams is categorized into two types, i.e., typical shallow-buried coal seams and near-shallow-buried seams, according to the different depths of burial and base-loading ratios. On the basis of summarizing the research results of the previous researchers, three factors, namely, mining thickness, coal seam depth, and working length, were selected, and the data of the height of the water-conducting fissure zone in the study area were analyzed by using a multivariate nonlinear regression method. Subsequently, each group of the data was randomly divided into training data and validation data with a ratio of 70:30. Then, the training data were used to build a neural network model (BP), random forest model (RF), a hybrid integration of particle swarm optimization and the support vector machine model (PSO-SVR), and a hybrid integration of genetic algorithm optimization and the support vector machine model (GA-SVR). Finally, the test samples were used to test the model accuracy and evaluate the generalization ability. Accordingly, the optimal prediction model for the typical shallow-buried area and near-shallow-buried area of Jurassic coal seams in northern Shaanxi was established. The results show that the HWCFZ for the typical shallow-buried coal seam is suitable to be determined by the multivariate nonlinear regression method, with an accuracy of 0.64; the HWCFZ for near-shallow-buried coal seams is suitable to be predicted by the two-factor PSO-SVR computational model of mining thickness and the burial depth, with a prediction accuracy of 0.84; and machine learning methods are more suitable for near-shallow-buried areas, dealing with small-scale data and discrete data.

Keywords:

shallow-buried coal seam; height of water-conducting fissure zone; multiple regression fitting; machine learning

1. Introduction

With the increase in energy demand and mining intensity, changes in the geological conditions of the coal seam roof and the development of mining fractures caused by coal mining activities are the direct causes of the destruction of aquifers and the fundamental cause of ecological degradation in mining areas. Coal seam mining will lead to the destruction of overlying strata, forming a water-conducting fracture zone composed of a fracture zone and collapse zone [1,2]. In the process of coal seam mining, once the water-conducting fracture zone is connected with the aquifer, the water of the overlying aquifer or the surface water will pass through the fissure, resulting in water damage accidents. With the cumulative effect of time, issues such as soil erosion, vegetation death, ground collapse, and other ecological environmental damage problems will also appear [3]. Accurate prediction of the HWCFZ is still an important issue in current research. The empirical formula proposed in the “Guidelines for the Retention of Coal Pillars in Buildings, Water Bodies, Railways, and Major Mine Shafts and Pressure Coal Mining”, hereinafter referred to as the “three-under standard”, only takes into account the factors of mining thickness and overlying rock layer hardness. This approach considers a single influencing factor and is insufficient for reflecting the combined effects of multiple influencing factors.

In recent years, machine learning methods such as decision trees (DTs), support vector machines (SVMs), random forest regression (RFR), artificial neural networks (ANNs), and multiple regression analysis (MNR) have gradually become mainstream approaches for predicting the development of HWCFZ, improving prediction accuracy to a certain extent. Dai et al. [4] combined multiple regression with the BP neural network model to propose a new integrated model, the multiple regression–BP neural network (MR-BPNN) model. The prediction accuracy and generalization ability of the three models were verified through recorded test samples. Comparative results indicate that, compared to existing empirical prediction methods, all three models exhibit good applicability in predicting the height of the water-conducting fractured zone in the mine. More importantly, the MR-BPNN integrated model combines the nonlinear mapping capability of neural networks with the empirical nature of multiple regression models, offering high precision, strong generalization, and practical applicability for predicting the height of water-conducting fractured zones in coal mines. Hou et al. [5] investigated the relationship between the water-conducting fractured zone and factors such as coal seam mining thickness, coal seam mining depth, hard rock proportion factor, and working length. They ultimately proposed a method for determining the development of HWCFZ using an optimized multi-nonlinear regression model based on the Entropy Weight Method (EWM-MNR). Guo et al. [6] proposed a support vector regression (SVR) based on mining depth and hard rock proportion to predict the development of HWCFZ. Gao et al. [7] utilized the Backpropagation Neural Network (BP-NN) to predict the HWCFZ. To address the issue of locally optimal results caused by the random generation of weights and thresholds in BP-NN, they employed global optimization on the coefficients, mining thickness, and working length. They used a Multi-Population Genetic Algorithm (MPGA) to find the optimal SVR parameters. The constructed MPGA-SVR model was compared with traditional empirical formulas and was found to have higher precision and stability. Wang and colleagues [8], in light of the traditional Backpropagation Neural Network (BPNN)’s tendency to be trapped in local optima, leading to poor prediction accuracy, proposed an improved BPNN based on Differential Evolution and Grey Wolf Optimizer (DEGWO), namely the DEGWO-BPNN model. Xu et al. [9] integrated the Extreme Gradient Boosting Machine with several commonly used intelligent algorithms, including genetic algorithms, particle swarm optimization, Jaya algorithm, and Sparrow Search Algorithm, to establish a combined optimization model for predicting the HWCFZ. Zhu and colleagues [10] employed a method combining the Improved Cuckoo Search (ICS) algorithm and Extreme Learning Machines (ELMs) to predict the development of HWCFZ. By analyzing the factors influencing the development of HWCFZ, they utilized the ICS algorithm to optimize two key parameters of the ELM model: the input weights (omega) and the biases of the hidden nodes. This led to the establishment of the ICS-ELM model for predicting the HWCFZ.

This study builds upon the research of predecessors, collecting a substantial amount of data on the HWCFZ in the northern Shaanxi of China. This study employs multiple nonlinear regression models, neural network models (BP), random forest models (RF), particle swarm optimization support vector machine models (PSO-SVR), and genetic algorithm-optimized support vector machine models (GA-SVR) to compare and analyze the development of HWCFZ obtained from different prediction models. By constructing an optimal model for predicting the HWCFZ, the prediction accuracy is improved, which could provide a reference for mines with similar geological–environmental conditions.

2. Preparation of the Data Set

Many measured data in northern Shaanxi region were collected. The results of this study showed that the key factors influencing the development of HWCFZ above the mined-out area include overburden structure, burial depth, working length, coal seam dip angle, mining thickness, mining method, and uniaxial compressive strength of the roof strata, among others. This study selected three main factors for the predictive model based on the availability and utility of data: coal seam mining thickness (X1), coal seam depth (X2), and working length (X3). Consequently, a data set of 186 comprehensive mining cases from the northern Shaanxi region was compiled, encompassing four data attribute dimensions: the development of HWCFZ, coal seam depth, mining thickness, and working length.

Coal seam mining thickness, as one of the main impact indicators, is a direct factor affecting the development of HWCFZ. It reflects the influence of the vertical height of underground excavation on the redistribution of stress in the roof rock mass after excavation, as well as deformation and the extent of fracturing. Moreover, in the empirical formula, it is the only impact parameter for predicting the height of water conduction. Practical experience has proven that with coal seam mining, the overlying key strata will gradually break. Within a certain range, the larger the mining thickness, the greater the extent of the plastic failure zone in the roof, leading to more severe damage to the overburden rock mass, and consequently, a corresponding increase in the HWCFZ. The relationship between mining thickness and the HWCFZ approximately follows a linear, fractional, or exponential growth pattern [11,12]. From Figure 1, it can be observed that the HWCFZ increases with an increase in mining thickness, but there is a significant degree of dispersion. This is manifested by a portion of the data showing a rapid increase in the HWCFZ as mining thickness increases, while another portion of the data indicates a slow increase in the HWCFZ with an increase in mining thickness.

The burial depth of the coal seam affects the original stress of the surrounding rock. After mining, under the effect of mine pressure, the overlying strata will move and break, thereby generating a water-conducting fractured zone. According to mine pressure theory, within a certain range, as coal seam depth increases, the vertical and lateral stresses of the surrounding rock at the working face continue to increase with mining proceeds, exacerbating the damage to the overburden rock mass, and consequently, the HWCFZ increases; however, when exceeding this range, due to the action of deep geostress, the fractures formed by mining are closed, and the development of HWCFZ decreases accordingly [7]. Additionally, as coal seam mining progresses deeper, mine pressure continuously increases. Under the interaction of stresses, fractures and connections form within the overlying rock strata, ultimately creating water-conducting channels. For those roof rocks that are not fully connected, the fractures between them will eventually link up under the action of stress, also forming water-conducting channels. Mine pressure is a primary source of movement and stress generation in the overlying rock strata, and its magnitude is closely related to coal seam depth; therefore, mining depth affects the development of HWCFZ. Figure 2 shows that the current relationship between mining thickness and coal seam depth is not obvious, but it can be roughly seen that when coal seam depth is less than 200 m, the HWCFZ increases with the increase in burial depth. When coal seam depth is between 270 m and 570 m, the HWCFZ reaches its maximum, and then begins to decrease, which is consistent with actual engineering experience.

The length of the working face is also one of the indicators affecting the HWCFZ, and it is also an important indicator of the degree of coal seam mining. According to previous related research, in the widely used trending longwall mining faces in China, before the coal seam is fully mined, working length has a significant impact on the development of HWCFZ. This is manifested in that as the advancing distance of working length increases, the overburden stress correspondingly increases, and with the increase in working length, the HWCFZ shows a stepwise or fractional function growth. However, when the coal seam reaches full mining and the length of working reaches the critical length under the geological mining conditions, the development of HWCFZ continues to increase, but the growth rate slows down relatively; at this point, the influence of working length on the HWCFZ becomes negligible. When the HWCFZ develops to the maximum, a typical arch shape will be formed through the correlation between the HWCFZ in coal mines and working length, as shown in Figure 3. Since the coal seam has reached full mining, there is no obvious correlation between the height and working length. Only a general trend can be seen: when the working length is small, the height is mostly very low, and as the working length increases, the height is distributed from high to low, which largely corresponds to the influence of the surrounding rock stress of the working face.

Based on the collected data of all HWCFZ and their correlation indicator distribution results, as shown in Figure 1, Figure 2 and Figure 3, there is a certain positive correlation between the development of HWCFZ and mining thickness, but the degree of dispersion is too large for direct prediction. It is necessary to partition parameters and divide the data with the same trend of HWCFZ for better prediction in the Jurassic coal seams of northern Shaanxi. From Figure 3, it is observed that the working length for most of the data is 200 m or 300 m, which prevents the use of the working length as a factor to divide different regions with varying patterns of development of HWCFZ. Therefore, it is best to categorize the typical coal characteristics of the western mining area based on burial depth. From Figure 2, it is known that when the burial depth is less than 200 m, the conduction height increases uniformly with depth. However, when the burial depth exceeds 200 m, the values of conduction height tend to be randomly distributed and show no regular pattern with depth. Thus, we use a burial depth of 200 m as a threshold to distinguish between medium–deep-buried coal seams and shallow-buried coal seams, roughly obtaining two trends in the change in conduction height.

Based on this, we only selected data with a burial depth of less than 200 m, totaling 130 groups, to conduct classified research on shallow-buried coal seams. This can further reduce the degree of data dispersion and increase the usability of the data, thereby grasping the impact of different storage conditions of shallow-buried coal seams on the development of mining-induced water-conducting fractures, in order to improve the accuracy of conduction height prediction. Firstly, as shown in Figure 4, the data with a burial depth of less than 200 m were divided into two groups of conduction height data: one with a burial depth of less than 150 m and the other with a burial depth between 150 and 200 m. These two groups of data each had certain clustering and trend characteristics. Therefore, the delineation of shallow-buried areas using 150 m burial depth as the boundary allowed for two different typical areas to have clear patterns and fluctuation differences in the development of HWCFZ. Ultimately, the shallow-buried areas in the Jurassic coal field of the western mining region were classified into typical shallow-buried areas (burial depth less than 150 m) and near-shallow-buried areas (burial depth between 150 and 200 m).

3. Methodology

3.1. Multivariate Nonlinear Regression Model

To effectively identify the interrelationships among these variables, this paper employs multiple regression analysis methods. These are commonly used and effective statistical calculation techniques for analyzing the relationship between coal seam indicators and the development of HWCFZ. They can aid in discovering laws among a multitude of data points [13]. Not only can it analyze the effects of single factors, but it can also examine the interrelated effects among various factors. Typically, the relationships between these factors are nonlinear. Therefore, to fully consider the relationships between factors, a multiple nonlinear regression model is adopted [14].

(1): Mining thickness

The factors influencing the HWCFZ are numerous. Firstly, single-factor regression analysis was conducted on the typical shallow-buried data and near-shallow-buried data [15,16]. After comparing various curve models, it was concluded that for typical shallow-buried coal seams, the relationship between the HWCFZ and mining thickness is best described by a quadratic function, which has the highest correlation coefficient value of 0.51, as detailed in Formula (1). In the case of near-shallow-buried coal seams, the relationship between the HWCFZ and mining thickness is more appropriately described by a power function, with the highest correlation coefficient value of 0.76, specifically shown in Formula (2) (Supplementary Figure S1).

H_{1} = - 24.63 + 31.08 X_{1} - 2.69 {X_{1}}^{2}

(1)

H_{2} = 40.91 {X_{1}}^{0.659}

(2)

In the formula, H₁ represents the HWCFZ in typical shallow-buried coal seams; H₂ represents the HWCFZ in near-shallow-buried coal seams; and X₁ represents the mining thickness of the coal seam.

(2): Coal seam depth

When investigating the correlation between the HWCFZ in shallow-buried coal seams and the depth of burial, the first step was to utilize the two groups of reclassified sample data for fracture zone heights. Next, the SPSS curve fitting function was applied to two sets of sample data for the HWCFZ in shallow-buried coal seams. Various curve models were derived, and the model with the strongest correlation for each set was chosen to represent the relationship between mining thickness and coal seam depth [17,18]. Under the condition of typical shallow-buried coal seams, the HWCFZ exhibited the best correlation with burial depth when modeled by an S-shape curve, with the formula shown as

H_{1} = \exp (4.26 - 26.71 / X_{2})

, and a correlation coefficient of 0.282 (Supplementary Figure S2a). In the case of near-shallow-buried coal seams, the HWCFZ had the strongest correlation with coal seam depth when described by a quadratic curve, with a correlation coefficient of 0.3 (Supplementary Figure S2b). For typical shallow-buried coal seams, the binary regression Formula (3) improved the predictive accuracy of the univariate regression Formula (1), with a correlation coefficient value of 0.63; whereas for near-shallow-buried coal seams, the precision of the binary regression fitting formula decreased; hence, it was unnecessary to construct it, and using a regression model established with mining thickness as the sole factor achieved accurate prediction.

H_{1} = - 46.77 + 34.62 X_{1} - 3.17 {X_{1}}^{2} + 13.08 e x p (1.08 - 79.34 / X_{2})

(3)

In the formula, H₁ represents the HWCFZ in typical shallow-buried coal seams; X₁ represents the mining thickness of the coal seam; and X₂ represents coal seam depth.

(3): Working length

Continuing with the same data set and methodology, this study further investigated the correlation between mining thickness and working length. The model with the best correlation coefficient among various curve models was selected to represent the relationship between mining thickness and working length [19,20]. In the case of typical shallow-buried coal seams, the HWCFZ exhibited the best correlation with working length when it followed an S-shaped model relationship. Specifically,

H_{1} = e x p (4.29 - 69.86 / X_{3})

, and the correlation coefficient was 0.328; in the case of near-surface-buried coal seams, the HWCFZ exhibited the best correlation with working length when it followed a cubic curve relationship, with a correlation coefficient of 0.062. This correlation analysis proved that, due to the lack of a significant correlation between the HWCFZ and working length under the condition of full coal seam exploitation, the data for shallow-buried coal seams were more dispersed (Supplementary Figure S3a,b). For typical shallow-buried coal seams, the multivariate nonlinear fitting Formula (4) obtained for the relationship between the height of the fractures and various factors once again improved the prediction accuracy of the binary regression Formula (3), with a correlation coefficient of 0.64. Therefore, in typical shallow-buried areas, which have a high degree of correlation with the depth and length of the working face, it is suitable to use more comprehensive factors to establish a multivariate nonlinear regression model based on empirical formula corrections to predict the height of fractures; whereas for near-surface-buried coal seams, the multivariate regression fitting formula still has issues and is not suitable for predicting the height of fractures, and would also increase the complexity of the formula, so there is no need to construct it [21].

H_{1} = - 46.7 + 34.64 X_{1} - 3.18 {X_{1}}^{2} + 34.99 e x p (0.02 - 78.94 / X_{2}) + 26.86 e x p (- 0.49 - 678.55 / X_{3})

(4)

In the formula, H₁ represents the fracture height of typical shallow-buried coal seams; X₁ is the mining thickness of the coal seam; X₂ is the coal seam depth; and X₃ is the working length.

3.2. BP Neural Network Model

Artificial neural networks are the result of studying human brain organization. They utilize numerous interconnected processing units to form a complex network, which simulates the thinking and learning abilities of the human brain. Also known as neural networks, they have proven to be a simple and effective method for solving complex problems [22]. Compared to traditional methods, the most obvious advantage of neural networks is their ability to solve problems with multiple parameters. They also have the following characteristics:

(1): High fault tolerance and robustness, maintaining these qualities even with data containing certain noise.
(2): Strong adaptive learning capabilities, allowing them to extract and display statistical patterns from a given set of sample data.
(3): Capable of parallel processing on large-scale data with fast execution speeds [23].

A neural network can generally be divided into three layers in terms of structure: the input layer, the output layer, and the hidden layer. The nodes of the input layer correspond to different predictive variables. The nodes of the output layer correspond to target variables (which can be multiple). The hidden layer lies between the input and output layers; its number of layers and the number of nodes in each layer together determine the complexity of the neural network. Different neural networks possess different logical structures, with a commonly used model being the BP (backpropagation) neural network model [24].

The BP neural network implements the mapping function from input to output and can approximate any nonlinear continuous function with arbitrary accuracy. The theory shows that increasing the number of hidden layers or the number of neurons in the hidden layer can improve the accuracy of the network. When the number of hidden layers is increased, it complicates the network and also increases the training time for the network weights. Moreover, adjusting and observing the training effect by increasing the number of neurons in the hidden layer is easier. Therefore, a BP neural network structure with one hidden layer is chosen, and to enhance the accuracy of the network, the method of changing the number of neurons in the hidden layer is adopted. The number of neurons in the hidden layer is determined by the following formula:

N = \sqrt{0.43 m n + 0.12 n^{2} + 2.54 m + 0.77 n + 0.35 + 0.51}

(5)

In the formula, N represents the number of neurons in the hidden layer; m represents the number of neurons in the input layer; and n represents the number of neurons in the output layer. The coal seam height at the borehole location m, the depth of the coal seam h, and the working length l are taken as the input parameters for the network, while the HWCFZ at the borehole location H serves as the output parameter.

According to the observed data, a neural network structure as shown in Figure 5 was established. The input layer of this network consists of 3 neurons, the hidden layer is composed of 3 neurons, and the output layer is formed by 1 neuron.

Since the model is nonlinear, the initial values greatly affect whether learning can reach a local minimum and whether it can converge. Therefore, the initial weights were taken as random numbers between [0, 1]. Consequently, before implementing the network, it was necessary to normalize the raw data. The code for normalization is as follows:

[p_train, ps_input] = mapminmax (P_train, 0,1);
p_test = mapminmax (‘apply’, P_test, ps_input);
[t_train, ps_output] = mapminmax (T_train, 0,1).

The process of denormalization involved mapping the data obtained from the program, which fell within the interval [−1, 1], back to the actual predicted values. The code is as follows:

T_sim = mapminmax (‘reverse’, t_sim, ps_output);
T_sim2 = mapminmax (‘reverse’, t_sim2, ps_output).

To overcome the shortcomings of the BP learning algorithm, the L-M algorithm was employed to train the BP neural network. The L-M optimization algorithm is considered to yield the smallest training error and possesses the highest accuracy, which is referenced for predicting the HWCFZ based on the BP neural network. After multiple training sessions, fitting accuracy and convergence speed were achieved, and the average iteration count as well as the mean RMSE were used as performance metrics to evaluate the model.

The L-M training algorithm [25] allows each iteration to no longer follow the singular negative gradient direction, instead permitting the error to search along a deteriorating path. It also optimizes the network weights by adaptively adjusting between the steepest gradient descent method and the Gauss–Newton method, thereby enabling the neural network to converge effectively.

The formula for adjusting the weight threshold is as follows:

Δ w = - {(J^{T} J + μ I)}^{- 1} J^{T} e

(6)

In the formula,

Δ w

represents the adjusted weight threshold, and

μ

is an adaptively adjusted scalar.

J

is the Jacobian matrix of error-to-weight differentiation, e is the error vector, and

I

is the unit matrix.

To achieve a better fitting effect, after repeated training (Figure 6, Figure 7, Figure 8 and Figure 9), the final parameters were set as follows: the learning rate was 0.001, the maximum number of iterations was 1000, the target accuracy was 0.001, the maximum number of validation failures was 10, the display interval was set to 25, and the minimum gradient for training was 1 × 10⁻⁷. The number of hidden layer nodes was determined through the trial and error method, with the optimal value being 9 during the training process. After three training sessions, the training error met the requirements (as shown in Figure 6a). From Figure 6b, it can be seen that the training result of this network had an R = 0.81499, and the angle between the solid line and the dashed line is also very small, indicating that the training result of this network was satisfactory.

3.3. Support Vector Machine Models

(1): Introduction to support vector machine model

The support vector machine (SVM) is a classification system derived from statistical learning theory. It can be used for both classification and regression purposes [26,27]. It uses a decision surface to separate classes, and this decision surface maximizes the margin between classes. This decision surface is considered to be the optimal hyperplane, and the data points closest to the hyperplane are critical factors in the training data set [28]. The advantages of support vector machines (SVMs) are primarily manifested in their ability to handle nonlinear problems effectively and significantly simplify common classification and regression tasks. Moreover, the SVM algorithm is straightforward and exhibits good stability; it is not overly sensitive to variations in the support sample, which means changes in the sample set do not drastically affect the outcomes. Additionally, the particle swarm optimization algorithm is utilized for global optimization [29].

Algorithms inspired by natural selection and genetic variation in the process of biological evolution are called genetic algorithms [30]. In SVR, there are three parameters that are not easy to determine and are usually set based on experience or testing: the penalty parameter C, the insensitive loss function ε, and the kernel parameter σ. This algorithm has a strong global search capability, and utilizing this point to optimize the parameters can solve the problem of parameter selection for SVR. The simulation constructed by combining the two is called the genetic algorithm–support vector regression model (GA-SVR) [31].

(2): Construction of support vector model

In terms of modeling with SVM, setting the optimal parameters is crucial for the accuracy and prediction direction of subsequent models. However, a common issue currently is that the selection of the best parameters is a complex process involving repeated iteration, optimization, and replacement. PSO and GA have certain advantages in optimization. The information sharing capability of PSO and the speed of exchanging information between adjacent particles give it a certain advantage in nonlinear computations [32]. When GA is used for multidimensional spatial parameter optimization, it can avoid local optima and ensure that the algorithm always evolves toward the optimal solution [33]. Combining PSO and GA with SVR, respectively, to harness their individual strengths not only accelerates data convergence but also enhances the optimization process through continuous iteration and updating. This process is straightforward and easy to implement.

Firstly, it was necessary to install the LibSVM toolbox on the MATLAB R2012b platform. It is written in C++ and compiled based on the source codes of C++ and JAVA. This article utilized an enhanced version of the LibSVM toolbox, developed by some domestic developers who applied the genetic algorithm (GA) and particle swarm optimization (PSO) on the basis of the LibSVM toolbox to optimize the parameters of support vector machines. This enhanced version is known as the LibSVM–FarutoUltimate toolbox.

The GA (genetic algorithm) was used to optimize the three parameters C, g, and p in the SVR model, with the parameter ε set to the default value of 0.1 provided in the LibSVM toolbox. The specific parameters for GA optimization were set as follows: a population size of 20, a maximum number of evolutionary generations of 200, a variation range for parameter C of [0, 100], and a variation range for parameter g of [0, 1000]. The process of parameter optimization is shown in Supplementary Figures S4–S7, where it essentially reached optimality after 45 iterations and terminated after 100 iterations. The optimal parameters optimized by GA are shown in Supplementary Figures S4–S7. These parameters were then substituted into the SVR model for training on the sample training set.

To determine the parameters of the PSO-SVM prediction model, the search range for the penalty factor C was set to (0.1, 100), and the search range for the kernel function parameter was set to (0.01, 1000). The particle swarm size was set to 40, with a maximum number of iterations of 100. The dimensionality of the particle vector was 6, with C1 = 2.5 and C2 = 1.5. The initial inertia weight Wstart was set to 0.9, and the final inertia weight at the maximum number of iterations, Wend, was set to 1.1. The iteration ended when the number of iterations reached 100. The iterative program was run, and the training set sample data were read to obtain the optimal penalty factor C and kernel function parameter g for the support vector machine prediction model.

3.4. Random Forest Model

The random forest model is a type of tree-based regression model. As a decision tree, it includes root nodes, internal nodes, and leaf nodes. It can reflect the correspondence between attributes and labels (classification problems), as well as the relationship between attributes and numerical values (regression analysis) [9]. Generally speaking, learning and training data start from the root node, gradually pass through the internal nodes, and terminate at the leaf nodes. In this process, the original data are divided into multiple training subsets according to some specific attributes, with each leaf node representing a learning training subset, and with each node being unique, so that classification or regression can be performed. Random forest is the average of multiple decision trees. If you want to build an RF model as a multivariate nonlinear regression model, then the predictor variables are numerical [34,35,36,37].

This time, the mining thickness, coal seam depth, and length of the working face were selected as the three influencing factors for this study. Randomly divide 70% of the data set as the training set, and use the remaining 30% as the prediction set to test the regression effect of the model.

Before beginning the training process, it is essential to configure the parameters of the classifier. Two important parameters significantly influence the algorithm’s efficiency and classification accuracy: the number of variables (mtry) and the number of trees (Ntree). Setting mtry too low can lead to overfitting, while setting it too high can gradually reduce correlation, both of which can decrease the precision of predictive classification. If Ntree is set too low, it results in insufficient training, and setting it too high increases the computational load of the model. In this study, the number of variables pre-selected at each tree node, mtry, was set to its default value of 1, and the number of trees in the random forest, Ntree, was set to 500. The three factors affecting the guidance height were used as inputs to the model, with the guidance height as the output. The random forest algorithm can utilize the out-of-bag error (OOB error) to measure the impact of parameters on their own performance, evaluate model accuracy, and thus determine appropriate parameters. The mean change in the OOB error in the training set is shown in Figure 10 and Figure 11. The average decrease in accuracy for the three features in the input sample set is shown in Figure 12.

Figure 10 demonstrates that as the number of trees increased, the OOB error gradually decreased and stabilized. When the number of trees reached 250, the model became relatively ideal and stabilized. Adding too many trees would increase the computational load without providing additional benefits. This indicates that the default parameters selected in this study for training the random forest classifier were reasonable, taking into account both computational accuracy and efficiency.

Figure 11 illustrates the average decrease in accuracy, which indicates the change in model accuracy if a particular feature attribute is removed from the model. This metric was used to compare the importance of different features. According to Figure 11, mining thickness is a significant factor influencing guide height, with coal seam depth having a slightly greater impact on guide height than working length.

4. Results and Analysis

4.1. Multivariate Regression Fitting Method for Observed Results of Shallow-Buried Fracture Heights

4.1.1. Analysis of the Model Results for Typical Shallow-Buried Coal Seams

In the typical shallow-buried coal seam area, the regression equation obtained from three factors—coal seam mining thickness, coal seam depth, and coal seam working length—was calculated. The results were compared with the empirical formula in the “three-under standard”, and their relative errors with the measured values were calculated. The results are shown in Table 1.

From Table 1, it can be observed that the relative errors in predicting the HWCFZ using two empirical formulas range from 0.00 to 1.23 and 0 to 1.41, with average relative errors of 0.20% and 0.19%, respectively. The fitting formula 4 yielded the smallest relative error, ranging from 0 to 0.41, with an average relative error of 0.13%. This is followed by fitting formula 3, with a relative error range of 0 to 0.42 and an average relative error of 0.13%. Finally, fitting formula 1 has a relative error range of 0 to 1.03, with an average relative error of 0.18%. Overall, both the empirical formula method and the fitting formula approach did not exhibit large error values and could predict the HWCFZ after excavation in typical shallow-buried coal seams relatively well. However, fitting formulas 4 and 3 significantly outperformed the other methods, with fitting formula 4 showing the closest results to the actual measured values, the smallest errors, and the most stable error rates. The correlation coefficient values also reflect this trend. The R-square values for the two empirical formulas range between 0.01 and 0.35, while the R-square values for the fitting formulas are between 0.51 and 0.64. Notably, fitting Formula (4) has an R-square value of 0.64, which significantly enhances prediction accuracy. This indicates a strong nonlinear relationship between the HWCFZ in typical shallow-buried coal seams under comprehensive mining conditions and the three factors considered. This relationship effectively improves the predictive capability of nonlinear regression, offering greater precision and applicability compared to the empirical formula methods.

4.1.2. Analysis of the Results from the Model of Shallow-Buried Coal Seams

For the shallow-buried coal seam area, a comparison was made between the calculated results from the regression equations derived from three factors—coal seam mining thickness, coal seam depth, and coal seam working length—and the empirical formulas from the “Three-under standard”. The relative errors to the measured values were also calculated. The results are shown in Table 2.

From Table 2, it is observed that the relative errors in predicting the HWCFZ using two empirical formulas range from 0.14 to 0.68 and 0 to 0.64, with average relative errors of 0.54% and 0.48%, respectively. The relative error range for the fitting formula 2 is between 0 and 0.62, with an average relative error of 0.18%, indicating that the errors from the fitting formula are significantly lower than those from the two empirical formula methods, and the range of errors is also noticeably smaller than that of the empirical formulas. Further comparison of the correlation coefficients for each method shows that the empirical formulas have negative R² values, while the fitting formula has an R² value of 0.76.

4.2. Prediction of Fissure Height by Machine Learning Models

Through the single-factor correlation analysis of factors affecting the development of HWCFZ, the relationship between the HWCFZ and its influencing factors was discussed. Based on this, the data were categorized and divided. Subsequently, multivariate nonlinear regressions were performed on the fissure heights under typical shallow- and near-shallow-buried conditions. Finally, the results of the regression formulas were compared with those of empirical formulas for analysis and evaluation. However, under the same influencing factors, machine learning models yielded more reasonable predictions for sample data. This is because the multivariate regression method uses specific formulas as training outcomes, and the limited descriptive power of fitting formulas can compromise the integrity of the training information. In contrast, machine learning regression models retain only the input and output ports in the training results, significantly improving the defects of traditional statistical analysis methods and fundamentally optimizing the traditional process from induction to deduction [38].

4.2.1. Model Results Evaluation Methods

The evaluation of the model adopts the coefficient of determination R² and root mean square error (RMSE, denoted as

σ

_RMSE).

R^{2} = 1 - \sum_{i = 1}^{n} \frac{(Y_{i} - \overset{\land}{Y_{i})^{2}}}{{(Y_{i} - \bar{Y})}^{2}}

(7)

σ_{R M S E} = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} (Y_{i} - {\overset{\land}{Y_{i})}}^{2}}

(8)

n is the number of samples,

\bar{Y}

,

Y_{i}

, and

\overset{\land}{Y_{i}}

, denote the fissure height mean of all samples, the actual measured value of the sample, and the model-tested value. The R² coefficient is used to characterize the goodness of a fit through the variation in the data; a value closer to 1 indicates a stronger ability of the model to explain the dependent variable, suggesting a better fit of the model to the data. A smaller root mean square error (RMSE) indicates higher model fitting accuracy, and similarly, a smaller average relative error suggests higher model fitting precision. This study utilized the Matlab R2012b software for programming calculations. The calculated coefficients of determination and root mean square errors are presented in Table 3, Table 4, Table 5, Table 6, Table 7, Table 8, Table 9 and Table 10.

4.2.2. Analysis of the Model Results for Typical Shallow-Buried Coal Seams

Using 70% of the data (32 sets) from typical shallow-buried coal seams, selected through random categorization, as the training data set, four different machine learning models were trained: the random forest model, BP neural network model, particle swarm optimization (PSO) SVR model, and genetic algorithm (GA)-optimized SVR model. The SVR models optimized by the genetic algorithm (GA) and particle swarm optimization (PSO) determined the optimal parameters for the penalty factor C and kernel function parameter g. The BP neural network also underwent model parameter optimization to obtain the best gradient value results. The random forest did not automatically optimize model parameters but set reasonable model parameters based on the mean change in the out-of-bag error, ensuring that the error results met the prediction requirements. The parameters obtained after optimization were used to construct the models. The fitting effects, as well as comparison plots of measured versus predicted values and relative error plots, are shown in Supplementary Figure S8.

The optimal parameters for the BP neural network are as follows: For the case of mining height as a single factor, the gradient value is 0.00150; for the combination of mining thickness and coal seam depth, the gradient value is 0.00459; for two factors of mining thickness and working length, the gradient value is 0.0125; and for the three-factor scenario, the optimized gradient value is 0.00358. During GA optimization, other parameters were adjusted to default values. The best results show that for mining thickness as a single factor, C = 35.5894 and g = 2.5753; for two factors of mining thickness and coal seam depth, C = 69.4271 and g = 0.1483; for two factors of mining thickness and working length, C = 29.5641 and g = 0.12083; for the three-factor scenario, C = 83.8262 and g = 0.17.0124. PSO results show that for mining thickness as a single factor, C = 0.23549 and g = 90.7716; for two factors of mining thickness and coal seam depth, C = 2.7558 and g = 2.9604; for two factors of mining thickness and working length, C = 19.5359 and g = 0.095986; for the three-factor scenario, C = 10.4019 and g = 0.12166.

After the parameter configuration was completed, the four optimized machine learning models that had been trained could be used on the test set (the remaining 30% of the data, a total of 14 groups) to test the development of HWCFZ in typical shallow-buried coal seams under fully mechanized mining conditions. Finally, the classification accuracy was calculated, and the performance of the trained classifiers was analyzed. The results are shown in Supplementary Figure S9, Tables S3–S6.

From the perspective of fitting performance, the height of fractures generally had a higher degree of fit with three factors and with the combination of mining thickness and coal seam depth. However, the fitting precision was highest for the RF model that used mining thickness and coal seam depth, with an accuracy of 0.81. As can be seen from Supplementary Figure S8d, the relative error range for predictions using three factors is between 0 and 0.54, and for predictions using mining thickness and coal seam depth, it is between 0 and 0.47. The fitted values show good consistency with the actual measured values, far lower than the error range of 0 to 0.95 in other cases. The RMSE is between 5.84 and 8.56, and the R² is between 0.59 and 0.81, indicating that the model fits the training data well, with smaller training errors. This also demonstrates that incorporating more factors leads to stronger explanatory power for fracture height and higher training accuracy of the model. Next, the random forest model had the highest fitting precision, best stability, and strongest learning ability, with R² between 0.60 and 0.81 (most of the other models are between 0.22 and 0.66). This is because the random forest model is suitable for non-differentiable models and situations where features are discrete and have limited values, which aligns more closely with the data characteristics of this paper. The BP neural network and PSO-SVR models had the worst fitting precision, while the GA-SVR model performed more stably overall, but its fitting precision was not high.

From the perspective of verification results, there was a certain deviation between the model’s predicted values and the actual measured values, with relatively large errors, as shown in Supplementary Figure S9. This is due to the complex geological and hydrogeological conditions inherent to typical shallow-buried areas, compounded by the limited sample size. The predicted error range for fracture height can reach 0.01~1.6, which is significantly smaller than the fitting error values. Further analysis from Table 4 and Table 6 indicates that the random forest model had the highest accuracy under the single factor of mining thickness and the dual factors of mining thickness and working length, with the coefficient of determination R² reaching 0.59 and 0.60, respectively, and the root mean square error RMSE = 10.15 and 9.96, capable of predicting the development of HWCFZ relatively well. Next is the GA-SVR model, which, like the training process, exhibited stable and relatively good results during prediction, indicating that the established GA-SVR model was correct, with R² between 0.47~0.61 and RMSE values between 11.50~9.88. The model performance shows that the fitting results are consistent with the verification results, with the worst performance again being the BP neural network model and the PSO-SVR model, which are not suitable for predicting the development of HWCFZ in typical shallow-buried coal seam areas. However, it can be seen from the selection of factors that the fitting results were not consistent with the model performance, and overfitting phenomena generally occurred in the cases where mining thickness and coal seam depth were used along with the three factors. This may be due to the use of three-dimensional data as input variables, carrying more comprehensive training sample information and analyzing the development of HWCFZ more accurately. However, excessive data can cause a dimensionality disaster, generating abnormal features (noise) from too many sample data, leading to poor generalization ability of the model on new data and resulting in low model accuracy. Ultimately, it is determined that the random forest model under the dual factors of mining thickness and working face length has training assessment results that basically correspond with model accuracy verification, and its precision is higher than other methods, making it suitable for predicting the HWCFZ in typical shallow-buried coal seams.

4.2.3. Analysis of Model Results for Near-Shallow-Buried Coal Seams

For the near-shallow-buried coal seam data, 70% (24 groups) of the data were selected after random classification to train four different machine learning models. The training results and relative error plots are shown in Supplementary Figure S11, along with the calculated coefficients of determination and root mean square errors. After training, the optimal parameters for the BP neural network were determined as follows: a gradient value of 0.00114 in the single factor of mining thickness condition; gradient values of 0.000398 in the dual factors of mining thickness and coal seam depth condition; a gradient value of 0.0207 in the dual factors of mining thickness and working face condition; and a gradient value of 0.00472 after optimization for the three factors. The best results from GA optimization show that for the single factor of mining thickness, C = 37.5541 and g = 7.721; for the dual factors of mining thickness and coal seam depth, C = 7.6878 and g = 0.58565; for the dual factors of mining thickness and working face, C = 12.5525 and g = 0.23689; and for the three factors, C = 8.2948 and g = 0.2245. PSO results show that for the single factor of mining thickness, C = 5.9248 and g = 3.3682; for the dual factors of mining thickness and coal seam depth, C = 1.6895 and g = 1.762; for the dual factors of mining thickness and working face, C = 10.2121 and g = 0.26948; and for the three factors, C = 14.2009 and g = 0.74479.

After the configuration of parameters was completed, the sample parameters were brought into the trained machine learning models. Then, the four optimized machine learning models could be used to predict the development of HWCFZ for the remaining 10 groups of near-shallow-buried coal seam validation data under comprehensive mining conditions. Finally, the accuracy of the classification was calculated, and the performance of the trained classifiers was analyzed. The results are shown in Supplementary Figure S11 and Tables S7–S10.

From the scatter plot and relative error plot of model training fitting values versus true values in Supplementary Figure S10, it can be seen that all models are capable of accurately fitting the actual measured values of fracture height, with only a small portion of data showing significant discreteness and deviation, making them difficult to fit. There are two data points with relative errors exceeding 0.7%, while the errors for the remaining data are within 0.6%. Although these samples are outliers, their small number did not affect the overall fitting effect, which can be avoided by expanding the sample database. Among them, the BP neural network model did not fit well, with a relative error range of 0%~1.2% (the maximum relative error for the other models is 0.91%), and the deviation of the fitted values from the actual measured values of the fracture height was also the largest. The RMSE value and R² also prove this point, with the BP neural network having the lowest R² between 0.04 and 0.74 and the highest RMSE between 18.25 and 34.79. Secondly, similar to the fitting effect in the typical shallow-buried area, the random forest model had the best fit with the training sample data and matched the actual measured values well, indicating that this model has a stronger capacity to carry and describe the information contained in the training samples, with a larger R² of 0.89~0.99 for the training data and a smaller RMSE value of 8.57~11.93. The fitting accuracy of the other two models was higher.

The remaining 10 groups of test sample data were brought into the above-mentioned 16 trained models to evaluate the accuracy and generalization ability of the models based on comparison with actual measurements, as shown in Table 4, Table 6, Table 8 and Table 10, and the prediction results and relative error plot are shown in Supplementary Figure S11. It can be seen that the accuracy verification results of several models are basically consistent with the training assessment results. The relative error of the validation data was also very small, with most of the relative errors below 0.5. The random forest model had the strongest accuracy and generalization ability, while the BP neural network had the worst accuracy and generalization ability. It is worth noting that although the fitting accuracy of the random forest model was the highest under the three factors, with an R² of 0.99, the corresponding verification accuracy was not the highest, possibly because the model training process entered a local optimum, and therefore this model should not be used to predict the fracture height in the near-shallow-buried coal seam area. In addition, it can also be seen that both the fitting accuracy using mining thickness and coal seam depth as dual factors and using three factors were very high, but in the verification results, using mining thickness and coal seam depth as dual factors still achieved better prediction results, while the prediction effect under three factors was generally lower than that of dual factors. This is partly due to the abnormal noise caused by too many sample dimensions, and partly because the development of fracture height in the near-shallow-buried area is closely related to coal seam depth. Therefore, the PSO-SVR model under mining thickness and coal seam depth was finally selected for the accurate analysis and summary of the combined effect on the development of HWCFZ, and its accuracy also meets engineering requirements, with an R² of 0.84 and RMSE = 10.83.

5. Discussion

This article, through the analysis of single-factor correlation, shows that there is a nonlinear relationship between the development of HWCFZ in the typical shallow-buried area and various influencing factors. The HWCFZ has a quadratic curve relationship with mining thickness, an S-shape relationship with coal seam depth, and also an S-shape relationship with working length; in the near-shallow-buried area, there is a clear nonlinear relationship between the HWCFZ and the mining thickness, presenting a power function relationship, and basically no relationship with coal seam depth and working length. More comprehensive influencing factors should be considered during prediction.

In comparing the multivariate linear regression fitting formula with the empirical formulas proposed by predecessors, it was found that the fitting formula has higher accuracy in predicting the water conduction height of near-shallow-buried coal seams, while the empirical formulas are not suitable for predicting the water conduction height in this type [2]. The best-fitting formula for typical shallow-buried coal seams, compared to the best-fitting formula for near-shallow-buried coal seams, employs more independent variables. The differences in formulaic form may be related to the geological conditions of the type.

Four machine learning models were established: BP, RF, PSO-SVR, and GA-SVR. It was found that the model performance for predicting the water conduction height in typical shallow-buried coal seams is similar to that in near-shallow-buried coal seams. The random forest model generally has higher accuracy, with strong prediction capabilities under dual-factor conditions, and the BP neural network model has poor accuracy, which is due to the significant similarity of data features. However, the prediction effect for near-shallow-buried coal seams is better, and the prediction process is more stable. Therefore, machine learning methods are more suitable for use in near-shallow-buried areas to handle small-scale discrete data. The machine learning models have high accuracy in predicting each type of HWCFZ, all greater than 0.6. For predicting the water conduction height in typical shallow-buried coal seams, it is suitable to use RF(X1 × 3); for near-shallow-buried coal seams, it is suitable to use PSO-SVR(X1X2); and for predicting the height in medium–deep-buried coal seams, it is suitable to use RF(X1). Therefore, this study concludes that using one or two factors for each group yields more reasonable prediction results. Too much data can cause a dimensionality disaster, which instead reduces prediction accuracy.

Through a comprehensive comparative study of five methods in typical areas, it was found that the water conduction height in typical shallow-buried coal seams has a greater correlation with coal seam depth and working length, and it is suitable to use multiple nonlinear regression methods for prediction; for near-shallow-buried coal seams, the random forest algorithm with mining thickness and of working length achieved higher training accuracy and model accuracy, and this method should be used for prediction. It can be concluded that machine learning methods are not necessarily superior to traditional regression fitting methods. Additionally, when the dependent variable has a clear correlation with two independent variables, machine learning methods are suitable, while nonlinear regression methods are appropriate when the dependent variable has a clear correlation with one or three independent variables.

6. Conclusions

This article systematically studied the development of HWCFZ in northern Shaanxi, China. It analyzed the height data of the water-conducting fractured zones in the study area using multiple nonlinear regression methods and established four machine learning models: a neural network model (BP), random forest model (RF), particle swarm optimization support vector machine model (PSO-SVR), and genetic algorithm-optimized support vector machine model (GA-SVR). Through single-factor analysis predicted by multiple regression and the average reduced accuracy found by constructing the random forest model, it was discovered that the HWCFZ is closely related to mining thickness, coal seam depth, and working length. According to the degree of association, the order is as follows: mining thickness, coal seam depth, and working length. Through model comparison, it was found that the best-suited model for predicting the fissure height of typical shallow-buried areas is the multiple nonlinear regression equation; for near-shallow-buried areas, the best-suited model is the PSO-SVR model with dual factors of mining thickness and coal seam depth. The calculation results of these two models are the most accurate, with accuracies of 0.64 and 0.84, respectively. The results of the present study could provide theoretical and scientific basis for the mining areas in northern Shaanxi and the prediction and prevention of water and sand inrush disasters.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/w17030312/s1.

Author Contributions

Conceptualization, W.C. and S.G.; methodology, W.C., S.G. and X.C.; software, S.G. and X.C.; validation, W.C., T.L. and P.T.; formal analysis, W.C., T.L., P.T. and I.I.; investigation, W.C., S.G., X.C. and T.L.; writing—original draft preparation, W.C., S.G., X.C., T.L., P.T. and I.I.; writing—review and editing, W.C., P.T. and I.I. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the National Natural Science Foundation of China (Grant No. 41807192), Natural Science Basic Research Program of Shaanxi (Program No. 2019JLM-7).

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

Author Xi Chen was employed by the company Shandong Construction Reconnoissance Group Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Fallah-Zazuli, M.; Vafaeinejad, A.; Alesheykh, A.A.; Modiri, M.; Aghamohammadi, H. Mapping landslide susceptibility in the zagros mountains, iran: A comparative study of different data mining models. Earth Sci. Inform. 2019, 12, 615–628. [Google Scholar] [CrossRef]
Lu, C.J.; Xu, J.P.; Li, Q.; Zhao, H.; He, Y. Research on the development law of water-conducting fracture zone in the combined mining of jurassic and carboniferous coal seams. Appl. Sci. 2022, 12, 11178. [Google Scholar] [CrossRef]
Zhang, P.D.; Sun, C.S.; Fan, X.; Li, S.T.; Wang, L.J.; Cao, Z.Z. Temporal and spatial evolution mechanisms of the water-conducting fractured zone of overlying strata in the kongzhuang coal mine. Geofluids 2023, 2023, 3790998. [Google Scholar] [CrossRef]
Dai, S.; Han, B.; Liu, S.L.; Li, N.B.; Geng, F.; Hou, X.Z. Neural network-based prediction methods for height of water-flowing fractured zone caused by underground coal mining. Arab. J. Geosci. 2020, 13, 495. [Google Scholar] [CrossRef]
Feng, D.; Hou, E.K.; Xie, X.S.; Hou, P.F. Research on water-conducting fractured zone height under the condition of large mining height in Yushen mining area, China. Lithosphere 2023, 2023, 8918348. [Google Scholar] [CrossRef]
Guo, C.F.; Yang, Z.; Li, S.; Lou, J.F. Predicting the water-conducting fracture zone (WCFZ) height using an MPGA-SVR approach. Sustainability 2020, 12, 1809. [Google Scholar] [CrossRef]
Gao, Z.Y.; Jin, L.X.; Liu, P.T.; Wei, J.J. Height prediction of water-conducting fracture zone in jurassic coalfield of ordos basin based on improved radial movement optimization algorithm back-propagation neural network. Mathematics 2024, 12, 1602. [Google Scholar] [CrossRef]
Wang, H.Z.; Zhu, J.Z.; Li, W.P. An improved back propagation neural network based on differential evolution and grey wolf optimizer and its application in the height prediction of water-conducting fracture zone. Appl. Sci. 2024, 14, 4509. [Google Scholar] [CrossRef]
Xu, C.; Zhou, K.P.; Xiong, X.; Gao, F.; Zhou, J. Research on height prediction of water-conducting fracture zone in coal mining based on intelligent algorithm combined with extreme boosting machine. Expert Syst. Appl. 2024, 249, 123669. [Google Scholar] [CrossRef]
Zhu, Z.J.; Guan, S.S. Prediction of the height of fractured water-conducting zone based on the improved cuckoo search algorithm-extreme learning machine model. Front. Earth Sci. 2022, 10, 860507. [Google Scholar] [CrossRef]
Zheng, C.; Yu, L.; Sun, N.; Zhou, H.; He, J. Size effect of water-flowing fracture zone height based on FLAC3D. E3S Web Conf. 2020, 194, 4. [Google Scholar] [CrossRef]
Wang, Z.; Wang, C.; Wang, Z. The hazard analysis of water inrush of mining of thick coal seam under reservoir based on entropy weight evaluation method. Geotech. Geol. Eng. 2018, 36, 3019–3028. [Google Scholar] [CrossRef]
Yang, P.; Yang, W.F.; Nie, Y.X.; Saleem, F.H.; Lu, F.; Ma, R.K.; Li, R.P. Predicting the height of the water-conducting fractured zone based on a multiple regression model and information entropy in the northern ordos basin, China. Mine Water Environ. 2022, 41, 225–236. [Google Scholar] [CrossRef]
Yan, Z.G.; Chang, X.T.; Wang, Y.P. The prediction of water conducted zone in coal mining by internet of things perception. Arab. J. Geosci. 2020, 13, 852. [Google Scholar] [CrossRef]
Liu, Y.; Yuan, S.C.; Yang, B.B.; Liu, J.W.; Ye, Z.Y. Predicting the height of the water-conducting fractured zone using multiple regression analysis and GIS. Environ. Earth Sci. 2019, 78, 422. [Google Scholar] [CrossRef]
He, X.; Zhao, Y.X.; Zhang, C.; Han, P.H. A model to estimate the height of the water-conducting fracture zone for longwall panels in western China. Mine Water Environ. 2020, 39, 823–838. [Google Scholar] [CrossRef]
Yin, H.Y.; Dong, F.Y.; Cheng, W.J.; Zhai, P.H.; Ren, X.Y.; Liu, Z.; Zhai, Y.T.; Zhang, Y.W.; Li, X. Height prediction and 3D visualization of mining-induced water-conducting fracture zone in western ordos basin based on a multi-factor regression analysis. Energies 2022, 15, 3850. [Google Scholar] [CrossRef]
Gao, X.C.; Liu, S.; Ma, T.F.; Zhao, C.; Zhang, X.C.; Xia, H.; Yin, J.H. A prediction method for height of water flowing fractured zone based on sparrow search algorithm-elman neural network in northwest mining area. Appl. Sci. 2023, 13, 1162. [Google Scholar] [CrossRef]
Liyang, B.; Changlong, L.; Changxiang, W.; Meng, Z.; Fanbao, M.; Mingjin, F.; Baoliang, Z. Study on height prediction of water flowing fractured zone in deep mines based on weka platform. Sustainability 2022, 15, 737. [Google Scholar] [CrossRef]
Liu, X.; Tan, Y.; Ning, J.; Tian, C.; Wang, J. The height of water-conducting fractured zones in longwall mining of shallow coal seams. Geotech. Geol. Eng. 2015, 33, 693–700. [Google Scholar] [CrossRef]
Zheng, Q.S.; Wang, C.F.; Liu, W.T.; Pang, L.F. Evaluation on development height of water-conduted fractures on overburden roof based on nonlinear algorithm. Water 2022, 14, 3853. [Google Scholar] [CrossRef]
Basheer, I.A.; Hajmeer, M. Artificial neural networks: Fundamentals, computing, design, and application. J. Microbiol. Methods 2000, 43, 3–31. [Google Scholar] [CrossRef] [PubMed]
Weicai, L.; Hui, H.; Shenshen, C.; Biwu, H. Neural network optimization algorithm for the prediction parameters of probability integral method. Sci. Surv. Mapp. 2019, 44, 7. [Google Scholar]
Momeni, E.; Armaghani, D.J.; Hajihassani, M.; Amin, M.F.M. Prediction of uniaxial compressive strength of rock samples using hybrid particle swarm optimization-based artificial neural networks. Measurement 2015, 60, 50–63. [Google Scholar] [CrossRef]
Sun, J.; Mao, H.; Liu, J.; Zhang, B. The research of paddy rice moisture lossless detection based on LM BP neural network. In Proceedings of the International Conference on Computer and Computing Technologies in Agriculture, Beijing, China, 18–20 October 2008; pp. 1181–1188. [Google Scholar]
Ma, G.W.; Chen, Y.; Dong, W.; Xu, M.; Li, T.; Wang, H.D. Investigation of nuclide migration in complex fractures with filling properties. J. Clean. Prod. 2023, 403, 136781. [Google Scholar] [CrossRef]
Zhao, D.K.; Wu, Q. An approach to predict the height of fractured water-conducting zone of coal roof strata using random forest regression. Sci. Rep. 2018, 8, 10986. [Google Scholar] [CrossRef] [PubMed]
Cao, L.J.; Tay, F.H. Support vector machine with adaptive parameters in financial time series forecasting. IEEE Trans. Neural Netw. 2003, 14, 1506–1518. [Google Scholar] [CrossRef] [PubMed]
Hou, Z.; Lu, W. Comparative study of surrogate models for groundwater contamination source identification at DNAPL-contaminated sites. Hydrogeol. J. 2018, 26, 923–932. [Google Scholar] [CrossRef]
Du, H.; Chen, W.H.; Zhu, Q.J.; Liu, S.L.; Zhou, J.B. Identification of weak peaks in X-ray fluorescence spectrum analysis based on the hybrid algorithm combining genetic and Levenberg Marquardt algorithm. Appl. Radiat. Isot. 2018, 141, 149–155. [Google Scholar] [CrossRef]
Hou, E.; Wen, Q.; Ye, Z.; Chen, W.; Wei, J. Height prediction of water-flowing fracture zone with a genetic-algorithm support-vector-machine method. Int. J. Coal Sci. Technol. 2020, 7, 740–751. [Google Scholar] [CrossRef]
Wang, D.S.; Tan, D.P.; Liu, L. Particle swarm optimization algorithm: An overview. Soft Comput. 2018, 22, 387–408. [Google Scholar] [CrossRef]
Zhan, Z.-H.; Zhang, J.; Li, Y.; Chung, H.S.-H. Adaptive particle swarm optimization. IEEE Trans. Syst. Man Cybern. Part B 2009, 39, 1362–1381. [Google Scholar] [CrossRef] [PubMed]
Rodriguez-Galiano, V.; Mendes, M.P.; Garcia-Soldado, M.J.; Chica-Olmo, M.; Ribeiro, L. Predictive modeling of groundwater nitrate pollution using random forest and multisource variables related to intrinsic and specific vulnerability: A case study in an agricultural setting (Southern Spain). Sci. Total Environ. 2014, 476, 189–206. [Google Scholar] [CrossRef] [PubMed]
Rodriguez-Galiano, V.; Sanchez-Castillo, M.; Chica-Olmo, M.; Chica-Rivas, M. Machine learning predictive models for mineral prospectivity: An evaluation of neural networks, random forest, regression trees and support vector machines. Ore Geol. Rev. 2015, 71, 804–818. [Google Scholar] [CrossRef]
Smith, P.F.; Ganesh, S.; Liu, P. A comparison of random forest regression and multiple linear regression for prediction in neuroscience. J. Neurosci. Methods 2013, 220, 85–91. [Google Scholar] [CrossRef] [PubMed]
Wang, Z.L.; Lai, C.G.; Chen, X.H.; Yang, B.; Zhao, S.W.; Bai, X.Y. Flood hazard risk assessment model based on random forest. J. Hydrol. 2015, 527, 1130–1141. [Google Scholar] [CrossRef]
Lary, D.J.; Alavi, A.H.; Gandomi, A.H.; Walker, A.L. Machine learning in geosciences and remote sensing. Geosci. Front. 2016, 7, 3–10. [Google Scholar] [CrossRef]

Figure 1. Scatter plot of HWCFZ versus mining thickness.

Figure 2. Scatter plot of HWCFZ versus coal seam depth.

Figure 3. Scatter plot of HWCFZ versus working length.

Figure 4. Scatter plot of HWCFZ versus mining height after reclassification.

Figure 5. Diagram of neural network structure.

Figure 6. Network training results under single-factor mining thickness conditions.

Figure 7. Network training results under dual-factor conditions of mining thickness and coal seam depth.

Figure 8. Network training results under dual-factor conditions of mining thickness and working length conditions.

Figure 9. Network training results under three factors conditions of mining thickness, coal seam depth and working length.

Figure 10. The mean error change of the out-of-bag samples for the training set.

Figure 11. The mean error change of the out-of-bag samples for the training set.

Figure 12. Average reduction in accuracy for features.

Table 1. The relative errors between the predicted values of HWCFZ in typical shallow-buried coal seams and the actual measured values.

Number	Measured HWCFZ	“Three-Under Standard” Formula (1)		“Three-Under Standard” Formula (2)		Fitting Formula (1)		Fitting Formula (3)		Fitting Formula (4)
Number	Measured HWCFZ	Predicted Value	Error Value	Predicted Value	Error Value	Predicted Value	Error Value	Predicted Value	Error Value	Predicted Value	Error Value
1	26.40	38.49	0.46	41.62	0.58	36.25	0.37	21.87	0.17	21.82	0.17
2	36.52	43.21	0.18	46.88	0.28	49.92	0.37	39.59	0.08	40.03	0.10
3	17.00	37.86	1.23	40.98	1.41	34.46	1.03	23.62	0.39	23.33	0.37
4	49.00	46.30	0.06	50.99	0.04	58.41	0.19	50.34	0.03	51.03	0.04
5	45.00	45.60	0.01	50.00	0.11	56.61	0.26	48.62	0.08	48.22	0.07
6	54.58	44.82	0.18	48.94	0.10	54.49	0.00	47.91	0.12	47.49	0.13
7	58.00	51.05	0.12	58.99	0.02	64.91	0.12	56.64	0.02	55.99	0.03
8	45.00	48.15	0.07	53.82	0.20	62.52	0.39	57.73	0.28	57.66	0.28
9	66.00	46.63	0.29	51.47	0.22	59.23	0.10	56.28	0.15	56.13	0.15
10	45.85	43.43	0.05	47.15	0.03	50.55	0.10	48.92	0.07	48.54	0.06
11	74.00	45.60	0.38	50.00	0.32	56.61	0.23	55.97	0.24	55.68	0.25
12	42.78	45.60	0.07	50.00	0.17	56.61	0.32	56.81	0.33	56.51	0.32
13	69.17	51.05	0.26	58.99	0.15	64.91	0.06	62.73	0.09	63.18	0.09
14	75.60	51.05	0.32	58.99	0.22	64.91	0.14	62.73	0.17	63.18	0.16
15	75.30	52.39	0.30	61.77	0.18	62.73	0.17	58.91	0.22	59.29	0.21
16	75.20	51.05	0.32	58.99	0.22	64.91	0.14	63.24	0.16	63.65	0.15
17	45.78	43.21	0.06	46.88	0.02	49.92	0.09	50.85	0.11	50.48	0.10
18	40.60	45.60	0.12	50.00	0.23	56.61	0.39	57.57	0.42	57.16	0.41
19	42.00	41.31	0.02	44.64	0.06	44.38	0.06	45.27	0.08	44.42	0.06
20	64.18	43.43	0.32	47.15	0.27	50.55	0.21	52.26	0.19	51.63	0.20
21	45.72	42.30	0.07	45.78	0.00	47.26	0.03	49.13	0.07	50.56	0.11
22	45.72	41.31	0.10	44.64	0.02	44.38	0.03	46.73	0.02	48.12	0.05
23	62.78	45.42	0.28	49.75	0.21	56.13	0.11	59.14	0.06	58.64	0.07
24	49.30	52.90	0.07	62.92	0.28	60.99	0.24	58.96	0.20	57.93	0.18
25	45.70	52.90	0.16	62.92	0.38	60.99	0.33	58.96	0.29	57.93	0.27
26	52.40	52.73	0.01	62.54	0.19	61.62	0.18	59.91	0.14	60.06	0.15
27	45.00	37.58	0.16	40.71	0.10	33.67	0.25	35.97	0.20	35.53	0.21
28	53.89	45.23	0.16	49.50	0.08	55.63	0.03	59.92	0.11	60.75	0.13
29	68.09	45.23	0.34	49.50	0.27	55.63	0.18	59.92	0.12	60.75	0.11
30	56.13	45.23	0.19	49.50	0.12	55.63	0.01	59.92	0.07	60.75	0.08
31	61.13	45.60	0.25	50.00	0.18	56.61	0.07	60.87	0.00	60.93	0.00
32	62.80	45.60	0.27	50.00	0.20	56.61	0.10	61.41	0.02	60.02	0.04
33	51.00	46.30	0.09	50.99	0.00	58.41	0.15	63.54	0.25	62.60	0.23
34	62.00	49.47	0.20	56.04	0.10	64.46	0.04	68.59	0.11	67.57	0.09
35	52.60	52.39	0.00	61.77	0.17	62.73	0.19	64.09	0.22	64.08	0.22
36	52.90	53.37	0.01	64.04	0.21	58.76	0.11	58.43	0.10	58.33	0.10
37	60.70	50.63	0.17	58.17	0.04	65.05	0.07	68.79	0.13	69.50	0.14
38	78.32	52.90	0.32	62.92	0.20	60.99	0.22	62.09	0.21	61.74	0.21
39	67.71	52.39	0.23	61.77	0.09	62.73	0.07	64.91	0.04	64.84	0.04
40	35.81	36.14	0.01	39.33	0.10	29.75	0.17	35.13	0.02	35.21	0.02
41	70.00	44.06	0.37	47.95	0.32	52.37	0.25	58.94	0.16	57.47	0.18
42	68.76	51.92	0.24	60.75	0.12	63.85	0.07	67.33	0.02	67.23	0.02
43	34.21	36.14	0.06	39.33	0.15	29.75	0.13	35.64	0.04	35.67	0.04
44	35.07	36.14	0.03	39.33	0.12	29.75	0.15	35.69	0.02	35.73	0.02
45	70.50	52.03	0.26	60.99	0.13	63.62	0.10	67.02	0.05	67.56	0.04
46	75.60	51.05	0.32	58.99	0.22	64.91	0.14	69.53	0.08	69.46	0.08

Table 2. Relative error table between predicted and measured values of fracture height in near-shallow-buried coal seams.

Number	Measured Fracture Height	“Three-Under Standard” Formula (1)		“Three-Under Standard” Formula (2)		Fitting Formula (2)
Number	Measured Fracture Height	Predicted Value	Error	Predicted Value	Error	Predicted Value	Error
1	81.00	43.64	0.46	47.42	0.41	93.41	0.15
2	110.11	47.27	0.57	52.43	0.52	110.23	0.00
3	125.80	48.15	0.62	53.82	0.57	115.02	0.09
4	130.60	48.15	0.63	53.82	0.59	115.02	0.12
5	57.90	37.73	0.35	40.85	0.29	72.44	0.25
6	81.40	37.73	0.54	40.85	0.50	72.44	0.11
7	135.00	49.47	0.63	56.04	0.58	122.78	0.09
8	41.95	36.14	0.14	39.33	0.06	67.75	0.62
9	137.40	47.27	0.66	52.43	0.62	110.23	0.20
10	31.30	26.74	0.15	31.45	0.00	44.86	0.43
11	110.11	47.05	0.57	52.10	0.53	109.10	0.01
12	147.66	50.63	0.66	58.17	0.61	130.30	0.12
13	153.46	50.63	0.67	58.17	0.62	130.30	0.15
14	137.32	53.37	0.61	64.04	0.53	151.62	0.10
15	101.70	43.64	0.57	47.42	0.53	93.41	0.08
16	103.60	45.60	0.56	50.00	0.52	102.00	0.02
17	132.83	55.49	0.58	69.67	0.48	172.77	0.30
18	151.30	48.15	0.68	53.82	0.64	115.02	0.24
19	145.23	48.15	0.67	53.82	0.63	115.02	0.21
20	117.50	52.90	0.55	62.92	0.46	147.49	0.26
21	149.28	49.86	0.67	56.73	0.62	125.21	0.16
22	143.82	49.86	0.65	56.73	0.61	125.21	0.13
23	84.80	43.64	0.49	47.42	0.44	93.41	0.10
24	145.23	49.86	0.66	56.73	0.61	125.21	0.14
25	139.77	49.86	0.64	56.73	0.59	125.21	0.10
26	126.00	49.47	0.61	56.04	0.56	122.78	0.03
27	75.78	45.46	0.40	49.80	0.34	101.32	0.34
28	71.66	45.46	0.37	49.80	0.31	101.32	0.41
29	75.00	43.64	0.42	47.42	0.37	93.41	0.25
30	78.00	38.49	0.51	41.62	0.47	74.83	0.04
31	68.30	41.31	0.40	44.64	0.35	84.38	0.24
32	110.10	45.60	0.59	50.00	0.55	102.00	0.07
33	99.50	48.70	0.51	54.72	0.45	118.15	0.19
34	90.00	48.70	0.46	54.72	0.39	118.15	0.31

Table 3. R-square values in the training data set of typical shallow coal seams.

	Random Forest Model	BP Neural Network Model	PSO-SVR	GA-SVR
Mining thickness	0.60	0.43	0.44	0.45
Mining thickness, coal seam depth	0.81	0.65	0.71	0.63
Mining thickness, working length	0.71	0.22	0.52	0.52
Mining thickness, coal seam depth, working length	0.78	0.59	0.66	0.64

Table 4. R-square values in the validation data set of typical shallow coal seams.

	Random Forest Model	BP Neural Network Model	PSO-SVR	GA-SVR
Mining thickness	0.59	0.43	0.45	0.47
Mining thickness, coal seam depth	0.46	0.56	0.53	0.59
Mining thickness, working length	0.60	0.71	0.60	0.61
Mining thickness, coal seam depth, working length	0.51	0.50	0.49	0.52

Table 5. RMSE values in the training data set of typical shallow coal seams.

	Random Forest Model	BP Neural Network Model	PSO-SVR	GA-SVR
Mining thickness	8.45	10.12	10.06	9.93
Mining thickness, coal seam depth	5.84	7.88	7.24	8.15
Mining thickness, working length	7.16	11.83	9.33	9.31
Mining thickness, coal seam depth, working length	6.25	8.56	7.81	8.04

Table 6. RMSE values in the validation data set of typical shallow coal seams.

	Random Forest Model	BP Neural Network Model	PSO-SVR	GA-SVR
Mining thickness	10.15	11.94	11.75	11.50
Mining thickness, coal seam depth	11.61	10.46	10.87	10.19
Mining thickness, working length	9.96	8.46	10.04	9.88
Mining thickness, coal seam depth, working length	11.23	11.22	11.31	10.94

Table 7. R-square values in the training data set of near-shallow coal seam.

	Random Forest Model	BP Neural Network Model	PSO-SVR	GA-SVR
Mining thickness	0.94	0.74	0.82	0.86
Mining thickness, coal seam depth	0.93	0.64	0.95	0.90
Mining thickness, working face	0.89	0.04	0.82	0.82
Mining thickness, coal seam depth, working face	0.99	0.73	0.84	0.88

Table 8. R-square values in the validation data set of near-shallow-buried coal seams.

	Random Forest Model	BP Neural Network Model	PSO-SVR	GA-SVR
Mining thickness	0.71	0.45	0.63	0. 68
Mining thickness, coal seam depth	0.79	0.55	0.84	0.77
Mining thickness, working face	0.76	0.41	0.53	0.53
Mining thickness, coal seam depth, working face	0.81	0.57	0.71	0.68

Table 9. RMSE values in the training data set of near-shallow coal seam.

	Random Forest Model	BP Neural Network Model	PSO-SVR	GA-SVR
Mining thickness	8.57	18.25	15.21	13.22
Mining thickness, coal seam depth	9.38	21.27	7.73	10.97
Mining thickness, working face	11.73	34.79	14.95	15.04
Mining thickness, coal seam depth, working face	11.93	18.64	14.13	12.19

Table 10. RMSE square value in validation data set of near-shallow coal seam.

	Random Forest Model	BP Neural Network Model	PSO-SVR	GA-SVR
Mining thickness	14.56	20.09	16.52	15.32
Mining thickness, coal seam depth	12.32	18.26	10.83	13.06
Mining thickness, working face	13.30	20.85	18.57	18.52
Mining thickness, coal seam depth, working face	11.92	17.71	14.64	15.43

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, W.; Geng, S.; Chen, X.; Li, T.; Tsangaratos, P.; Ilia, I. Prediction of the Height of Water-Conducting Fissure Zone for Shallow-Buried Coal Seams Under Fully Mechanized Caving Conditions in Northern Shaanxi Province. Water 2025, 17, 312. https://doi.org/10.3390/w17030312

AMA Style

Chen W, Geng S, Chen X, Li T, Tsangaratos P, Ilia I. Prediction of the Height of Water-Conducting Fissure Zone for Shallow-Buried Coal Seams Under Fully Mechanized Caving Conditions in Northern Shaanxi Province. Water. 2025; 17(3):312. https://doi.org/10.3390/w17030312

Chicago/Turabian Style

Chen, Wei, Shujia Geng, Xi Chen, Tao Li, Paraskevas Tsangaratos, and Ioanna Ilia. 2025. "Prediction of the Height of Water-Conducting Fissure Zone for Shallow-Buried Coal Seams Under Fully Mechanized Caving Conditions in Northern Shaanxi Province" Water 17, no. 3: 312. https://doi.org/10.3390/w17030312

APA Style

Chen, W., Geng, S., Chen, X., Li, T., Tsangaratos, P., & Ilia, I. (2025). Prediction of the Height of Water-Conducting Fissure Zone for Shallow-Buried Coal Seams Under Fully Mechanized Caving Conditions in Northern Shaanxi Province. Water, 17(3), 312. https://doi.org/10.3390/w17030312

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Prediction of the Height of Water-Conducting Fissure Zone for Shallow-Buried Coal Seams Under Fully Mechanized Caving Conditions in Northern Shaanxi Province

Abstract

1. Introduction

2. Preparation of the Data Set

3. Methodology

3.1. Multivariate Nonlinear Regression Model

3.2. BP Neural Network Model

3.3. Support Vector Machine Models

3.4. Random Forest Model

4. Results and Analysis

4.1. Multivariate Regression Fitting Method for Observed Results of Shallow-Buried Fracture Heights

4.1.1. Analysis of the Model Results for Typical Shallow-Buried Coal Seams

4.1.2. Analysis of the Results from the Model of Shallow-Buried Coal Seams

4.2. Prediction of Fissure Height by Machine Learning Models

4.2.1. Model Results Evaluation Methods

4.2.2. Analysis of the Model Results for Typical Shallow-Buried Coal Seams

4.2.3. Analysis of Model Results for Near-Shallow-Buried Coal Seams

5. Discussion

6. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI