CPS-RAUnet++: A Jet Axis Detection Method Based on Cross-Pseudo Supervision and Extended Unet++ Model

Gan, Jianhong; Cai, Kun; Fan, Changyuan; Deng, Xun; Hu, Wendong; Li, Zhibin; Wei, Peiyang; Liao, Tao; Zhang, Fan

doi:10.3390/electronics14030441

Open AccessArticle

CPS-RAUnet++: A Jet Axis Detection Method Based on Cross-Pseudo Supervision and Extended Unet++ Model

by

Jianhong Gan

^1,2,3,4,

Kun Cai

^1,2,3,

Changyuan Fan

^5,*,

Xun Deng

^1,2,3,*,

Wendong Hu

⁶,

Zhibin Li

^1,2,3,4,7,8,

Peiyang Wei

¹

,

Tao Liao

^1,2,3 and

Fan Zhang

⁸

¹

College of Software Engineering, Chengdu University of Information Technology, Chengdu 610225, China

²

Key Laboratory of Meteorological Software China Meteorological Administration, Chengdu 610225, China

³

Sichuan Key Laboratory of Software Automatic Generation and Intelligent Service, Chengdu University of Information Technology, Chengdu 610225, China

⁴

Dazhou Key Laboratory of Government Data Security, Sichuan University of Arts and Science, Dazhou 635000, China

⁵

Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China

⁶

College of Computer Science, Chengdu University of Information Technology, Chengdu 610225, China

⁷

College of Electronic Engineering, Chengdu University of Information Technology, Chengdu 610225, China

⁸

School of Atmospheric Sciences, Chengdu University of Information Technology, Chengdu 610225, China

^*

Authors to whom correspondence should be addressed.

Electronics 2025, 14(3), 441; https://doi.org/10.3390/electronics14030441

Submission received: 29 November 2024 / Revised: 10 January 2025 / Accepted: 20 January 2025 / Published: 22 January 2025

(This article belongs to the Special Issue Application of Machine Learning in Graphics and Images, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

:

Atmospheric jets are pivotal components of atmospheric circulation, profoundly influencing surface weather patterns and the development of extreme weather events such as storms and cold waves. Accurate detection of the jet stream axis is indispensable for enhancing weather forecasting, monitoring climate change, and mitigating disasters. However, traditional methods for delineating atmospheric jets are plagued by inefficiency, substantial errors, and pronounced subjectivity, limiting their applicability in complex atmospheric scenarios. Current research on semi-supervised methods for extracting atmospheric jets remains scarce, with most approaches dependent on traditional techniques that struggle with stability and generalization. To address these limitations, this study proposes a semi-supervised jet stream axis extraction method leveraging an enhanced U-Net++ model. The approach incorporates improved residual blocks and enhanced attention gate mechanisms, seamlessly integrating these enhanced attention gates into the dense skip connections of U-Net++. Furthermore, it optimizes the consistency learning phase within semi-supervised frameworks, effectively addressing data scarcity challenges while significantly enhancing the precision of jet stream axis detection. Experimental results reveal the following: (1) With only 30% of labeled data, the proposed method achieves a precision exceeding 80% on the test set, surpassing state-of-the-art (SOTA) baselines. Compared to fully supervised U-Net and U-Net++ methods, the precision improves by 17.02% and 9.91%. (2) With labeled data proportions of 10%, 20%, and 30%, the proposed method outperforms the MT semi-supervised method, achieving precision gains of 9.44%, 15.58%, and 19.50%, while surpassing the DCT semi-supervised method with improvements of 10.24%, 16.64%, and 14.15%, respectively. Ablation studies further validate the effectiveness of the proposed method in accurately identifying the jet stream axis. The proposed method exhibits remarkable consistency, stability, and generalization capabilities, producing jet stream axis extractions closely aligned with wind field data.

Keywords:

jet stream axis detection; deep learning; semi-supervised learning; attention mechanisms; cross pseudo supervision

1. Introduction

The jet stream is a concentrated band of fast-moving air located in the upper levels of the troposphere, typically at pressures between 200 and 300 hPa. Wind speeds in these regions frequently exceed 30 m/s and occasionally surpass 100 m/s [1]. The formation of jet streams is closely linked to temperature gradients, geostrophic effects, and topographical influences. The two primary types of jet streams are polar and subtropical, occurring in mid-to-high latitudes and subtropical regions, respectively [2]. Jet streams are crucial in shaping atmospheric circulation and influencing weather systems. They steer low-pressure and frontal systems, which affect the distribution of precipitation and temperature, factors that are vital for agricultural productivity, water resource management, and disaster prevention. Furthermore, jet streams significantly influence the paths of monsoons and temperate cyclones, making them critical for understanding climate variability and mitigating the risks of extreme weather events within the framework of sustainable environmental management [3]. Jet streams are closely associated with the dynamics of fronts and atmospheric airflow, fundamentally shaping surface weather patterns. This connection has sparked significant research interest in jet streams in recent years [4].

Visualizing jet stream axes is critical for forecasters to better understand and predict weather patterns. On high-altitude weather maps, jet stream axes are typically represented by lines with arrows, indicating their location and flow direction. In meteorological operations, forecasters often rely on software to manually plot jet stream axes. However, this manual process is time-consuming, error-prone, and highly subjective. To address these issues, traditional approaches to jet stream axis detection have employed numerical models and mathematical algorithms, such as the subsumption algorithm, polynomial fitting, isochronous analysis, critical point detection, clustering [5], thresholding [6], ridge detection [7], Dijkstra’s shortest-path algorithm [8], and data assimilation [9]. These methods have significantly improved detection efficiency and reduced subjectivity. Despite these advancements, the inherent complexity and variability of wind fields pose substantial challenges. In regions where jet streams diverge or merge, the intricate relationships between wind direction vectors often exceed the capabilities of traditional methods. This limitation leads to instability and reduced accuracy in the results, primarily because traditional algorithms are unable to capture the dynamic and nonlinear changes in wind fields, such as the breakage and reformation of jet stream axes. As wind field structures become increasingly complex, there is a growing need for more robust and adaptive automated methods that can effectively address these challenges and enhance the accuracy of jet stream axes detection. Deep learning has garnered significant attention in jet stream axes detection due to its strong generalization ability and high accuracy. Existing methods include Physical Information Neural Networks (PINNs) [10], Statistical Information Neural Networks (SINNs), Artificial Neural Networks (ANNs) [11], ConvLSTM [12] models with multivariate features, and Recurrent Neural Networks (RNNs) [13]. Fully supervised deep learning models, trained on labeled meteorological data, can automatically identify jet stream areas. However, such approaches require a large volume of high-quality labeled data, which is often difficult to obtain. In contrast, semi-supervised learning offers an effective alternative by leveraging unlabeled data to reduce dependence on labeled samples. A core component of semi-supervised learning is the consistency learning phase [14,15], which enhances model robustness and generalization by ensuring stable predictions. This is achieved by applying slight perturbations [16] to unlabeled data and comparing the model’s outputs before and after the perturbation. Consistency constraints are then incorporated into the loss function to optimize the model’s parameters. While effective, current consistency learning methods face challenges, including low pseudo-label accuracy and difficulties in integrating pseudo-labeled data into the training process. These limitations hinder continuous model optimization and reduce performance in complex wind field scenarios.

This paper proposes the CPS-RAUnet++ semi-supervised deep learning model to overcome the limitations of traditional jet stream axis detection methods. These limitations include difficulties in handling nonlinear changes in wind fields, the heavy reliance on large amounts of labeled data in supervised learning, and challenges in generating and filtering high-quality pseudo-labels during the consistency learning phase. Inspired by semi-supervised learning, depth-separable convolution, and attention mechanisms, the model extends the U-Net++ architecture by incorporating residual units and attention mechanisms. Additionally, it integrates a cross pseudo supervision approach to optimize the quality of pseudo-labels generated during the consistency learning phase, enabling more accurate and robust jet stream axis detection.

The main contributions of this study are as follows:

We propose a novel strategy to transform wind field vector data into images, utilizing RGB channels to represent wind speed and the two directional components of unit wind velocity, respectively. This method enables the model to capture wind field features with greater accuracy and overcomes the limitations of traditional numerical meteorological models that rely on single-channel or simplified numerical inputs.
We integrated the improved attention gate into the dense skip connections of U-Net++, enabling dynamic feature selection within these connections. By emphasizing key regional features and suppressing irrelevant background noise, the attention gate enhances the model’s ability to focus on critical areas. For jet stream axis detection, characterized by multi-scale and dynamic features, this mechanism significantly improves the model’s capacity to capture fine-grained local details of the jet stream axis alongside broader wind field patterns. Furthermore, we optimized the ResNet18 backbone by incorporating DropConnect within its Basic Blocks. This addition enhances the modulation of information flow between convolutional layers, resulting in more adaptive and dynamic feature fusion, ultimately boosting the model’s overall performance and generalization capabilities.
We combined the cross pseudo supervision semi-supervised learning method with an extended U-Net++ model, integrating pseudo-label generation and filtering mechanisms to significantly enhance the model’s ability to learn from unlabeled data and reduce its reliance on high-quality labeled samples.
To correct errors in the sequence of jet stream central axis points, we developed an eight-neighbor connection algorithm. This algorithm effectively addresses issues such as axis distortion caused by scattered connections.

2. Related Work

Atmospheric jets play a critical role in atmospheric circulation, influencing surface weather patterns and climate predictions. Researchers have explored various methods for improving jet stream axis detection, including radar-based techniques. Li et al. [17] analyzed the relationship between upper-level turbulence and the East Asian jet stream using radar-derived observations. Spensberger et al. [18] and Kern et al. [19] applied statistical techniques to enhance detection using radar data. However, radar methods are limited by their inability to capture full wind field information, as they only measure radial wind speeds, and their spatial resolution is insufficient for high-resolution wind field detection. These constraints make radar unsuitable for complex wind scenarios involving jet stream bifurcations and mergers. Data assimilation and reanalysis methods integrate multi-source data to build global wind field models, as demonstrated by Zhou et al. [20] and Gan et al. [21], who applied velocity filtering and wind consistency methods to improve streamline detection. However, these methods are computationally intensive, require high-quality observational data, and lack real-time adaptability, making them less practical for dynamic scenarios. Traditional numerical weather models face challenges in accurately detecting jet stream axes, primarily due to their reliance on highly parameterized physical models that fail to capture complex wind dynamics and the inefficiencies in processing large data volumes. These limitations have motivated the exploration of machine learning and semi-supervised learning techniques to enhance detection accuracy and efficiency.

In semi-supervised learning, a pseudo-label [22] is widely used to generate labels for unlabeled data based on model predictions, but the quality of the pseudo-labels is crucial to the effectiveness of this approach. In tasks such as jet stream axis detection, where bifurcations and mergers introduce high variability, low-quality pseudo-labels can degrade model performance. Methods like contrastive learning [23] help improve feature consistency by clustering similar samples in the feature space, but they face challenges such as negative sample selection and high computational costs. Consistency regularization [24] asserts that the model’s predictions should remain stable under different data augmentations. The FixMatch model generates pseudo-labels from weakly augmented samples and reinforces them with strongly augmented data to promote consistency, thus mitigating overfitting in label-scarce scenarios. However, data augmentation itself can introduce inaccuracies that hinder classification efficiency. Recent studies suggest that fuzzy logic-based classifiers could address this issue by grouping similar images in a fuzzy sense and using fuzzy divergence to compare each image with representative feature groups [25]. This approach could enhance the robustness of classification methods by mitigating the effects of noisy or uncertain augmented data. Cross pseudo supervision (CPS) [26] is an enhanced pseudo-labeling strategy that involves multiple models working collaboratively to generate pseudo-labels. During this process, the models supervise each other and exchange information, which helps reduce the bias of individual models and improves the reliability of the labels. However, the segmentation performance of this method heavily relies on the quality of the pseudo-labels generated by the backbone models during training. Existing methods still struggle to generate high-quality pseudo-labels and fail to effectively incorporate them into the training set, limiting their performance in complex tasks. Moreover, self-training [27] methods rely on generated pseudo-labels to retrain the model, but poor-quality pseudo-labels can degrade model performance, causing a decline in accuracy over successive iterations.

Deep neural networks (DNNs) [28] have achieved remarkable success in image segmentation tasks. Through multi-level feature extraction, DNNs are able to capture complex patterns and structures in an image, enabling accurate classification of each pixel. This capability has led to a wide range of applications in areas such as medical image analysis, autonomous driving, remote sensing, and meteorology. U-Net [29] is a fully convolutional network that is particularly effective in medical image segmentation. As shown in Figure 1, its encoder-decoder architecture extracts high-level features while restoring spatial resolution through up-sampling. Despite its success, U-Net suffers from computational inefficiencies and feature redundancy due to its extensive jump connections.

Recently, several variations of the U-Net model have been introduced by researchers. For example, TransUNet [30] integrates the architectural framework of Transformer with that of U-Net for medical image segmentation. The core idea behind TransUNet is to leverage the Transformer for capturing long-range dependencies during the encoding stage, while U-Net is used to restore the spatial details of the image. One limitation of this approach is that it may struggle with the efficient extraction of fine-grained features, which is crucial for accurate semantic segmentation. In the decoding stage, UNet3+ [31] addresses the limitations of the U-Net model, particularly the issues of information loss and insufficient fusion of deep and shallow features. It integrates features across all the different resolution layers, enhancing the model’s expressive power through rich skip connections. A potential drawback of this approach is that the increased number of skip connections can lead to computational overhead and make the model more prone to overfitting. The Swin-Unet [32] model incorporates the Swin Transformer into the U-Net architecture to capture global dependencies and multi-scale features. The encoder of Swin-Unet employs the Transformer along with a sliding-window mechanism to extract features. The decoder then restores the spatial details by up-sampling. This approach effectively captures long-range dependencies and local context through the sliding-window mechanism, but it may introduce high computational costs, especially for high-resolution inputs. UNet++ [33] has been optimized to address the semantic gap of skip connections and improve segmentation accuracy. Unlike the basic skip connections used in traditional U-Net, UNet++ utilizes dense skip connections between the encoder and decoder, performing multiple feature aggregations through layer-by-layer decoders. Dense connections facilitate cross-layer information flow, enhancing feature fusion and ensuring semantic consistency across different levels. However, dense skip connections may lead to feature information redundancy and information overload. Without an effective mechanism to control which features are useful, excessive information may make it difficult for the network to focus effectively on key features. To overcome these issues, we propose an improvement by incorporating attention gates into the dense skip connections of Unet++, which enables the network to focus more on features with high semantic information while suppressing irrelevant or redundant features.

3. Methodology

3.1. Dataset and Preprocessing

The data utilized in this study were sourced from the ERA5 reanalysis provided by the European Centre for Medium-Range Weather Forecasts (ECMWF) [34]. Specifically, the U and V wind components were extracted from the ERA5 pressure-level dataset at the 500 hPa pressure level. The dataset covers the period from 2019 to 2022, with data recorded at 8:00 A.M. and 8:00 P.M. daily. The study area spans a longitudes range of 0° to 160° E and latitudes from 12° to 80° N, which includes most of the Eurasian continent, the northern Indian Ocean, and the western Pacific region, extending from the equator to the Arctic. This region includes tropical, subtropical, and temperate climate zones, and part of the polar climate zones, representing diverse climatic and atmospheric dynamics. To facilitate storage and analysis, the raw data were processed using the following methodology, resulting in a final dataset of 1000 color-coded images. The labels for these images were manually annotated by meteorological experts. The dataset is split into training, validation, and test sets in an 8:1:1 ratio.

ERA5 wind field data is stored as a longitudinal wind speed component (U) and a latitudinal wind speed component (V). The wind speed magnitude is calculated using the standard formula in meteorological studies, Equation (1), while the wind direction is derived from Equation (2). Since the wind direction is represented as an angle, directly using it as input to the deep learning model may cause ambiguity, necessitating further processing. This processing step not only facilitates data visualization and wind speed comparisons but also lays the foundation for converting the data into effective model inputs. Following these steps, we map the wind speed magnitude and the two wind direction components into image pixels, creating a dataset with a resolution of 320 × 512.

Speed = \sqrt{U^{2} + V^{2}}

(1)

Direction = \{\begin{matrix} 0, & if U = 0 and V \geq 0, \\ 90, & if U > 0 and V = 0, \\ 180, & if U > 0 and V < 0, \\ 270, & if U < 0 and V = 0, \\ arctan (\frac{U}{V}) \times \frac{180}{π}, & if U > 0 and V > 0, \\ arctan (- \frac{V}{U}) \times \frac{180}{π} + 90, & if U > 0 and V < 0, \\ arctan (\frac{U}{V}) \times \frac{180}{π} + 180, & if U < 0 and V < 0, \\ arctan (- \frac{V}{U}) \times \frac{180}{π} + 270, & if U < 0 and V > 0 . \end{matrix}

(2)

To convert wind fields into three-channel data, we represent wind speed and the two directional components of unit wind velocity using the R, G, and B channels, as defined by Equations (3), (5), and (6). According to the Technical Specification for Meso-scale Weather Chart Analysis (Trial Version) issued by the National Meteorological Center of China, the jet stream region at the 500 hPa level is typically defined as areas with wind speeds exceeding 20 m/s. This threshold is widely accepted in meteorology as sufficient to distinguish general wind speeds from strong wind regions, such as jet streams. However, in practical applications, the boundary characteristics of jet streams may vary depending on different research objectives or meteorological conditions. To more precisely delineate jet stream boundaries and capture potential jet stream characteristics in low wind speed regions, we set 14 m/s as the boundary point in the piecewise function and use Equation (3) to map wind speed to the R channel pixel values. For wind speeds below 14 m/s, the corresponding R channel pixel values are set to 0. For wind speeds greater than or equal to 14 m/s, we use the sigmoid-like function in Equation (3) to map the wind speed to the R channel pixel values within the range (0, 255]. This mapping method effectively distinguishes the wind speed characteristics between low-speed and jet stream regions. Based on the definition of the wind speed threshold, we set 20 m/s as the central value of the sensitivity range for the jet stream region. This setting ensures that when the wind speed approaches 20 m/s, its mapping to the R channel pixel values exhibits significantly enhanced changes. As shown in Figure 2, the mapping function curve is steep around 20 m/s, indicating that changes in wind speed have the greatest impact on the R channel pixel values. Within this sensitive range, subtle variations in wind speed can be quickly captured, better reflecting the characteristics of weaker jet stream regions and improving the accuracy of jet stream boundary identification.

R_{(i, j)} = \{\begin{matrix} 0, & {Speed}_{(i, j)} < 14 \\ \frac{255}{1 + e^{- {Speed}_{(i, j)} + 20}}, & {Speed}_{(i, j)} \geq 14 \end{matrix}

(3)

Wind direction, which represents orientation without magnitude, is not suitable for direct use as an input feature in deep learning models. For instance, wind directions of 0° and 360° represent the same direction, yet their numerical values differ significantly, which could lead to the model misinterpreting directional relationships if used directly. Therefore, in this study, the equation Equation (4) maps the wind direction to a point (x,y) on the unit circle, where the angle

θ

corresponds to the angle between the point and the positive Y-axis. The coordinates (x,y) are in the range of [−1, 1], representing the full range of wind directions.

x^{2} + y^{2} = 1

(4)

Equations (5) and (6) map wind direction to the G and B channels, we scale the coordinates to the range of [0, 255] to match the typical pixel values used in image data. Specifically, we first shift the coordinates from the range [

- 1

, 1] to [0, 2] by adding 1 to each coordinate. Then, we multiply each coordinate by 127.5 to scale it into the desired range. By using the sine and cosine functions, we effectively capture the cyclic nature of wind direction and avoid the pitfalls of using raw angular values directly. This transformation ensures compatibility with image model input standards. After this process, the wind field data are transformed into a dataset with dimensions of 3C × 320H × 512W (C:

c h a n n e l s

; H:

h e i g h t

; W:

w i d t h

).

G_{(i, j)} = [\sin (\frac{2 \times π \times {Direction}_{(i, j)}}{359}) + 1] \times 127.5

(5)

B_{(i, j)} = [\cos (\frac{2 \times π \times {Direction}_{(i, j)}}{359}) + 1] \times 127.5

(6)

Data Augmentation

We adopted a data augmentation method similar to the one used in cross pseudo supervision (CPS), where data augmentation plays a critical role in improving the robustness of pseudo-label generation and enhancing the consistency learning process. By following this approach, we aim to leverage the benefits of augmentation to not only improve the diversity of the training data but also facilitate the pseudo-labeling process, thereby enhancing the effectiveness of the semi-supervised learning paradigm. In this study, we use a straightforward data augmentation strategy tailored to labeled and unlabeled data, as summarized in Table 1. This table outlines the specific transformations used for weak and strong augmentations. For labeled data, we apply weak augmentations to preserve the integrity of annotated features while introducing moderate variability. For unlabeled data, we employ stronger augmentations to enhance the diversity of the training data.

3.2. Structure of the Semi-Supervised Method

This paper proposes a method for extracting jet stream axes based on depth-separable convolution, an attention mechanism, a residual unit-extended Unet++, and CPS semi-supervised learning. The method consists of two phases: a consistency learning phase and a self-training phase. The general framework is shown in Figure 3.

In the consistency learning phase, a cross pseudo supervision approach is used to train two parallel RAUnet++ models for generating pseudo labels. Each model has the same structure and weights but is initialized differently. The input to each model consists of a labeled dataset and an unlabeled dataset. Each RAUnet++ model generates two outputs: one for labeled data and the other for unlabeled data. As shown in Figure 3, the outputs of labeled data (Labeled output 1 and Labeled output 2) are used to compute the loss function with the true labels, which is partially used as supervised loss. While the output of unlabeled data calculates the unsupervised loss by cross pseudo supervision method. Specifically, during each iteration, the two models predict the same batch of unlabeled data. The outputs, Unlabeled output 1 and Unlabeled output 2, are segmentation confidence maps generated by the two networks after softmax normalization. Using the argmax operation, the category (1 for axis and 0 for not axis) with the highest probability at each pixel is selected, producing the corresponding pseudo-labels, Pseudo1 and Pseudo2. Subsequently, Pseudo2 serves as the supervision signal for Unlabeled output 1, while Pseudo1 serves as the supervision signal for Unlabeled output 2, which partially corresponds to the unsupervised loss in Figure 3.

In the self-training phase, RAUnet++ is initially trained using the labeled data, and the pseudo-labels generated in the consistency phase are added to the dataset. Subsequently, the model continues to be trained using the updated dataset until the performance metrics are no longer significantly improved.

3.3. Loss Function

In supervised learning, two commonly used loss functions for semantic segmentation tasks are cross-entropy loss [35] and Dice loss [36]. The supervised loss is applied to labeled data, where the model takes labeled samples as input and computes the loss between the model predictions and the corresponding ground truth labels. The cross-entropy loss quantifies the discrepancy between the category probability distribution predicted by the model and the one-hot encoded distribution of the actual labels. Its formula is

L_{C E} = - \sum_{i}^{C} y_{i} log (\hat{y})

(7)

where C is the total number of categories,

y_{i}

is the true label, and

\hat{y}

is the predicted probability of belonging to category i. The Dice loss function is based on the Dice coefficient, which aims to maximize the similarity between the predicted regions and the ground truth. The formula for the Dice coefficient is

Dice = \frac{2 \sum_{i = 1}^{N} p_{i} g_{i}}{\sum_{i = 1}^{N} p_{i} + \sum_{i = 1}^{N} g_{i}}

(8)

where

p_{i}

is the probability predicted by the model for pixel i, and

g_{i}

is the ground truth label for pixel i. The cross-entropy loss is particularly suitable for pixel-level classification tasks, while the Dice loss improves region overlap accuracy in segmentation, especially for imbalanced datasets. However, using only cross-entropy or Dice loss may not yield optimal results. Therefore, a combined loss function is often employed for supervised learning. where

L_{S u p}

represents the supervised learning loss,

L_{C E}

corresponds to the cross-entropy loss,

L_{D i c e}

refers to the Dice loss, and

λ

and

β

are the weighting factors.

L_{S u p} = λ L_{C E} + β L_{D i c e}

(9)

In unsupervised learning, the cross-entropy loss is applied exclusively. The two RAUnet++ models take unlabeled data as input, and the predicted pseudo-labels are used in a cross-supervision manner, where each model computes the unsupervised loss between its predictions and the pseudo-labels generated by the other model. The loss function used in the consistency learning process is given by Equation (10). where

L_{C o n}

denotes the consistent learning process loss function,

L_{S u p}

denotes the supervised learning loss function, and

L_{U n S u p}

denotes the unsupervised learning loss.

L_{C o n} = L_{S u p} + L_{U n S u p}

(10)

In the self-training phase, BCEWithLogitsLoss is used, which combines the Binary Cross-Entropy Loss (BCE) and Logits, allowing the model’s output to be used directly as input without the need for an additional Sigmoid activation. Its formula is

BCEWithLogitsLoss = - \frac{1}{n} \sum_{i = 1}^{n} [y_{i} \cdot log σ (z_{i}) + (1 - y_{i}) \cdot log (1 - σ (z_{i}))]

(11)

where

z_{i}

is the predicted output of the i-th sample and

y_{i}

is the true label of the i-th sample.

σ (\cdot)

converts the predicted output into a probability.

3.4. Residual Block with DropConnect

Residual networks [37] were introduced to mitigate the issues of gradient vanishing and degradation that occur as the depth of the network increases. The inputs and outputs of the residual module can be described as follows:

y_{l} = x_{l} + F (x_{l} + w_{l})

(12)

x_{l + 1} = L (y_{l})

(13)

where

x_{l}

denotes the input of the residual unit in layer l,

x_{l + 1}

denotes the output of the residual unit in layer

l + 1

, and

F (\cdot)

denotes the residual mapping, which contains convolution operations, batch normalization, and activation.

w_{l}

represents the learnable parameter of the layer, and

L (\cdot)

is the nonlinear activation function. While skip connections mitigate the gradient vanishing problem, deep networks remain susceptible to overfitting. To address this, we introduce a DropConnec [38] module in the residual unit, a regularization method similar to Dropout [39] that reduces the risk of overfitting by randomly deactivating some of the weight connections. This is an extension of Dropout that operates at a finer-grained level. Assuming the input to a given layer is

x_{l}

and the weight matrix is

w_{l}

, the standard formula for a fully connected layer is

z_{l} = x_{l} \cdot w_{l} + b

(14)

In DropConnect, a mask

m_{l}

(a matrix randomly composed of 0 and 1) is applied to the weight matrix

w_{l}

, yielding the following formula:

z_{l} = x_{l} \cdot (w_{l} ⊙ m_{l}) + b

(15)

As illustrated in Figure 4, a DropConnect layer is incorporated into the Basic Block after the initial convolutional layer.

3.5. Separable Convolutional Attention Gate

We adopt the methodology proposed in Attention U-net to enhance the model’s ability to focus on jet stream regions while minimizing the influence of irrelevant areas. Unlike Attention U-net, we integrate a Separable Convolutional Attention Gate (SCAG) structure, which replaces the standard convolution in Attention Gate with a depth-separable convolution. Compared to standard convolution, this modification retains the feature extraction capability of the network, increases the receptive field, and reduces the number of parameters and computational overhead. The structure of the SCAG is illustrated in Figure 5.

The encoder features and the up-sample signal is convolved separately using depth-separable convolution, after which the resulting features are combined along the channel dimension.

W 1

is the weight matrix used for a linear transformation of the cascaded features. After applying the ReLU activation function, the resulting features are processed. The data then undergo a linear transformation using

W 2

, which is employed to compute the final weights. The Sigmoid function restricts the output to the [0, 1] interval, thereby generating the attention coefficient

α

. This enables the model to dynamically assess feature importance, suppress irrelevant information, and highlight key features.

α = σ (W_{2} \cdot ReLU (W_{1} \cdot [Depthwise (X_{upsample}), Depthwise (X_{encoder})]))

(16)

Finally, the encoder features are multiplied pixel by pixel by the attention coefficient to obtain the conditioned feature output.

{\hat{X}}^{l} = α \cdot X^{l}

(17)

3.6. RAUNet++

The RAUnet++ network model, based on a separable convolutional attention gate, an attention mechanism, a residual unit, and Unet++, is used for pseudo-label generation and jet stream axis extraction in the consistency learning phase of semi-supervised methods. The structure of RAUnet++ is shown in Figure 6. RAUnet++ adopts UNet++ as the base framework and integrates SCAG into the dense skip connections. SCAG is applied to the feature maps generated by each encoder block, refining these features before they are transmitted to the decoder. Specifically, SCAG uses depth-wise convolution to extract spatial features within each channel and pointwise convolution to perform a linear combination across channels, reducing computational complexity while preserving critical information. By dynamically calculating attention coefficients, SCAG suppresses redundant or irrelevant features and emphasizes salient semantic information. This improvement ensures that features transmitted over dense skip connections are context-rich, thus facilitating more effective multi-scale feature fusion in the decoder, which can effectively address the shortcomings present in Unet++ and improve segmentation accuracy.

The enhanced skip connection can be expressed as follows:

x^{i, j}

represents the output of node

X^{i, j}

, where i denotes the encoder’s down-sampling layer and j denotes the convolutional layer in this dense skip path.

x^{i, j} = \{\begin{matrix} H (x^{i - 1, j}), & j = 0 \\ H ([T (x^{i + 1, j - 1}), \sum_{k = 0}^{j - 1} SCAG (x^{i, k})]), & j > 0 \end{matrix}

(18)

H (\cdot)

denotes the convolution operation followed by the ReLU activation function, and

T (\cdot)

denotes the up-sampling operation.

S C A G (\cdot)

refers to the attention gate based on a separable convolutional. The operation

[\cdot, \cdot]

represents feature concatenation along the channel dimension. As shown in Figure 7, the green arrow represents the gating signal, the blue arrow represents upsampling, and the green triangle represents SCAG.

The combination of these two approaches offers two benefits. First, utilizing the SCAG model refines the features extracted from the encoder at the decoder stage. Enabling accurate fusion of semantic and spatial information. As it up-samples layer by layer, SCAG dynamically focuses on relevant features, ensuring that details are preserved during decoding. The use of depth-separable convolution reduces computational cost compared to standard convolution. Additionally, the incorporation of residual units and DropConnect mitigates overfitting during training and improves gradient flow stability.

3.7. Deep Supervision and Pruning

In the extended Unet++ model, we also introduce deep supervision [40], which improves the performance of the model by adding loss signals to the intermediate layers. In addition, we incorporated multilayer supervision, which provides direct feedback signals to different depths of the network. This is primarily employed to train the model in conjunction with the losses of multiple layers. Deep supervision is employed exclusively during self-training, with BCEWithLogitsLoss as the loss function. In this context, each intermediate output is compared to the actual label, generating the corresponding loss. The overall loss of the network is subsequently calculated by weighting and summing the losses from each layer.

L_{total} = α_{1} L_{{out}_{1}} + α_{2} L_{{out}_{2}} + \dots + α_{n} L_{{out}_{n}}

(19)

where

α_{n}

is the weight of the n-th intermediate layer loss. For simplicity and consistency with the original UNet++ implementation, the losses from all intermediate layers are equally weighted, with

α_{n} = \frac{1}{n}

, where n is the total number of intermediate layers.

Figure 8 illustrates the different levels of model outputs resulting from the choice of different layers of decoders, using RAUnet++Li to represent the model outputs at the i-th layer of decoders. For example, the model maximum pruning of RAUnet++L1 indicates that the model segmentation result comes from decoder

X^{0, 1}

, and RAUnet++L4 without pruning indicates that the model segmentation result comes from decoder

X^{0, 4}

.

3.8. Eight-Neighbor Connection Algorithm Based on Jet Stream Center Axis Points

When plotting jet streamlines by connecting the fast axis midpoints extracted from the skeleton, the standard bottom-up connection method leads to anomalous axes, such as the scattered point connection problem shown in the green dashed box in Figure 9. This problem is transformed into the red dashed box after the Cartesian coordinate transformation. To address this issue, we propose an eight-neighbor connection algorithm based on the jet stream center axis points of the jet stream to ensure that the scattered points in the image are connected in an orderly manner according to the spatial order rather than randomly. The core of the algorithm is as follows, as shown in Algorithm 1.

(1) First, identify a suitable starting pixel. Only one neighbor in the eight neighborhoods around this pixel point has a color value, ensuring the uniqueness and order of the connections.

(2) After finding the starting point, the algorithm selects the nearest unvisited neighboring point to the current point by traversing the eight-neighborhood of the current pixel, preferring those that have not been marked as visited.

(3) The process is repeated for each point, adding each neighboring pixel to the set of connected points.

In cases where multiple unvisited neighbors are equidistant from the current point, a tie-breaking mechanism is applied to ensure consistent and orderly connections. The mechanism prioritizes neighbors based on a fixed direction order (e.g., up, down, left, right, diagonals). If neighbors remain tied, the algorithm selects the next pixel based on the first occurrence in the traversal sequence, ensuring deterministic behavior. This method not only simplifies the process of connecting the central axis points of the jet stream but also maintains the spatial structure when converted to a standard data format, solving the problem of random connections in the image as shown in the red solid and blue dashed box in Figure 9.

Algorithm 1 Eight-Neighbor Connection Algorithm for Jet Stream Center Axis Points.

1:: Input: Image I, set of pixel points P
2:: Output: Connected point set C
3:: Find a starting point $s \in P$ with a unique color value in its eight-neighbor region
4:: Initialize the visited point set $C \leftarrow {s}$
5:: Define the direction priority order as:
6:: $Priority \leftarrow [up, down, left, right, top - left, top - right, bottom - left, bottom - right]$
7:: while There exists an unvisited neighboring point n do
8:: Traverse the eight-neighbor region $N (c)$ of the current point $c \in C$
9:: Find the nearest unvisited neighbor(s) $N_{\min} \in N (c) \cap (P ∖ C)$
10:: if $| N_{\min} | > 1$ then
11:: Resolve tie by selecting the first neighbor in Priority order from $N_{\min}$
12:: end if
13:: if $N_{\min}$ is not empty then
14:: Select $n_{\min} \in N_{\min}$
15:: Add $n_{\min}$ to the visited set $C \leftarrow C \cup {n_{\min}}$
16:: Update the current point to $c \leftarrow n_{\min}$
17:: end if
18:: end while
19:: Return the connected point set C

4. Experiments

4.1. Experimental Setup

The experimental setup includes an Intel Xeon Gold 6142 CPU, two NVIDIA RTX 4060Ti GPUs, and 64 GB of memory. The software environment consists of PyTorch 1.11.0, CUDA 11.3, and Python 3.8.18.

During the consistency learning phase, the two RAUnet++ models are initialized using Kaiming initialization and Xavier initialization, respectively, as provided by the PyTorch framework. The initial learning rate is set to 0.0001. The batch size for labeled data is set to 8. Considering that unlabeled data requires generating pseudo-labels through the model’s own predictions, and these pseudo-labels are typically of lower quality than real labels, the batch size for unlabeled data is set to 4 to reduce the negative impact of low-quality pseudo-labels. The initial consistency and consistency enhancement are set to 0.01 and 150, respectively.

In the self-training phase, we use ReduceLROnPlateau as the learning rate scheduler, which dynamically adjusts the learning rate when the validation loss reaches a plateau. This scheduler monitors the validation loss and reduces the learning rate if no improvement is observed for a predefined number of epochs. The Adam optimizer is used with a learning rate of 0.0001 and a batch size of 8. The maximum number of training epochs is initially set to 500 to allow sufficient flexibility for convergence in early experiments. Based on extensive trials, we observed that the model consistently reaches convergence within 100 epochs, with early stopping triggered by the plateauing of validation loss. To optimize computational efficiency in subsequent experiments, the maximum number of epochs was adjusted to 150 to ensure sufficient convergence without over-training. To further prevent overfitting, early stopping is employed, terminating training if the validation loss does not improve for 10 consecutive epochs.

4.2. Evaluation Metrics

To assess the model’s performance, we employ commonly used evaluation metrics in image segmentation: Intersection over Union (IoU), Dice coefficient (Dice), Precision (Pre), and Detection Rate (DR). IoU measures the overlap between the predicted segmentation and the ground truth labels relative to their union. It is widely recognized as a robust metric for evaluating segmentation tasks because it directly quantifies the degree of alignment between predicted and actual regions. The Dice coefficient emphasizes the similarity between the predicted and ground truth regions by focusing on the size of their intersection. This metric is particularly useful in scenarios where the target region may be small relative to the entire image, as it ensures that minor errors in segmentation do not disproportionately affect the overall evaluation. In our study, the Dice coefficient complements IoU by providing an additional perspective on model similarity and alignment. Precision evaluates the proportion of correctly predicted positive samples out of all predicted positive samples. This metric is crucial for assessing the reliability of the model’s predictions, particularly in cases where over-segmentation (predicting areas outside the target region) can lead to misleading results. High precision indicates that the model is effective at minimizing false positives, which is essential for accurately delineating jet stream axes. DR measures the model’s ability to correctly identify positive samples within the target region. This metric is particularly important in this study, as it reflects the model’s sensitivity to detecting jet stream axes, ensuring that the predicted regions adequately capture the target features.

The IoU and Dice calculations are shown in Equations (20) and (21), respectively.

\begin{matrix} I o U & = \frac{T P}{T P + F P + F N} \end{matrix}

(20)

\begin{matrix} D i c e = \frac{2 T P}{2 T P + F P + F N} \end{matrix}

(21)

Pre and DR can be expressed as Equations (22) and (23).

\begin{matrix} P r e & = \frac{T P}{T P + F P} \end{matrix}

(22)

\begin{matrix} D R & = \frac{T P}{T P + F N} \end{matrix}

(23)

4.3. Training Process

Figure 10 illustrates the Loss and IoU performance of the model on the training and validation sets during the self-training phase. The horizontal axis represents the number of training and validation iterations, while the vertical axis denotes the Loss and IoU values. As depicted in the figure, the Loss for both the training and validation sets fluctuates significantly between epochs 20 and 40, without a consistent decrease. Similarly, during the period of fluctuation, the IoU also exhibits pronounced volatility. However, the model’s performance improves steadily thereafter.

The proposed model achieves convergence for both training and validation Loss within 100 epochs, without any signs of overfitting or underfitting, indicating excellent robustness and stability.

4.4. Comparison with Other Semi-Supervisory Methods

To highlight the advantages of the proposed method, we compared it with four recent semi-supervised learning approaches. The approaches include Mean Teacher (MT) [41], introduced in 2018, which is based on consistency regularization and uses a teacher model to generate reliable pseudo-labels for the training process. The teacher model’s weights are updated using the exponential moving average (EMA) of the student model’s weights. As a foundational method in semi-supervised learning, MT provides a benchmark for evaluating the effectiveness of consistency-based learning strategies. Uncertainty-aware Mean Teacher (UAMT) [42], proposed in 2019, builds on MT by incorporating an uncertainty-aware mechanism that adjusts the weight of the consistency loss based on prediction uncertainty, thus handling unlabeled data more effectively. Its inclusion highlights our method’s ability to generate and utilize high-quality pseudo-labels, particularly in scenarios with noisy or uncertain data. Deep Co-Training (DCT) [43], introduced in 2018, leverages the co-training paradigm and utilizes the heterogeneity of multiple models to generate pseudo-labels. These models perform mutual supervision learning, where they act as teachers for each other. By incorporating multiple consistency constraints, DCT provides a relevant benchmark for methods that aim to extend consistency-based learning frameworks. Cross pseudo supervision (CPS) [44], proposed in 2021, centers around generating pseudo-labels, with two models generating pseudo-labels for each other to achieve mutual supervision. As a state-of-the-art method, CPS offers a robust benchmark for evaluating the effectiveness of our proposed enhancements, particularly since our method is based on modifications and improvements to CPS.

We categorized the labeled data into four classes based on their proportions, 5%, 10%, 20%, and 30%, and used four semi-supervised learning methods to compare their performance on datasets with varying proportions of labeled data. These proportions are selected to investigate how the model performs under different levels of labeled data availability, reflecting practical scenarios where labeled data can be scarce. By testing across these ranges, we aim to better understand the balance between labeled and unlabeled data and assess how well the methods generalize as the proportion of labeled data increases. As shown in Table 2, when the proportion of labeled data is 5%, our method performs well across all metrics, though it does not significantly surpass other semi-supervised methods, with only minor differences. As the labeled data proportion increases to 10%, our method shows significant improvement in the Dice coefficient, and it also outperforms other semi-supervised methods on other metrics, demonstrating stronger generalization capability. When the proportion of labeled data reaches 20% and 30%, our method significantly outperforms all other methods on all evaluation metrics, particularly excelling in IoU and Pre metrics, further highlighting its advantage. These results indicate that as the proportion of labeled data increases, our method is better able to leverage the labeled information, enhancing the model’s precision and robustness.

Figure 11 shows the comparison of our method with other semi-supervised methods in terms of Dice coefficients after several experiments. The upper and lower whiskers and distribution ranges of the box plots indicate that our proposed method has a more centralized distribution of results and lower volatility of results, reflecting stronger robustness. Our method not only improves the median of the evaluation metrics but also reduces the deviation of the extremes (minimum and maximum values), further demonstrating the effectiveness of the method.

Figure 12 demonstrates the improvement in evaluation metrics of our proposed method compared to the supervised baseline model under different labeled data ratios. Compared to the baseline model, the DR and Pre metrics in our proposed method show significant improvement across all labeled data ratios, with the performance improvement being more pronounced when the labeled data ratio is higher (e.g., 30%). Even when the labeled data ratio is lower (e.g., 5%), our method still outperforms the supervised baseline model, despite the overall performance degradation, highlighting its robustness and superiority.

4.5. Ablation Study

To test the robustness of the proposed method, fully supervised models such as U-Net and U-Net++ were employed for comparison, and the performance of the proposed method was analyzed based on different backbone network combinations and the inclusion or exclusion of SCAG. Table 3 presents the results of the ablation experiments, where Attention-Unet++ represents U-Net++ extended with SCAG only, RAUnet++ represents U-Net++ extended with both residual units and SCAG, and CPS-RAUnet++ denotes the proposed method. The ablation experiments were conducted on a dataset with 30% labeled data.

As shown in Table 3, when the fully supervised models U-Net and U-Net++ are trained with 30% labeled data, they perform unsatisfactorily on the test set, with Dice scores of 65.25% and 72.03%, respectively. This shows that supervised models cannot achieve satisfactory results when there is insufficient labeled data. Compared to the original U-Net++, the addition of residual units and SCAG significantly improved the model’s performance across all metrics. Furthermore, the IoU, Dice, DR, and Pre metrics are notably higher in RAUnet++ with the semi-supervised approach, demonstrating that incorporating a semi-supervised approach enhances model performance. The proposed method outperforms all four metrics, further validating that combining the extended U-Net++ with a semi-supervised approach leads to superior performance. These findings confirm that each component of the proposed method plays a key role in improving segmentation accuracy and robustness, especially in the case of limited labeled data.

Figure 13 shows a comparison of the segmentation results of the different models in the ablation experiment. While all models can identify the approximate regions of the jet stream axis with some accuracy, the other comparison models suffer from missed detection and false detection, especially in the complex regions of jet stream bifurcation and merging. The red circles in the figure mark some of the missed regions, and the green circles mark some of the false detection regions, and these phenomena reflect the limitations of the models in dealing with complex scenes. Specifically, the Unet model performs poorly, with significant missed detection in each of the examples, especially at locations where rapids bifurcate and converge, which typically have higher structural complexity and tend to lead to inaccurate model predictions. The unimproved Unet++ model also had a high number of misses and false detections in areas of rapid bifurcation or confluence, suggesting that its ability to capture localized features is still insufficient. In contrast, RAUnet++ introduces an attention mechanism and a residual structure to improve the detection of the jet stream axis. By enhancing the local feature extraction capability and improving the model’s ability to capture contextual information, RAUnet++ significantly reduces the occurrence of missed and false detections. However, despite the significant performance improvement of RAUnet++ over Unet and Unet++, leakage detection still exists in some complex regions. Our proposed CPS-RAUnet++ method performs best in jet stream axis detection. The model successfully identifies all jet stream axis regions with only minor missed detection at a few fast mergers. This suggests that by combining attention gates, residual mechanisms, and semi-supervised methods, the model can better capture complex structures using a small amount of labeled data, resulting in significantly improved accuracy and robustness of segmentation. Overall, our method performs well during jet stream axis bifurcation and merging and can obtain higher-quality segmentation results, further validating its potential for application in complex meteorological data analysis.

Table 4 illustrates the effect of different supervised loss function weight combinations on segmentation results during the consistency learning phase. The goal of the experiment is to identify the optimal combination of

λ

and

β

values to achieve better segmentation performance. The results indicate that when

λ = 0.5

and

β = 1

, the model achieves the highest values in Dice (79.19%), DR (78.67%), and Pre (80.28%), demonstrating that this weight combination delivers the best overall segmentation performance. This result suggests that appropriately increasing the weight of the Dice loss effectively optimizes the shape and boundaries of the target regions. Additionally, when

λ = 1

and

β = 0.5

, the model achieves near-optimal performance in Dice (79.11%) and Pre (79.19%) but exhibits slight declines in DR and IoU. This indicates that with this combination, the cross-entropy loss weight dominates, focusing more on improving classification accuracy, but it is slightly less effective in refining segmentation boundaries. In summary, the experimental results demonstrate that moderately increasing the weight of the Dice loss relative to the cross-entropy loss effectively enhances the model’s segmentation capabilities, particularly in reconstructing the shape and optimizing the boundaries of target regions.

Figure 14 shows the comparison between CPS-RAUnet++ and other methods on the evaluation metric Pre after several experiments. From the figure, it is obvious that the Pre distribution of Unet is low, indicating that it performs poorly in dealing with complex features. U-net++ has obvious improvement compared with Unet, but its upper accuracy limit and stability are still limited, indicating that the underlying network modeling capability is limited; RAUnet++ further combines residual units and SCAG, and its Pre distribution is more concentrated and higher than Attention- Unet++, indicating that the residual unit helps to further optimize the feature representation and reduce the gradient vanishing problem. Our proposed method obtains the highest evaluation metric Pre value in several experiments with a more stable distribution, which indicates that our method is more robust in terms of accuracy and consistency.

Figure 15 illustrates the inference time, Intersection over Union (IoU), and the number of model parameters for the RAUnet++ model at different pruning levels after self-training, tested on 100 images from the test set. As demonstrated by the figure, RAUnet++L3 achieves an average reduction of 23% in parameters and a 3.54-s decrease in inference time compared to RAUnet++L4, with only a minor average IoU drop of 3.5%. For RAUnet++L2, there is a significant 94.6% reduction in parameters and reduces the inference time to 6.54 s, with an IoU decrease of 12.89%. These results demonstrate that model pruning impacts performance but also accelerates inference time and reduces the number of parameters. Therefore, different pruning models can be selected based on specific use cases to optimize the balance between inference time, model size, and performance.

The elliptical dashed box in Figure 16 highlights the significant advantages of the proposed CPS-RAUnet++ model in the critical region. The model’s extraction of the jet stream region is highly accurate, with fewer noise points shown in the dashed box, and the direction of the jet stream axis is highly consistent with the actual labeling. Other models, such as U-Net, Unet++, Attention-Unet++, and RAUnet++, exhibit missed jet stream detections in the black elliptical dashed box. In the purple elliptical dashed box, the proposed method better captures small changes in the direction of the jet stream, particularly in regions with drastic changes in the wind field (e.g., the leading and trailing edges). Although CPS-RAUnet++ performs remarkably in extracting the jet stream axis, there are still areas that require improvement, such as in the yellow elliptical dashed box in Figure 16.

5. Conclusions

This paper addresses the challenges of poor generalization and the inability to handle jet stream merging and bifurcation when automatically extracting jet stream axes using traditional wind field analysis methods. We extend the U-Net++ model by integrating it with a semi-supervised approach for jet stream axis identification, incorporating the cross pseudo supervision method. The proposed approach includes two main steps: jet stream region segmentation using residual units, attention mechanisms, and semi-supervised learning; and jet stream axis extraction based on the eight-neighbor connection algorithm for jet stream center axis points. The semi-supervised learning method effectively leverages a small amount of labeled data alongside a large amount of unlabeled data, reducing the reliance on labeled data and improving the model’s generalization capability. Experimental results demonstrate that the method proposed in this paper achieves a recognition accuracy exceeding 80% on the test set, outperforming other semantic segmentation methods. However, the method still has limitations in recognition, particularly in regions where jet stream features are less apparent or where the atmospheric wind field is more complex, resulting in lower accuracy and deviating from the manually drawn results. Future work can involve combining neural networks with more robust feature extraction capabilities to improve jet stream axis identification and extraction further.

Author Contributions

Conceptualization, J.G.; methodology, J.G. and K.C.; supervision, J.G.; project administration, J.G.; software, K.C., Z.L. and F.Z.; validation, K.C.; writing—original draft preparation, K.C.; writing—review and editing, C.F.; funding acquisition, C.F. and X.D.; investigation, W.H. and X.D.; resources, W.H.; data curation, Z.L.; formal analysis, P.W. and T.L.; visualization, P.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Key R&D Program for Social Development in Yunnan Provincial (China), grant number 202203AC100006-4; the Key Projects of Open Fund (grant numbers ZSAQ202401, ZSAQ202423, ZSAQ202424); the Sichuan Science and Technology Plan: Research and Application of Key Technologies of Command and Equipment for Hail Suppression in Xinjiang (grant number 2024YFHZ0151); the Second Comprehensive Scientific Investigation of the Tibetan Plateau-Extreme Weather and Climate Events and Disaster Risk (grant number 2019QZKK0104), funded by the Ministry of Science and Technology; the National Funded Postdoctoral Research Program (grant number GZC20241900); the Natural Science Foundation Program of Xinjiang Uygur Autonomous Region (grant number 2024D01A141); the Tianchi Talents Program of Xinjiang Uygur Autonomous Region; and the Postdoctoral Fund of Xinjiang Uygur Autonomous Region.

Data Availability Statement

The data utilized in this study were obtained from the European Centre for Medium-Range Weather Forecasts (ECMWF) through the Copernicus Climate Data Store. These datasets, specifically the ERA5 reanalysis on pressure levels, can be accessed at https://cds.climate.copernicus.eu/datasets/reanalysis-era5-pressure-levels (accessed on 20 January 2024). Users may need to register on the Copernicus platform and agree to the terms of use specified by ECMWF to download the data. The code for this study is available on GitHub and can be accessed at https://github.com/Spider-ck/CPS-RAUnetPlus (accessed on 1 December 2024).

Conflicts of Interest

The authors declare no conflict of interest.

References

Kidston, J.; Scaife, A.A.; Hardiman, S.C.; Mitchell, D.M.; Butchart, N.; Baldwin, M.P.; Gray, L.J. Stratospheric influence on tropospheric jet streams, storm tracks, and surface weather. Nat. Geosci. 2015, 8, 433–440. [Google Scholar] [CrossRef]
Stendel, M.; Francis, J.; White, R.; Williams, P.D.; Woollings, T. The jet stream and climate change. In Climate Change; Elsevier: Amsterdam, The Netherlands, 2021; pp. 327–357. [Google Scholar]
Ahmed, F.; Adnan, S.; Latif, M. Impact of jet stream and associated mechanisms on winter precipitation in Pakistan. Meteorol. Atmos. Phys. 2020, 132, 225–238. [Google Scholar] [CrossRef]
Barnes, E.A.; Screen, J.A. The impact of Arctic warming on the midlatitude jet-stream: Can it? Has it? Will it? Wiley Interdiscip. Rev. Clim. Chang. 2015, 6, 277–286. [Google Scholar] [CrossRef]
Ikotun, A.M.; Ezugwu, A.E.; Abualigah, L.; Abuhaija, B.; Heming, J. K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data. Inf. Sci. 2023, 622, 178–210. [Google Scholar] [CrossRef]
Wang, P.; Wang, C.; Wang, D. Study on Identification of Low-Level Jet and Automatic Drawing Method of Jet Axis. Meteorology 2018, 44, 952–960. [Google Scholar]
Colominas, M.A.; Meignen, S.; Pham, D.H. Fully adaptive ridge detection based on STFT phase information. IEEE Signal Process. Lett. 2020, 27, 620–624. [Google Scholar] [CrossRef]
Molnos, S.; Mamdouh, T.; Petri, S.; Nocke, T.; Weinkauf, T.; Coumou, D. A network-based detection scheme for the jet stream core. Earth Syst. Dyn. 2017, 8, 75–89. [Google Scholar] [CrossRef]
Yang, E.G.; Kim, H.M.; Kim, D.H. Development of East Asia Regional Reanalysis based on advanced hybrid gain data assimilation method and evaluation with E3DVAR, ERA-5, and ERA-Interim reanalysis. Earth Syst. Sci. Data 2022, 14, 2109–2127. [Google Scholar] [CrossRef]
Eusebi, R.; Vecchi, G.A.; Lai, C.Y.; Tong, M. Realistic Tropical Cyclone Wind and Pressure Fields Can Be Reconstructed from Sparse Data Using Deep Learning. Commun. Earth Environ. 2024, 5, 8. [Google Scholar] [CrossRef]
Ekmekci, I.; Oner, H.; Sen, Y. Prediction of circular jet streams with artificial neural networks. In Proceedings of the 2012 International Symposium on Innovations in Intelligent Systems and Applications, Trabzon, Turkey, 2–4 July 2012; pp. 1–5. [Google Scholar]
Phermphoonphiphat, E.; Tomita, T.; Numao, M.; Fukui, K. A study of upper tropospheric circulations over the northern hemisphere prediction using multivariate features by ConvLSTM. In Proceedings of the 23rd Asia Pacific Symposium on Intelligent and Evolutionary Systems, Hiroshima, Japan, 18–20 November 2019; Springer: Cham, Switzerland, 2020; pp. 130–141. [Google Scholar]
Hakim, G.J.; Masanam, S. Dynamical tests of a deep-learning weather prediction model. In Artificial Intelligence for the Earth Systems; American Meteorological Society: Boston, MA, USA, 2024. [Google Scholar]
Van Engelen, J.E.; Hoos, H.H. A survey on semi-supervised learning. Mach. Learn. 2020, 109, 373–440. [Google Scholar] [CrossRef]
Sohn, K.; Berthelot, D.; Carlini, N.; Zhang, Z.; Zhang, H.; Raffel, C.A.; Cubuk, E.D.; Kurakin, A.; Li, C.L. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. Adv. Neural Inf. Process. Syst. 2020, 33, 596–608. [Google Scholar]
Dansana, J.; Kabat, M.R.; Pattnaik, P.K. Improved 3D Rotation-based Geometric Data Perturbation Based on Medical Data Preservation in Big Data. Int. J. Adv. Comput. Sci. Appl. 2023, 14, 5. [Google Scholar] [CrossRef]
Li, K.; Chen, X.; Wu, K.; Liu, H.; Dai, F.; Yang, T.; Yu, J.; Wang, K. Analysis of the Relationship between Upper-Level Aircraft Turbulence and the East Asian Westerly Jet Stream. Atmosphere 2024, 15, 1138. [Google Scholar] [CrossRef]
Spensberger, C.; Spengler, T.; Li, C. Upper-tropospheric jet axis detection and application to the boreal winter 2013/14. Mon. Weather Rev. 2017, 145, 2363–2374. [Google Scholar] [CrossRef]
Kern, M.; Hewson, T.; Sadlo, F.; Westermann, R.; Rautenhaus, M. Robust detection and visualization of jet-stream core lines in atmospheric flow. IEEE Trans. Vis. Comput. Graph. 2017, 24, 893–902. [Google Scholar] [CrossRef]
Zhou, Z.; Cao, L.; Liao, J.; Gu, J.; Zhang, T.; Pan, C. Overview of Hydrometeorological Information: Observation, Fusion, and Reanalysis. Meteorology 2022, 48, 272–283. [Google Scholar]
Gan, J.; Qi, H.; Hu, W.; Shu, H.; Luo, F.; He, T.; Yin, Q.; Lai, R. A method for calculating jet streamlines in atmospheric wind field. J. Sichuan Univ. (Nat. Sci. Ed.) 2020, 57, 1084–1089. [Google Scholar]
Lee, D.H. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Workshop on Challenges in Representation Learning, ICML; The Science and Information Organization: New York, NY, USA, 2013; Volume 3, p. 896. [Google Scholar]
Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning; PMLR: McKees Rocks, PA, USA, 2020; pp. 1597–1607. [Google Scholar]
Miyato, T.; Maeda, S.; Koyama, M.; Ishii, S. Virtual adversarial training: A regularization method for supervised and semi-supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 41, 1979–1993. [Google Scholar] [CrossRef]
Versaci, M.; Angiulli, G.; La Foresta, F.; Laganà, F.; Palumbo, A. Intuitionistic fuzzy divergence for evaluating the mechanical stress state of steel plates subject to bi-axial loads. Integr. Comput. Aided Eng. 2024, 31, 363–379. [Google Scholar] [CrossRef]
Chen, X.; Yuan, Y.; Zeng, G.; Wang, J. Semi-supervised semantic segmentation with cross pseudo supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 2613–2622. [Google Scholar]
Tanha, J.; Van Someren, M.; Afsarmanesh, H. Semi-supervised self-training for decision tree classifiers. Int. J. Mach. Learn. Cybern. 2017, 8, 355–370. [Google Scholar] [CrossRef]
Alom, M.Z.; Taha, T.M.; Yakopcic, C.; Westberg, S.; Sidike, P.; Nasrin, M.S.; Van Esesn, B.C.; Awwal, A.A.S.; Asari, V.K. The History Began from AlexNet: A Comprehensive Survey on Deep Learning Approaches. arXiv 2018, arXiv:1803.01164. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015, Proceedings, Part III; Springer International Publishing: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
Chen, J.; Lu, Y.; Yu, Q.; Luo, X.; Adeli, E.; Wang, Y.; Lu, L.; Yuille, A.L.; Zhou, Y. Transunet: Transformers make strong encoders for medical image segmentation. arXiv 2021, arXiv:2102.04306. [Google Scholar]
Huang, H.; Lin, L.; Tong, R.; Hu, H.; Zhang, Q.; Iwamoto, Y.; Han, X.; Chen, Y.W.; Wu, J. Unet 3+: A full-scale connected unet for medical image segmentation. In Proceedings of the ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 1055–1059. [Google Scholar]
Cao, H.; Wang, Y.; Chen, J.; Jiang, D.; Zhang, X.; Tian, Q.; Wang, M. Swin-unet: Unet-like pure transformer for medical image segmentation. In European Conference on Computer Vision; Springer Nature Switzerland: Cham, Switzerland, 2022; pp. 205–218. [Google Scholar]
Zhou, Z.; Rahman Siddiquee, M.M.; Tajbakhsh, N.; Liang, J. Unet++: A nested u-net architecture for medical image segmentation. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, 20 September 2018, Proceedings 4; Springer International Publishing: Berlin/Heidelberg, Germany, 2018; pp. 3–11. [Google Scholar]
Hersbach, H.; Bell, B.; Berrisford, P.; Hirahara, S.; Horányi, A.; Muñoz-Sabater, J.; Nicolas, J.; Peubey, C.; Radu, R.; Schepers, D. The ERA5 global reanalysis. Q. J. R. Meteorol. Soc. 2020, 146, 1999–2049. [Google Scholar] [CrossRef]
Zhang, Z.; Sabuncu, M. Generalized cross entropy loss for training deep neural networks with noisy labels. In Advances in Neural Information Processing Systems; NeurIPS: Denver, CO, USA, 2018; Volume 31. [Google Scholar]
Wang, L.; Wang, C.; Sun, Z.; Chen, S. An improved dice loss for pneumothorax segmentation by mining the information of negative areas. IEEE Access 2020, 8, 167939–167949. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Wan, L.; Zeiler, M.; Zhang, S.; Sun, J. Regularization of neural networks using dropconnect. In Proceedings of the International Conference on Machine Learning, PMLR, Atlanta, GA, USA, 17–19 June 2013; pp. 1058–1066. [Google Scholar]
Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
Wang, L.; Lee, C.Y.; Tu, Z.; Lazebnik, S. Training deeper convolutional networks with deep supervision. arXiv 2015, arXiv:1505.02496. [Google Scholar]
Tarvainen, A.; Valpola, H. Mean Teachers Are Better Role Models: Weight-Averaged Consistency Targets Improve Semi-Supervised Deep Learning Results. In Advances in Neural Information Processing Systems; NeurIPS: Denver, CO, USA, 2017; Volume 30. [Google Scholar]
Yu, L.; Wang, S.; Li, X.; Fu, C.W.; Heng, P.A. Uncertainty-Aware Self-Ensembling Model for Semi-Supervised 3D Left Atrium Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2019: 22nd International Conference, Shenzhen, China, 13–17 October 2019; Springer International Publishing: Berlin/Heidelberg, Germany, 2019; pp. 605–613. [Google Scholar]
Qiao, S.; Shen, W.; Zhang, Z.; Wang, B.; Yuille, A. Deep Co-Training for Semi-Supervised Image Recognition. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 135–152. [Google Scholar]
Xiao, Y.; Chen, C.; Fu, X.; Wang, L.; Yu, J.; Zou, Y. A Novel Multi-Task Semi-Supervised Medical Image Segmentation Method Based on Multi-Branch Cross Pseudo Supervision. Appl. Intell. 2023, 53, 30343–30358. [Google Scholar] [CrossRef]

Figure 1. Unet Architecture.

Figure 2. Wind speed mapping to R channel pixel values function. The deeper the red, the more sensitive the mapping of wind speed to changes in the R channel pixel values.

Figure 3. Overall architecture of the CPS-RAUnet++ semi-supervised learning model. The two RAUnet++ in the consistency learning phase have the same structure and are initialized independently. Weak augmentation is performed for labeled data for each model input and strong augmentation is used for each unlabeled data. The red arrow in the figure represents unsupervised loss, the green dashed arrow represents supervised loss, the yellow arrow represents self-training phase loss, and the purple arrow represents the data processing process.

Figure 4. Unet++ backbone.We proposed Residual block with DropConnect.

Figure 5. We proposed SCAG architecture.

Figure 6. RAUnet++ architecture.

Figure 7. Detailed analysis of layer 1 dense skip paths in RAUnet++.

Figure 8. RAUNet++ can be pruned to RAUNet++ L1,RAUNet++ L2,RAUNet++ L3 and RAUNet++ L4 if trained with deep supervision. The cyan circles represent the input or feature maps. The blue circles represent the intermediate states of the model. The gray circles represent the network layers that can be pruned.

Figure 9. Improved results of jet stream axis plotting. The green dashed box contains the unmodified visualization result, while the blue dashed box contains the improved visualization result.

Figure 10. Loss and IoU trends in the training and validation sets during the self-training phase (the left figure shows the Loss trend, while the right figure shows the IoU trend; the red line represents the training set, and the blue line represents the validation set).

Figure 11. The comparison between CPS-RAUNet++ and other methods on Dice after multiple experiments for the test dataset, all experiments were conducted using 30% labeled data.

Figure 12. Improvement of our proposed method on the supervised baseline RAUnet++ regarding DR (left image) and Pre (right image) metrics at 30%, 20%, 10%, and 5% labeled data ratios.

Figure 13. Comparison of segmentation results of different models in the ablation experiment. The red circles represent missed cases, and the green circles represent false detection cases.

Figure 14. The comparison between CPS-RAUNet++ and other methods on Pre after multiple experiments for the test dataset.

Figure 15. Inference time, IoU, and Parameters of CPS-RAUNet++ L

1 - 4

for the test dataset.

Figure 15. Inference time, IoU, and Parameters of CPS-RAUNet++ L

1 - 4

for the test dataset.

Figure 16. Visualization results of the jet axis ablation experiment segmentation generated by MICAPS 4.0 software. The black and purple dashed boxes represent missed cases, and the yellow dashed box represents false detection areas.

Table 1. Data Augmentation Strategies for Labeled and Unlabeled Data.

Type	Transformation	Labeled Data (Weak)	Unlabeled Data (Strong)
Horizontal Flip	Flips images horizontally	✓	✓
Color Jitter	Adjusts brightness, contrast, and saturation	✗	✓
Gaussian Blur	Applies a 3 × 3 kernel blur	✗	✓

Table 2. Comparison of Metrics between Our Proposed Method and Other Semi-Supervised Methods under Different Ratios of Labeled Data.

Ratios of Labeled Data	Methods	Dice (%)	DR (%)	IoU (%)	Pre (%)
5%	MT	65.31	58.23	48.55	54.86
	DCT	61.44	60.37	48.13	52.32
	UAMT	66.92	57.38	53.19	62.14
	CPS	64.87	59.45	50.94	58.26
	CPS-RAUnet++	65.31	67.15	57.43	63.59
10%	MT	64.59	64.04	50.11	57.51
	DCT	64.77	61.78	51.12	56.71
	UAMT	68.91	62.78	54.69	62.12
	CPS	68.35	61.65	53.84	61.38
	CPS-RAUnet++	72.86	69.86	59.78	66.95
20%	MT	67.91	64.86	52.35	58.93
	DCT	65.49	64.12	51.88	57.87
	UAMT	70.56	63.52	56.23	63.16
	CPS	70.83	67.08	55.61	63.01
	CPS-RAUnet++	76.83	71.78	60.17	74.51
30%	MT	67.93	66.60	53.91	60.78
	DCT	70.59	64.67	55.37	66.13
	UAMT	71.64	64.21	56.85	65.81
	CPS	74.03	68.75	59.45	66.89
	CPS-RAUnet++	79.19	78.67	69.01	80.28

Table 3. Results of the ablation study for different combinations.

Methods	Dice (%)	Dice Deviation with Ours (%)	DR (%)	DR Deviation with Ours (%)	IoU (%)	IoU Deviation with Ours (%)	Pre (%)	Pre Deviation with Ours (%)
U-Net	65.25	13.94	64.11	14.56	53.81	15.20	63.26	17.02
Unet++	72.03	7.16	70.59	8.08	60.30	8.71	70.37	9.91
Attention-Unet++	71.68	7.51	69.35	9.32	62.11	6.90	73.34	6.94
RAUnet++	74.23	4.96	74.11	4.56	64.98	4.03	75.51	4.77
CPS-RAUnet++	79.19	/	78.67	/	69.01	/	80.28	/

“/” represents no improvement.

Table 4. Effect of different weight combinations of supervised loss functions on segmentation performance during the consistency learning phase.

Loss	Dice (%)	DR (%)	IoU (%)	Pre (%)
0.5 × $L_{C E}$ + 0.5 × $L_{Dice}$	77.06	77.54	69.54	78.26
0.5 × $L_{C E}$ + $L_{Dice}$	79.19	78.67	69.01	80.28
$L_{C E}$ + 0.5 × $L_{Dice}$	79.11	75.25	68.22	79.19
1.5 × $L_{C E}$ + 0.5 × $L_{Dice}$	78.53	76.22	67.89	79.22
0.5 × $L_{C E}$ + 1.5 × $L_{Dice}$	78.55	77.17	67.85	78.45

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gan, J.; Cai, K.; Fan, C.; Deng, X.; Hu, W.; Li, Z.; Wei, P.; Liao, T.; Zhang, F. CPS-RAUnet++: A Jet Axis Detection Method Based on Cross-Pseudo Supervision and Extended Unet++ Model. Electronics 2025, 14, 441. https://doi.org/10.3390/electronics14030441

AMA Style

Gan J, Cai K, Fan C, Deng X, Hu W, Li Z, Wei P, Liao T, Zhang F. CPS-RAUnet++: A Jet Axis Detection Method Based on Cross-Pseudo Supervision and Extended Unet++ Model. Electronics. 2025; 14(3):441. https://doi.org/10.3390/electronics14030441

Chicago/Turabian Style

Gan, Jianhong, Kun Cai, Changyuan Fan, Xun Deng, Wendong Hu, Zhibin Li, Peiyang Wei, Tao Liao, and Fan Zhang. 2025. "CPS-RAUnet++: A Jet Axis Detection Method Based on Cross-Pseudo Supervision and Extended Unet++ Model" Electronics 14, no. 3: 441. https://doi.org/10.3390/electronics14030441

APA Style

Gan, J., Cai, K., Fan, C., Deng, X., Hu, W., Li, Z., Wei, P., Liao, T., & Zhang, F. (2025). CPS-RAUnet++: A Jet Axis Detection Method Based on Cross-Pseudo Supervision and Extended Unet++ Model. Electronics, 14(3), 441. https://doi.org/10.3390/electronics14030441

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

CPS-RAUnet++: A Jet Axis Detection Method Based on Cross-Pseudo Supervision and Extended Unet++ Model

Abstract

1. Introduction

2. Related Work

3. Methodology

3.1. Dataset and Preprocessing

Data Augmentation

3.2. Structure of the Semi-Supervised Method

3.3. Loss Function

3.4. Residual Block with DropConnect

3.5. Separable Convolutional Attention Gate

3.6. RAUNet++

3.7. Deep Supervision and Pruning

3.8. Eight-Neighbor Connection Algorithm Based on Jet Stream Center Axis Points

4. Experiments

4.1. Experimental Setup

4.2. Evaluation Metrics

4.3. Training Process

4.4. Comparison with Other Semi-Supervisory Methods

4.5. Ablation Study

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI