Unsupervised domain adaptation of dynamic extension networks based on class decision boundaries

Chen, Yuanjiao; Wang, Diao; Zhu, Darong; Xu, Zhe; He, Bishi

doi:10.1007/s00530-024-01278-z

Unsupervised domain adaptation of dynamic extension networks based on class decision boundaries

Regular Paper
Open access
Published: 13 March 2024

Volume 30, article number 80, (2024)
Cite this article

Download PDF

You have full access to this open access article

Multimedia Systems Aims and scope Submit manuscript

Unsupervised domain adaptation of dynamic extension networks based on class decision boundaries

Download PDF

Yuanjiao Chen¹,
Diao Wang¹,
Darong Zhu²,
Zhe Xu¹ &
…
Bishi He¹

959 Accesses
Explore all metrics

Abstract

In response to the problems of inaccurate feature alignment, loss of source domain information, imbalanced sample distribution, and biased class decision boundaries in traditional unsupervised domain adaptation methods, this paper proposes a class decision boundary-based dynamic expansion network unsupervised domain adaptation method called CDE-Net. Specifically, our method dynamically expands the autoencoder-based network structure, which can preserve source domain feature information while gradually adapting to the target domain data distribution and learning useful feature information from the target domain. Meanwhile, by minimizing clustering loss and conditional entropy loss, CDE-Net can explore the intrinsic structure of the data and push class decision boundaries away from dense data areas. We experimentally verify our method on three medical image datasets, chest X-rays, intracranial hemorrhage, and mammography, and achieve an average AUC improvement of 25.8% or more compared to non-transfer methods. In addition, we compare our method with previous unsupervised domain adaptation methods, and the experimental results show that our method achieves better classification accuracy and generalization performance.

Classifier Decoupled Training for Black-Box Unsupervised Domain Adaptation

Zeroth- and first-order difference discrimination for unsupervised domain adaptation

Article Open access 05 December 2023

NaCL: noise-robust cross-domain contrastive learning for unsupervised domain adaptation

Article 27 June 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In recent years, deep learning technology has made tremendous progress in fields such as image recognition, speech recognition, and natural language processing. Supervised learning is the most popular method for solving various target tasks, but it requires not only a large amount of labeled data but also the assumption that the training set and the test set are independently and identically distributed [1]. In practical applications, due to the diversity of data sources and the difficulty of obtaining annotations, the generalization ability of models still needs to be further improved [2]. Unsupervised domain adaptation is a technique that uses unlabeled data to improve model generalization and adaptability [3, 4], especially in practical application scenarios where sample distribution differences are large and data labeling costs are high, its advantages are more prominent.

In the research on unsupervised domain adaptation of medical images, some relatively mature methods and models have emerged, such as batch normalization, image registration, and adversarial learning. Nibali et al. [5] studied the classification of malignant pulmonary nodules based on deep residual networks. They extracted rich hierarchical features of chest images through batch normalization and ResNets residual learning and classified large-scale images before model training. Pre-trained weights on the dataset serve as the starting point for learning the target task. However, medical imaging data usually have a highly complex structure and diversity, including different organs, lesion types, and scanning equipment. Pre-trained weights that rely solely on large image classification datasets may not be suitable for specific malignancy classification tasks, requiring further fine-tuning to bridge the knowledge gap caused by domain shift. Adversarial learning is more widely used in unsupervised domain adaptation of medical images. Zhang et al. [6] proposed a single-stage framework for automatically learning unlabeled chest X-ray parsing from labeled CT scans. First, it is trained on chest X-rays, and then a task-driven generative adversarial network architecture is introduced to achieve simultaneous analysis of unseen real chest X-rays. Yang et al. [7] proposed an unsupervised domain adaptation method with an adversarial learning framework, utilizing lung texture patterns learned in high-resolution whole-lung scans to mark emphysema areas on cardiac CT scans. This method uses a trained convolutional neural network to classify lung textures on synthetic images on one hand, and distinguish real images from synthetic images through adversarial learning on the other. The image features derived from adversarial training preserve the labeling accuracy of synthetic scans. Mahapatra et al. [8] developed a deep data processing method for cross-modal image registration, using convolutional autoencoders to learn latent feature representations of images, and then using generators to synthesize registered images to achieve cross-domain knowledge transfer. Loey et al. [9] proposed a generative adversarial network using deep transfer learning for coronavirus COVID-19 for coronavirus detection in chest X-rays. They collected images of possible COVID-19 and used a GAN network to generate more images to help detect the virus with the highest possible accuracy from available chest X-rays. Mahapatra et al. [10] proposed an end-to-end Graph Convolutional Adversarial Network (GCAN) to learn cross-domain invariant semantic and structural features to achieve better performance when distribution changes. The method uses switched autoencoders for feature unwrapping to obtain textural and structural features, which are used for graph construction and defining the generator loss. Tang et al. [11] proposed a task-driven, discriminatively trained, cycle-consistent unsupervised generative adversarial network TUNA-Net. It is able to preserve low-level details, high-level semantic information, and mid-level feature representation during image-to-image conversion, which is beneficial for target disease identification tasks. The TUNA-Net model can adjust labeled adult chest X-rays in the source domain so that they look like they were extracted from pediatric chest X-rays in the unlabeled target domain, while preserving disease semantics. However, a big drawback of adversarial methods is that the training process is unstable. The game between the generator and the discriminator may put the training process into trouble, leading to problems such as mode oscillation and gradient disappearance. Moreover, medical images may have quality problems such as noise, artifacts, and unclearness. Once the generator falls into a pattern collapse situation and can only generate a limited number of patterns without covering the entire data distribution well, the generator cannot generate diverse and high-quality samples, thus affecting the evaluation of model performance. Wu et al. [12] proposed an architecture with a shared module for all tasks and a separate output module for each task to improve the performance of multi-task training and domain adaptation by aligning the embedding layers of multiple tasks. However, the diversity of medical image processing tasks is very high, different tasks may have different characteristics to represent the needs and objectives, and the design of shared modules may not fully meet the needs of each task. When the similarity between tasks is low or the conflict is large, the training process of one task will interfere with the performance of other tasks.

The above methods use unlabeled data to improve the generalization ability and adaptability of the model in different fields and scenarios, and solve the problems caused by the diversity of data sources and the difficulty in obtaining labeled data, but there are also some challenges and limitations, for example, the inapplicability of pre-training models on natural images, the uncertainty of the quality of generated images, and the instability of adversarial learning. Therefore, the research on unsupervised domain adaptation technology in medical imaging still has important theoretical and application value.

This paper proposes a novel unsupervised domain adaptation method based on class decision boundaries and dynamic expansion networks to further enhance the model's generalization ability and adaptability. Additionally, experimental validation is conducted on three medical image datasets, namely, chest X-ray, intracranial hemorrhage, and mammography, to demonstrate the effectiveness and applicability of the proposed method. Our main contributions can be summarized as follows: (1) Taking into account the diversity of medical imaging data sources, we propose a new unsupervised domain adaptation method CDE-Net, which achieves the retention and utilization of source domain feature information by dynamically expanding the network structure based on autoencoders, and gradually adapt to the data distribution of the target domain to overcome the inapplicability of existing methods in pre-training models on natural images for processing medical images. (2) By minimizing clustering loss and conditional entropy loss, we explore the intrinsic structure of the data and push the class decision boundaries away from densely populated regions, avoiding the unstable training process in adversarial learning, thereby further enhancing the model's generalization ability and adaptability. (3) Our research provides a new perspective in the field of unsupervised learning and holds significant theoretical and practical significance for advancing and applying unsupervised domain adaptation techniques in medical image analysis. The rest of this paper is organized as follows. Section 2 provides a review of related work. Section 3 elaborates on the details of our proposed CDE-Net method. Section 4 presents the experimental results and analysis. Finally, Sect. 5 concludes the paper.

2 Related work

Unsupervised domain adaptation is an emerging machine learning technique that aims to address the issue of poor performance when a model trained in one domain is tested in another domain. Over the past few years, this field has gained significant interest from researchers and has witnessed the development of various novel techniques and methods.

Statistical divergence alignment methods focus on learning domain-invariant feature representations by selecting appropriate divergence metrics to minimize domain discrepancies in the latent feature space. Commonly used metrics include Maximum Mean Discrepancy (MMD) [13], Contrastive Domain Discrepancy (CDD) [14], Wasserstein distance [15], among others. Kang et al. [14] proposed CDD, which incorporates class labels into MMD by estimating the labels of the target domain through alternating clustering, thus achieving alignment of class-conditional distributions. Ge et al. [16] used the Wasserstein distance as a measure of distribution divergence and incorporated risk-aware inter-class correlations into the training framework by configuring the distance matrix of the Discrete Optimal Transport (DOT) training framework. This method can better capture the similarity between the source and target domains, but adjusting the size of the distance matrix and the network structure requires some experience and expertise, and the model performance may be limited when the dataset is small.

With the advancement of Generative Adversarial Networks (GANs), adversarial training has been widely employed to achieve domain-invariant feature extraction. Naik et al. [17] introduced domain invariance through an adversarial domain adaptation framework and constructed representations for triggering entity recognition. Du et al. [18] proposed Dual Adversarial Domain Adaptation (DADA), which includes two joint discriminators that enable all classes from the source and target domains to confront each other and backpropagate into the feature extractor. The two GANs run in parallel, sharing weights at the initial layers of the generator and the last layer of the discriminator, capturing high-level features from the discriminator and high-level semantics from the generator, which helps the GAN understand the joint distribution of the domain. Rangwani et al. [19] introduced Smooth Domain Adversarial Training, arguing that losses for specific tasks should be minimized smoothly to better adapt to the target domain. However, adversarial training has its limitations. On one hand, the competition between the generator and discriminator can lead to training instability. On the other hand, the domain discriminator in adversarial learning may learn some domain-irrelevant features, rendering the adversarial learning ineffective.

Batch normalization layers enable faster training, smoother optimization, and more stable convergence due to their insensitivity to initialization. Early research suggested that low-order statistics such as mean and variance contain domain-specific information. For example, Li et al. [20] attempted to modulate BN statistics from the source domain to the target domain to achieve unsupervised domain adaptation (UDA). After training the network, except for the BN layers, the learned parameters and weights during the training process are fixed, and then the BN layers are added to the target domain to perform the specified task. However, low-order batch statistics are specific to particular domains, and simply forcing the means and variances to be the same between the source and target domains may cause the network to lose expressive power, as there are differences in the feature representations between the two domains [21]. Liu et al. [22, 23] developed a novel batch normalization statistics adaptation framework for UDA segmentation. They gradually adapt domain-specific low-order BN statistics using an exponential moving average strategy while explicitly enhancing the consistency of domain-shared high-order BN statistics based on optimization objectives. Although batch normalization methods can accelerate the convergence speed of neural networks, the statistics of batch normalization layers are calculated based on the distribution of training data. When there is a large difference in the data distribution between the source and target domains, the effectiveness of adaptation may diminish. Lv et al. [24] proposed a transferable semantic visual relationship method for transductive zero-shot learning, redefining image recognition as predicting similarity/dissimilarity labels for semantic visual fusion consisting of class attributes and visual features. Through the above transformation, the source domain and the target domain have the same label space, so that domain differences can be quantified. This method adjusts the distribution of semantic visual fusion by merging two batch normalization units at each layer to achieve feature alignment. However, due to the small number of similar semantic visual fusions in each mini-batch, which may result in a loss of information overall, the model may not be able to fully utilize the specific information of each domain, thus affecting performance. And methods that rely on incorporating domain-specific batch normalization may have effects that are limited by domain differences and cannot fully adapt to feature distribution changes in other domains to adapt to tasks. Another domain adaptation research based on zero-shot learning is the three-way semantically consistent embedding proposed by Zhang et al. [25], which learns domain-independent classification prototypes from the semantic embedding of class labels. The source domain features are related to the prototype through its supervision information. To maintain consistency, on the other hand, a mutual information maximization mechanism is introduced to push the target domain features and prototypes closer to each other. This method requires a semantic embedding bridge between visible classes and unseen classes, that is, there needs to be a semantic embedding corresponding to a known class label. This may limit its applicability in real-world applications, as obtaining accurate and complete semantic embeddings may be difficult.

Unsupervised domain adaptation based on self-training is a round-based alternative training approach, typically involving two steps. The first step is to create pseudo-labels in the target domain, and the second step is to retrain the network using these pseudo-labels and target domain data. Mei et al. [26] proposed a self-training framework for semantic segmentation tasks, employing pseudo-label generation strategies and region-guided regularization to smooth pseudo-label regions and sharpen non-pseudo-label regions. The effectiveness of self-training relies on the quality and distribution of the unlabeled data. If the selected unlabeled data are not diverse enough or contains certain biases, the performance of self-training may be affected. You et al. [27] presented a domain adaptation framework that includes positive learning and negative learning. In positive learning, a balanced set of pseudo-labeled pixels is selected using an intra-class threshold, while negative learning uses heuristically complementary labels to determine which class a pixel does not belong to. However, a significant issue with pseudo-labels is the presence of high noise, and thus, two important directions to improve pseudo-label quality are pseudo-label filtering and learning with noisy labels. Chu et al. [28] proposed a novel De-noised Maximum Classifier Discrepancy (D-MCD) method, which minimizes the distribution mismatch between selected pseudo-labeled samples and the remaining target domain samples to mitigate sample selection bias. However, during the self-training process, the model may encounter erroneous predictions, which are then propagated as labels to the next round of training, leading to further performance degradation.

Traditional convolutional neural network models tend to capture domain-specific local information, such as background details, which may cause the model to overlook the truly useful target information. Xu et al. [29] found that the cross-attention mechanism in Transformers is robust to noisy inputs and can achieve better feature alignment. Therefore, they designed a bidirectional center-aware marker algorithm and proposed a weight-sharing three-branch Transformer framework, applying self-attention and cross-attention for source/target domain feature learning and source-target domain alignment, respectively. Bohdal et al. [30] partitioned target domain images into support images and query images under the unsupervised setting and improved the original patch-to-patch operation to image-to-image to capture holistic representations and reduce computational burden. Kothandaraman et al. [31] further processed features from both spatial and channel perspectives, conducted feature distillation from pre-trained networks to target networks, and supplemented target samples mined based on transferability and uncertainty criteria to enrich contextual semantics. However, for some complex domains and tasks, self-attention mechanisms might not capture all critical features as they primarily focus on local relationships within input sequences while neglecting global relationships.

Unsupervised domain adaptation based on self-supervised learning relies solely on unlabeled data to set learning tasks, such as context prediction or image rotation, to compute target representations without supervision [32]. Sun et al. [33] used three self-supervised tasks, rotation, flipping, and quadrant prediction, to align learned representations between the source and target domains. However, learning high-quality feature representations remains a challenge, as it requires comprehensive consideration of factors such as feature abstractness, generalizability, and interpretability. Kim et al. [34] proposed a novel cross-domain self-supervised learning method to capture intra-domain visual similarity in a domain-adaptive manner. The learned features are not only domain invariant but also class discriminative. Additionally, research has indicated that target domain samples from the same category may be tightly clustered. Kumar et al. [35] proposed a co-regularized domain alignment method, constructing multiple different feature spaces and aligning the distributions of the source and target domains within each feature space while encouraging these alignments to be consistent in predicting class labels on unlabeled target samples. However, the choice of target boundaries may differ in different experiments, and it may even be challenging to find suitable boundaries. This can lead to fluctuating model performance across different experiments, making it difficult to ensure stability and reproducibility.

In comparison to the aforementioned studies, our proposed CDE-Net preserves and utilizes the source domain feature information through a dynamically expandable network based on autoencoders. This approach avoids the model forgetting important features of the source domain during the transfer learning process. Additionally, CDE-Net learns the intrinsic feature representation of the data by minimizing the clustering loss and conditional entropy loss, pushing class decision boundaries away from dense data regions and further enhancing the model's generalization capability.

3 Method

In this paper, we propose a class decision boundary-based dynamically expandable network for unsupervised domain adaptation in medical image anomaly recognition, called CDE-Net, to overcome the limitations of previous methods. As shown in Fig. 1, traditional unsupervised domain adaptation methods often overlook the issue of the target domain's decision boundary. When the decision boundary spans high-density data regions, it can lead to overfitting of the source domain data, resulting in poor performance on the target domain. Moreover, some data points may be misclassified, which could be outliers, noise, or genuine samples. Misclassifying these data points can adversely affect the accuracy of the model. To address these issues, we consider the clustering assumption that data samples within the same cluster share the same class label and are densely distributed in a cluster. Therefore, to determine the class decision boundary in the target domain, CDE-Net first performs clustering on the source domain data to obtain cluster centers and class labels. The clustering results are then applied to the target domain. Next, by learning the low-density regions, CDE-Net determines the class decision boundary in the target domain, finding the optimal decision boundary away from dense data regions. Our approach effectively addresses the limitations of traditional unsupervised domain adaptation methods and improves the model's performance on the target domain.

Figure 2 illustrates the overall framework of CDE-Net. Firstly, medical images are obtained for the experiment, and data preprocessing is performed to enhance data quality and facilitate subsequent steps. Multi-layer convolution is then employed to learn and extract discriminative features from the preprocessed images, with each layer's convolutional kernels capturing features of different scales and semantics. Next, the convolutional neural network is trained on the source domain data using the cross-entropy loss function, which measures the difference between predicted class probabilities and true class labels, encouraging the network to learn a set of common features. To further improve CDE-Net's performance on the target domain, two additional loss functions are introduced: clustering loss and conditional entropy loss. The clustering loss encourages the network to learn features that cluster together in the target domain. Through clustering analysis, the target domain data are divided into different categories, enhancing the network's discriminative ability for different classes in the target domain. On the other hand, the conditional entropy loss aims to produce highly confident and informative outputs, pushing the decision boundary away from data-dense regions. This prevents overfitting to the source domain data and improves the model's generalization to unseen samples in the target domain. Finally, the network undergoes iterative optimization and parameter updates using the constructed total objective loss function, and it performs the task of classifying medical images in the prediction phase.

3.1 Dynamic expansion network based on autoencoder

In unsupervised domain adaptation, there typically exists a distributional difference between the source and target domains. This means that when we introduce a new target domain to train the model, the model tends to forget the features learned from the source domain and may even introduce noise, thus affecting its generalization ability. To address this issue, we design a dynamic expansion network to enhance the model's representational capacity while preserving the source domain information and adapting to the new target domain. Specifically, we incorporate an autoencoder module into the model, which compresses the input data into a latent representation and attempts to reconstruct the input data from the latent representation. By dynamically learning new features and integrating them with the existing features, we update the model by fusing the feature representations of the source and target domains, thereby enhancing the model's representational capacity. Through this dynamic expansion network, the model can effectively adapt to the new target domain without compromising the source domain feature information, reducing the risk of introducing noise and improving the model's performance and generalization ability.

In the dynamic expansion network, we utilize an autoencoder to learn the hidden representation of data, capturing abstract features. Specifically, as illustrated in Fig. 3, we begin by applying standardized preprocessing to the input data, enhancing the robustness of the hidden representations to variations in input. Standardization transforms the data's mean to 0 and variance to 1, enabling the model to handle different input data effectively. Next, we apply the sigmoid function to map the standardized input data to the range [0, 1], ensuring that the network's output results are within a practical range. The core part of the dynamic expansion network involves introducing an encoder and a decoder to implement the autoencoder functionality. The encoder compresses the input data into a low-dimensional representation through hidden layers. This compression effectively reduces the input data's dimension, thereby decreasing the number of model parameters and computational complexity. In the encoder, we utilize a multi-layer convolutional neural network to learn the spatial structure and hierarchical features of the input data while retaining crucial features from the source domain data. The output of the encoder is then passed through a ReLU activation function, which provides non-linear transformation capabilities, making it highly efficient for computation and optimization. Additionally, ReLU introduces sparsity in the hidden units, contributing to better generalization ability. This design further enhances the network model's representational capacity and performance. Next comes the decoding process, achieved using fully connected layers to restore the low-dimensional representation to its original data format. This process can be seen as the reverse operation of the autoencoder, transforming the low-dimensional representation back to the dimensions of the original data. To maintain data within a reasonable range, we apply another Sigmoid function after the decoder. Apart from the autoencoder module, we also utilize multiple layers of convolution and pooling operations to extract high-dimensional features from the images. Convolution operations capture local relationships within the input data, while pooling operations downsample the data, reducing the number of parameters and computational complexity. The fusion of features extracted by the multiple convolution layers and the low-dimensional representation learned by the autoencoder enhances the model's performance on the target domain. Finally, we perform joint training on the features extracted by the multiple convolution layers and the softmax classifier. This process maps the model-learned features onto probability distributions for different categories, enabling classification and recognition of the input data. Consequently, the model optimizes both feature representation and classification performance simultaneously.

Utilizing the hidden layers of the encoder to store the learned feature representations from the source domain can effectively prevent the forgetting of source domain features and provide valuable knowledge for adapting to the new target domain. Specifically, assuming the source domain feature learning task is denoted as ${Q}_{s}$, when the model encounters a new task ${Q}_{T}$, which is the feature learning in the target domain, we train an autoencoder for this task and compute the correlation between ${Q}_{T}$ and ${Q}_{s}$:

$${\text{Re}} l(Q_{T} ,Q_{S} ) = 1 - \left( {\frac{{R_{S} - R_{T} }}{{R_{T} }}} \right),$$

(1)

where ${R}_{S}$ and ${R}_{T}$ represent the average reconstruction errors of the autoencoder on the source domain data and target domain data, respectively (i.e., the mean squared error between the model's output and the original input), defined as shown in Eq. (2):

$${\text{MSE}} = \frac{1}{n}\sum\limits_{i = 1}^{n} {(Y_{i} - \mathop {Y_{i} }\limits^{ \wedge } )^{2} } ,$$

(2)

where $n$ represents the number of input data, ${Y}_{i}$ represents the ith feature value of the input data, and $\mathop Y\limits^{ \wedge }_{i}$ corresponds to the reconstructed result of ${Y}_{i}$.

By calculating the average reconstruction error between the source domain and target domain data, and using the correlation between their features, we can control the importance of each input sample in the target domain, thereby enhancing the expressiveness and generalization performance of the autoencoder. Specifically, by computing the correlation coefficients, we select the most similar feature representations from the source domain to the target domain and use them as the prior model for learning the target domain data. Then, the learned target domain feature representation is added to the hidden layer of the autoencoder, effectively avoiding the forgetting of source domain features and providing useful knowledge for adapting to the new target domain. To further adapt to the target domain data distribution, we fine-tune the target domain data using the expanded autoencoder. Firstly, the target domain data is passed through the encoder part of the autoencoder to obtain the corresponding low-dimensional representation. Then, a new network layer is added, connected to the hidden layer of the autoencoder. This combines the source domain features with the target domain features, enriching the feature information of the network.

We update the weights and biases of the new network layer using the backpropagation algorithm, so that the new network layer can better capture the different features of the target domain and exhibit better expressiveness for different input samples. This process continues until the network adapts to the target domain data distribution and achieves optimal performance. CDE-Net dynamically expands the structure of the autoencoder, allowing it to gradually adapt to the target domain data distribution while preserving the knowledge from the source domain, and exhibiting better robustness against data perturbations such as noise. In CDE-Net, we also use a method based on class decision boundaries to avoid overfitting and misclassification on the target domain.

3.2 Low-density decision boundary

The source domain dataset was defined as ${D}_{S}$, where the samples from the source domain follow the probability distribution ${{D}_{S}\sim X}_{S}$. Similarly, the target domain dataset is ${D}_{T}$, where the samples from the target domain follow the probability distribution ${{D}_{T}\sim X}_{T}$. First, the data in the source domain were divided into different categories according to the labels. Then, one point from each category was selected as the initial clustering center. Once the initial cluster centers are determined, iterative clustering process is started. For the target domain, during each iteration, the shortest distance between each target domain data point and the currently existing clustering center is calculated. And the data points of the target domain are assigned to different clustering clusters. By minimizing the clustering loss function, we can ensure that the distance between the data points within each cluster and the cluster centers is minimized, thereby adapting to the target domain's data distribution. Let the data points in the target domain be ${p}_{i}(i=\mathrm{1,2},...,k)$; the cluster centers were initialized as ${c}_{j}(j=\mathrm{1,2},...,n)$. The objective function of the clustering loss can be represented as

$$C(p_{i} ,c_{j} ) = \min \sum\limits_{i = 1}^{k} {\mathop {\min }\limits_{j = 1,2,...,n} \left\| {p_{i} - c_{j} } \right\|^{2} } .$$

(3)

The smaller the value of the clustering loss function, the smaller the distance between the data points within each cluster and the cluster centers. We update the cluster centers to ensure the effectiveness of the clustering. After the clustering process is completed, to further adapt to the target domain's data distribution, we minimize the conditional entropy of the target domain distribution to position the class decision boundaries in regions with sparse data. The conditional entropy is defined as follows:

$$L_{e} \left( {\delta ,X_{T} } \right) = - E_{{x\sim X_{T} }} \left[ {f_{\delta } \left( X \right)^{\rm T} \ln f_{\delta } \left( X \right)} \right],$$

(4)

where ${f}_{\delta }$ represents a classifier parameterized with $\delta$, and $X$ represents the input. We use Eqs. (5 and 6) to select the gradient step size $\Delta \delta$ in order to minimize ${L}_{e}(\delta ,{X}_{T})$:

$$\mathop {\min }\limits_{\Delta \delta } .L_{e} (\delta + \Delta \delta ),$$

(5)

$${\text{s}}{\text{.t}}{.}\left\| {\Delta \delta } \right\| \le \varepsilon .$$

(6)

By minimizing the conditional entropy, we can enable the classifier to learn more patterns and information from labeled data, thereby increasing its confidence in classifying unlabeled target domain data. This approach improves the accuracy and generalization capability of the classifier even in the absence of labeled data, ensuring that the decision boundary of the classifier stays away from dense regions of data. Additionally, to prevent the classifier from arbitrarily changing its predictions on training data points, we impose a locally Lipschitz constraint on the classifier, ensuring its stability and reliability. The cross-entropy loss function obtained from training on the source domain is.

$$L_{c} \left( {\delta ,X_{S} } \right) = E_{{X,Y\sim X_{S} }} \left[ {y^{\rm T} \ln f_{\delta } \left( X \right)} \right],$$

(7)

where $Y$ denotes the one-hot label. Finally, the total objective function is constructed by combining the cross-entropy loss function from the source domain, the clustering loss function from the target domain, and the conditional entropy loss function. It can be expressed as

$$\mathop {\min }\limits_{\delta } L = L_{c} (\delta ,X_{S} ) + \lambda C(p_{i} ,c_{j} ) + \beta L_{e} (\delta ,X_{T} ).$$

(8)

Among them, $\lambda$ and $\beta$ are the weight hyperparameters of the balanced clustering loss and conditional entropy loss, respectively. The selection of these weights is crucial as they directly impact the performance and generalization ability of the model. Generally, the weights between the clustering loss and conditional entropy loss should be adjusted based on the specific scenario. If there is a significant difference between the target domain data distribution and the source domain data distribution, the weight of the clustering loss should be relatively higher to adapt to the target domain data distribution. If the target domain data distribution is similar to the source domain data distribution, the weight of the conditional entropy loss should be relatively higher to better utilize the unlabeled data in the target domain. By adjusting these two weight hyperparameters, the relationship between different loss functions can be balanced, optimizing the overall performance and generalization ability of the model. The selection of weight hyperparameters will be further analyzed in the ablation experiments.

Finally, the model parameters are updated by minimizing the total objective function until convergence. Through the aforementioned approach, we can simultaneously consider both the source domain data and the target domain data, achieving unsupervised domain adaptation from the source domain to the target domain.

4 Experiments and results

4.1 Dataset and evaluation indicators

A total of six datasets from medical imaging were used in this study, namely, CheXpert [36] and Chest X-Ray14 [37] for chest radiography, CQ500 and RSNA for intracranial hemorrhage, and VinDr-Mammo [38] and CMMD [39] for mammography. The CheXpert dataset consists of 224,316 chest radiography images from 65,240 patients. This dataset includes uncertain medical labels and reference standard evaluation sets annotated by radiologists, which can be used to predict the probabilities of 14 different observations from multi-view chest radiography images. Among the images, 179,452 were used for training, 22,432 for validation, and 22,432 for testing. The Chest X-Ray14 dataset contains 112,120 frontal chest radiography images from 30,805 patients, with radiology reports indicating 14 common diseases. Among the images, 89,696 were used for training, 11,212 for validation, and 11,212 for testing. For the classification study, we selected five common pathologies shared by both chest radiography datasets: Atelectasis, Cardiomegaly, Effusion, Consolidation, and Edema.

The CQ500 dataset is a publicly available dataset provided by CARING in New Delhi, India. It contains 491 scans with a total of 193,317 slice data. Three radiologists with 8, 12, and 20 years of experience in interpreting cranial CT scans annotated the CT scans in the CQ500 dataset for hemorrhage. Slices that did not match the bounding box labels were filtered out, and a subset of 17,836 samples was selected for training, 2,230 samples for validation, and 2,230 samples for testing, considering the proportion of each subtype of hemorrhage in the dataset. The RSNA dataset is the second stage training data provided on the Kaggle platform. It includes 750,000 slices of cranial CT scans and corresponding pre-labeled six-class labels by radiology experts. A subset of 80,000 samples was extracted for training, 10,000 samples for validation, and 10,000 samples for testing, considering the proportion of each subtype of hemorrhage. For both intracranial hemorrhage datasets, five common pathologies were selected for classification research: Epidural, Intraparenchymal, Intraventricular, Subarachnoid, and Subdural.

The VinDr-Mammo dataset, referred to as VinDr, consists of 5000 breast X-ray examinations, each with four standard views, totaling 20,000 images. This dataset provides both breast-level assessment and extensive lesion-level annotations. Among the images, 16,000 were used for training, 2000 for validation, and 2000 for testing. The CMMD dataset contains 3728 breast X-ray images from 1775 patients. Among them, 2984 were used for training, 372 for validation, and 372 for testing. For the two mammography datasets, two common pathologies were selected for classification research: Calcification and Mass.

AUC, accuracy, precision, specificity, Positive Predictive Value (PPV), and Negative Predictive Value (NPV) were used as six performance evaluation metrics to assess the classification performance of the models.

4.2 Data pre-processing

The data preprocessing module consists of two parts: normalization and data augmentation. Firstly, normalization is performed using the mean and standard deviation of the images to make the data distribution conform to the standard normal distribution. There are three types of data augmentation techniques. The first one is random scaling, where images are randomly scaled by a certain proportion, and then a random region is cropped from the scaled image. The second one is horizontal flipping, which is applied to images with a 50% probability. The third one is random rotation, where images are randomly rotated by an angle of up to ± 25 degrees. During the training process, these augmentation techniques are applied randomly to different images to improve the model's robustness and generalization ability.

4.3 Implementation details

On the chest radiography dataset, we used the CheXpert dataset as the source domain and the Chest X-Ray14 dataset as the target domain. We compared three typical unsupervised domain adaptation methods in recent years to prove the advancement and superiority of the CDE-Net method.

Experimental environment We implemented the experiments using the Python programming language and the PyTorch deep learning framework. The training was conducted on a CentOS workstation equipped with four NVIDIA GTX 1080Ti 12G GPUs. In the experiments, we utilized ResNet50 as the backbone network for CDE-Net and employed the Adam optimizer with a momentum of 0.9. The initial learning rate was set to 0.0001, with a decay rate of 0.0001. Moreover, we trained the model for a total of 100 epochs using a batch size of 16. Finally, we selected the model with the best performance on the validation set for testing.

Hyperparameter setting In order to analyze the impact of hyperparameters $\lambda$ and $\beta$ on the model's performance, we conducted multiple tests by adjusting their values to study their sensitivity. We tested the model with values ranging from 0.05 to 0.25 for $\lambda$ and from 0.01 to 0.05 for $\beta$, analyzing their influence on the classification performance on the target domain test set. The results are shown in Fig. 4.

From Fig. 4a, it can be observed that the highest AUC value on the target domain test set is achieved when $\lambda$ =0.1. When $\lambda$ exceeds 0.15, there is a significant decrease in the AUC value. Similarly, from Fig. 4b, it can be seen that the best performance on the target domain test set is obtained when $\beta$=0.02. We find that appropriate values for $\lambda$ and $\beta$ can effectively improve the AUC value of the classification. Therefore, in this study, we set $\lambda$ = 0.1 and $\beta$ = 0.02 as the default values. By using these two hyperparameters as auxiliary terms, the model's generalization performance can be enhanced.

4.4 Experimental results and performance analysis

For each epoch, we calculated the AUC value of the model on the validation set. Based on the performance on the validation set, we determined whether the current trained model was the best. In the model evaluation phase, we loaded the parameters of the best model obtained during training. We standardized the test images and performed classification predictions to calculate various evaluation metrics, thereby assessing the model's classification performance.

On chest radiograph images, we compared CDE-Net with several typical unsupervised domain adaptation methods. The first method is Co-Regularized Domain Alignment (Co-DA) proposed by Kumar et al. [35]. The second method is an Unsupervised Domain Adaptation (UDA) approach using self-supervised auxiliary tasks proposed by Sun et al. [33]. The third method is the Denoising Maximum Classifier Difference (D-MCD) method proposed by Chu et al. [28]. Figure 5 shows the ROC curves of these three methods and our method on the Chest X-Ray14 test set. Table 1 provides a detailed comparison of the classification evaluation metrics between the three unsupervised domain adaptation methods and our method, using the CheXpert → Chest X-Ray14 domain adaptation task. The bold font represents the highest value within the same evaluation indicator.

Table 1 Classification evaluation metrics of different methods (%)

Full size table

Based on the experimental results, our proposed method exhibits superior performance in unsupervised domain adaptation tasks. Our method outperforms the other three methods in all evaluation metrics, demonstrating significantly better performance. Specifically, our method achieves an AUC of 0.787, accuracy of 0.705, sensitivity of 0.750, specificity of 0.694, positive predictive value of 0.415, and negative predictive value of 0.869. These results are 3.6%, 2.3%, 2.9%, 3.0%, 5.3%, and 1.2% higher than the best results of the other three methods, respectively. These experimental findings indicate that our method better adapts to the feature distribution of the target domain, resulting in improved generalization and adaptability of the model. Specifically, CDE-Net's better performance is reflected in the following aspects: identifying and classifying samples more accurately, reducing the risk of misclassification, and being able to classify samples more reliably in practical applications. It can better capture positive samples, improve the recall rate of positive samples, and reduce the risk of missed diagnosis. Achieving a better balance between precision and recall further emphasizes the advantages of the CDE-Net method in handling differences between different domains and dataset feature information. This is attributed to the unique design of our method, which dynamically expands the network structure based on autoencoders, preserving the source domain's feature information while gradually adapting to the target domain's data distribution and learning useful target domain features. Moreover, by minimizing the clustering loss and conditional entropy loss, CDE-Net can uncover the intrinsic structure of the data and push the class decision boundaries away from dense data regions. This comprehensive training objective enables CDE-Net to better capture the characteristics and distribution of data, thereby improving the performance of unsupervised domain adaptation tasks. To sum up, the comparative experiments thoroughly demonstrate the superiority of CDE-Net in unsupervised domain adaptation tasks. Through its unique network structure and multi-objective training strategy, CDE-Net becomes an effective method to solve unsupervised domain adaptation problems.

At the same time, we found during the experiment that the model has poor ability to identify the Consolidation lesion type and has the strongest ability to identify the Cardiomegaly category. Because cardiac hypertrophy usually appears as a significant enlargement of the heart area on imaging, its characteristics are relatively easy to identify and are clearly distinguished from the characteristics of other lesion types. Consolidation usually appears as high-density shadows of various shapes on CT images with blurred borders, often accompanied by blurring or loss of lung parenchyma. In addition, Consolidation may also appear as irregular shape and size, through which the shadow of the lung texture cannot be seen, and has the characteristics of migration. The complexity of these features makes the identification of Consolidation categories relatively difficult, resulting in a model with a lower ability to identify Consolidation than other lesion types.

It is also worth noting that, as can be seen from Table 1, the positive prediction rate of the model on the test set is relatively low, which is due to the imbalance in the number of samples of different lesion types in the dataset. Unbalanced data distribution brings challenges to both model training and performance evaluation. During the training phase, the model may tend to learn lesion types with more samples, while lesion types with fewer samples may be ignored, which may lead to poor model performance in predicting samples with a small number of lesion types. In the future, new technical methods can be further explored to overcome the negative impact of unbalanced sample distribution on model prediction, so as to better adapt to actual application scenarios.

4.5 Ablation studies

4.5.1 Domain adaptation method

We selected three representative medical imaging datasets for ablation experimental verification, including chest radiation dataset, intracranial hemorrhage dataset, and mammography dataset. We use CheXpert as the source domain of the chest radiography dataset and Chest X-Ray14 as the target domain. CQ500 is used as the source domain of the intracranial hemorrhage dataset, and RSNA is used as the target domain. VinDr serves as the source domain of the mammography dataset, and CMMD serves as the target domain. These datasets have a wide range of application scenarios and practical value. Through experimental verification on different datasets, the versatility and applicability of the CDE-Net method can be more comprehensively evaluated, and its performance on different datasets can be explored. Figure 6 shows the comparison of the performance of not using the domain adaptation method and using the CDE-Net method proposed in this article, and shows the plots of their ROC curves on their respective datasets. Table 2 details the classification evaluation indicators of the three experiments. The bold font represents the highest value within the same evaluation indicator.

Table 2 Classification evaluation metrics (%) without using domain adaptive methods vs. our methods

Full size table

Firstly, from the overall shape of the ROC curves, it can be observed that the ROC curve of the CDE-Net method performs better and is closer to the ideal state at the top-left corner. On the other hand, the ROC curve without domain adaptation exhibits more fluctuations and instability. This indicates that the CDE-Net method outperforms traditional methods, as it can better handle the feature information and domain differences in the dataset. Secondly, looking at the specific classification evaluation metrics, it can be seen that the CDE-Net method not only has a higher average AUC value compared to the method without domain adaptation but also performs better in the other five classification metrics. This demonstrates that the CDE-Net method shows superior performance across different datasets, effectively handling domain differences and utilizing the feature information of the datasets. This confirms the effectiveness of CDE-Net in addressing unsupervised domain adaptation problems.

4.5.2 Hidden layer size

In the autoencoder, the size of the hidden layer is a crucial parameter that directly affects the performance and generalization ability of the model. We experimented with five different sizes of hidden layers: 20, 60, 100, 200, and 400. Figure 7 shows the comparison of their AUC values on the Chest X-Ray14 test set.

The experimental results demonstrate that the model achieves optimal performance with a hidden layer size of 100, reaching an AUC value of 0.787 on the Chest X-Ray14 test set. When the hidden layer size is 20, the AUC value is the lowest at 0.768. Increasing the hidden layer size to 200 and 400 leads to some improvement in the AUC value, but the improvement is not significant compared to the model with a hidden layer size of 100. Furthermore, the experimental results indicate that, under the dataset and model structure used in this study, a hidden layer size of 100 provides the best performance and generalization ability. This is because a hidden layer size that is too small prevents the model from capturing sufficient feature information, resulting in underfitting. On the other hand, a hidden layer size that is too large can lead to overfitting and reduced generalization ability. Therefore, we recommend using a hidden layer size of 100 in the model.

4.5.3 Loss function

In order to better illustrate the contribution of the components of the CDE-Net loss function, we conducted experiments on the combination of cross-entropy loss, clustering loss, and conditional entropy loss. Table 3 lists the experimental results on the Chest X-Ray14 test set under different loss function settings. The bold font represents the highest value within the same evaluation indicator.

Table 3 Experimental results under different loss function settings

Full size table

It can be observed from the results in the table that using only the cross-entropy loss of the source domain, the lowest AUC value on the test set is 0.634. Because the model at this time mainly focuses on maximizing the classification accuracy on the source domain data and cannot take into account the deeper structure and conditional distribution of the target domain data, it has weak generalization ability when dealing with unseen target domain data. In comparison, when using a combination of cross-entropy loss and clustering loss, the AUC value on the test set is 0.753. This result shows high performance, indicating that by combining classification and clustering objectives, the model can be helped better utilize the characteristics and structural information of the target domain data, and the model can achieve a better balance between accuracy and data structure understanding. On the other hand, when using the combination of cross-entropy loss and conditional entropy loss, the AUC value on the test set is slightly lower at 0.742. Although there is a decrease compared to using cross-entropy loss and clustering loss, it still shows that this combination can improve the performance of the model to a certain extent. Conditional entropy loss helps encourage the network to produce highly confident and informative outputs, providing additional information in classification and prediction tasks. When using the complete CDE-Net loss function, the highest AUC value on the test set is 0.787, indicating that the model can better mine the intrinsic structure and distribution information of the target domain data and can more accurately predict the category of the sample, which further verifies the effectiveness of CDE-Net and proves the complementary role of clustering loss and conditional entropy loss in joint training.

5 Research limitations and future research directions

Although CDE-Net has shown promising performance in unsupervised domain adaptation tasks on medical images, there are still some research limitations that need to be addressed for practical applications. Firstly, CDE-Net still has certain limitations in robustness to noise and abnormal data in the target domain. In real-world applications, the target domain may contain various types of noise and anomalies, such as poor image quality, different shooting angles, organ deformations, etc., which can affect the model's generalization performance. Therefore, future research needs to explore how to improve the model's robustness to better handle complex scenarios.

Additionally, in our experiments, CDE-Net was primarily applied in the field of medical imaging, and its application and performance in other domains need further exploration and validation. Although the dynamic expanding network structure and clustering loss methods in CDE-Net have certain generality, there are still significant differences and challenges across different data domains. Therefore, future research should further explore and validate the applicability and performance of CDE-Net in other domains.

In conclusion, while CDE-Net exhibits promising performance in unsupervised domain adaptation tasks, there are still some research limitations. Future studies should further explore and address these issues to better support practical applications and advance related fields.

6 Conclusion

In this paper, we propose a dynamic expanding network-based unsupervised domain adaptation method called CDE-Net, which leverages class decision boundaries. Specifically, our method dynamically expands a self-encoder-based network structure to preserve source domain feature information while gradually adapting to the target domain data distribution and learning useful target domain features. By minimizing clustering loss and conditional entropy loss, CDE-Net can uncover the underlying structure of the data and push the class decision boundaries away from densely populated regions, thereby enhancing model generalization and adaptability. We validate CDE-Net on three medical image datasets: Chest X-ray dataset, intracranial hemorrhage dataset, and mammography dataset. The experimental results demonstrate the efficiency and applicability of CDE-Net, showing significant improvements in classification accuracy compared to existing unsupervised domain adaptation methods. Furthermore, CDE-Net provides a new approach and methodology for addressing the challenge of obtaining labeled medical image data, offering valuable insights and assistance for research and practice in the field of medical imaging.

Data availability

The datasets generated and/or analysed during the current study are available in the following public repositories: CheXpert (https://stanfordmlgroup.github.io/competitions/chexpert/); Chest X-Ray14 (https://www.kaggle.com/datasets/nih-chest-xrays/data); CQ500 (http://headctstudy.qure.ai/); RSNA (https://www.kaggle.com/competitions/rsna-intracranial-hemorrhage-detection); VinDr-Mammo (https://www.physionet.org/content/vindr-mammo/1.0.0/); CMMD (https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=70230508).

References

Huo, X., Xie, L., Hu, H., et al.: Domain-agnostic prior for transfer semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7075–7085 (2022)
Che, T., Liu, X., Li, S., et al.: Deep verifier networks: verification of deep discriminative models with deep generative models. In: Proceedings of the AAAI Conference on Artificial Intelligence. 35(8), 7002–7010 (2021)
Liu, X., Liu, X., Hu, B., et al.: Subtype-aware unsupervised domain adaptation for medical diagnosis. In: Proceedings of the AAAI Conference on Artificial Intelligence. 35(3), 2189–2197 (2021)
Liu, X., Xing, F., You, J., et al.: Subtype-aware dynamic unsupervised domain adaptation. IEEE Trans. Neural Netw. Learn. Syst. (2022). https://doi.org/10.1109/TNNLS.2022.3192315
Article Google Scholar
Nibali, A., He, Z., Wollersheim, D.: Pulmonary nodule classification with deep residual networks. Int. J. Comput. Assist. Radiol. Surg. 12, 1799–1808 (2017)
Article Google Scholar
Zhang, Y., Miao, S., Mansi, T., et al.: Task driven generative modeling for unsupervised domain adaptation: application to x-ray image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer International Publishing, pp. 599–607 (2018)
Yang, J., Vetterli, T., Balte, P.P., et al.: Unsupervised domain adaption with adversarial learning (UDAA) for emphysema subtyping on cardiac CT scans: the mesa study[C]//2019. In: IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019). IEEE, pp. 289–293 (2019)
Mahapatra, D., Ge, Z.: Training data independent image registration using generative adversarial networks and domain adaptation. Pattern Recogn. 100, 107109 (2020)
Article Google Scholar
Loey, M., Smarandache, F., Khalifa, N.E.M.: Within the lack of chest COVID-19 X-ray dataset: a novel detection model based on GAN and deep transfer learning. Symmetry 12(4), 651 (2020)
Article Google Scholar
Mahapatra, D., Tennakoon, R.: Gcn based unsupervised domain adaptation with feature disentanglement for medical image classification (2021)
Tang, Y., Tang, Y., Sandfort, V., et al.: Tuna-net: task-oriented unsupervised adversarial network for disease recognition in cross-domain chest x-rays. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part VI 22. Springer International Publishing, pp. 431–440 (2019)
Wu, S., Zhang, H.R., Ré, C.: Understanding and improving information transfer in multi-task learning. arXiv preprint arXiv:2005.00944 (2020)
Rozantsev, A., Salzmann, M., Fua, P.: Beyond sharing weights for deep domain adaptation. IEEE Trans. Pattern Anal. Mach. Intell. 41(4), 801–814 (2018)
Article Google Scholar
Kang, G., Jiang, L., Yang, Y., et al.: Contrastive adaptation network for unsupervised domain adaptation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 4893–4902 (2019)
Liu, X., Han, Y., Bai, S., et al.: Importance-aware semantic segmentation in self-driving with discrete wasserstein training. In: Proceedings of the AAAI Conference on Artificial Intelligence. 34(07), 11629–11636 (2020)
Ge, Y., Li, S., Li, X., et al.: Embedding semantic hierarchy in discrete optimal transport for risk minimization. In: ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp. 2835–2839 (2021)
Naik, A., Rosé, C.: Towards open domain event trigger identification using adversarial domain adaptation. arXiv preprint arXiv:2005.11355 (2020)
Du, Y., Tan, Z., Chen, Q., et al.: Dual adversarial domain adaptation. arXiv preprint arXiv:2001.00153 (2020)
Rangwani, H., Aithal, S.K., Mishra, M., et al.: A closer look at smoothness in domain adversarial training. In: International Conference on Machine Learning. PMLR, pp 18378–18399 (2022)
Li, Y., Wang, N., Shi, J., et al.: Adaptive batch normalization for practical domain adaptation. Pattern Recogn. 80, 109–117 (2018)
Article Google Scholar
Zhang, J., Qi, L., Shi, Y., et al.: Generalizable semantic segmentation via model-agnostic learning and target-specific normalization. arXiv preprint arXiv:2003.12296, 2(3), 6 (2020)
Liu, X., Xing, F., Yang, C., et al.: Adapting off-the-shelf source segmenter for target medical image segmentation. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part II 24. Springer International Publishing, pp. 549–559 (2021)
Liu, X., Xing, F., El Fakhri, G., et al.: Memory consistent unsupervised off-the-shelf model adaptation for source-relaxed medical image segmentation. Med. Image Anal. 83, 102641 (2023)
Article Google Scholar
Lv, F., Zhang, J., Yang, G., et al.: Learning cross-domain semantic-visual relationships for transductive zero-shot learning. Pattern Recogn. 141, 109591 (2023)
Article Google Scholar
Zhang, J., Yang, G., Hu, P., et al.: Semantic consistent embedding for domain adaptive zero-shot learning. IEEE Trans. Image Process. (2023). https://doi.org/10.1109/TIP.2023.3293769
Article Google Scholar
Mei, K., Zhu, C., Zou, J., et al.: Instance adaptive self-training for unsupervised domain adaptation. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVI 16. Springer International Publishing, pp. 415–430 (2020)
You, F., Li, J., Zhu, L., et al.: Domain adaptive semantic segmentation without source data. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 3293–3302 (2021)
Chu, T., Liu, Y., Deng, J., et al.: Denoised maximum classifier discrepancy for source-free unsupervised domain adaptation. In: Proceedings of the AAAI Conference on Artificial Intelligence. 36(1), 472–480 (2022)
Xu, T., Chen, W., Wang, P., et al.: Cdtrans: cross-domain transformer for unsupervised domain adaptation. arXiv preprint arXiv:2109.06165 (2021)
Bohdal, O., Li, D., Hu, S.X., et al.: Feed-forward source-free latent domain adaptation via cross-attention. arXiv preprint arXiv:2207.07624 (2022)
Kothandaraman, D., Shekhar, S., Sancheti, A., et al.: SALAD: Source-free active label-agnostic domain adaptation for classification, segmentation and detection. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 382–391 (2023)
Kolesnikov, A., Zhai, X., Beyer, L.: Revisiting self-supervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1920–1929 (2019)
Sun, Y., Tzeng, E., Darrell, T., et al.: Unsupervised domain adaptation through self-supervision. arXiv preprint arXiv:1909.11825 (2019)
Kim, D., Saito, K., Oh, T.H., et al.: Cross-domain self-supervised learning for domain adaptation with few source labels. arXiv preprint arXiv:2003.08264 (2020)
Kumar, A., Sattigeri, P., Wadhawan, K., et al.: Co-regularized alignment for unsupervised domain adaptation. Advances in neural information processing systems 31 (2018)
Irvin, J., Rajpurkar, P., Ko, M., et al.: Chexpert: a large chest radiograph dataset with uncertainty labels and expert comparison. In: Proceedings of the AAAI Conference on Artificial Intelligence. 33(01), 590–597 (2019)
Wang, X., Peng, Y., Lu, L., et al.: Chestx-ray8: hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 2097–2106 (2017)
Nguyen, H.T., Nguyen, H.Q., Pham, H.H., et al.: VinDr-Mammo: a large-scale benchmark dataset for computer-aided diagnosis in full-field digital mammography. Sci. Data 10(1), 277 (2023)
Article Google Scholar
Cai, H., Wang, J., Dan, T., et al.: An online mammography database with biopsy confirmed types. Sci. Data 10(1), 123 (2023)
Article Google Scholar

Download references

Funding

Science and Technology Plan Project of Hangzhou China, 2021WJCY258

Author information

Authors and Affiliations

School of Automation (School of Artificial Intelligence), Hangzhou Dianzi University, Hangzhou, China
Yuanjiao Chen, Diao Wang, Zhe Xu & Bishi He
Affiliated Hangzhou First People’s Hospital, Zhejiang University School of Medicine, Hangzhou, China
Darong Zhu

Authors

Yuanjiao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Diao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Darong Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Zhe Xu
View author publications
You can also search for this author in PubMed Google Scholar
Bishi He
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Guarantors of integrity of entire study, all authors; study concepts/study design, Bishi He, Yuanjiao Chen; data acquisition, Darong Zhu, Diao Wang; data analysis and interpretation, Bishi He, Yuanjiao Chen, Darong Zhu; manuscript drafting or manuscript revision for important intellectual content, Bishi He, Zhe Xu, Yuanjiao Chen; approval of final version of submitted manuscript, all authors; agrees to ensure any questions related to the work are appropriately resolved, all authors; experimental studies, all authors.

Corresponding author

Correspondence to Bishi He.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Communicated by J. Gao.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Chen, Y., Wang, D., Zhu, D. et al. Unsupervised domain adaptation of dynamic extension networks based on class decision boundaries. Multimedia Systems 30, 80 (2024). https://doi.org/10.1007/s00530-024-01278-z

Download citation

Received: 08 August 2023
Accepted: 31 January 2024
Published: 13 March 2024
DOI: https://doi.org/10.1007/s00530-024-01278-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Unsupervised domain adaptation of dynamic extension networks based on class decision boundaries

Abstract

Similar content being viewed by others

Classifier Decoupled Training for Black-Box Unsupervised Domain Adaptation

Zeroth- and first-order difference discrimination for unsupervised domain adaptation

NaCL: noise-robust cross-domain contrastive learning for unsupervised domain adaptation

1 Introduction

2 Related work

3 Method