4.1. Experimental Dataset
To validate the effectiveness of our approach, we utilized the hyperspectral Chikusei dataset as the source domain dataset, and the Indian Pines, Pavia University, and Salinas datasets [
43,
44] as the target domain datasets. The pseudo-color images and real land cover maps of this experimental dataset are shown in
Figure 3 and
Figure 4.
The Chikusei dataset has a spectral wavelength range of 343–1080 nm, a spatial resolution of approximately 2.5 m, and a data size of pixels. It consists of 128 spectral bands and includes 77,592 ground pixels, categorized into 19 distinct land cover classes.
The Indian Pines dataset covers a spectral wavelength range of 400–2500 nm, with a spatial resolution of about 20 m. The image data size is 145 × 145 pixels and comprises 200 spectral bands. It encompasses a total of 16 land cover classes. The Salinas dataset has a spectral wavelength range of 400–2500 nm and a spatial resolution of approximately 3.7 m. The image size for this dataset is 512 × 217 pixels and includes 224 spectral bands. However, due to the impact of water vapor absorption on certain bands, only 204 bands are retained. This dataset covers 16 different categories of agricultural land cover, including, but not limited to, corn, wheat, soybeans, grasslands, and vineyards. The Pavia University dataset’s spectral wavelength range is 430–860 nm, with a spatial resolution of approximately 1.3 m. After preprocessing, the dataset has a total of 115 spectral bands, with 13 noisy bands removed. Land cover types in this region consist of nine classes, including asphalt roads, meadows, gravel, trees, metals, bare land, asphalt roofs, bricks, and shadows.
4.3. Experimental Results and Analysis
To validate the effectiveness of the proposed method in the paper, it was compared with non-few-shot learning methods and few-shot learning methods. In experiments comparing the proposed method with non-few-shot learning methods, the proposed method was compared with SVM, 3D-CNN [
45], and SSRN [
46]. In experiments comparing the proposed method with other few-shot learning methods, the proposed method was compared with the DFSL + NN [
37], DFSL + SVM [
47,
48], RN-FSL [
49], Gai-CFSL [
50], DPGN [
51], DCFSL [
52], SCFormer-R, and SCFormer-S [
41] methods. In each comparison experiment, the same training approach as the few-shot methods was employed. Five labeled samples from each class in the target domain dataset were randomly selected for transferring the model trained in the source domain to the target domain, with the remaining target domain samples used as test data. For the small-sample learning methods in comparison, we randomly selected 200 labeled source domain samples from each class to learn transferable knowledge, following the same setup for comparison. To verify the effectiveness of the Mish function and batch normalization (BN) added to the model in the paper, a comparative performance analysis was performed using the DCFSL method. In this comparison, the Mish + BN part was removed, while keeping the rest of the network structure consistent, serving as a set of ablation experiments. The results of the ablation experiments are presented in the “MFSC” row of the tables, where the activation function used is the Softmax activation function, consistent with the DCFSL method. In contrast, the experimental data in the “Ours” row were obtained under the MFSC algorithm framework, incorporating Mish + BN and replacing the original Softmax activation function. For the IP, UP, and Salinas datasets, the study compared the classification performance of different methods. The evaluation was carried out using three metrics: overall accuracy (OA), average accuracy (AA), and Kappa coefficient. Specific comparative results are shown in
Table 1,
Table 2 and
Table 3.
Table 1,
Table 2 and
Table 3 present the results of comparative experiments on the target datasets, IP, UP, and Salinas, with each class having five labeled samples. From the tables, it can be observed that the methods based on few-shot learning achieve higher overall accuracy compared to non-few-shot methods. This indicates that the episodic training strategy is better suited for classification tasks with limited labeled samples. In the IP dataset, the proposed few-shot learning method shows significant improvements over the traditional SVM classification method, with an increase of 25.64% in OA, 21.95% in AA, and a 28.13% increase in Kappa. In the IP, UP, and Salinas datasets, when compared to deep learning-based methods like 3D-CNN and SSRN, the proposed method achieves significant increases in OA when the number of labeled samples is five, with improvements of 16.73%, 19.35% and 6.34% in IP; and 10.13%, 8.83%, and 4.15% in UP and Salinas, respectively. This indicates that the meta-learning training strategy allows the model to learn transferable knowledge and features from the source-class data, thus aiding in predicting the target-class data. The relatively low performance of the non-few-shot learning methods shown in
Table 1,
Table 2 and
Table 3 illustrates that non-small-sample learning methods extract shallow features with weaker discriminative capabilities for different target categories. The limited labeled samples are insufficient for non-small-sample learning methods to effectively train a classification model. However, meta-learning training strategies enable the model to learn transferable knowledge and features from the source-class data, aiding in predicting target-class data.
In the few-shot classification methods, the method proposed in this paper also demonstrates significant improvements in detection accuracy compared to other methods. On the IP, UP, and Salinas datasets, when compared to the DFSL + NN, DFSL + SVM, RN-FSL, Gai-CFSL, DCFSL, SCFormer-R, and SCFormer-S methods, the proposed method achieves improvements in OA of 12.95%, 10.91%, 14.43%, 8.83%, 5.79%, 7.59%, and 7.65% on IP; 8.27%, 6.39%, 5.84%, 2.9%, 2.37%, 3.71%, and 2.19% on UP; and 3.92%, 4.02%, 6.86%, 3.14%, 1.63% 1.67%, and 2.15% on Salinas, respectively, when there are few labeled samples in the target domain. With the presence of a small number of labeled samples in the target domain, the method proposed in this article utilizes the ResDenseNet network to reduce data distribution differences and learn more discriminative feature spaces. Compared to other methods, this approach can obtain a better feature space, which can improve the classification performance of the target domain samples. The classification results on the IP, UP, and Salinas datasets show that the proposed method achieves average accuracy (OA) of 72.60%, 86.02%, and 90.97%, respectively. This strongly confirms the effectiveness and robustness of the ResDenseNet model in the few-shot high-dimensional spectral data classification task. Additionally, the incorporation of the Mish function and batch normalization (BN) not only effectively mitigates the vanishing gradient problem but also enhances the model’s generalization capabilities. Furthermore, compared to the ReLU function, the Mish function is smoother, leading to an improvement in training stability and average accuracy.
Table 4,
Table 5 and
Table 6 report the detailed classification results of different classification algorithms on the UP, IP, and Salinas datasets, respectively. The last columns of the tables present the classification accuracy and standard deviation for each class in the dataset based on multiple experiments. It can be observed from
Table 4 that, compared to other algorithms, the proposed method achieved the highest recognition rates in three of nine categories. It also performed well in accurately classifying the “Bricks”, “Bitumen”, “Metal sheets”, and “Trees” categories, which were challenging for other methods. The proposed method shows a certain gap from the optimal results among the three categories, including “Gravel”, “Meadows”, and “Asphalt” in the UP dataset, when compared to the methods of contrast. The UP dataset has the highest spatial resolution among the three datasets, but it has the lowest spectral resolution. The data for the three categories are the most prone to generating spectrally similar but different substances. The data in
Table 5 and
Table 6 illustrate that, compared to other algorithms, the method proposed in the paper achieved the highest recognition rates in 11 out of 16 categories and 10 out of 16 categories, respectively. It significantly improved the classification accuracy for categories like “Grapes_untrained”, “Vinyard_untrained”, and “Soil_vinyard_develop” in the Salines dataset, where other methods had relatively lower accuracy. Furthermore, compared to other methods, the proposed method also substantially increased the classification accuracy of categories like “Grass-pasture”, “Corn”, “Corn-mintill”, “Corn-notill”, and "Woods” in the IP dataset.
Figure 5,
Figure 6 and
Figure 7 display the classification results of the proposed method and comparative methods using the IP, UP, and Salinas datasets. It can be seen from the figures that the method proposed in this paper exhibits fewer misclassifications. On the contrary, the SVM-based method shows more misclassified objects. Compared to the SVM-based method, the 3D-CNN and SSRN methods have fewer misclassifications, mainly due to the stronger representation learning capabilities of deep learning methods. However, deep learning methods require a large number of training samples, and when the number of training samples is reduced, these methods experience a significant decrease in classification accuracy. This indicates that, when labeled samples are limited, the extracted features are not effective enough, leading to lower accuracy when classifying objects with similar spectral characteristics. In the case of few-shot data, using a few-shot learning approach to construct ResDenseNet significantly improves the classification accuracy compared to the SVM method and deep learning methods like 3D-CNN and SSRN.
In complex scenes, objects within a specific area are rarely composed of just one type of material. Typically, there are varying amounts of other material categories present, leading to spectral noise from other categories within the spectral characteristics of the primary material. Additionally, at the boundaries between two different land cover types, there inevitably exists interference from neighboring land cover categories’ spectral feature vectors. This makes it difficult to accurately extract both the spatial and spectral information of land cover, resulting in subtle differences between different types of land cover. In addition, it can lead to significant distinctions between the same types of land cover, causing the misclassification of certain land cover areas at the boundaries. In the case of few-shot data, while methods like DFSL + NN, DFSL + SVM, and RN-FSC consider the scarcity of labeled samples in hyperspectral imagery, their performance in accurately classifying challenging classes still lags behind the method proposed in this paper.
From the experimental results shown in the figures, it can be observed that when land cover features are relatively easy to distinguish and the feature vectors are distinct, the classification method employed in this paper, as well as other few-shot learning methods, can achieve good classification results. For example, in
Figure 5, for the IP dataset, classes like “Oats” and “Grass-Trees”; in
Figure 6, for the UP dataset, classes like “Asphalt” and “Shadow”; and in
Figure 7, for the Salinas dataset, classes like “Celery”, “Stubble”, “Fallow_smooth”, “Lettuce_romaince_5wk”, and “Brocoil_green_weeds_1” have feature vectors in the feature space that are relatively easy to differentiate. In situations with only a small number of labeled samples, traditional machine learning methods, such as SVM, and general few-shot learning methods can also achieve good classification results. On the contrary, deep learning methods that require a large number of training samples are prone to overfitting, leading to a lower classification accuracy.
For land cover categories with similar features and small feature vector distances that tend to produce errors in classification, such as “Meadows” and “Alfalfa” in the UP dataset; “Vinyard_untrained”, “Vinyard_vertical_trellis”, and “Corn_senesced_green_weeds” in the Salinas dataset; and “Stone-Steel-Tower”, “Hay-windrowed”, “Woods”, and “Soybean-mintill” in the IP dataset, the classification results rely more on the effective extraction of land cover features. From the classification results, it can be seen that the method proposed in this paper achieves a relatively good classification accuracy for such categories. MFSC follows, and DCFSL has fewer misclassifications compared to SVM, 3D-CNN, and SSRN. This indicates, on the one hand, that meta-learning training strategies are advantageous for enhanced knowledge transfer and improved classification performance. On the other hand, it also demonstrates that the residual dense connection network designed in this paper can reduce data distribution differences, leading to a better feature space with higher interclass discriminability. Under small-sample training conditions, the training data’s effectiveness and robustness are superior to those of other methods. Furthermore, the method proposed in this paper has fewer misclassification points than DCFSL, indicating that this network model has good generalizability, can extract deeper and more discriminative features, and can achieve better classification results for classes that are difficult to accurately classify.