Next Article in Journal
Leveraging Mixed Data Sources for Enhanced Road Segmentation in Synthetic Aperture Radar Images
Previous Article in Journal
Digital Twin of Space Environment: Development, Challenges, Applications, and Future Outlook
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Multi-Source Image Matching Algorithms for UAV Positioning: Benchmarking, Innovation, and Combined Strategies

1
National Engineering Research Center of Surveying and Mapping, China TopRS Technology Company Limited, Beijing 100039, China
2
School of Geomatics and Urban Spatial Informatics, Beijing University of Civil Engineering and Architecture, Beijing 102616, China
3
Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China
*
Author to whom correspondence should be addressed.
Submission received: 14 June 2024 / Revised: 11 August 2024 / Accepted: 15 August 2024 / Published: 18 August 2024

Abstract

:
The accuracy and reliability of unmanned aerial vehicle (UAV) visual positioning systems are dependent on the performance of multi-source image matching algorithms. Despite many advancements, targeted performance evaluation frameworks and datasets for UAV positioning are still lacking. Moreover, existing consistency verification methods such as Random Sample Consensus (RANSAC) often fail to entirely eliminate mismatches, affecting the precision and stability of the matching process. The contributions of this research include the following: (1) the development of a benchmarking framework accompanied by a large evaluation dataset for assessing the efficacy of multi-source image matching algorithms; (2) the results of this benchmarking framework indicate that combinations of multiple algorithms significantly enhance the Match Success Rate (MSR); (3) the introduction of a novel Geographic Geometric Consistency (GGC) method that effectively identifies mismatches within RANSAC results and accommodates rotational and scale variations; and (4) the implementation of a distance threshold iteration (DTI) method that, according to experimental results, achieves an 87.29% MSR with a Root Mean Square Error (RMSE) of 1.11 m (2.22 pixels) while maintaining runtime at only 1.52 times that of a single execution, thus optimizing the trade-off between MSR, accuracy, and efficiency. Furthermore, when compared with existing studies on UAV positioning, the multi-source image matching algorithms demonstrated a sub-meter positioning error, significantly outperforming the comparative method. These advancements are poised to enhance the application of advanced multi-source image matching technologies in UAV visual positioning.

1. Introduction

As unmanned aerial vehicle (UAV) technology becomes increasingly utilized in fields such as agriculture, urban planning, and emergency response [1,2,3,4], its positioning capabilities have become critical for performing various tasks. Initially, UAV positioning relied primarily on the Inertial Navigation System (INS) as a standalone method. With the rapid advancement of satellite navigation technology, modern UAV positioning often combines the INS with Global Navigation Satellite Systems (GNSSs) [5]. This combination significantly improves the accuracy and stability of positioning compared to using only the INS system. However, GNSS signals may be blocked or interfered with in denial environments, leading to reduced localization performance. To address this challenge, researchers have begun exploring alternative positioning solutions, among which UAV visual positioning has emerged as an important alternative method [6,7,8]. This method achieves positioning by matching images real-time captured by the UAV with existing geographically referenced high-resolution satellite imagery [9].
The effectiveness of UAV visual positioning is dependent on the performance of image matching algorithms. Due to the differences in sensors and imaging modes between UAVs and satellites, there are significant appearance differences or nonlinear radiometric variations between their images. Traditional image matching algorithms, such as SIFT [10] and SURF [11], often struggle to handle these disparities effectively. To overcome these challenges, researchers have developed a variety of multi-source image matching algorithms. These algorithms are primarily categorized into three types: region-based, deep-learning-based, and feature-based algorithms [12,13].
Region-based algorithms achieve image matching by measuring the similarity between image blocks [14,15]. For instance, Dalen et al. [16] proposed a method for calculating UAV positions using normalized cross-correlation, which resulted in a maximum positioning error of 12.5 m. Yol et al. [17] utilized mutual information as a criterion for positional similarity, achieving horizontal positioning errors of 6.56 m and 8.02 m. Wan et al. [18] developed a method using illumination-invariant phase correlation, achieving an average positioning error of 1.31 m. This method outperformed the normalized cross-correlation error of 2.19 m and the mutual information error of 3.08 m, highlighting the effectiveness of the phase correlation approach. These methods’ performance levels largely depend on the initial positioning and the extent of image overlap, particularly suffering a significant decrease in performance with large image rotation angles.
With the advancement of deep learning technology, deep-learning-based multi-source image matching algorithms have also gained extensive research interest in UAV visual positioning [19]. For example, Zhang et al. [20] proposed a deep-learning-based local feature matching algorithm that extracts features from UAV infrared and satellite imagery, achieving positioning accuracies ranging from 1.92 to 7.78 m. Mugha et al. [21] developed a deep-learning-based feature point extraction method that utilizes point-to-point template matching to achieve a positioning error of 3.7 m. Wu et al. [22] introduced a multi-level Lucas Kanade deep learning algorithm that leverages global texture features of UAV imagery, with positioning accuracies between 4.33 and 9.8 m. Although deep-learning-based image matching algorithms demonstrate significant potential in UAV positioning, their performance is constrained by large geometric deformations and large data volumes.
Feature-based multi-source image matching algorithms offer a structured and highly flexible workflow, with the core components including detector, descriptor, matching, and consistency verification [23]. For instance, Shan et al. [24] developed a positioning method combining a histogram of oriented gradients, particle filters, and optical flow, achieving a positioning error of 6.77 m. Chiu et al. [25] introduced a positioning method that combines Inertial Measurement Units (IMUs) with geographic image registration, achieving a positioning error of 9.83 m under GPS-denied conditions. Mantelli et al. [26] used the abBREIF descriptor to match UAV images with satellite maps for UAV positioning, with a positioning error of 17.78 m. These studies in feature-based multi-source image matching primarily focused on the domain of positioning and navigation, with less exploration in the field of remote sensing. In recent years, the remote sensing field has seen the emergence of advanced structural-feature-based multi-source image matching algorithms, providing new possibilities for UAV visual positioning.
To the best of our knowledge, although the existing literature extensively discusses the principles of multi-source image matching algorithms [23,27], there is a scarcity of experimental studies, and a unified evaluation of the core components and overall performance of these algorithms is lacking, particularly concerning the image matching success rate (MSR), a critical indicator determining the reliability of UAV visual positioning systems. Moreover, due to the lack of standardization in datasets, evaluation criteria, and parameter settings, significant discrepancies exist between different research outcomes, which limits their application in UAV positioning.
For this purpose, this paper first introduces a benchmarking framework to evaluate the core components and overall performance of current advanced multi-source image matching algorithms using a uniform standard. The multi-source image matching algorithms we tested include four significant methods from recent years: HAPCG [28], LNIFT [29], RIFT [30], and WSSF [31]. HAPCG is a frequency-domain matching algorithm that considers anisotropic weighted moments and absolute phase gradient histograms. It uses anisotropic filtering to compute the nonlinear scale image space and employs Log-Gabor filters to generate maximum and minimum moments, producing anisotropic weighted moment maps. It then captures keypoints in the moment space using the Harris detector and utilizes phase consistency feature values as image gradient characteristics, computing feature vectors via a Log-Polar coordinate description framework. LNIFT is a spatial domain multi-source image matching algorithm that transforms an image into an intermediate modal image using local normalization filters, detects ORB keypoints, optimizes keypoint distribution with non-maximum suppression strategies, and describes keypoints using a HOG-like descriptor. RIFT is a frequency-domain multi-source image matching algorithm that calculates the maximum and minimum moments of an image through phase consistency, uses the FAST detector to form keypoints, constructs the maximum index map using a Log-Gabor sequence, and obtains descriptions using a SIFT-like descriptor. WSSF is a structural-saliency-based multi-source image matching algorithm that initially constructs a scale space, then generates feature maps using phase consistency computation and local normalization filters, extracts keypoints using the KAZE detector and non-maximum suppression strategy, and finally retrieves feature vectors using an improved gradient location and orientation histogram (GLOH) descriptor.
Furthermore, to address the issue of mismatches in consistency verification methods, we propose a Geographic Geometric Consistency (GGC) method that effectively identifies mismatches in RANSAC results, thereby enhancing the reliability of matching. Based on GGC, we have also introduced a distance threshold iteration (DTI) method. This method incrementally lowers the distance threshold to improve the MSR while ensuring both the precision and efficiency of the execution.

2. Materials and Methods

2.1. Evaluation Dataset

To meet the evaluation requirements and establish uniform standards for future research, this study developed a large-scale dataset of UAV and satellite RGB imagery, covering an area of 286 square kilometers (km2). The dataset encompasses three regions: Jingjin New City (JJ), Yongxing (YX), and Shanshui (SS), as depicted in Figure 1. JJ, located in northern China, spans an area of 33 km2 and primarily includes artificial structures, farmland, and water bodies, with a total of 204 image pairs created. YX, situated in western China, covers 53 km2, predominantly consisting of artificial buildings and farmland, with 338 image pairs produced. SS, situated in southern China, extends over 200 km2 and comprises artificial structures, farmland, forest land, and water bodies, with 1367 image pairs generated. The total dataset includes 1909 image pairs, each with dimensions of 530 × 530 pixels and a ground spatial resolution of 0.5 m. All reference satellite imagery were obtained from Google Earth, which also has a spatial resolution of 0.5 m. Given that the resolution of UAV imagery is typically known, the dataset does not include multi-scale experiments; nor does it account for large rotation angles.
To quantitatively assess the performance of algorithms, we manually provide a true geometric transformation matrix H for each image pair as a standard. The H matrix is a 3 × 3 matrix, represented by the following equation:
x y 1 = h 11 h 12 h 13 h 21 h 22 h 23 h 31 h 32 h 33 x y 1
Here, ( x , y ,   1 ) and ( x , y , 1 ) are the coordinates in the sensed (UAV) and reference (satellite) images, respectively, and h 11 to h 33 are the parameters of the H matrix. These parameters can be derived using at least four pairs of manually matched keypoints.

2.2. Methods

The benchmarking workflow is illustrated in Figure 2, comprising three steps. Initially, keypoints are identified using a detector, and common keypoints are filtered using a distance threshold to calculate the repeatability. Subsequently, descriptors are employed to generate feature vectors for the keypoints, followed by calculating the Feature Distance Ratio (FDR) of feature vectors of common and non-common keypoints to evaluate the performance of descriptors. Finally, the Match Success Rate (MSR), Root Mean Square Error (RMSE), and runtime of image matching are determined through matching method and consistency verification. The first two steps of this process assess the core components of algorithms while the last step evaluates the overall performance of the algorithms. It is important to note that in the study, these efficiency experiments were conducted strictly using the MATLAB2023A codes provided by the original authors of each algorithm without considering whether these algorithms utilized parallel strategies.

2.2.1. Detectors and Repeatability

Common points refer to those in a pair of images (reference and sensed image) that have matching positions within their keypoints. The repeatability is defined as the proportion of common keypoints relative to the total number of keypoints. Studies have shown that a higher repeatability increases the probability of correct matches, and thus, we use this metric to evaluate the performance of different algorithms at the detector stage.
The formula for calculating common keypoints is as follows [32]:
E l o c = r e f x ,   y H × e n x , y
Here, r e f x , y and s e n x , y represent the keypoint positions in the reference image and the sensed image, respectively. H is the true geometric transformation matrix between the two images and E l o c represents the positional error. If this error is less than a distance threshold ε (set to 3 in this study), the pair of keypoints is considered common. The total number of common keypoints is divided by the total number of keypoints to calculate the repeatability. To ensure consistency in evaluation, the number of keypoints is limited to 5000 for all algorithms.

2.2.2. Descriptors and Feature Distance Ratio

Descriptors generate corresponding feature vectors based on keypoint information, such as on location, scale, and orientation. In assessing the performance of descriptors, some studies directly quantify their performance by calculating the similarity between feature vectors for common keypoints in image pairs while others indirectly assess performance through matching results. We have opted for a direct assessment method. Previous research typically used the similarity between common keypoints to measure the performance of descriptors. However, we believe that relying solely on the similarity between common keypoints is insufficient for a comprehensive assessment of descriptor performance as distinguishing between good and poor descriptors should focus on high similarity among common keypoints and low similarity among non-common keypoints. Therefore, we propose a new assessment method that compares the Feature Distance Ratio (FDR) of the distances of feature vectors between common and non-common keypoints in images to more comprehensively measure the performance of descriptors. The specific calculation steps are as follows:
(1)
First, calculate the average distance of feature vectors for common keypoints, C d i s :
C d i s = 1 N i = 1 N d ( v i , w i )
Here, v i and w i are the feature vectors of common keypoints between two images, d ( v i , w i )   represents the distance between the feature vectors, we use Euclidean distance, and N is the number of common keypoints.
(2)
Calculate the average distance of feature vectors for non-common keypoints, N C d i s :
N C d i s = 1 M j = 1 M d ( v j , w j )
Here, v j and w j are the feature vectors of non-common keypoints between two images, d ( v j , w j ) represents the distance between the feature vectors, we use Euclidean distance, and M is the number of non-common keypoints.
(3)
Use the F D R as the metric for the feature descriptor:
F D R = C d i s / N C d i s
The smaller this F D R is, the greater the difference between the descriptors of common and non-common keypoints will be, indicating stronger descriptor capability. This method not only provides a quantitative metric to assess the consistency and discriminative power of descriptors across different images but also helps deepen the understanding of the performance characteristics of descriptors.

2.2.3. Matching and Consistency Verification

Matching and consistency verification primarily involve comparing feature vectors derived from descriptors to identify correspondences and eliminate outliers. For this process, we utilize a combination of FLANN and RANSAC. FLANN (Fast Library for Approximate Nearest Neighbors) is a library extensively used for fast approximate nearest neighbor searches, particularly in feature point matching and image recognition within the fields of computer vision and machine learning. RANSAC is a robust parameter estimation method effective in dealing with datasets contaminated with a significant amount of noise and outliers, commonly applied in applications such as line fitting, camera calibration, and image registration. This combination method first utilizes FLANN to quickly identify a large number of potential matching points, including some possible mismatches. Subsequently, RANSAC is used to validate these matches, eliminating inconsistent matching points and retaining only those that support common geometric transformations. This strategy not only accelerates the matching speed but also significantly improves the accuracy and robustness of the matching.
In assessing the performance of algorithms, we employ two primary metrics: MSR and RMSE, which indicates the precision of the matches. The distance threshold in the RANSAC method significantly affects the results of consistency verification; therefore, we have experimented with various distance threshold settings (3, 5, 7, and 9) to explore their impact on the metrics and evaluate performance at each threshold. These experiments provide deeper insights into the method’s performance variations under different settings.

2.2.4. Geographic Geometric Consistency and Distance Threshold Iteration

Geographic Geometric Consistency

To enhance the reliability of matches and address occasional failures of automatic error detection by RANSAC, we propose a Geographic Geometric Consistency (GGC) method. The GGC method utilizes the geographic geometric information of remote sensing data to identify mismatches by comparing the consistency of distances between matched keypoints. The steps are shown in Figure 3:
(1)
Define two sets of keypoints: Assume P 1 and P 2 represent the initial matched keypoints of the reference image and the sensed image, respectively.
(2)
Calculate geometric differences between matched keypoints:
  • For each point i in   P 1 , compute the distance vector to all other keypoints:
    d 1 ( i ) = P 1 P 1 , i 2
  • For corresponding keypoints in P 2 , considering a scale ratio k (known or computed in real time), compute the distance vector:
    d 2 ( i ) = k · ( P 2 P 2 , i ) 2
  • Compute the difference between the two sets of distance vectors for each keypoint i :
    d ( i ) = d 1 ( i ) d 2 ( i )
(3)
Calculate the proportion exceeding the difference threshold: For each point i , compute the proportion of differences that exceed the threshold T 1 .
π i = 1 n j = 1 n ( d j ( i ) > T 1 )
Here, n is the total number of matched keypoints and T 1 is the geometric difference threshold, set here as 5 pixels.
(4)
Calculate the global proportion exceeding the threshold:
F T = π i > T 2
Here, T 2 is the threshold for judging global difference proportions, set in this study at 0.5, and the F T is index of mismatching or correct matching. This means if more than half of the points have geometric consistency differences exceeding the threshold, the match result is likely mismatching.
The settings of thresholds T 1 and T 2 are crucial for accurate image matching. T 1 serves as the threshold for image geometric differences, determining whether the geometric discrepancy of a single point falls within a reasonable range. T 2 is used to judge the proportion of global differences and assesses the reliability of global matching. These thresholds are primarily influenced by the initial matching algorithm, image deformation, and image spatial resolution. A robust initial matching algorithm typically results in smaller geometric differences among matching points, allowing for lower settings of T 1 and T 2 . Conversely, if the initial algorithm is less effective, it may be necessary to increase the values of T 1 and T 2 . Furthermore, if images have undergone preprocessing to eliminate geometric distortions, T 1 and T 2 can be set lower. If not, these thresholds should be increased to accommodate uncorrected distortions. Additionally, high-resolution images, which render geometric differences more detectable, require higher settings of T 1 and T 2 . In contrast, for low-resolution images where geometric discrepancies are typically larger, T 1 and T 2 can be set lower.
This method effectively identifies and corrects mismatches overlooked by the RANSAC methods, thereby enhancing the precision and reliability of the overall matching process.

Distance Threshold Iteration

A distance threshold iteration method (DTI) based on GGC was introduced to enhance the MSR and RMSE of matches. This method iteratively adjusts the distance threshold in RANSAC to optimize the matching results. Specifically, the RANSAC distance thresholds are sequentially set at 3, 5, 7, and 9. In each iteration, we evaluate whether the matching results meet the criteria set by the GGC method: if the results pass the consistency verification at any given threshold, the iteration stops and the results are accepted. If no satisfactory matches are found by the time the threshold reaches 9, the image pair is deemed unsuccessfully matched. This distance threshold iteration method not only improves the Match Success Rate but also has proven effective in maintaining matching accuracy in our experiments.

2.2.5. Combined Strategies

Combining the core components of different algorithms may enhance overall performance while integrating these algorithms at a holistic level could also increase the MSR, thereby improving the stability of UAV visual positioning systems. This strategy is crucial for both engineering implementation and algorithmic innovation. Initially, we combined the core components that demonstrated advantages during testing and assessed their combined performance within a unified framework. Subsequently, we conducted a comprehensive analysis of the overall matching results from different algorithms to explore performance variations across integrated algorithmic approaches. This analysis will provide a solid theoretical foundation and practical guidance for future applications of algorithmic combinations in this field.

2.2.6. Runtime

The efficiency performance of matching algorithms is crucial for the practical UAV operations of matching algorithms. We compared the efficiency performance of each algorithm using runtime as the metric. Runtime refers to the time taken from loading image pairs to outputting matching results. The experiments were conducted using a Lenovo Y9000K laptop equipped with an Intel i7-11800H CPU, an NVIDIA RTX 3080 Ti GPU, and 32 GB of Samsung RAM. The laptop is produced by Lenovo Group Ltd., which is headquartered in Beijing, China. The software used for this analysis was MATLAB R2023b.

3. Experimental Evaluations

3.1. Detectors and Repeatability

This section discusses the repeatability of detectors. Figure 4 shows the distribution of repeatability for each image pair across different datasets, obtained using four types of detectors. For clarity, the data have been smoothed. Overall, the repeatability scores of HAPCG’s Harris and LNIFT’s ORB are very similar and higher than those of RIFT’s FAST and WSSF’s KAZE, with WSSF’s KAZE scoring notably lower than the others.
Table 1 presents the statistical results (mean values) of the four detectors across different datasets. Observing from the dataset perspective, the performance variations among all methods were minimal across different datasets, with differences within 0.02, except for RIFT’s FAST, which was slightly lower in the YX dataset. From the methods perspective, the average repeatability scores for HAPCG’s Harris and LNIFT’s ORB are both 0.48, that of RIFT’s FAST is 0.44, and that of WSSF’s KAZE is the lowest at 0.39. By examining the distribution of each image pair and the overall dataset statistics, it is evident that HAPCG’s Harris and LNIFT’s ORB demonstrate superior performance in repeatability.

3.2. Descriptors and Feature Distance Ratio

This section evaluates the performance of the four descriptors by measuring the FDR, with a smaller value indicating stronger expressive power. Figure 5 presents the FDR for each image pair and the data have been smoothed for clearer visualization. The results show that on the JJ and SS datasets, the values for RIFT’s SIFT-like are significantly higher than those for other descriptors while they are comparable on the YX dataset. The distributions of the other three descriptors—HAPCG’s Lop-Polar, LNIFT’s HOG-like, and WSSF’s GLOH—are very tight and overlap significantly.
Table 2 lists the average FDR for different descriptors. From the perspective of the datasets, the variations in LNIFT’s HOG-like and WSSF’s GLOH are minor, fluctuating only within 0.04; however, the maximum differences between datasets for HAPCG’s Lop-Polar and RIFT’s SIFT-like reach 0.07 and 0.11, respectively, indicating significant variability. Considering the average values across all data, HAPCG’s Lop-Polar, LNIFT’s HOG-like, and WSSF’s GLOH perform closely whereas RIFT’s SIFT-like’s average value is notably higher than those of those three descriptors. This statistical result suggests that RIFT’s SIFT-like has comparatively weaker feature descriptor capability than the other three methods, which is consistent with the subsequent matching results.

3.3. Matching and Consistency Verification

This section presents the results of matching and consistency verification based on FLANN and RANSAC. To assess the accuracy of the matches, each result is verified against a visual judgment standard to determine its correctness. Figure 6 and Table 3 display the distribution and statistical results for the MSR and the RMSE, where the RMSE is calculated as the average for all correctly matched image pairs. Figure 6 shows that as the RANSAC distance threshold increases, the MSR for all four algorithms improves, but this is accompanied by an increase in the RMSE. Notably, the increase in the MSR is most significant when the threshold is raised from 3 to 5, with diminishing returns as the threshold increases from 5 to 9. Despite the increase in the MSR, its rate of increase is lower than that of the RMSE. The statistics in Table 3 reveal that when the distance threshold is increased from 3 to 5, the average MSR across all methods increases by 6.66% whereas the increases from 5 to 7 and 7 to 9 are 2.3% and 1.25%, respectively. This indicates that appropriately relaxing the distance threshold can significantly enhance the MSR, especially when the threshold is increased from 3 to 5. Simultaneously, the increase in the threshold significantly impacts the RMSE, with an increase of 1.09 pixels when the threshold is raised from 3 to 5. Comparing the performances of different algorithms, HAPCG and WSSF outperform LNIFT and RIFT. Given RIFT’s lower MSR, this results in a higher RMSE. These findings demonstrate that the setting of the RANSAC distance threshold significantly affects the matching performance of each algorithm. MSR and RMSE present a trade-off relationship: improving the Match Success Rate usually comes at the expense of precision and vice versa. Therefore, balancing these factors is crucial for optimizing algorithm performance.

3.4. Geographic Geometric Consistency and Distance Threshold Iteration

3.4.1. Geographic Geometric Consistency

The Geographic Geometric Consistency (GGC) effectively identifies mismatches in RANSAC that are incorrectly deemed correct. In practical applications, the reliability of RANSAC’s results is typically assessed automatically by the number of matching keypoints and the RMSE of the transformation model. To evaluate the performance of GGC, we set the automatic detection criteria for mismatches as an RMSE of less than 5 and more than 5 matching keypoints in the RANSAC results and compared these with the outcomes from GGC. The distance threshold for GGC was set at 5 pixels, and the T 2 ratio threshold set at 0.5, meaning that if the error exceeded 5 pixels for more than half of the matching points, the match would be deemed incorrect. Figure 7 displays examples of mismatching and correct matching. Our evaluation was based on the visual inspection of match results and Table 4 only presents cases where the RANSAC distance threshold was 3. From the perspective of the datasets, GGC effectively identified erroneous matches within RANSAC across all datasets, with an average detection rate of 7.35%. From an algorithmic standpoint, GGC significantly impacted all four methods, with detection rates ranging from 5.08% to 9.22%. These results demonstrate that our proposed GGC method can effectively identify mismatches in RANSAC, which is crucial for enhancing the reliability of UAV positioning matches.

3.4.2. Distance Threshold Iteration

Table 5 presents the statistical results of the distance threshold iteration (DTI) method, showcasing a comprehensive analysis of all images. The results indicate that the average MSR for the four algorithms reached 87.29%, closely aligning with the 87.46% observed at a distance threshold of 9, as shown in Table 3. The average RMSE was 2.22 pixels, slightly higher than the 1.87 pixels observed at a distance threshold of 3. This demonstrates that our distance threshold iteration method maintains a high MSR while also achieving high matching accuracy, striking a good balance between the two. Additionally, the average Execution Time Ratio (ETR) was only 1.52 times that of a single execution, indicating effective control over computational costs. An average RMSE of 2.22 pixels corresponds to a ground positioning accuracy of 1.11 m, surpassing all other methods mentioned in the introduction.
In the performance evaluation of individual algorithms, HAPCG excelled in all three metrics, achieving an MSR of 92.29%, an RMSE of 2.04 pixels, and an ETR of only 1.32. These results demonstrate that our distance threshold iteration method, based on geometric consistency verification, not only significantly improves the MSR but also maintains a low execution time ratio and high accuracy, thus proving its practicality and efficiency in UAV positioning applications.

3.5. Combined Strategies

3.5.1. Combination of Advantageous Components

The results from Section 3.1 and Section 3.2 demonstrate that HAPCG’s Harris detector and WSSF’s GLOH descriptor both exhibit excellent performance. Consequently, we attempted to combine these two advantageous components to assess whether they could enhance matching performance. The statistical results of the combined components shown in Table 6, compared to the previous performances of HAPCG and WSSF, indicate that the MSR at different thresholds was lower than when using HAPCG and WSSF individually whereas the RMSE of the combined results, although lower than that of HAPCG, was slightly higher than that of WSSF. This phenomenon occurs because the RMSE only considers correctly matched image pairs, leading to a biased high outcome. The analysis suggests that the performance of the combined advantageous components did not exhibit a significant improvement. This may be due to the high degree of interdependence among the components of these methods. Our combination experiment only altered the positions of the keypoints, overlooking crucial information such as scale and orientation. Since the existing algorithms’ code has been encrypted, we were unable to make deeper modifications, limiting the further optimization of descriptors and matching. The effective combination of advantageous components would require comprehensive code-level improvements.

3.5.2. Comprehensive Combinations of Algorithms

Table 7 summarizes the results of combining four algorithms under two scenarios: a fixed distance threshold of 3 and a DTI. In our combination tests, we evaluated combinations of two, three, and four algorithms. The primary evaluation metric was the MSR. The results indicate that the combinations “HAPCG + WSSF”, “HAPCG + LNIFT + WSSF”, and “HAPCG + LNIFT + RIFT + WSSF” demonstrated optimal performance under both the fixed and iterative distance thresholds. Notably, the high-performing individual algorithm HAPCG (refer to the results in Table 3 and Table 5) played a pivotal role in these combinations, significantly enhancing overall performance when paired with any other algorithm. These findings suggest that comprehensive combinations of different algorithms can effectively increase the match success rate. When computational costs are not a concern, combining multiple algorithms substantially enhances the stability and reliability of UAV visual positioning systems.

3.6. Runtime

Table 8 reports the average runtime of each algorithm on three datasets. The results show that the RIFT algorithm exhibits the highest efficiency, followed by HAPCG and LNIFT, while WSSF has relatively lower efficiency. These findings offer a crucial basis for selecting the most appropriate algorithm combination for practical applications.

4. Application

To evaluate the practical application performance of different multi-source image matching algorithms, GGC, DTI, and algorithm combination strategies, we used the UAV localization dataset from the paper “GPS Denied UAV Localization using Pre-exiting Satellite Imagery” for testing [33], as illustrated in Figure 8. The flight distance of this dataset exceeds 0.85 km and consists of 17 pairs of UAV and satellite images. Given a flight speed of 10 m/s, each image frame is captured at an interval of 5 s. The matching process benefits from the ability to perform coarse predictions of image direction and position through inter-frame matching, resulting in relatively minor rotation, scale, and offset differences between image pairs. The UAV images are sized at 252 × 252 pixels while the satellite images vary from 305 × 315 to 362 × 377 pixels.
Figure 9 displays the matching results for the first four image pairs in the dataset using four different matching algorithms. It is evident that all four algorithms demonstrate numerous matching points and achieve effective matching. The primary reason for this success is that the UAV localization dataset features rich textures and distinct structural characteristics, making it conducive to effective matching.
As shown in Table 9, all four matching algorithms achieved a 100% Match Success Rate (MSR) on this dataset, which was conducive to matching due to its distinct structural features. Regarding positioning accuracy, the Root Mean Square Error (RMSE) for all methods was maintained within 2 pixels, corresponding to a positioning error within 0.9 m. This performance was a substantial improvement over the 7.06 m reported in the original paper, highlighting the significant potential of heterogeneous remote-sensing image matching algorithms in UAV positioning. In terms of runtime, the average runtime for the HAPCG and RIFT algorithms was about 4 s, aligning with the practical requirement of positioning once every 5 s in this dataset (considering a flight speed of 10 m/s). However, the runtimes for the LNIFT and WSSF algorithms were 9.68 s and 17.76 s, respectively, which were somewhat inadequate. Additionally, since the dataset was relatively straightforward, the benefits of combining algorithms were not pronounced. For instance, the positioning accuracy of the combined HAPCG + RIFT algorithms improved by only 0.01 m compared to using RIFT alone, and their combined runtime did not increase significantly. This indicates that in certain application scenarios, heterogeneous image matching algorithms in the field of remote sensing indeed offer considerable advantages for UAV positioning.

5. Conclusions

Multi-source image matching algorithms are crucial for UAV visual positioning. For this purpose, we have developed a comprehensive evaluation framework and created a large dataset comprising 1909 pairs of UAV and satellite images from three regions. Our study also introduced a descriptor performance metric, the Feature Distance Ratio (FDR), which effectively quantifies the capabilities of different descriptors. Although the RANSAC method reduces most mismatches, occasional mismatches can be fatal for UAV visual positioning. Therefore, we proposed a novel Geographic Geometric Consistency (GGC) method that effectively identifies mismatches in RANSAC results. Based on this, we developed a distance threshold iteration (DTI) method that optimizes the balance between the Match Success Rate (MSR), RMSE, and Execution Time Ratio (ETR), significantly enhancing UAV visual positioning performance.
Furthermore, we evaluated the combined effects of different core components. Although the combination did not achieve the expected results due to high inter-component coupling within the algorithms, combining different algorithms as a whole significantly enhanced performance. With sufficient computational resources, such algorithm combinations have significant potential applications in the field of UAV visual positioning. We also compared the multi-source image matching algorithms with existing research on UAV positioning and the results demonstrated that these algorithms achieved sub-meter-level positioning accuracy. To promote the application of multi-source image matching technologies in UAV visual positioning and to encourage further research by other researchers, we have made all datasets, results, and codes publicly available at the following URL: https://github.com/LJL-UAV/UAV-Positioning (accessed on 14 August 2024).

Author Contributions

Funding acquisition, J.X. and Y.L.; writing—original draft, J.L.; writing—review and editing, Y.R., F.L., H.Y. (Huanyin Yue) and H.Y. (Huping Ye). All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the National Key Research and Development Program of China (No. 2022YFC3320802 and No. 2023YFB3905704), and Central Guiding Local Technology Development (No. 226Z5901G).

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

Although the authors Jianli Liu, Jincheng Xiao, Yafeng Ren, and Yingcheng Li were employed by China TopRS Technology Company Limited, the funding for this research was provided by the National Key R&D Program of China’s Ministry of Science and Technology. Additionally, this research was conducted without any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Liu, J.; Liao, X.; Ye, H.; Yue, H.; Wang, Y.; Tan, X.; Wang, D. UAV swarm scheduling method for remote sensing observations during emergency scenarios. Remote Sens. 2022, 14, 1406. [Google Scholar] [CrossRef]
  2. Liao, X.; Zhang, Y.; Su, F.; Yue, H.; Ding, Z.; Liu, J. UAVs surpassing satellites and aircraft in remote sensing over China. Int. J. Remote Sens. 2018, 39, 7138–7153. [Google Scholar] [CrossRef]
  3. Zhang, D.; Liu, J.; Ni, W.; Sun, G.; Zhang, Z.; Liu, Q.; Wang, Q. Estimation of forest leaf area index using height and canopy cover information extracted from unmanned aerial vehicle stereo imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 471–481. [Google Scholar] [CrossRef]
  4. Yu, T.; Ni, W.; Liu, J.; Zhao, R.; Zhang, Z.; Sun, G. Extraction of tree heights in mountainous natural forests from UAV leaf-on stereoscopic imagery based on approximation of ground surfaces. Remote Sens. Environ. 2023, 293, 113613. [Google Scholar] [CrossRef]
  5. Wang, H.; Cheng, Y.; Liu, N.; Zhao, Y.; Chan, J.C.-W.; Li, Z. An Illumination-Invariant Shadow-Based Scene Matching Navigation Approach in Low-Altitude Flight. Remote Sens. 2022, 14, 3869. [Google Scholar] [CrossRef]
  6. Gyagenda, N.; Hatilima, J.V.; Roth, H.; Zhmud, V. A review of GNSS-independent UAV navigation techniques. Robot. Auton. Syst. 2022, 152, 104069. [Google Scholar] [CrossRef]
  7. Couturier, A.; Akhloufi, M.A. A review on absolute visual localization for UAV. Robot. Auton. Syst. 2021, 135, 103666. [Google Scholar] [CrossRef]
  8. Lindsten, F.; Callmer, J.; Ohlsson, H.; Törnqvist, D.; Schön, T.B.; Gustafsson, F. Geo-referencing for UAV navigation using environmental classification. In Proceedings of the 2010 IEEE International Conference on Robotics and Automation, Anchorage, AK, USA, 3–7 May 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 1420–1425. [Google Scholar]
  9. Mei, C.; Fan, Z.; Zhu, Q.; Yang, P.; Hou, Z.; Jin, H. A Novel Scene Matching Navigation System for UAVs Based on Vision/Inertial Fusion. IEEE Sens. J. 2023, 23, 6192–6203. [Google Scholar] [CrossRef]
  10. Cesetti, A.; Frontoni, E.; Mancini, A.; Zingaretti, P.; Longhi, S. A vision-based guidance system for UAV navigation and safe landing using natural landmarks. J. Intell. Robot. Syst. 2010, 57, 233–257. [Google Scholar] [CrossRef]
  11. Bay, H.; Ess, A.; Tuytelaars, T.; Van Gool, L. Speeded-up robust features (SURF). Comput. Vis. Image Underst. 2008, 110, 346–359. [Google Scholar] [CrossRef]
  12. Feng, R.; Shen, H.; Bai, J.; Li, X. Advances and opportunities in remote sensing image geometric registration: A systematic review of state-of-the-art approaches and future research directions. IEEE Geosci. Remote Sens. Mag. 2021, 9, 120–142. [Google Scholar] [CrossRef]
  13. Zhu, B.; Zhou, L.; Pu, S.; Fan, J.; Ye, Y. Advances and challenges in multimodal remote sensing image registration. IEEE J. Miniaturization Air Space Syst. 2023, 4, 165–174. [Google Scholar] [CrossRef]
  14. Ye, Y.; Shan, J.; Bruzzone, L.; Shen, L. Robust registration of multimodal remote sensing images based on structural similarity. IEEE Trans. Geosci. Remote Sens. 2017, 55, 2941–2958. [Google Scholar] [CrossRef]
  15. Ye, Y.; Shan, J.; Hao, S.; Bruzzone, L.; Qin, Y. A local phase based invariant feature for remote sensing image matching. ISPRS J. Photogramm. Remote Sens. 2018, 142, 205–221. [Google Scholar] [CrossRef]
  16. Van Dalen, G.J.; Magree, D.P.; Johnson, E.N. Absolute localization using image alignment and particle filtering. In Proceedings of the Aiaa Guidance, Navigation, and Control Conference, San Diego, CA, USA, 4–8 January 2016; p. 0647. [Google Scholar]
  17. Yol, A.; Delabarre, B.; Dame, A.; Dartois, J.E.; Marchand, E. Vision-based absolute localization for unmanned aerial vehicles. In Proceedings of the 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, Chicago, IL, USA, 14–18 September 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 3429–3434. [Google Scholar]
  18. Wan, X.; Liu, J.; Yan, H.; Morgan, G.L. Illumination-invariant image matching for autonomous UAV localisation based on optical sensing. ISPRS J. Photogramm. Remote Sens. 2016, 119, 198–213. [Google Scholar] [CrossRef]
  19. Yang, Z.; Dan, T.; Yang, Y. Multi-temporal remote sensing image registration using deep convolutional features. IEEE Access 2018, 6, 38544–38555. [Google Scholar] [CrossRef]
  20. Zhang, X.; He, Z.; Ma, Z.; Wang, Z.; Wang, L. Llfe: A novel learning local features extraction for uav navigation based on infrared aerial image and satellite reference image matching. Remote Sens. 2021, 13, 4618. [Google Scholar] [CrossRef]
  21. Mughal, M.H.; Khokhar, M.J.; Shahzad, M. Assisting UAV localization via deep contextual image matching. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 2445–2457. [Google Scholar] [CrossRef]
  22. Wu, S.; Du, C.; Chen, H.; Jing, N. Coarse-to-fine UAV image geo-localization using multi-stage Lucas-Kanade networks. In Proceedings of the 2021 2nd Information Communication Technologies Conference (ICTC), Nanjing, China, 7–9 May 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 220–224. [Google Scholar]
  23. Jiang, X.; Ma, J.; Xiao, G.; Shao, Z.; Guo, X. A review of multimodal image matching: Methods and applications. Inf. Fusion 2021, 73, 22–71. [Google Scholar] [CrossRef]
  24. Shan, M.; Wang, F.; Lin, F.; Gao, Z.; Tang, Y.Z.; Chen, B.M. Google map aided visual navigation for UAVs in GPS-denied environment. In Proceedings of the 2015 IEEE International Conference on Robotics and Biomimetics (ROBIO), Zhuhai, China, 6–9 December 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 114–119. [Google Scholar]
  25. Chiu, H.P.; Das, A.; Miller, P.; Samarasekera, S.; Kumar, R. Precise vision-aided aerial navigation. In Proceedings of the 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, Chicago, IL, USA, 14–18 September 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 688–695. [Google Scholar]
  26. Mantelli, M.; Pittol, D.; Neuland, R.; Ribacki, A.; Maffei, R.; Jorge, V.; Prestes, E.; Kolberg, M. A novel measurement model based on abBRIEF for global localization of a UAV over satellite images. Robot. Auton. Syst. 2019, 112, 304–319. [Google Scholar] [CrossRef]
  27. Zhang, X.; Leng, C.; Hong, Y.; Pei, Z.; Cheng, I.; Basu, A. Multimodal remote sensing image registration methods and advancements: A survey. Remote Sens. 2021, 13, 5128. [Google Scholar] [CrossRef]
  28. Yao, Y.; Zhang, Y.; Wan, Y.; Liu, X.; Guo, H. Heterologous images matching considering anisotropic weighted moment and absolute phase orientation. Geomat. Inf. Sci. Wuhan Univ. 2021, 46, 1727–1736. [Google Scholar]
  29. Li, J.; Xu, W.; Shi, P.; Zhang, Y.; Hu, Q. LNIFT: Locally normalized image for rotation invariant multimodal feature matching. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5621314. [Google Scholar] [CrossRef]
  30. Li, J.; Hu, Q.; Ai, M. RIFT: Multi-modal image matching based on radiation-variation insensitive feature transform. IEEE Trans. Image Process. 2019, 29, 3296–3310. [Google Scholar] [CrossRef] [PubMed]
  31. Wan, G.; Ye, Z.; Xu, Y.; Huang, R.; Zhou, Y.; Xie, H.; Tong, X. Multi-Modal Remote Sensing Image Matching Based on Weighted Structure Saliency Feature. IEEE Trans. Geosci. Remote Sens. 2023, 62, 4700816. [Google Scholar]
  32. Zhu, B.; Yang, C.; Dai, J.; Fan, J.; Qin, Y.; Ye, Y. R2FD2: Fast and Robust Matching of Multimodal Remote Sensing Images via Repeatable Feature Detector and Rotation-invariant Feature Descriptor. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5606115. [Google Scholar] [CrossRef]
  33. Goforth, H.; Lucey, S. GPS-denied UAV localization using pre-existing satellite imagery. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 2974–2980. [Google Scholar]
Figure 1. Evaluation dataset.
Figure 1. Evaluation dataset.
Remotesensing 16 03025 g001
Figure 2. Workflow of the benchmarking.
Figure 2. Workflow of the benchmarking.
Remotesensing 16 03025 g002
Figure 3. Workflow of the Geographic Geometric Consistency (GGC) method.
Figure 3. Workflow of the Geographic Geometric Consistency (GGC) method.
Remotesensing 16 03025 g003
Figure 4. Distribution of repeatability.
Figure 4. Distribution of repeatability.
Remotesensing 16 03025 g004
Figure 5. Distribution of FDR.
Figure 5. Distribution of FDR.
Remotesensing 16 03025 g005
Figure 6. Distribution of results of matching and consistency verification ((ac) are for MSR; (df) are for RMSE).
Figure 6. Distribution of results of matching and consistency verification ((ac) are for MSR; (df) are for RMSE).
Remotesensing 16 03025 g006
Figure 7. Examples of mismatch and correct matches of RANSAC.
Figure 7. Examples of mismatch and correct matches of RANSAC.
Remotesensing 16 03025 g007
Figure 8. (a) Overview of the UAV localization dataset flight path. (b) Some examples of UAV images and their corresponding satellite images [33].
Figure 8. (a) Overview of the UAV localization dataset flight path. (b) Some examples of UAV images and their corresponding satellite images [33].
Remotesensing 16 03025 g008
Figure 9. Partial matching results of four matching algorithms on UAV localization dataset.
Figure 9. Partial matching results of four matching algorithms on UAV localization dataset.
Remotesensing 16 03025 g009
Table 1. Statistical results (mean values) of repeatability.
Table 1. Statistical results (mean values) of repeatability.
DatasetDetectors
HAPCG’s HarrisLNIFT’s ORBRIFT’s FASTWSSF’s KAZE
JJ0.480.490.470.41
YX0.470.470.390.40
SS0.490.490.450.39
Total0.480.480.440.39
Table 2. Statistical results of FDR.
Table 2. Statistical results of FDR.
DatasetDescriptors
HAPCG’s Lop-PolarLNIFT’s HOG-likeRIFT’s SIFT-likeWSSF’s GLOH
JJ0.540.540.630.51
YX0.610.550.550.53
SS0.570.580.660.56
Total0.570.570.640.55
Table 3. Statistical results of matching and consistency verification.
Table 3. Statistical results of matching and consistency verification.
DatasetAlgorithmsMetrics
MSR (%)RMSE (Pixel)
35793579
TotalHAPCG85.1290.6291.9392.511.822.843.644.27
LNIFT70.4080.7885.0786.481.923.184.425.25
RIFT71.3076.7478.2679.721.882.803.584.29
WSSF82.1987.4889.5890.991.873.013.954.75
Mean77.2583.9186.2187.461.872.963.904.64
Table 4. Statistical results of GGC.
Table 4. Statistical results of GGC.
AlgorithmsDataset
JJYXSSTotal
HAPCG2.94%10.36%8.19%8.01%
LNIFT7.35%10.95%6.07%7.07%
RIFT9.80%11.24%8.63%9.22%
WSSF2.45%10.36%4.17%5.08%
Mean5.64%10.73%6.77%7.35%
Table 5. Statistical results of the DTI (ETR stands for Execution Time Ratio).
Table 5. Statistical results of the DTI (ETR stands for Execution Time Ratio).
DatasetMetricsAlgorithms
HAPCGLNIFTRIFTWSSFMean
TotalMSR (%)92.2986.4779.7190.6887.29
RMSE (pixel)2.042.512.142.172.22
ETR1.321.631.731.401.52
Table 6. Results of combination of advantageous components (3, 5, 7, and 9 are distance thresholds).
Table 6. Results of combination of advantageous components (3, 5, 7, and 9 are distance thresholds).
DatasetAlgorithmsMetrics
MSR (%)RMSE (Pixel)
35793579
TotalHAPCG’s Harris + WSSF’s GLOH79.6286.4489.0089.471.882.893.764.48
Table 7. MSR values of comprehensive combinations of algorithm (3 is distance threshold).
Table 7. MSR values of comprehensive combinations of algorithm (3 is distance threshold).
Algorithm CombinationsTotal DatasetAlgorithm CombinationsTotal Dataset
3 (%)DTI (%) 3 (%)DTI (%)
HAPCG + LNIFT89.1195.39RIFT + WSSF87.1192.56
HAPCG + RIFT89.4294.24HAPCG + LNIFT + RIFT91.3595.81
HAPCG + WSSF89.5895.39HAPCG + LNIFT + WSSF91.7896.44
LNIFT + RIFT82.7190.73LNIFT + RIFT + WSSF89.3194.55
LNIFT + WSSF87.2293.92HAPCG + LNIFT + RIFT + WSSF92.8896.75
Table 8. Runtime of each algorithm. (Units are seconds.)
Table 8. Runtime of each algorithm. (Units are seconds.)
AlgorithmsDataset
JJYXSSTotal
HAPCG11.6810.9911.4311.37
LNIFT15.4116.0215.2015.54
RIFT8.519.398.928.94
WSSF31.5632.2731.2831.70
Table 9. Matching and positioning results.
Table 9. Matching and positioning results.
AlgorithmsMSR (%)Runtime (s)RMSE (Pixel)Positioning Error (m)
HAPCG1004.011.840.83
LNIFT1009.681.990.90
RIFT1004.071.750.79
WSSF10017.761.850.83
HAPCG + RIFT1004.421.740.78
HAPCG + RIFT + LNIFT1008.331.720.78
HAPCG + LNIFT + RIFT + WSSF10015.411.720.78
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, J.; Xiao, J.; Ren, Y.; Liu, F.; Yue, H.; Ye, H.; Li, Y. Multi-Source Image Matching Algorithms for UAV Positioning: Benchmarking, Innovation, and Combined Strategies. Remote Sens. 2024, 16, 3025. https://doi.org/10.3390/rs16163025

AMA Style

Liu J, Xiao J, Ren Y, Liu F, Yue H, Ye H, Li Y. Multi-Source Image Matching Algorithms for UAV Positioning: Benchmarking, Innovation, and Combined Strategies. Remote Sensing. 2024; 16(16):3025. https://doi.org/10.3390/rs16163025

Chicago/Turabian Style

Liu, Jianli, Jincheng Xiao, Yafeng Ren, Fei Liu, Huanyin Yue, Huping Ye, and Yingcheng Li. 2024. "Multi-Source Image Matching Algorithms for UAV Positioning: Benchmarking, Innovation, and Combined Strategies" Remote Sensing 16, no. 16: 3025. https://doi.org/10.3390/rs16163025

APA Style

Liu, J., Xiao, J., Ren, Y., Liu, F., Yue, H., Ye, H., & Li, Y. (2024). Multi-Source Image Matching Algorithms for UAV Positioning: Benchmarking, Innovation, and Combined Strategies. Remote Sensing, 16(16), 3025. https://doi.org/10.3390/rs16163025

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop