1. Introduction
As unmanned aerial vehicle (UAV) technology becomes increasingly utilized in fields such as agriculture, urban planning, and emergency response [
1,
2,
3,
4], its positioning capabilities have become critical for performing various tasks. Initially, UAV positioning relied primarily on the Inertial Navigation System (INS) as a standalone method. With the rapid advancement of satellite navigation technology, modern UAV positioning often combines the INS with Global Navigation Satellite Systems (GNSSs) [
5]. This combination significantly improves the accuracy and stability of positioning compared to using only the INS system. However, GNSS signals may be blocked or interfered with in denial environments, leading to reduced localization performance. To address this challenge, researchers have begun exploring alternative positioning solutions, among which UAV visual positioning has emerged as an important alternative method [
6,
7,
8]. This method achieves positioning by matching images real-time captured by the UAV with existing geographically referenced high-resolution satellite imagery [
9].
The effectiveness of UAV visual positioning is dependent on the performance of image matching algorithms. Due to the differences in sensors and imaging modes between UAVs and satellites, there are significant appearance differences or nonlinear radiometric variations between their images. Traditional image matching algorithms, such as SIFT [
10] and SURF [
11], often struggle to handle these disparities effectively. To overcome these challenges, researchers have developed a variety of multi-source image matching algorithms. These algorithms are primarily categorized into three types: region-based, deep-learning-based, and feature-based algorithms [
12,
13].
Region-based algorithms achieve image matching by measuring the similarity between image blocks [
14,
15]. For instance, Dalen et al. [
16] proposed a method for calculating UAV positions using normalized cross-correlation, which resulted in a maximum positioning error of 12.5 m. Yol et al. [
17] utilized mutual information as a criterion for positional similarity, achieving horizontal positioning errors of 6.56 m and 8.02 m. Wan et al. [
18] developed a method using illumination-invariant phase correlation, achieving an average positioning error of 1.31 m. This method outperformed the normalized cross-correlation error of 2.19 m and the mutual information error of 3.08 m, highlighting the effectiveness of the phase correlation approach. These methods’ performance levels largely depend on the initial positioning and the extent of image overlap, particularly suffering a significant decrease in performance with large image rotation angles.
With the advancement of deep learning technology, deep-learning-based multi-source image matching algorithms have also gained extensive research interest in UAV visual positioning [
19]. For example, Zhang et al. [
20] proposed a deep-learning-based local feature matching algorithm that extracts features from UAV infrared and satellite imagery, achieving positioning accuracies ranging from 1.92 to 7.78 m. Mugha et al. [
21] developed a deep-learning-based feature point extraction method that utilizes point-to-point template matching to achieve a positioning error of 3.7 m. Wu et al. [
22] introduced a multi-level Lucas Kanade deep learning algorithm that leverages global texture features of UAV imagery, with positioning accuracies between 4.33 and 9.8 m. Although deep-learning-based image matching algorithms demonstrate significant potential in UAV positioning, their performance is constrained by large geometric deformations and large data volumes.
Feature-based multi-source image matching algorithms offer a structured and highly flexible workflow, with the core components including detector, descriptor, matching, and consistency verification [
23]. For instance, Shan et al. [
24] developed a positioning method combining a histogram of oriented gradients, particle filters, and optical flow, achieving a positioning error of 6.77 m. Chiu et al. [
25] introduced a positioning method that combines Inertial Measurement Units (IMUs) with geographic image registration, achieving a positioning error of 9.83 m under GPS-denied conditions. Mantelli et al. [
26] used the abBREIF descriptor to match UAV images with satellite maps for UAV positioning, with a positioning error of 17.78 m. These studies in feature-based multi-source image matching primarily focused on the domain of positioning and navigation, with less exploration in the field of remote sensing. In recent years, the remote sensing field has seen the emergence of advanced structural-feature-based multi-source image matching algorithms, providing new possibilities for UAV visual positioning.
To the best of our knowledge, although the existing literature extensively discusses the principles of multi-source image matching algorithms [
23,
27], there is a scarcity of experimental studies, and a unified evaluation of the core components and overall performance of these algorithms is lacking, particularly concerning the image matching success rate (MSR), a critical indicator determining the reliability of UAV visual positioning systems. Moreover, due to the lack of standardization in datasets, evaluation criteria, and parameter settings, significant discrepancies exist between different research outcomes, which limits their application in UAV positioning.
For this purpose, this paper first introduces a benchmarking framework to evaluate the core components and overall performance of current advanced multi-source image matching algorithms using a uniform standard. The multi-source image matching algorithms we tested include four significant methods from recent years: HAPCG [
28], LNIFT [
29], RIFT [
30], and WSSF [
31]. HAPCG is a frequency-domain matching algorithm that considers anisotropic weighted moments and absolute phase gradient histograms. It uses anisotropic filtering to compute the nonlinear scale image space and employs Log-Gabor filters to generate maximum and minimum moments, producing anisotropic weighted moment maps. It then captures keypoints in the moment space using the Harris detector and utilizes phase consistency feature values as image gradient characteristics, computing feature vectors via a Log-Polar coordinate description framework. LNIFT is a spatial domain multi-source image matching algorithm that transforms an image into an intermediate modal image using local normalization filters, detects ORB keypoints, optimizes keypoint distribution with non-maximum suppression strategies, and describes keypoints using a HOG-like descriptor. RIFT is a frequency-domain multi-source image matching algorithm that calculates the maximum and minimum moments of an image through phase consistency, uses the FAST detector to form keypoints, constructs the maximum index map using a Log-Gabor sequence, and obtains descriptions using a SIFT-like descriptor. WSSF is a structural-saliency-based multi-source image matching algorithm that initially constructs a scale space, then generates feature maps using phase consistency computation and local normalization filters, extracts keypoints using the KAZE detector and non-maximum suppression strategy, and finally retrieves feature vectors using an improved gradient location and orientation histogram (GLOH) descriptor.
Furthermore, to address the issue of mismatches in consistency verification methods, we propose a Geographic Geometric Consistency (GGC) method that effectively identifies mismatches in RANSAC results, thereby enhancing the reliability of matching. Based on GGC, we have also introduced a distance threshold iteration (DTI) method. This method incrementally lowers the distance threshold to improve the MSR while ensuring both the precision and efficiency of the execution.
4. Application
To evaluate the practical application performance of different multi-source image matching algorithms, GGC, DTI, and algorithm combination strategies, we used the UAV localization dataset from the paper “GPS Denied UAV Localization using Pre-exiting Satellite Imagery” for testing [
33], as illustrated in
Figure 8. The flight distance of this dataset exceeds 0.85 km and consists of 17 pairs of UAV and satellite images. Given a flight speed of 10 m/s, each image frame is captured at an interval of 5 s. The matching process benefits from the ability to perform coarse predictions of image direction and position through inter-frame matching, resulting in relatively minor rotation, scale, and offset differences between image pairs. The UAV images are sized at 252 × 252 pixels while the satellite images vary from 305 × 315 to 362 × 377 pixels.
Figure 9 displays the matching results for the first four image pairs in the dataset using four different matching algorithms. It is evident that all four algorithms demonstrate numerous matching points and achieve effective matching. The primary reason for this success is that the UAV localization dataset features rich textures and distinct structural characteristics, making it conducive to effective matching.
As shown in
Table 9, all four matching algorithms achieved a 100% Match Success Rate (MSR) on this dataset, which was conducive to matching due to its distinct structural features. Regarding positioning accuracy, the Root Mean Square Error (RMSE) for all methods was maintained within 2 pixels, corresponding to a positioning error within 0.9 m. This performance was a substantial improvement over the 7.06 m reported in the original paper, highlighting the significant potential of heterogeneous remote-sensing image matching algorithms in UAV positioning. In terms of runtime, the average runtime for the HAPCG and RIFT algorithms was about 4 s, aligning with the practical requirement of positioning once every 5 s in this dataset (considering a flight speed of 10 m/s). However, the runtimes for the LNIFT and WSSF algorithms were 9.68 s and 17.76 s, respectively, which were somewhat inadequate. Additionally, since the dataset was relatively straightforward, the benefits of combining algorithms were not pronounced. For instance, the positioning accuracy of the combined HAPCG + RIFT algorithms improved by only 0.01 m compared to using RIFT alone, and their combined runtime did not increase significantly. This indicates that in certain application scenarios, heterogeneous image matching algorithms in the field of remote sensing indeed offer considerable advantages for UAV positioning.
5. Conclusions
Multi-source image matching algorithms are crucial for UAV visual positioning. For this purpose, we have developed a comprehensive evaluation framework and created a large dataset comprising 1909 pairs of UAV and satellite images from three regions. Our study also introduced a descriptor performance metric, the Feature Distance Ratio (FDR), which effectively quantifies the capabilities of different descriptors. Although the RANSAC method reduces most mismatches, occasional mismatches can be fatal for UAV visual positioning. Therefore, we proposed a novel Geographic Geometric Consistency (GGC) method that effectively identifies mismatches in RANSAC results. Based on this, we developed a distance threshold iteration (DTI) method that optimizes the balance between the Match Success Rate (MSR), RMSE, and Execution Time Ratio (ETR), significantly enhancing UAV visual positioning performance.
Furthermore, we evaluated the combined effects of different core components. Although the combination did not achieve the expected results due to high inter-component coupling within the algorithms, combining different algorithms as a whole significantly enhanced performance. With sufficient computational resources, such algorithm combinations have significant potential applications in the field of UAV visual positioning. We also compared the multi-source image matching algorithms with existing research on UAV positioning and the results demonstrated that these algorithms achieved sub-meter-level positioning accuracy. To promote the application of multi-source image matching technologies in UAV visual positioning and to encourage further research by other researchers, we have made all datasets, results, and codes publicly available at the following URL:
https://github.com/LJL-UAV/UAV-Positioning (accessed on 14 August 2024).