A Multi-Level Cross-Attention Image Registration Method for Visible and Infrared Small Unmanned Aerial Vehicle Targets via Image Style Transfer
Abstract
:1. Introduction
- Different from many current methods that focus on global image registration, this paper primarily investigated the registration of small UAV targets in localized regions. It analyzed the challenges associated with small UAVs in the registration task and proposed a robust cross-modality image registration framework specifically designed for small UAV targets.
- An innovative registration model was introduced, which transforms cross-modality images into single-modality ones through SPSTN and performs registration of single-modality images with MCARN. The model effectively maintains the structure of small UAVs after modal transformation and improves the extraction capability of important details for small UAVs.
- To validate the performance of the proposed method, the network was compared with several popular image registration networks. Experimental results, based on common evaluation metrics, demonstrated that our method outperforms other state-of-the-art methods.
2. Related Work
2.1. Cross-Modality Image Transformation
2.2. Cross-Modality Image Registration
- Low Spatial Coverage: The diminutive size of small UAV targets in imagery results in minimal pixel coverage, which restricts the availability of discernible feature information. This scarcity complicates the feature extraction process, often leading to insufficient clarity in details and a consequent difficulty in identifying robust features.
- High Variability in Viewing Geometry: The dynamic nature of UAV flight introduces substantial variability in viewing angles and elevations, which manifests as significant disparities in the perspective and geometric transformations of UAV targets within the image frame.
- Intense Environmental Interference: The occurrence of occlusions, shadows, and reflections within complex and mutable environmental contexts exerts a more pronounced influence on the registration of small UAV targets compared to other image registration scenarios, potentially leading to greater registration challenges.
3. The Cross-Modality Image Registration Network
3.1. Structure Preservation and Style Transformation Network
3.2. Cross-Attention Residual Registration Network
4. Experiments and Results
4.1. Dataset Description
4.2. Implement Details and Metrics
4.3. Performance Analysis
4.3.1. Experiments of Modality Transformation Network
4.3.2. Experiments of Cross-Modality Registration Network
4.4. Ablation Study and Analysis
4.4.1. Ablation Experiments of the SPSTN Module
4.4.2. Ablation Experiments of the MCARN Module
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Zitová, B.; Flusser, J. Image registration methods: A survey. Image Vis. Comput. 2003, 21, 977–1000. [Google Scholar] [CrossRef]
- Li, N.; Li, Y.; Jiao, J. Multimodal remote sensing image registration based on adaptive multi-scale PIIFD. Multimed. Tools Appl. 2024, 1–13. [Google Scholar] [CrossRef]
- Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Li, J.; Hu, Q.; Ai, M. RIFT: Multi-modal image matching based on radiation-invariant feature transform. IEEE Trans. Image Process. 2020, 29, 3296–3310. [Google Scholar] [CrossRef] [PubMed]
- Xiang, Y.; Wang, F.; You, H. OS-SIFT: A Robust SIFT-like algorithm for high-resolution optical-to-SAR Image registration in suburban areas. IEEE Trans. Geosci. Remote Sens. 2018, 56, 3078–3090. [Google Scholar] [CrossRef]
- Cui, S.; Ma, A.; Wan, Y.; Zhong, Y.; Luo, B.; Xu, M. Cross-Modality Image Matching Network with Modality-Invariant Feature Representation for Airborne-Ground Thermal Infrared and Visible Datasets. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–14. [Google Scholar] [CrossRef]
- Deng, Y.; Ma, J. ReDFeat: Recoupling Detection and Description for Cross-Modal Feature Learning. IEEE Trans. Image Process. 2023, 32, 591–602. [Google Scholar] [CrossRef] [PubMed]
- Tang, H.; Yuan, C.; Li, Z.; Tang, J. Learning attention-guided pyramidal features for few-shot fine-grained recognition. Pattern Recognit. 2022, 130, 108792. [Google Scholar] [CrossRef]
- Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
- Luo, Y.; Cha, H.; Zuo, L.; Cheng, P.; Zhao, Q. General cross-modality registration framework for visible and infrared UAV target image registration. Sci. Rep. 2023, 13, 12941. [Google Scholar] [CrossRef]
- Xu, H.; Yuan, J.; Ma, J. MURF: Mutually Reinforcing Multi-modal Image Registration and Fusion. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 12148–12166. [Google Scholar] [CrossRef]
- Wang, D.; Liu, J.; Fan, X.; Liu, R. Unsupervised Misaligned Infrared and Visible Image Fusion via Cross-Modality Image Generation and Registration. arXiv 2022, arXiv:2205.11876. [Google Scholar]
- Haskins, G.; Kruger, U.; Yan, P. Deep learning in medical image registration: A survey. Mach. Vis. Appl. 2020, 31, 8. [Google Scholar] [CrossRef]
- Balakrishnan, G.; Zhao, A.; Sabuncu, M.R.; Guttag, J.; Dalca, A.V. An unsupervised learning model for deformable medical image registration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 9252–9260. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014, 27, 2672–2680. [Google Scholar]
- Isola, P.; Zhu, J.Y.; Zhou, T.; Efros, A.A. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1125–1134. [Google Scholar]
- Wolterink, J.M.; Dinkla, A.M.; Savenije, M.H.; Seevinck, P.R.; van den Berg, C.A.; Išgum, I. Deep MR to CT synthesis using unpaired data. In Simulation and Synthesis in Medical Imaging: Second International Workshop, SASHIMI 2017, Held in Conjunction with MICCAI 2017, Québec City, QC, Canada, 10 September 2017; Proceedings 2; Springer International Publishing: New York, NY, USA, 2017; pp. 14–23. [Google Scholar]
- Hu, Y.; Modat, M.; Gibson, E.; Li, W.; Ghavami, N.; Bonmati, E.; Wang, G.; Bandula, S.; Moore, C.M.; Emberton, M.; et al. Weakly-supervised convolutional neural networks for cross-modal image registration. Med. Image Anal. 2018, 49, 1–13. [Google Scholar] [CrossRef] [PubMed]
- Studholme, C.; Hill, D.; Hawkes, D. An overlap invariant entropy measure of 3D medical image alignment. Pattern Recognit. 1999, 32, 71–86. [Google Scholar] [CrossRef]
- Maes, F.; Collignon, A.; Vandermeulen, D.; Marchal, G.; Suetens, P. Cross-modality image registration by maximization of mutual information. IEEE Trans. Med. Imaging 1997, 16, 187–198. [Google Scholar] [CrossRef] [PubMed]
- Mattes, D.; Haynor, D.R.; Vesselle, H.; Lewellen, T.K.; Eubank, W. PET-CT image registration in the chest using free-form deformations. IEEE Trans. Med. Imaging 2003, 22, 120–128. [Google Scholar] [CrossRef] [PubMed]
- Wells, W.M.; Viola, P.; Atsumi, H.; Nakajima, S.; Kikinis, R. Multi-modal volume registration by maximization of mutual information. Med. Image Anal. 1996, 1, 35–51. [Google Scholar] [CrossRef]
- Myronenko, A.; Song, X. Intensity-based image registration by minimizing residual complexity. IEEE Trans. Med. Imaging 2010, 29, 1882–1891. [Google Scholar] [CrossRef]
- Rueckert, D.; Sonoda, L.I.; Hayes, C.; Hill, D.L.G.; Leach, M.O.; Hawkes, D.J. Nonrigid registration using free-form deformations: Application to breast MR images. IEEE Trans. Med. Imaging 1999, 18, 712–721. [Google Scholar] [CrossRef]
- Pluim, J.P.W.; Maintz, J.B.A.; Viergever, M.A. Mutual-information-based registration of medical images: A survey. IEEE Trans. Med. Imaging 2003, 22, 986–1004. [Google Scholar] [CrossRef] [PubMed]
- Reddy, B.S.; Chatterji, B.N. An FFT-based technique for translation, rotation, and scale-invariant image registration. IEEE Trans. Image Process. 1996, 5, 1266–1271. [Google Scholar] [CrossRef] [PubMed]
- Wei, Z.; Jung, C.; Su, C. RegiNet: Gradient guided multispectral image registration using convolutional neural networks. Neurocomputing 2020, 415, 193–200. [Google Scholar] [CrossRef]
- Arar, M.; Ginger, Y.; Danon, D.; Bermano, A.H.; Cohen-Or, D. Unsupervised multi-modal image registration via geometry preserving image-to-image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 13410–13419. [Google Scholar]
- Katharopoulos, A.; Vyas, A.; Pappas, N.; Fleuret, F. Transformers are RNNs: Fast autoregressive transformers with linear attention. In Proceedings of the International Conference on Machine Learning, Virtual Event, 13–18 July 2020. [Google Scholar]
Metrics | SSIM↑ | MI↑ | MSE↓ |
---|---|---|---|
CycleGAN | 0.5688 | 1.3319 | 94.0021 |
CPSTN | 0.5718 | 1.3573 | 93.8813 |
SPSTN | 0.5747 | 1.3919 | 93.6272 |
Method | NCC↑ | SSIM↑ | NMI↑ | MSE↓ |
---|---|---|---|---|
Misaligned Input | 0.7033 | 0.5747 | 0.1538 | 93.6272 |
NEMAR | 0.7183 | 0.4808 | 0.1599 | 93.5440 |
GCMR | 0.7175 | 0.5144 | 0.1927 | 92.9808 |
UMF-CMGR | 0.7079 | 0.5932 | 0.1894 | 92.7404 |
VoxelMorph | 0.6656 | 0.4691 | 0.1769 | 96.6120 |
Ours | 0.7256 | 0.5974 | 0.1967 | 91.8245 |
Metrics | SSIM↑ | MI↑ | MSE↓ |
---|---|---|---|
STN | 0.5750 | 1.3779 | 95.1424 |
SPSTN | 0.5911 | 1.4023 | 93.5502 |
Metrics | NCC↑ | SSIM↑ | NMI↑ | MSE↓ |
---|---|---|---|---|
Misaligned Input | 0.7033 | 0.5747 | 0.1538 | 93.6272 |
MRN | 0.7128 | 0.5932 | 0.1755 | 92.9808 |
MCARN | 0.7256 | 0.5974 | 0.1967 | 91.8245 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jiang, W.; Pan, H.; Wang, Y.; Li, Y.; Lin, Y.; Bi, F. A Multi-Level Cross-Attention Image Registration Method for Visible and Infrared Small Unmanned Aerial Vehicle Targets via Image Style Transfer. Remote Sens. 2024, 16, 2880. https://doi.org/10.3390/rs16162880
Jiang W, Pan H, Wang Y, Li Y, Lin Y, Bi F. A Multi-Level Cross-Attention Image Registration Method for Visible and Infrared Small Unmanned Aerial Vehicle Targets via Image Style Transfer. Remote Sensing. 2024; 16(16):2880. https://doi.org/10.3390/rs16162880
Chicago/Turabian StyleJiang, Wen, Hanxin Pan, Yanping Wang, Yang Li, Yun Lin, and Fukun Bi. 2024. "A Multi-Level Cross-Attention Image Registration Method for Visible and Infrared Small Unmanned Aerial Vehicle Targets via Image Style Transfer" Remote Sensing 16, no. 16: 2880. https://doi.org/10.3390/rs16162880