A New Approach for Super Resolution Object Detection Using an Image Slicing Algorithm and the Segment Anything Model
Abstract
:1. Introduction
2. Related Work
3. Materials and Methods
3.1. Datasets
3.1.1. xView Dataset
3.1.2. VisDrone Dataset
3.2. Proposed Image Slicing Algorithm (ISA)
3.3. Segment Anything Model (SAM)
3.4. Proposed Super-Resolution Object Detection (SROD)
4. Experimental Results and Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Dasiopoulou, S.; Mezaris, V.; Kompatsiaris, I.; Papastathis, V.K.; Strintzis, M.G. Knowledge-assisted semantic video object detection. IEEE Trans. Circuits Syst. Video Technol. 2005, 15, 1210–1224. [Google Scholar] [CrossRef]
- Pesaresi, M.; Benediktsson, J.A. A new approach for the morphological segmentation of high-resolution satellite imagery. IEEE Trans. Geosci. Remote Sens. 2001, 39, 309–320. [Google Scholar] [CrossRef]
- Mansour, A.; Hussein, W.M.; Said, E. Small objects detection in satellite images using deep learning. In Proceedings of the 2019 Ninth International Conference on Intelligent Computing and Information Systems (ICICIS), Cairo, Egypt, 8–10 December 2019; pp. 86–91. [Google Scholar]
- Chen, W.; Li, Y.; Tian, Z.; Zhang, F. 2D and 3D object detection algorithms from images: A Survey. Array 2023, 19, 100305. [Google Scholar] [CrossRef]
- Yang, C.; Huang, Z.; Wang, N. Querydet: Cascaded sparse query for accelerating high-resolution small object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 13668–13677. [Google Scholar]
- Liu, W.; Ren, G.; Yu, R.; Guo, S.; Zhu, J.; Zhang, L. Image-adaptive YOLO for object detection in adverse weather conditions. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtually, 22 February–1 March 2022; Volume 36, pp. 1792–1800. [Google Scholar]
- Wan, D.; Lu, R.; Wang, S.; Shen, S.; Xu, T.; Lang, X. Yolo-hr: Improved yolov5 for object detection in high-resolution optical remote sensing images. Remote Sens. 2023, 15, 614. [Google Scholar] [CrossRef]
- Ming, Q.; Miao, L.; Zhou, Z.; Song, J.; Dong, Y.; Yang, X. Task interleaving and orientation estimation for high-precision oriented object detection in aerial images. ISPRS J. Photogramm. Remote. Sens. 2023, 196, 241–255. [Google Scholar] [CrossRef]
- Tian, Z.; Huang, J.; Yang, Y.; Nie, W. KCFS-YOLOv5: A high-precision detection method for object detection in aerial remote sensing images. Appl. Sci. 2023, 13, 649. [Google Scholar] [CrossRef]
- Fang, Y.; Yang, S.; Wang, S.; Ge, Y.; Shan, Y.; Wang, X. Unleashing vanilla vision transformer with masked image modeling for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 6244–6253. [Google Scholar]
- Bosquet, B.; Cores, D.; Seidenari, L.; Brea, V.M.; Mucientes, M.; Del Bimbo, A. A full data augmentation pipeline for small object detection based on generative adversarial networks. Pattern Recognit. 2023, 133, 108998. [Google Scholar] [CrossRef]
- Zhang, J.; Lei, J.; Xie, W.; Fang, Z.; Li, Y.; Du, Q. SuperYOLO: Super resolution assisted object detection in multimodal remote sensing imagery. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5605415. [Google Scholar] [CrossRef]
- Akyon, F.C.; Altinuc, S.O.; Temizel, A. Slicing aided hyper inference and fine-tuning for small object detection. In Proceedings of the 2022 IEEE International Conference on Image Processing (ICIP), Bordeaux, France, 16–19 October 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 966–970. [Google Scholar]
- Olamofe, J.; Dong, X.; Qian, L.; Shields, E. Performance Evaluation of Data Augmentation for Object Detection in XView Dataset. In Proceedings of the 2022 International Conference on Intelligent Data Science Technologies and Applications (IDSTA), San Antonio, TX, USA, 5–7 September 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 26–33. [Google Scholar]
- Shen, Y.; Liu, D.; Zhang, F.; Zhang, Q. Fast and accurate multi-class geospatial object detection with large-size remote sensing imagery using CNN and Truncated NMS. ISPRS J. Photogramm. Remote Sens. 2022, 191, 235–249. [Google Scholar] [CrossRef]
- Lin, J.; Lin, H.; Wang, F. STPM_SAHI: A Small-Target forest fire detection model based on Swin Transformer and Slicing Aided Hyper inference. Forests 2022, 13, 1603. [Google Scholar] [CrossRef]
- Shen, Y.; Liu, D.; Chen, J.; Wang, Z.; Wang, Z.; Zhang, Q. On-board multi-class geospatial object detection based on convolutional neural network for High Resolution Remote Sensing Images. Remote Sens. 2023, 15, 3963. [Google Scholar] [CrossRef]
- Pereira, A.; Santos, C.; Aguiar, M.; Welfer, D.; Dias, M.; Ribeiro, M. Improved Detection of Fundus Lesions Using YOLOR-CSP Architecture and Slicing Aided Hyper Inference. IEEE Lat. Am. Trans. 2023, 21, 806–813. [Google Scholar] [CrossRef]
- Akshatha, K.R.; Karunakar, A.K.; Shenoy, S.; Dhareshwar, C.V.; Johnson, D.G. Manipal-UAV person detection dataset: A step towards benchmarking dataset and algorithms for small object detection. ISPRS J. Photogramm. Remote. Sens. 2023, 195, 77–89. [Google Scholar]
- Wang, X.; He, N.; Hong, C.; Wang, Q.; Chen, M. Improved YOLOX-X based UAV aerial photography object detection algorithm. Image Vis. Comput. 2023, 135, 104697. [Google Scholar] [CrossRef]
- Zhang, H.; Hao, C.; Song, W.; Jiang, B.; Li, B. Adaptive slicing-aided hyper inference for small object detection in high-resolution remote sensing images. Remote. Sens. 2023, 15, 1249. [Google Scholar] [CrossRef]
- Muzammul, M.; Algarni, A.M.; Ghadi, Y.Y.; Assam, M. Enhancing UAV aerial image analysis: Integrating advanced SAHI techniques with real-time detection models on the VisDrone dataset. IEEE Access 2024, 12, 21621–21633. [Google Scholar] [CrossRef]
- Lam, D.; Kuzma, R.; McGee, K.; Dooley, S.; Laielli, M.; Klaric, M.; Bulatov, Y.; McCord, B. xview: Objects in context in overhead imagery. arXiv 2018, arXiv:1802.07856. [Google Scholar]
- Du, D.; Zhu, P.; Wen, L.; Bian, X.; Lin, H.; Hu, Q.; Peng, T.; Zheng, J.; Wang, X.; Zhang, Y.; et al. VisDrone-DET2019: The vision meets drone object detection in image challenge results. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea, 27–28 October 2019. [Google Scholar]
- Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.Y.; et al. Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 4015–4026. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Ke, L.; Ye, M.; Danelljan, M.; Tai, Y.W.; Tang, C.K.; Yu, F. Segment anything in high quality. In Proceedings of the Thirty-seventh Annual Conference on Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023; Volume 36. [Google Scholar]
- Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4681–4690. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26–30 June 2016; pp. 779–788. [Google Scholar]
- Jocher, G.; Stoken, A.; Borovec, J.; Changyu, L.; Hogan, A.; Diaconu, L.; Poznanski, J.; Yu, L.; Rai, P.; Ferriday, R.; et al. ultralytics/yolov5: v3. 0. Zenodo. 2020. Available online: https://ui.adsabs.harvard.edu/abs/2020zndo...3983579J/abstract (accessed on 29 June 2024).
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
- Reis, D.; Kupec, J.; Hong, J.; Daoudi, A. Real-time flying object detection with YOLOv8. arXiv 2023, arXiv:2305.09972. [Google Scholar]
- Wang, C.Y.; Liao, H.Y.M. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. arXiv 2024, arXiv:2402.13616. [Google Scholar]
- Li, W.; Zhang, L.; Wu, C.; Cui, Z.; Niu, C. A new lightweight deep neural network for surface scratch detection. Int. J. Adv. Manuf. Technol. 2022, 123, 1999–2015. [Google Scholar] [CrossRef] [PubMed]
Feature | Explanation |
---|---|
Scope | Various geographical regions and environmental conditions around the world |
Resolution | Very clear images where each pixel represents 30 cm |
Labeling | More than 1 million tagging |
Object Classes | More than 60 object classes |
Applications | Military, urban planning, disaster management, environmental monitoring |
Feature | Explanation |
---|---|
Scope and diversity | Images taken under various weather conditions, lighting conditions, and environmental conditions |
Image quality and resolution | High-resolution and detailed images |
Labeling | Hundreds of thousands of labeled examples for more than 10 object classes |
Tasks | Object detection, object tracking, and video analysis |
Applications | Security monitoring, traffic management, disaster response, and urban planning |
Models | Precision | Recall | mAP_0.5 | mAP_0.5:0.95 | F1-Score |
---|---|---|---|---|---|
ISA-YOLOv5 | 59.9 | 44.5 | 45.6 | 30.1 | 52.1 |
ISA-SAM-YOLOv5 | 60.5 | 46.3 | 48.1 | 33.1 | 53.2 |
ISA-SAM-SROD-YOLOv5 | 64.3 | 52.5 | 54.4 | 46.1 | 58.3 |
ISA-YOLOv7 | 42.4 | 25.2 | 28.3 | 17.4 | 35.9 |
ISA-SAM-YOLOv7 | 48.2 | 27.9 | 30.3 | 21.1 | 37.8 |
ISA-SAM-SROD-YOLOv7 | 52.9 | 32.5 | 35.7 | 23.6 | 42.9 |
ISA-YOLOv8 | 53.7 | 31.2 | 33.5 | 22.6 | 42.1 |
ISA-SAM-YOLOv8 | 54.0 | 35.3 | 36.2 | 24.1 | 44.0 |
ISA-SAM-SROD-YOLOv8 | 61.4 | 39.8 | 41.2 | 28.5 | 54.3 |
ISA-YOLOv9 | 45.3 | 27.1 | 27.9 | 17.1 | 35.9 |
ISA-SAM-YOLOv9 | 46.3 | 28.6 | 29.8 | 20.0 | 37.2 |
ISA-SAM-SROD-YOLOv9 | 50.5 | 30.9 | 33.6 | 21.9 | 40.7 |
Models | Precision | Recall | mAP_0.5 | mAP_0.5:0.95 | F1-Score |
---|---|---|---|---|---|
ISA-YOLOv5 | 65.5 | 55.3 | 53.1 | 39.9 | 59.7 |
ISA-SAM-YOLOv5 | 67.3 | 56.4 | 55.8 | 43.2 | 61.7 |
ISA-SAM-SROD-YOLOv5 | 71.9 | 65.5 | 67.7 | 49.1 | 68.3 |
ISA-YOLOv7 | 68.1 | 64.8 | 67.4 | 44.0 | 66.4 |
ISA-SAM-YOLOv7 | 76.3 | 68.5 | 71.8 | 47.9 | 72.4 |
ISA-SAM-SROD-YOLOv7 | 81.2 | 70.6 | 75.6 | 53.5 | 75.9 |
ISA-YOLOv8 | 74.9 | 64.9 | 66.5 | 52.8 | 69.7 |
ISA-SAM-YOLOv8 | 80.1 | 61.4 | 69.6 | 55.4 | 70.6 |
ISA-SAM-SROD-YOLOv8 | 80.4 | 74.4 | 77.5 | 63.8 | 77.9 |
ISA-YOLOv9 | 59.3 | 45.8 | 47.8 | 35.3 | 51.9 |
ISA-SAM-YOLOv9 | 67.1 | 54.8 | 59.4 | 42.7 | 60.7 |
ISA-SAM-SROD-YOLOv9 | 71.6 | 63.2 | 65.1 | 48.7 | 67.2 |
Model Name | Datasets | Number of Classes | mAP_0.5 | mAP_0.5:0.95 |
---|---|---|---|---|
Akyon et al. [13] | VisDrone | 10 | 66.4 | 42.2 |
Akshatha et al. [19] | VisDrone | 10 | 58.3 | |
Muzammul et al. [22] | VisDrone | 10 | 73.7 | 54.8 |
ISA-SAM-SROD-YOLOv8 | VisDrone | 10 | 77.5 | 63.8 |
Akyon et al. [13] | xView | 60 | 23.6 | 14.9 |
Olamofe et al. [14] | xView | 6 | 44.7 | * (mAR) 65.9 |
ISA-SAM-SROD-YOLOv5 | xView | 60 | 54.4 | 46.1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Telçeken, M.; Akgun, D.; Kacar, S.; Bingol, B. A New Approach for Super Resolution Object Detection Using an Image Slicing Algorithm and the Segment Anything Model. Sensors 2024, 24, 4526. https://doi.org/10.3390/s24144526
Telçeken M, Akgun D, Kacar S, Bingol B. A New Approach for Super Resolution Object Detection Using an Image Slicing Algorithm and the Segment Anything Model. Sensors. 2024; 24(14):4526. https://doi.org/10.3390/s24144526
Chicago/Turabian StyleTelçeken, Muhammed, Devrim Akgun, Sezgin Kacar, and Bunyamin Bingol. 2024. "A New Approach for Super Resolution Object Detection Using an Image Slicing Algorithm and the Segment Anything Model" Sensors 24, no. 14: 4526. https://doi.org/10.3390/s24144526
APA StyleTelçeken, M., Akgun, D., Kacar, S., & Bingol, B. (2024). A New Approach for Super Resolution Object Detection Using an Image Slicing Algorithm and the Segment Anything Model. Sensors, 24(14), 4526. https://doi.org/10.3390/s24144526