Lightweight Single-Stage Ship Object Detection Algorithm for Unmanned Surface Vessels Based on Improved YOLOv5
Abstract
:1. Introduction
- To address the high number of parameters in the backbone of the YOLOv5s network, we introduce an improved ShuffleNetV2 network as the backbone. This modification attempts to maintain the model’s feature extraction capability while reducing the parameter count.
- We designed a split-DLKA attention module that leverages the ability of large kernels to expand the receptive field and the capacity of deformable convolutions to adjust to the convolution kernel shape adaptively. This enhances the network’s adaptability to samples of different shapes, thereby making it suitable for detecting vessels of varying sizes in maritime scenarios.
- By incorporating the WIOUv3 loss function into the network, the impact of low-quality samples on the model is reduced, which results in improved detection accuracy.
2. Related Work
2.1. Attention Mechanism in Ship Detection
2.2. Attention Mechanism Combined with Other Improvements
2.3. Data Enhancement Prevent Overfitting
2.4. Improvement in Lightweight
3. Introduction to YOLOv5s Algorithm
- -
- Input: The input layer processes raw images, resizing them to a standard dimension and normalizing the pixel values. This process prepares the data for further processing by the model.
- -
- Backbone: YOLOv5′s backbone is based on the cross-stage partial network (CSPNet) architecture. The main purpose of the backbone in YOLOv5 is to extract features from the input image through a series of convolutional layers. CSPNet helps enhance the model’s learning capability and reduces computational complexity by partitioning the feature map of the base layer into two parts and then merging them through a cross-stage hierarchy.
- -
- Neck: The neck network of YOLOv5 employs the path aggregation network (PANet) as its feature fusion module. PANet enhances feature fusion through both top–down and bottom–up pathways, which enables the network to fully leverage feature information from different levels. This design ensures effective utilization of multi-scale features, which improves the detection of objects of various sizes and shapes.
- -
- Head: The head of YOLOv5 contains three detection heads, which can predict objects at multiple scales simultaneously, thereby improving both accuracy and efficiency. Note that YOLOv5 uses CIOU loss as the bounding box loss function and weighted non-maximum suppression (NMS). CIOU loss improves bounding box regression by considering the overlap area, center point distance, and aspect ratio, and the weighted NMS increases the suppression weight of high-confidence bounding boxes, thereby helping to obtain more reliable results.
4. Proposed Method
4.1. Backbone
4.1.1. ShufflenetV2 Backbone
4.1.2. Activation Function
4.2. Attention Split-DLKA Module
4.3. Loss Function
5. Experiment
5.1. Experimental Dataset
5.2. Experimental Platform and Parameter Settings
5.3. Evaluation Metrics
5.4. Experimental Results
5.5. Comparison with Other Algorithms
5.6. Ablation Experiment
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Yuan, J.; Cai, Z.; Wang, S.; Kong, X. A Multitype Feature Perception and Refined Network for Spaceborne Infrared Ship Detection. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4100311. [Google Scholar] [CrossRef]
- Iwin Thanakumar Joseph, S.; Sasikala, J.; Sujitha Juliet, D. Ship detection and recognition for offshore and inshore applications: A survey. Int. J. Intell. Unmanned Syst. 2019, 7, 177–188. [Google Scholar]
- Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; pp. 886–893. [Google Scholar]
- Felzenszwalb, P.F.; Girshick, R.B.; McAllester, D.; Ramanan, D. Object Detection with Discriminatively Trained Part-Based Models. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 1627–1645. [Google Scholar] [CrossRef] [PubMed]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, IEEE Computer Society, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.E.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the European Conference on Computer Vision, Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
- Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-Time End-to-End Object Detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]
- Shen, X.; Wang, H.; Cui, T.; Guo, Z.; Fu, X. Multiple information perception-based attention in YOLO for underwater object detection. Vis. Comput. 2024, 40, 1415–1438. [Google Scholar] [CrossRef]
- Liu, D.; Zhang, Y.; Zhao, Y.; Shi, Z.; Zhang, J.; Zhang, Y.; Ling, F.; Zhang, Y. AARN: Anchor-guided attention refinement network for inshore ship detection. IET Image Process. 2023, 17, 2225–2237. [Google Scholar] [CrossRef]
- Guo, Y.; Yu, H.; Ma, L.; Zeng, L.; Luo, X. THFE: A Triple-hierarchy Feature Enhancement method for tiny boat detection. Eng. App.l. Artif. Intell. 2023, 123, 106271. [Google Scholar] [CrossRef]
- Yao, T.; Zhang, B.; Gao, Y.; Ren, Y.; Wang, Z. A Feature Enhanced Scale-adaptive Convolutional Network for Ship Detection in Maritime Surveillance*. In Proceedings of the 2023 International Conference on Digital Image Computing: Techniques and Applications (DICTA), Port Macquarie, Australia, 28 November–1 December 2023; pp. 97–104. [Google Scholar]
- Li, J.; Li, G.; Jiang, H.; Guo, W.; Gong, C. An Efficient Enhanced-YOLOv5 Algorithm for Multi-scale Ship Detection. In Proceedings of the Neural Information Processing: 30th International Conference, ICONIP 2023, Changsha, China, 20–23 November 2023; Proceedings, Part VI. Springer: Changsha, China, 2023; pp. 252–263. [Google Scholar]
- Li, Y.; Yuan, H.; Wang, Y.; Xiao, C. GGT-YOLO: A Novel Object Detection Algorithm for Drone-Based Maritime Cruising. Drones 2022, 6, 335. [Google Scholar] [CrossRef]
- Zheng, J.; Liu, Y. A Study on Small-Scale Ship Detection Based on Attention Mechanism. IEEE Access 2022, 10, 77940–77949. [Google Scholar] [CrossRef]
- Li, H.; Deng, L.; Yang, C.; Liu, J.; Gu, Z. Enhanced YOLO v3 Tiny Network for Real-Time Ship Detection from Visual Image. IEEE Access 2021, 9, 16692–16706. [Google Scholar] [CrossRef]
- Zhao, X.; Song, Y.; Shi, S.; Li, S. Improving YOLOv5n for lightweight ship target detection. In Proceedings of the 2023 IEEE 3rd International Conference on Computer Systems (ICCS), Qingdao, China, 22–24 September 2023; pp. 110–115. [Google Scholar]
- Lv, J.; Chen, J.; Huang, Z.; Wan, H.; Zhou, C.; Wang, D.; Wu, B.; Sun, L. An Anchor-Free Detection Algorithm for SAR Ship Targets with Deep Saliency Representation. Remote Sens. 2022, 15, 103. [Google Scholar] [CrossRef]
- Ye, Y.; Zhen, R.; Shao, Z.; Pan, J.; Lin, Y. A Novel Intelligent Ship Detection Method Based on Attention Mechanism Feature Enhancement. J. Mar. Sci. Eng. 2023, 11, 625. [Google Scholar] [CrossRef]
- Xing, B.; Wang, W.; Qian, J.; Pan, C.; Le, Q. A Lightweight Model for Real-Time Monitoring of Ships. Electronics 2023, 12, 3804. [Google Scholar] [CrossRef]
- Zhang, Q.; Huang, Y.; Song, R. A Ship Detection Model Based on YOLOX with Lightweight Adaptive Channel Feature Fusion and Sparse Data Augmentation. In Proceedings of the 2022 18th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Madrid, Spain, 29 November–2 December 2022; pp. 1–8. [Google Scholar]
- Gao, Z.; Zhang, Y.; Wang, S. Lightweight Small Ship Detection Algorithm Combined with Infrared Characteristic Analysis for Autonomous Navigation. J. Mar. Sci. Eng. 2023, 11, 1114. [Google Scholar] [CrossRef]
- Chen, H.; Xue, J.; Wen, H.; Hu, Y.; Zhang, Y. EfficientShip: A Hybrid Deep Learning Framework for Ship Detection in the River. CMES-Comput. Model. Eng. Sci. 2024, 138, 301. [Google Scholar] [CrossRef]
- Qiu, X.; Han, F.; Zhao, W. Anti-Attention Mechanism: A Module for Channel Correction in Ship Detection. In Proceedings of the 2023 IEEE International Conference on Electrical, Automation and Computer Engineering (ICEACE), Changchun, China, 29–31 December 2023; pp. 498–504. [Google Scholar]
- Li, Z.; Deng, Z.; Hao, K.; Zhao, X.; Jin, Z. A Ship Detection Model Based on Dynamic Convolution and an Adaptive Fusion Network for Complex Maritime Conditions. Sensors 2024, 24, 859. [Google Scholar] [CrossRef]
- Zheng, J.; Zhao, S.; Xu, Z.; Zhang, L.; Liu, J. Anchor boxes adaptive optimization algorithm for maritime object detection in video surveillance. Front. Mar. Sci 2023, 10, 1290931. [Google Scholar] [CrossRef]
- Shi, H.; Hu, Y.; Zhang, H. An Improved YOLOX Loss Function Applied to Maritime Video Surveillance. In Proceedings of the 2023 8th International Conference on Image, Vision and Computing (ICIVC), Dalian, China, 27–29 July 2023; pp. 633–638. [Google Scholar]
- Zhang, L.; Du, X.; Zhang, R.; Zhang, J. A Lightweight Detection Algorithm for Unmanned Surface Vehicles Based on Multi-Scale Feature Fusion. J. Mar. Sci. Eng. 2023, 11, 1392. [Google Scholar] [CrossRef]
- Zheng, Y.; Zhang, Y.; Qian, L.; Zhang, X.; Diao, S.; Liu, X.; Cao, J.; Huang, H. A lightweight ship target detection model based on improved YOLOv5s algorithm. PLoS ONE 2023, 18, e0283932. [Google Scholar] [CrossRef] [PubMed]
- Ma, N.; Zhang, X.; Zheng, H.-T.; Sun, J. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design. In Proceedings of the Computer Vision—ECCV 2018, Cham, Switzerland, 8–14 September 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 122–138. [Google Scholar]
- Tong, Z.; Chen, Y.; Xu, Z.; Yu, R. Wise-IoU: Bounding Box Regression Loss with Dynamic Focusing Mechanism. arXiv 2023, arXiv:2301.10051. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2999–3007. [Google Scholar]
- Shao, Z.; Wu, W.; Wang, Z.; Du, W.; Li, C. SeaShips: A Large-Scale Precisely Annotated Dataset for Ship Detection. IEEE Trans. Multimed. 2018, 20, 2593–2604. [Google Scholar] [CrossRef]
- Wang, H.; Yao, M.; Jiang, G.; Mi, Z.; Fu, X. Graph-Collaborated Auto-Encoder Hashing for Multiview Binary Clustering. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 10121–10133. [Google Scholar] [CrossRef] [PubMed]
- Wang, H.; Yao, M.; Chen, Y.; Xu, Y.; Liu, H.; Jia, W.; Fu, X.; Wang, Y. Manifold-based Incomplete Multi-view Clustering via Bi-Consistency Guidance. IEEE Trans. Multimed. 2024, 3405650. [Google Scholar] [CrossRef]
Layer | Output Size | Kernel Size | Stride | Repeat | Channels |
---|---|---|---|---|---|
Input | 224 × 224 | 3 | |||
Conv1 | 112 × 112 | 3 × 3 | 2 | 1 | 24 |
MaxPool | 56 × 56 | 3 × 3 | 2 | 1 | 24 |
Stage 2 | 28 × 28 | 2 1 | 1 3 | 116 | |
Stage 3 | 14 × 14 | 2 1 | 1 3 | 232 | |
Stage 4 | 7 × 7 | 2 1 | 1 3 | 464 |
Projects | Environment |
---|---|
CPU | 12th Gen Intel(R) Core (TM) i7-12700F |
GPU | NVIDIA GeForce RTX 3060 |
RAM | 16 G |
CUDA | 12.1 |
Framework | Pytorch 2.1.2 |
Evaluation Metrics | YOLOv5s | Ours |
---|---|---|
Parameters | 7.04 M | 2.02 M |
FLOPs | 15.8 G | 6.5 G |
Weight size | 14.4 MB | 4.4 MB |
Precision | 0.84 | 0.89 |
Recall | 0.83 | 0.84 |
F1 | 0.841 | 0.864 |
[email protected] | 0.885 | 0.920 |
[email protected]:0.95 | 0.545 | 0.563 |
FPS (frame/s) | 212 | 192 |
Method | Preprocess | Inference | NMS |
---|---|---|---|
YOLOv5s | 0.3 ms | 3.7 ms | 0.7 ms |
ours | 0.3 ms | 4.1 ms | 0.8 ms |
Model | Parameters | FLOPs | Precision | Weight Size | Recall | F1 | [email protected] | [email protected]:0.95 | FPS |
---|---|---|---|---|---|---|---|---|---|
YOLOv5m | 20.89 M | 48.3 G | 0.879 | 42.2 MB | 0.88 | 0.879 | 0.944 | 0.602 | 119 |
YOLOv5l | 46.17 M | 108.3 G | 0.889 | 92.8 MB | 0.89 | 0.889 | 0.946 | 0.609 | 66 |
yolov4-tiny | 5.89 M | 6.8 G | 0.878 | 24.3 MB | 0.45 | 0.595 | 0.689 | 0.303 | 370 |
Nanodet-plus | 0.95 M | 1.2 G | 0.667 | 2.44 MB | 0.62 | 0.642 | 0.641 | 0.341 | 55 |
YOLOv5s-efficientnetv2 | 5.6 M | 5.6 G | 0.738 | 11.5 MB | 0.74 | 0.738 | 0.809 | 0.431 | 208 |
YOLOv9c | 25 M | 103.7 G | 0.98 | 51.8 MB | 0.97 | 0.974 | 0.99 | 0.82 | 58 |
ours | 2.02 M | 6.5 G | 0.89 | 4.4 MB | 0.84 | 0.864 | 0.920 | 0.563 | 192 |
Model | Parameters | FLOPs | Precision | Recall | [email protected] | [email protected]:0.95 |
---|---|---|---|---|---|---|
YOLOv5s | 7.04 M | 15.8 G | 0.84 | 0.83 | 0.885 | 0.545 |
YOLOv5s + shufflenetv2 | 1.8 M | 3.9 G | 0.79 | 0.82 | 0.877 | 0.504 |
YOLOv5s + shufflenetv2 (LeakeyReLu) | 1.8 M | 3.9 G | 0.85 | 0.79 | 0.888 | 0.508 |
LeakeyShuffle | Split-DLKA | WIOU | Parameters | FLOPs | Precision | Recall | [email protected] | [email protected]:0.95 | F1 |
---|---|---|---|---|---|---|---|---|---|
7.04 M | 15.8 G | 0.84 | 0.83 | 0.885 | 0.545 | 0.834 | |||
√ | 1.8 M | 3.9 G | 0.85 | 0.79 | 0.888 | 0.508 | 0.818 | ||
√ | 7.4 M | 21.2 G | 0.84 | 0.84 | 0.907 | 0.552 | 0.840 | ||
√ | 7.04 M | 15.8 G | 0.85 | 0.84 | 0.909 | 0.549 | 0.844 | ||
√ | √ | 2.02 M | 6.5 G | 0.85 | 0.84 | 0.905 | 0.537 | 0.844 | |
√ | √ | 1.8 M | 3.9 G | 0.82 | 0.79 | 0.855 | 0.480 | 0.804 | |
√ | √ | 7.4 M | 21.2 G | 0.86 | 0.82 | 0.919 | 0.529 | 0.839 | |
√ | √ | √ | 2.02 M | 6.5 G | 0.89 | 0.84 | 0.920 | 0.563 | 0.864 |
Attention Mechanism | Parameters | FLOPs | FPS | Precision | Recall | [email protected] | [email protected]:0.95 |
---|---|---|---|---|---|---|---|
SE | 7.2 M | 16.6 G | 204 | 0.83 | 0.84 | 0.900 | 0.545 |
CA | 7.2 M | 16.7 G | 204 | 0.82 | 0.89 | 0.896 | 0.519 |
ECA | 7.2 M | 16.6 G | 217 | 0.82 | 0.85 | 0.901 | 0.538 |
simAM | 7.2 M | 16.6 G | 204 | 0.84 | 0.83 | 0.896 | 0.529 |
Split-DLKA | 7.4 M | 21.2 G | 188 | 0.84 | 0.84 | 0.907 | 0.552 |
Position | Parameters | FLOPs | FPS | Precision | Recall | [email protected] | [email protected]:0.95 |
---|---|---|---|---|---|---|---|
Position 1 | 2.02 M | 6.5 G | 192 | 0.85 | 0.84 | 0.905 | 0.537 |
Position 2 | 2.2 M | 5.2 G | 121 | 0.80 | 0.79 | 0.870 | 0.497 |
Position 3 | 2.2 M | 5.2 G | 69 | 0.77 | 0.82 | 0.867 | 0.498 |
Precision | Recall | [email protected] | [email protected]:0.95 | ||
---|---|---|---|---|---|
2.5 | 2 | 0.84 | 0.82 | 0.891 | 0.539 |
1.9 | 3 | 0.85 | 0.85 | 0.912 | 0.560 |
1.6 | 4 | 0.83 | 0.85 | 0.901 | 0.555 |
1.4 | 5 | 0.83 | 0.84 | 0.894 | 0.545 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sun, H.; Zhang, W.; Yang, S.; Wang, H. Lightweight Single-Stage Ship Object Detection Algorithm for Unmanned Surface Vessels Based on Improved YOLOv5. Sensors 2024, 24, 5603. https://doi.org/10.3390/s24175603
Sun H, Zhang W, Yang S, Wang H. Lightweight Single-Stage Ship Object Detection Algorithm for Unmanned Surface Vessels Based on Improved YOLOv5. Sensors. 2024; 24(17):5603. https://doi.org/10.3390/s24175603
Chicago/Turabian StyleSun, Hui, Weizhe Zhang, Shu Yang, and Hongbo Wang. 2024. "Lightweight Single-Stage Ship Object Detection Algorithm for Unmanned Surface Vessels Based on Improved YOLOv5" Sensors 24, no. 17: 5603. https://doi.org/10.3390/s24175603
APA StyleSun, H., Zhang, W., Yang, S., & Wang, H. (2024). Lightweight Single-Stage Ship Object Detection Algorithm for Unmanned Surface Vessels Based on Improved YOLOv5. Sensors, 24(17), 5603. https://doi.org/10.3390/s24175603