Deep Learning Techniques for Vehicle Detection and Classification from Images/Videos: A Survey
Abstract
:1. Introduction
- We survey the methodologies, benchmark datasets, loss and activation functions, and optimization algorithms used in vehicle identification and classification in deep learning.
- We survey the strategies for vehicle detection and classification studies in Deep Convolutional Neural Networks.
- We address the taxonomy of deep learning approaches and other functions in object detection and classification tasks (as shown in Figure 1).
- We present promising technological future directions and tasks in improving deep learning schemes for researchers.
2. Deep Learning Techniques
2.1. Techniques
2.1.1. Traditional Detection Methods
- Region selection
- Feature extraction, and
- Classification.
- Evaluating the edge and discretizing the image;
- Removing edge sharpness.
2.1.2. CNN-Based Two-Step Algorithms
- Produce a series of candidate frames or extract region proposals from the scene;
- Classify and regress the generated candidate frames to improve the architecture’s detection accuracy.
- Produce categorical-independent region proposals;
- Extract a fixed-length feature vector from each region proposal;
- Compute the confidence scores to classify the object classes using class-specific support vector machines;
- Predict the bounding-box regressor for accurate bounding-box predictions, once the object class has been classified.
- Generate a convolution feature by using various convolution and max-pooling layers on the entire image;
- Extract a fixed-length feature vector from the feature map for each object proposal of Region of Interest pooling layers;
- Feed each feature vector into a sequence of FC layers to generate softmax probability predictions over M object classes plus 1 background (). The other layer generates four real-valued n. Fast R-CNN utilizes a streamlined training process with a fine-tuning step that jointly optimizes a softmax classifier and Bbox regressors.
2.1.3. CNN-Based Single-Step Algorithms
3. Benchmark Datasets and Performance Evaluation Metrics
3.1. Benchmark Datasets
3.2. Performance Evaluation Metrics
4. Activation Functions in Deep Learning
4.1. Loss Function in Deep Learning
4.2. Classification Loss Functions in Deep Learning
4.3. Location Loss Functions in Deep Learning
5. Optimization Algorithms in Deep Learning
- The continual decay of throughout the training phase;
- The requirement for a manually selected global learning rate.
6. Application of DCNN for Vehicle Detection and Classification
6.1. Difficulties and Challenges
6.2. DL in Vehicle Detection
6.3. DL in Vehicle Classification
7. Future Directions
8. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
- The following abbreviations are used in this manuscript:
Adam Adaptive Momentum AI Artificial Intelligence BP Back-propagation CV Computer-vision DCNNs Deep Convolutional Neural Networks DL Deep Learning DNNs Deep Neural Networks EL Ensemble Learning FC Fully Connected GD Gradient Descent GPUs Graphic Processing Units HOG Histogram of Oriented Gradient ITS Intelligence Transportation System LBP Local Binary Pattern LR Learning Rate ML Machine Learning OAs Optimization Algorithms RCNNs Regional Convolutional Neural Networks SGD Stochastic Gradient Descent TL Transfer Learning
Appendix A
Appendix A.1
Reference | Dataset Used | Network | Findings |
---|---|---|---|
Sang et al. [139] | BIT-vehicle dataset. Training 7880. Validation 1970. Testing (CompCar dataset) 800 | YOLOv2. Model-Comp YOLOv2-vehicle | YOLOv2-vehicle has higher precision and average IOU than YOLOv2 and model-Comp. Model-Comp has a higher average IOU than YOLOv2. |
Xu et al. [79] | COCO dataset | YOLOv3. improved YOLOv3. Faster RER-CNN. Modified YOLOv3 | The modified YOLOv3 has higher average precision than improved YOLOv3, YOLOv3, and Faster RER-CNN. |
Liu et al. [80] | DETRAC dataset | Faster RCNN. EB., BFEN. BFEN + 2FC. BFEN + SLPN. BFEN + SLPN + PNW. | The BFEN + SLPN + PNW has higher than Faster RCNN, EB, BFEN, BFEN + 2FC, and BFEN +SLPN. |
Mansour et al. [140] | From JF-2 and WORLD-VIEW satellites | Faster RCNN + Inceptionv2. SSD + Inceptionv2. | Faster RCNN with Inceptionv2 has higher mAP than SSD with Inceptionv2 but has a higher operation time than SSD with Inceptionv2. |
Sowmya et al. [141] | COCO test set PASCAL VOC 07 test set | ResNet101, VGG16 RCNN(Alex) RCNN (VGG16) SPPNet YOLOv4 + DA + TL. | YOLOv4 + DA + TL has higher mAP than ResNet101, VGG16, RCNN(Alex), RCNN(VGG16), and SPPNet. |
Nguyen [81] | KITTI test set LSVH test dataset. | Faster RCNN SSD MSCNN YOLO YOLOv2 improved Faster RCNN. | The improved Faster RCNN algorithm has higher AP than the original Faster RCNN, SSD, MSCNN, YOLO, and YOLOv2 on the KITTI test set. MS-CNN has higher AP than Improved Faster RCNN on the LSVH test set. |
Wang et al. [142] | DETRACT dataset. | Faster RCNN PN + FTN + Fusion PN + FTN + Concant PN + FTN + Fusion + Concant. | PN + FTN + Fusion + Concant has higher overall mAP than Faster RCNN, PN + FTN + Fusion. |
Nguyen [83] | KITTI benchmark. PASCAL VOC 07. | DPM, Fast RCNN Faster RCNN, YOLOv2 Faster RCNN with FPN backbone MS-CNN improved Faster RCNN, SINet Multitask CNN Faster RCNN with FPN + Improving RPN + multilayer enhancement module + adaptive RoI pooling. | Faster RCNN with FPN + Improving RPN + multilayer enhancement module + adaptive RoI pooling has higher AP than DPM, Fast RCNN, Faster RCNN, YOLOv2, Faster RCNN with SPP, Improved Faster RCNN, SINet and Multitask CNN on both datasets. |
Kim et al. [143] | DETRACT test set. | DPM, RCNN, ACF, Faster RCNN2 SA-FRCNN NANO CompACT, MSVD-SPP. | The MSVD-SPP has higher mAP than DPM, RCNN, ACF, Faster RCNN2, SA-FRCNN, NANO, and CompACT. |
Wang et al. [144] | KITTI test set. | YOLOv2, tiny YOLOv2, tiny YOLOv3. SPPNet-YOLOv3 | SPPNet-YOLOv3 has higher mean average precision than YOLOv2, Tiny YOLOv2, and Tiny YOLOv3. |
Appendix A.2
Reference | Dataset Used | Network | Findings |
---|---|---|---|
Manugmai and Nuthong. [127] | Own dataset. Training = 686. Testing = 228 | CNN. Decision tree Random forest DNN (Densely) | The CNN architecture has higher classification accuracy than DNN (densely), Decision tree, and random forest. |
Wang et al. [128] | Caltech256 dataset | VGG-s. VGG-verydeep-16. CS-CNN. | The CS-CNN has higher accuracy than VGG-s and VGG-very deep-16. |
Jahan et al. [130] | own dataset. Training = 2240. Testing = 560 | YOLOv3. improved YOLOv3. Faster RER-CNN. Modified YOLOv3 | The modified YOLOv3 has higher average precision than improved YOLOv3, YOLOv3, and Faster RER-CNN. |
Lee and Chung [131] | MIO-CTD dataset | AlexNet ResNet18 GoogleNet ensemble learning (AlexNet + ResNet18 + GoogleNet). | The ensemble model of AlexNet, ResNet18, and GoogleNet have lower error rates than the benchmark models. |
Liu et al. [132] | MIO-CTD dataset | ResNet50 ResNet50-BS ResNet101 ResNet101-BS ResNet152 ResNet152-BS DCEM DCEM-BS. | The DCEM-BS has higher precision than ResNet50, ResNet50-BS, ResNet101, ResNet101-BS, ResNet152, ResNet152-BS and DCEM. ResNet152-BS has higher mean recall than ResNet50, ResNet50-BS, ResNet101, ResNet101-BS, ResNet152, and DCEM. |
Liu et al. [80] | MIO-CTD dataset | ResNet50 ResNet101 ResNet152 Inceptionv4 Inceptionv3 GEM-OE. GEM-AP. | GEM-AP has higher precision than baseline networks and GEM-OE. GEM-OE has higher precision than baseline architectures. |
Jagannathan et al. [133] | MIO-CTD dataset BIT-vehicle dataset | GAN-based deep ensemble approach tiny YOLO with SVM semi-supervised CNN PCN with Softmax TC-SF-CNNLS. | Ensemble deep learning approach has higher recall than tiny YOLO with SVM, semi-supervised CNN, PCN with Softmax, and TC-SF-CNNLS. TC-SF-CNNLS has higher recall than tiny YOLO with SVM, semi-supervised CNN, and PCN with Softmax. |
References
- Szeliski, R. Computer Vision: Algorithms and Applications; Springer Nature: Berlin, Germany, 2022. [Google Scholar]
- Hassaballah, M.; Hosny, K.M. Recent advances in computer vision. Stud. Comput. Intell. 2019, 804, 1–84. [Google Scholar]
- Javaid, S.; Zeadally, S.; Fahim, H.; He, B. Medical Sensors and Their Integration in Wireless Body Area Networks for Pervasive Healthcare Delivery: A Review. IEEE Sens. J. 2022, 22, 3860–3877. [Google Scholar] [CrossRef]
- Berwo, M.A.; Fang, Y.; Mahmood, J.; Retta, E.A. Automotive engine cylinder head crack detection: Canny edge detection with morphological dilation. In Proceedings of the 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Tokyo, Japan, 14–17 December 2021; pp. 1519–1527. [Google Scholar]
- Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; Volume 1, pp. 886–893. [Google Scholar]
- Mita, T.; Kaneko, T.; Hori, O. Joint haar-like features for face detection. In Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV’05), Beijing, China, 17–21 October 2005; Volume 2, pp. 1619–1626. [Google Scholar]
- Zhang, G.; Huang, X.; Li, S.Z.; Wang, Y.; Wu, X. Boosting local binary pattern (LBP)-based face recognition. In Proceedings of the Chinese Conference on Biometric Recognition, Guangzhou, China, 13–14 December 2004; Springer: Berlin/Heidelberg, Germany, 2004; pp. 179–186. [Google Scholar]
- Javaid, S.; Saeed, N.; Qadir, Z.; Fahim, H.; He, B.; Song, H.; Bilal, M. Communication and Control in Collaborative UAVs: Recent Advances and Future Trends. IEEE Trans. Intell. Transp. Syst. 2023, 1–21. [Google Scholar] [CrossRef]
- Fahim, H.; Li, W.; Javaid, S.; Sadiq Fareed, M.M.; Ahmed, G.; Khattak, M.K. Fuzzy Logic and Bio-Inspired Firefly Algorithm Based Routing Scheme in Intrabody Nanonetworks. Sensors 2019, 19, 5526. [Google Scholar] [CrossRef] [PubMed]
- Javaid, S.; Fahim, H.; Zeadally, S.; He, B. Self-powered Sensors: Applications, Challenges, and Solutions. IEEE Sens. J. 2023, 1. [Google Scholar] [CrossRef]
- Wen, X.; Zheng, Y. An improved algorithm based on AdaBoost for vehicle recognition. In Proceedings of the 2nd International Conference on Information Science and Engineering, Wuhan, China, 25–26 December 2010; pp. 981–984. [Google Scholar]
- Broggi, A.; Cardarelli, E.; Cattani, S.; Medici, P.; Sabbatelli, M. Vehicle detection for autonomous parking using a soft-cascade AdaBoost classifier. In Proceedings of the 2014 IEEE Intelligent Vehicles Symposium Proceedings, Ypsilanti, MI, USA, 8–11 June 2014; pp. 912–917. [Google Scholar]
- Tang, Y.; Zhang, C.; Gu, R.; Li, P.; Yang, B. Vehicle detection and recognition for intelligent traffic surveillance system. Multimed. Tools Appl. 2017, 76, 5817–5832. [Google Scholar] [CrossRef]
- Ali, A.M.; Eltarhouni, W.I.; Bozed, K.A. On-Road Vehicle Detection using Support Vector Machine and Decision Tree Classifications. In Proceedings of the 6th International Conference on Engineering & MIS 2020, Istanbul, Turkey, 4–6 July 2020; pp. 1–5. [Google Scholar]
- Javaid, S.; Wu, Z.; Fahim, H.; Fareed, M.M.S.; Javed, F. Exploiting Temporal Correlation Mechanism for Designing Temperature-Aware Energy-Efficient Routing Protocol for Intrabody Nanonetworks. IEEE Access 2020, 8, 75906–75924. [Google Scholar] [CrossRef]
- Wei, Y.; Tian, Q.; Guo, J.; Huang, W.; Cao, J. Multi-vehicle detection algorithm through combining Harr and HOG features. Math. Comput. Simul. 2019, 155, 130–145. [Google Scholar] [CrossRef]
- Shobha, B.; Deepu, R. A review on video based vehicle detection, recognition and tracking. In Proceedings of the 2018 3rd International Conference on Computational Systems and Information Technology for Sustainable Solutions (CSITSS), Bengaluru, India, 20–22 December 2018; pp. 183–186. [Google Scholar]
- Ren, H.; Li, Z.N. Object detection using generalization and efficiency balanced co-occurrence features. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 46–54. [Google Scholar]
- Sun, Z.; Bebis, G.; Miller, R. On-road vehicle detection: A review. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 694–711. [Google Scholar]
- Ren, H. Boosted Object Detection Based on Local Features. Ph.D. Thesis, Applied Sciences, School of Computing Science, Burnaby, BC, Canada, 2016. [Google Scholar]
- Neumann, D.; Langner, T.; Ulbrich, F.; Spitta, D.; Goehring, D. Online vehicle detection using Haar-like, LBP and HOG feature based image classifiers with stereo vision preselection. In Proceedings of the 2017 IEEE Intelligent Vehicles Symposium (IV), Los Angeles, CA, USA, 11–14 June 2017; pp. 773–778. [Google Scholar]
- Wang, Z.; Zhan, J.; Duan, C.; Guan, X.; Yang, K. Vehicle detection in severe weather based on pseudo-visual search and HOG–LBP feature fusion. Proc. Inst. Mech. Eng. Part J. Automob. Eng. 2022, 7, 1607–1618. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Uijlings, J.R.; Van De Sande, K.E.; Gevers, T.; Smeulders, A.W. Selective search for object recognition. Int. J. Comput. Vis. 2013, 104, 154–171. [Google Scholar] [CrossRef]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28, 1137–1149. [Google Scholar] [CrossRef]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Dai, J.; Li, Y.; He, K.; Sun, J. R-fcn: Object detection via region-based fully convolutional networks. Adv. Neural Inf. Process. Syst. 2016, 29. Available online: https://proceedings.neurips.cc/paper_files/paper/2016/file/577ef1154f3240ad5b9b413aa7346a1e-Paper.pdf (accessed on 25 April 2023).
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; Springer: Berlin/Heidelberg, Germany, 2014; pp. 740–755. [Google Scholar]
- Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
- Pal, S.K.; Pramanik, A.; Maiti, J.; Mitra, P. Deep learning in multi-object detection and tracking: State of the art. Appl. Intell. 2021, 51, 6400–6429. [Google Scholar] [CrossRef]
- Wang, H.; Yu, Y.; Cai, Y.; Chen, X.; Chen, L.; Liu, Q. A comparative study of state-of-the-art deep learning algorithms for vehicle detection. IEEE Intell. Transp. Syst. Mag. 2019, 11, 82–95. [Google Scholar] [CrossRef]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 779–788. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Wen, H.; Dai, F. A Study of YOLO Algorithm for Multi-target Detection. J. Adv. Artif. Life Robot. 2021, 2, 70–73. [Google Scholar]
- Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A.A. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning, PMLR, Lille, France, 6 July–1 July 2015; pp. 448–456. [Google Scholar]
- Neubeck, A.; Van Gool, L. Efficient non-maximum suppression. In Proceedings of the 18th International Conference on Pattern Recognition (ICPR’06), Hong Kong, China, 20–24 August 2006; Volume 3, pp. 850–855. [Google Scholar]
- Tian, Z.; Shen, C.; Chen, H.; He, T. Fcos: Fully convolutional one-stage object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9627–9636. [Google Scholar]
- Yang, G.; Feng, W.; Jin, J.; Lei, Q.; Li, X.; Gui, G.; Wang, W. Face mask recognition system with YOLOV5 based on image recognition. In Proceedings of the 2020 IEEE 6th International Conference on Computer and Communications (ICCC), Chengdu, China, 11–14 December 2020; pp. 1398–1404. [Google Scholar]
- Javaid, S.; Wu, Z.; Hamid, Z.; Zeadally, S.; Fahim, H. Temperature-aware routing protocol for Intrabody Nanonetworks. J. Netw. Comput. Appl. 2021, 183–184, 103057. [Google Scholar] [CrossRef]
- Song, X.; Gu, W. Multi-objective real-time vehicle detection method based on yolov5. In Proceedings of the 2021 International Symposium on Artificial Intelligence and its Application on Media (ISAIAM), Xi’an, China, 21–23 May 2021; pp. 142–145. [Google Scholar]
- Snegireva, D.; Kataev, G. Vehicle Classification Application on Video Using Yolov5 Architecture. In Proceedings of the 2021 International Russian Automation Conference (RusAutoCon), Sochi, Russia, 5–11 September 2021; pp. 1008–1013. [Google Scholar]
- Berwo, M.A.; Wang, Z.; Fang, Y.; Mahmood, J.; Yang, N. Off-road Quad-Bike Detection Using CNN Models. In Proceedings of the Journal of Physics: Conference Series, Nanjing, China, 25–27 November 2022; IOP Publishing: Bristol, UK, 2022; Volume 2356, p. 012026. [Google Scholar]
- Jin, X.; Li, Z.; Yang, H. Pedestrian Detection with YOLOv5 in Autonomous Driving Scenario. In Proceedings of the 2021 5th CAA International Conference on Vehicular Control and Intelligence (CVCI), Tianjin, China, 29–31 October 2021; pp. 1–5. [Google Scholar]
- Li, Y.; He, X. COVID-19 Detection in Chest Radiograph Based on YOLO v5. In Proceedings of the 2021 IEEE International Conference on Computer Science, Electronic Information Engineering and Intelligent Control Technology (CEI), Fuzhou, China, 24–26 September 2021; pp. 344–347. [Google Scholar]
- Berwo, M.A.; Fang, Y.; Mahmood, J.; Yang, N.; Liu, Z.; Li, Y. FAECCD-CNet: Fast Automotive Engine Components Crack Detection and Classification Using ConvNet on Images. Appl. Sci. 2022, 12, 9713. [Google Scholar] [CrossRef]
- Kausar, A.; Jamil, A.; Nida, N.; Yousaf, M.H. Two-wheeled vehicle detection using two-step and single-step deep learning models. Arab. J. Sci. Eng. 2020, 45, 10755–10773. [Google Scholar] [CrossRef]
- Vasavi, S.; Priyadarshini, N.K.; Harshavaradhan, K. Invariant feature-based darknet architecture for moving object classification. IEEE Sens. J. 2020, 21, 11417–11426. [Google Scholar] [CrossRef]
- Li, Q.; Garg, S.; Nie, J.; Li, X.; Liu, R.W.; Cao, Z.; Hossain, M.S. A highly efficient vehicle taillight detection approach based on deep learning. IEEE Trans. Intell. Transp. Syst. 2020, 22, 4716–4726. [Google Scholar] [CrossRef]
- Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for autonomous driving? The kitti vision benchmark suite. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 3354–3361. [Google Scholar]
- Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision meets robotics: The kitti dataset. Int. J. Robot. Res. 2013, 32, 1231–1237. [Google Scholar] [CrossRef]
- Alvarez, J.M.; Gevers, T.; LeCun, Y.; Lopez, A.M. Road scene segmentation from a single image. In Proceedings of the European Conference on Computer Vision, Florence, Italy, 7–13 October 2012; Springer: Berlin/Heidelberg, Germany, 2012; pp. 376–389. [Google Scholar]
- Ros, G.; Alvarez, J.M. Unsupervised image transformation for outdoor semantic labelling. In Proceedings of the 2015 IEEE Intelligent Vehicles Symposium (IV), Seoul, Republic of Korea, 28 June–1 July 2015; pp. 537–542. [Google Scholar]
- Zhang, R.; Candra, S.A.; Vetter, K.; Zakhor, A. Sensor fusion for semantic segmentation of urban scenes. In Proceedings of the 2015 IEEE International Conference on Robotics and Automation (ICRA), Seattle, WA, USA, 26–30 May 2015; pp. 1850–1857. [Google Scholar]
- Ros, G.; Ramos, S.; Granados, M.; Bakhtiary, A.; Vazquez, D.; Lopez, A.M. Vision-based offline-online perception paradigm for autonomous driving. In Proceedings of the 2015 IEEE Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 5–9 January 2015; pp. 231–238. [Google Scholar]
- Krause, J.; Stark, M.; Deng, J.; Fei-Fei, L. 3D Object Representations for Finet-Grained Categorization. In Proceedings of the 4th International IEEE Workshop on 3D Representation and Recognition (3dRR-13), Sydney, Australia, 8 December 2013. [Google Scholar]
- Espinosa, J.E.; Velastin, S.A.; Branch, J.W. Motorcycle detection and classification in urban Scenarios using a model based on Faster R-CNN. arXiv 2018, arXiv:1808.02299. [Google Scholar]
- Zhu, Z.; Liang, D.; Zhang, S.; Huang, X.; Li, B.; Hu, S. Traffic-sign detection and classification in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 2110–2118. [Google Scholar]
- Li, X.; Flohr, F.; Yang, Y.; Xiong, H.; Braun, M.; Pan, S.; Li, K.; Gavrila, D.M. A new benchmark for vision-based cyclist detection. In Proceedings of the 2016 IEEE Intelligent Vehicles Symposium (IV), Gotenburg, Sweden, 19–22 June 2016; pp. 1028–1033. [Google Scholar]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 3213–3223. [Google Scholar]
- Guerrero-Gómez-Olmedo, R.; López-Sastre, R.J.; Maldonado-Bascón, S.; Fernández-Caballero, A. Vehicle tracking by simultaneous detection and viewpoint estimation. In Proceedings of the International Work-Conference on the Interplay Between Natural and Artificial Computation, Mallorca, Spain, 10–14 June 2013; Springer: Berlin/Heidelberg, Germany, 2013; pp. 306–316. [Google Scholar]
- Luo, Z.; Branchaud-Charron, F.; Lemaire, C.; Konrad, J.; Li, S.; Mishra, A.; Achkar, A.; Eichel, J.; Jodoin, P.M. MIO-TCD: A new benchmark dataset for vehicle classification and localization. IEEE Trans. Image Process. 2018, 27, 5129–5141. [Google Scholar]
- Wen, L.; Du, D.; Cai, Z.; Lei, Z.; Chang, M.C.; Qi, H.; Lim, J.; Yang, M.H.; Lyu, S. UA-DETRAC: A new benchmark and protocol for multi-object detection and tracking. Comput. Vis. Image Underst. 2020, 193, 102907. [Google Scholar] [CrossRef]
- Hu, X.; Xu, X.; Xiao, Y.; Chen, H.; He, S.; Qin, J.; Heng, P.A. SINet: A scale-insensitive convolutional neural network for fast vehicle detection. IEEE Trans. Intell. Transp. Syst. 2018, 20, 1010–1019. [Google Scholar] [CrossRef]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Li, F.F.; Fergus, R.; Perona, P. One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 594–611. [Google Scholar]
- Griffin, G.; Holub, A.; Perona, P. Caltech-256 object category dataset. 2007. Available online: https://authors.library.caltech.edu/7694/?ref=https://githubhelp.com (accessed on 25 April 2023).
- Kenk, M.A.; Hassaballah, M. DAWN: Vehicle detection in adverse weather nature dataset. arXiv 2020, arXiv:2008.05402. [Google Scholar]
- Zuraimi, M.A.B.; Zaman, F.H.K. Vehicle Detection and Tracking using YOLO and DeepSORT. In Proceedings of the 2021 IEEE 11th IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE), Penang, Malaysia, 3–4 April 2021; pp. 23–29. [Google Scholar]
- Xu, B.; Wang, B.; Gu, Y. Vehicle detection in aerial images using modified yolo. In Proceedings of the 2019 IEEE 19th International Conference on Communication Technology (ICCT), Xi’an, China, 16–19 October 2019; pp. 1669–1672. [Google Scholar]
- Liu, W.; Liao, S.; Hu, W.; Liang, X.; Zhang, Y. Improving tiny vehicle detection in complex scenes. In Proceedings of the 2018 IEEE International Conference on Multimedia and Expo (ICME), San Diego, CA, USA, 23–27 July 2018; pp. 1–6. [Google Scholar]
- Nguyen, H. Improving faster R-CNN framework for fast vehicle detection. Math. Probl. Eng. 2019, 2019, 3808064. [Google Scholar] [CrossRef]
- Dai, X. HybridNet: A fast vehicle detection system for autonomous driving. Signal Process. Image Commun. 2019, 70, 79–88. [Google Scholar] [CrossRef]
- Nguyen, H. Multiscale Feature Learning Based on Enhanced Feature Pyramid for Vehicle Detection. Complexity 2021, 2021, 5555121. [Google Scholar] [CrossRef]
- Fan, Q.; Brown, L.; Smith, J. A closer look at Faster R-CNN for vehicle detection. In Proceedings of the 2016 IEEE intelligent vehicles symposium (IV), Gotenburg, Sweden, 19–22 June 2016; pp. 124–129. [Google Scholar]
- Liu, P.; Zhang, G.; Wang, B.; Xu, H.; Liang, X.; Jiang, Y.; Li, Z. Loss function discovery for object detection via convergence-simulation driven search. arXiv 2021, arXiv:2102.04700. [Google Scholar]
- Muthukumar, V.; Narang, A.; Subramanian, V.; Belkin, M.; Hsu, D.; Sahai, A. Classification vs regression in overparameterized regimes: Does the loss function matter? J. Mach. Learn. Res. 2021, 22, 1–69. [Google Scholar]
- Jiang, B.; Luo, R.; Mao, J.; Xiao, T.; Jiang, Y. Acquisition of localization confidence for accurate object detection. In Proceedings of the European conference on computer vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 784–799. [Google Scholar]
- Sun, R. Optimization for deep learning: Theory and algorithms. arXiv 2019, arXiv:1912.08957. [Google Scholar]
- Li, P. Optimization Algorithms for Deep Learning; Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong: Hong Kong, 2017. [Google Scholar]
- Soydaner, D. A comparison of optimization algorithms for deep learning. Int. J. Pattern Recognit. Artif. Intell. 2020, 34, 2052013. [Google Scholar] [CrossRef]
- Darken, C.; Chang, J.; Moody, J. Learning rate schedules for faster stochastic gradient search. In Proceedings of the Neural Networks for Signal Processing, Citeseer, 1992; Volume 2. Available online: https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=9db554243d7588589569aea127d676c9644d069a (accessed on 25 April 2023).
- Nesterov, Y. A method for unconstrained convex minimization problem with the rate of convergence O (1/k^2). Doklady an Ussr 1983, 269, 543–547. [Google Scholar]
- Duchi, J.; Hazan, E.; Singer, Y. Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 2011, 12, 2121–2159. [Google Scholar]
- Zeiler, M.D. Adadelta: An adaptive learning rate method. arXiv 2012, arXiv:1212.5701. [Google Scholar]
- Tieleman, T.; Hinton, G. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA Neural Netw. Mach. Learn. 2012, 4, 26–31. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Dean, J.; Corrado, G.; Monga, R.; Chen, K.; Devin, M.; Mao, M.; Ranzato, M.; Senior, A.; Tucker, P.; Yang, K.; et al. Large scale distributed deep networks. Adv. Neural Inf. Process. Syst. 2012, 25. Available online: https://proceedings.neurips.cc/paper_files/paper/2012/file/6aca97005c68f1206823815f66102863-Paper.pdf (accessed on 25 April 2023).
- Mukkamala, M.C.; Hein, M. Variants of rmsprop and adagrad with logarithmic regret bounds. In Proceedings of the International Conference on Machine Learning, PMLR, Sydney, Australia, 6–11 August 2017; pp. 2545–2553. [Google Scholar]
- Zaheer, R.; Shaziya, H. A study of the optimization algorithms in deep learning. In Proceedings of the 2019 Third International Conference on Inventive Systems and Control (ICISC), Coimbatore, India, 10–11 January 2019; pp. 536–539. [Google Scholar]
- Javaid, S.; Wu, Z.; Fahim, H.; Mabrouk, I.B.; Al-Hasan, M.; Rasheed, M.B. Feedforward Neural Network-Based Data Aggregation Scheme for Intrabody Area Nanonetworks. IEEE Syst. J. 2022, 16, 1796–1807. [Google Scholar] [CrossRef]
- Zou, Z.; Shi, Z.; Guo, Y.; Ye, J. Object detection in 20 years: A survey. arXiv 2019, arXiv:1905.05055. [Google Scholar] [CrossRef]
- Viola, P.; Jones, M. Rapid Object Detection using a Boosted Cascade of Simple. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR, Kauai, HI, USA, 8–14 December 2001. [Google Scholar]
- Haselhoff, A.; Kummert, A. A vehicle detection system based on haar and triangle features. In Proceedings of the 2009 IEEE Intelligent Vehicles Symposium, Xi’an, China, 3–5 June 2009; pp. 261–266. [Google Scholar]
- Kim, K.J.; Kim, P.K.; Chung, Y.S.; Choi, D.H. Multi-scale detector for accurate vehicle detection in traffic surveillance data. IEEE Access 2019, 7, 78311–78319. [Google Scholar] [CrossRef]
- Chen, W.; Qiao, Y.; Li, Y. Inception-SSD: An improved single shot detector for vehicle detection. J. Ambient. Intell. Humaniz. Comput. 2020, 13, 5047–5053. [Google Scholar] [CrossRef]
- Zhao, M.; Zhong, Y.; Sun, D.; Chen, Y. Accurate and efficient vehicle detection framework based on SSD algorithm. IET Image Process. 2021, 15, 3094–3104. [Google Scholar] [CrossRef]
- Zhang, L.; Wang, H.; Wang, X.; Chen, S.; Wang, H.; Zheng, K. Vehicle object detection based on improved retinanet. In Proceedings of the Journal of Physics: Conference Series, Nanchang, China, 26–28 October 2021; IOP Publishing: Bristol, UK, 2021; Volume 1757, p. 012070. [Google Scholar]
- Wang, X.; Cheng, P.; Liu, X.; Uzochukwu, B. Focal loss dense detector for vehicle surveillance. In Proceedings of the 2018 International Conference on Intelligent Systems and Computer Vision (ISCV), Fez, Morocco, 2–4 April 2018; pp. 1–5. [Google Scholar]
- Luo, J.q.; Fang, H.s.; Shao, F.m.; Zhong, Y.; Hua, X. Multi-scale traffic vehicle detection based on faster R–CNN with NAS optimization and feature enrichment. Def. Technol. 2021, 17, 1542–1554. [Google Scholar] [CrossRef]
- Arora, N.; Kumar, Y.; Karkra, R.; Kumar, M. Automatic vehicle detection system in different environment conditions using fast R-CNN. Multimed. Tools Appl. 2022, 81, 18715–18735. [Google Scholar] [CrossRef]
- Charouh, Z.; Ezzouhri, A.; Ghogho, M.; Guennoun, Z. A resource-efficient CNN-based method for moving vehicle detection. Sensors 2022, 22, 1193. [Google Scholar] [CrossRef] [PubMed]
- Rajput, S.K.; Patni, J.C.; Alshamrani, S.S.; Chaudhari, V.; Dumka, A.; Singh, R.; Rashid, M.; Gehlot, A.; AlGhamdi, A.S. Automatic Vehicle Identification and Classification Model Using the YOLOv3 Algorithm for a Toll Management System. Sustainability 2022, 14, 9163. [Google Scholar] [CrossRef]
- Amrouche, A.; Bentrcia, Y.; Abed, A.; Hezil, N. Vehicle Detection and Tracking in Real-time using YOLOv4-tiny. In Proceedings of the 2022 7th International Conference on Image and Signal Processing and their Applications (ISPA), Mostaganem, Algeria, 8–9 May 2022; pp. 1–5. [Google Scholar]
- Wang, Q.; Xu, N.; Huang, B.; Wang, G. Part-Aware Refinement Network for Occlusion Vehicle Detection. Electronics 2022, 11, 1375. [Google Scholar] [CrossRef]
- Farid, A.; Hussain, F.; Khan, K.; Shahzad, M.; Khan, U.; Mahmood, Z. A Fast and Accurate Real-Time Vehicle Detection Method Using Deep Learning for Unconstrained Environments. Appl. Sci. 2023, 13, 3059. [Google Scholar] [CrossRef]
- Huang, F.; Chen, S.; Wang, Q.; Chen, Y.; Zhang, D. Using deep learning in an embedded system for real-time target detection based on images from an unmanned aerial vehicle: Vehicle detection as a case study. Int. J. Digit. Earth 2023, 16, 910–936. [Google Scholar] [CrossRef]
- Qiu, Z.; Bai, H.; Chen, T. Special Vehicle Detection from UAV Perspective via YOLO-GNS Based Deep Learning Network. Drones 2023, 7, 117. [Google Scholar] [CrossRef]
- Zhang, Y.; Sun, Y.; Wang, Z.; Jiang, Y. YOLOv7-RAR for Urban Vehicle Detection. Sensors 2023, 23, 1801. [Google Scholar] [CrossRef]
- Mittal, U.; Chawla, P.; Tiwari, R. EnsembleNet: A hybrid approach for vehicle detection and estimation of traffic density based on faster R-CNN and YOLO models. Neural Comput. Appl. 2023, 35, 4755–4774. [Google Scholar] [CrossRef]
- Gupte, S.; Masoud, O.; Martin, R.F.; Papanikolopoulos, N.P. Detection and classification of vehicles. IEEE Trans. Intell. Transp. Syst. 2002, 3, 37–47. [Google Scholar] [CrossRef]
- Petrovic, V.S.; Cootes, T.F. Analysis of Features for Rigid Structure Vehicle Type Recognition. In Proceedings of the BMVC, Kingston, UK, 7–9 September 2004; Kingston University: London, UK, 2004; Volume 2, pp. 587–596. [Google Scholar]
- Psyllos, A.; Anagnostopoulos, C.N.; Kayafas, E. Vehicle model recognition from frontal view image measurements. Comput. Stand. Interfaces 2011, 33, 142–151. [Google Scholar] [CrossRef]
- Peng, Y.; Jin, J.S.; Luo, S.; Xu, M.; Au, S.; Zhang, Z.; Cui, Y. Vehicle type classification using data mining techniques. In The Era of Interactive Media; Springer: Berlin/Heidelberg, Germany, 2013; pp. 325–335. [Google Scholar]
- Dong, Z.; Wu, Y.; Pei, M.; Jia, Y. Vehicle type classification using a semisupervised convolutional neural network. IEEE Trans. Intell. Transp. Syst. 2015, 16, 2247–2256. [Google Scholar] [CrossRef]
- Awang, S.; Azmi, N.M.A.N.; Rahman, M.A. Vehicle type classification using an enhanced sparse-filtered convolutional neural network with layer-skipping strategy. IEEE Access 2020, 8, 14265–14277. [Google Scholar] [CrossRef]
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef]
- Maungmai, W.; Nuthong, C. Vehicle classification with deep learning. In Proceedings of the 2019 IEEE 4th International Conference on Computer and Communication Systems (ICCCS), Singapore, 23–25 February 2019; pp. 294–298. [Google Scholar]
- Wang, K.C.; Pranata, Y.D.; Wang, J.C. Automatic vehicle classification using center strengthened convolutional neural network. In Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Kuala, Malaysia, 12–15 December 2017; pp. 1075–1078. [Google Scholar]
- Fahim, H.; Javaid, S.; Li, W.; Mabrouk, I.B.; Hasan, M.A.; Rasheed, M.B.B. An Efficient Routing Scheme for Intrabody Nanonetworks Using Artificial Bee Colony Algorithm. IEEE Access 2020, 8, 98946–98957. [Google Scholar] [CrossRef]
- Jahan, N.; Islam, S.; Foysal, M.F.A. Real-Time Vehicle Classification Using CNN. In Proceedings of the 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), Kharagpur, India, 1–3 July 2020; pp. 1–6. [Google Scholar]
- Taek Lee, J.; Chung, Y. Deep learning-based vehicle classification using an ensemble of local expert and global networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 47–52. [Google Scholar]
- Liu, W.; Zhang, M.; Luo, Z.; Cai, Y. An ensemble deep learning method for vehicle type classification on visual traffic surveillance sensors. IEEE Access 2017, 5, 24417–24425. [Google Scholar] [CrossRef]
- Jagannathan, P.; Rajkumar, S.; Frnda, J.; Divakarachari, P.B.; Subramani, P. Moving vehicle detection and classification using gaussian mixture model and ensemble deep learning technique. Wirel. Commun. Mob. Comput. 2021, 2021, 5590894. [Google Scholar] [CrossRef]
- Chen, W.; Chen, X.; Zhang, J.; Huang, K. A multi-task deep network for person re-identification. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 31. [Google Scholar]
- Liu, A.A.; Su, Y.T.; Nie, W.Z.; Kankanhalli, M. Hierarchical clustering multi-task learning for joint human action grouping and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 102–114. [Google Scholar] [CrossRef]
- Cai, Z.; Fan, Q.; Feris, R.S.; Vasconcelos, N. A unified multi-scale deep convolutional neural network for fast object detection. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 354–370. [Google Scholar]
- Kanacı, A.; Li, M.; Gong, S.; Rajamanoharan, G. Multi-task mutual learning for vehicle re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Phillips, J.; Martinez, J.; Bârsan, I.A.; Casas, S.; Sadat, A.; Urtasun, R. Deep multi-task learning for joint localization, perception, and prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 4679–4689. [Google Scholar]
- Sang, J.; Wu, Z.; Guo, P.; Hu, H.; Xiang, H.; Zhang, Q.; Cai, B. An improved YOLOv2 for vehicle detection. Sensors 2018, 18, 4272. [Google Scholar] [CrossRef] [PubMed]
- Mansour, A.; Hassan, A.; Hussein, W.M.; Said, E. Automated vehicle detection in satellite images using deep learning. In Proceedings of the International Conference on Aerospace Sciences and Aviation Technology, Cairo, Egypt, 9–11 April 2019; The Military Technical College: Cairo, Egypt, 2019; Volume 18, pp. 1–8. [Google Scholar]
- Sowmya, V.; Radha, R. Heavy-Vehicle Detection Based on YOLOv4 featuring Data Augmentation and Transfer-Learning Techniques. In Proceedings of the Journal of Physics: Conference Series, Nanchang, China, 26–28 October 2021; IOP Publishing: Bristol, UK, 2021; Volume 1911, p. 012029. [Google Scholar]
- Wang, L.; Lu, Y.; Wang, H.; Zheng, Y.; Ye, H.; Xue, X. Evolving boxes for fast vehicle detection. In Proceedings of the 2017 IEEE International Conference on Multimedia and Expo (ICME), Hong Kong, China, 10–14 July 2017; pp. 1135–1140. [Google Scholar]
- Kim, K.J.; Kim, P.K.; Chung, Y.S.; Choi, D.H. Performance enhancement of yolov3 by adding prediction layers with spatial pyramid pooling for vehicle detection. In Proceedings of the 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Auckland, New Zealand, 27–30 November 2018; pp. 1–6. [Google Scholar]
- Wang, X.; Wang, S.; Cao, J.; Wang, Y. Data-driven based tiny-YOLOv3 method for front vehicle detection inducing SPP-net. IEEE Access 2020, 8, 110227–110236. [Google Scholar] [CrossRef]
Algorithms | Advantage | Disadvantage |
---|---|---|
RCNN [23] | Utilizes selective search approach to produce regions. Extracts 2000 regions from each image than the standard CNN algorithm. | High computational time. Slow speed because of using several networks for generating predictions. Difficult to detect small-scale objects |
Fast RCNN [26] | Each image is passed only once to the CNN algorithm, and feature maps are extracted. Selective search approach is employed on these maps to produce predictions. | Requires a high volume of the real-time dataset. High computation time. |
Faster RCNN [27] | Replaces the selective search approach with RPN algorithm, which makes the algorithm much faster | Requires several passes using a single image to extract all the object classes. The performance of the algorithms depends on how the preceding schemes have performed. |
RFCN [31] | Uses position-sensitivity score maps to solve the position sensitivity problem of object classification and detection. Has less computational time compared to the rest of the algorithms, due to its property of sharing every convolutional layer. | R-FCN has a competitive mAP but it is lower than that of Faster R-CNN. |
Networks | Advantage | Disadvantage |
---|---|---|
SSD | Simple neural network. low computational expensive | Low detection accuracy in complex scenarios. |
RetinaNet | Enhanced detection precision on small objects. suitable for class imbalance training process | Requires real-time detection. |
YOLOv1 | Fast compared to the two-step object detectors. global trainable module stops optimization. offers higher generalization when evaluating another dataset. | Poor performance for a set of small object classes, due to its grid set-up. high localization error. |
YOLOv2 | It dramatically enhances the speed and accuracy of object detection. It is easy to detect objects with grids and boundaries prediction, and also it helps in predicting tiny objects or objects that are very far in the image | Complex Training |
YOLOv3 | Fast, robust predictions of objects in real-time. computational inexpensive. | Worst to detect medium and large objects. |
YOLOv4 | Excellent detection accuracy. better training optimization | Poor small target detection accuracy. |
YOLOv5 | Outstanding detection/recognition accuracy. low false detection rate. works efficiently. low computational cost. easily to set up. | Has both global maxima and local minimal. |
Networks | Backbone | Dataset | Image Size | [email protected] | [email protected] to 0.95 | FPs |
---|---|---|---|---|---|---|
RCNN | AlexNet | PASCAL VOC 12 | 224 | - | 58.50 | 0.02 |
Fast RCNN | VGG-16 | PASCAL VOC 12 | variable | - | 65.70 | 0.43 |
Faster RCNN | VGG-16 | PASCAL VOC 12 | 600 | - | 67.00 | 5 |
R-FCN | ResNet-101 | COCO 12 | 600 | 31.50 | 53.20 | 3 |
RetinaNet | ResNet-101-FPN | COCO 12 | 400 | 31.90 | 49.50 | 12 |
SSD | VGG-16 | COCO 12 | 300 | 23.20 | 41.20 | 46 |
YOLOv1 | GoogleNet | PASCAL VOC 12 | 448 | - | 57.90 | 45 |
YOLOv2 | DarkNet-19 | COCO 12 | 352 | 21.60 | 44.00 | 81 |
YOLOv3 | DarkNet-53 | COCO 12 | 320 | 28.20 | 51.50 | 45 |
YOLOv4 | CSPDarkNet-53 | COCO 12 | 512 | 43.00 | 64.90 | 31 |
References | Approach | Dataset | Evaluation Metrics |
---|---|---|---|
Zuraim et al. [78] | Yolov4.DeepSORT. | Own dataset. | 82.08% of average precision. |
Xu et al. [79] | Modified YOLOv3 classifier. | VEDAI dataset. | 91.72% of average precision. |
Liu et al. [80] | BFEN + SLPN + PNW. | DETRAC benchmark dataset. | 88.71% of mAP. |
Nguyen et al. [81] | Soft NMS algorithm. Faster RCNN classifier. | KITTI dataset. LSVH dataset. | 83.92% average precision in the KITTI dataset. 64.72% average precision in the LSVH dataset. |
Dai et al. [82] | Faster RCNN + SSD classifier. | KITTI dataset. PASCAL2007 car dataset. | 85.22% average precision in the KITTI dataset. 64.83% average precision in the PASCAL2007 car dataset. |
Nguyen et al. [83] | Faster RCNN with FPN backbone. | KITTI dataset. PASCAL2007 car dataset. | 88.95% average precision in the KITTI dataset. 78.84% average precision in the PASCAL2007 car dataset. |
Fan et al. [84] | Faster RCNN classifier. | KITTI dataset. | 83.36% verage precision. |
Evaluation Metrics | Mathematical Formulae |
---|---|
Precision | |
Recall | |
Frame Per Second | |
Intersection over Union (IoU) | |
Average mean Precision | |
Average Precision | |
True Positive Rate | |
False Positive Rate | |
Accuracy | |
F1-Score | - |
Area Under Curve |
Functions | Formula | Advantage | Disadvantage |
---|---|---|---|
Sigmoid | Suitable for light Networks. Used in feedforward NNs. Bounded and differentiable actual function. | Dramatically declines gradients during back-propagation. Has the nature of gradient saturation. Slow convergence and non-zero centered output lead the gradient updates to propagate in various directions. | |
Tanh | It presents outstanding training performance for MLP NNs. Generates zero centered output to assist the bac-kpropagation process. | It generates dead neurons during computation. High degree computational complexity. | |
ReLU | Faster learning activation compared to others. Most successful and widely employed function. Presents outstanding performance and generalization in DL architectures compared to sigmoid and Tanh functions. Simple to optimize. No gradient saturation problems. Low computational cost. | It has the nature of over-fit compared to a sigmoid function. Insubstantial during the training process and leads to some of the gradients dying. It is not a zero-centered function. | |
ELU | It can solve the problem of gradient vanishing using identity values. Ability to learn characteristics of DL systems improves. Can minimize the computational complexity of using the mean unit action function. | A high degree of computational complexity. | |
Softmax | It is used for multivariate classification tasks. | Not suitable for binary classification problems. | |
Softplus | It has smoothing and non-zero gradient properties to improve stabilization and performance of DL with fewer epochs to convergence during the training process. It can handle the vanishing gradient problem. | A high degree of Computational complexity. | |
Swish | Uses automatic search approaches to compute the function. Presents outstanding optimization and generalization outcomes. Does not suffer from problems of gradient vanishing. It requires simple scalar inputs. | A high Computation complexity. | |
ELiSq | It presents excellent optimization and generalization outcomes. Does not suffer from problems of gradient vanishing. Requires simple scalar inputs. It reduces the problem of the gradient vanishing to improve information flow. | ||
Maxout | Easily to generalize. | A high computational complexity. |
Models | Hidden Layers | Output Layers |
---|---|---|
SeNet | ReLU | Sigmoid |
ReseNeXt | ReLU | Softmax |
AlexNet | ReLU. | Softmax |
DenseNet | ReLU. | Softmax |
GoogleNet | ReLU. | Softmax |
EfficienNet | ReLU. | Softmax |
MobileNet | ReLU. | Softmax |
ResNet | ReLU. | Softmax |
ImageNet | ReLU. | Softmax |
SqueezNet | ReLU. | Softmax |
VGGNet | ReLU. | Softmax |
Inception | ReLU. | Softmax |
Loss Functions | Mathematical Formula |
---|---|
Hinge Loss | |
Squared Hinge Loss | |
Kullback–Leibler Divergence | |
Cross Entropy Loss |
Location Loss Functions | Mathematical Formula |
---|---|
Absolute Loss | |
Sum of Absolute Differences | |
Mean Absolute Error | |
Mean Square Error | |
Huber Loss | λ|y − f(x)| − λ2, otherwise |
Loss Functions | Advantage | Disadvantage |
---|---|---|
Mean Square Error Loss | The GD has only global minima. No local minima. penalizes the network architecture for making large mistakes. | Not robust if the samples consist of outliers. |
Mean Absolute Error Loss | More robust compared to MSE. | High computational cost. Has a local minima. large global for small loss |
Huber Loss | Outliers are handled wisely. No local minima. It is differential at zero. | Requires extra hyperparameter optimization techniques. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Berwo, M.A.; Khan, A.; Fang, Y.; Fahim, H.; Javaid, S.; Mahmood, J.; Abideen, Z.U.; M.S., S. Deep Learning Techniques for Vehicle Detection and Classification from Images/Videos: A Survey. Sensors 2023, 23, 4832. https://doi.org/10.3390/s23104832
Berwo MA, Khan A, Fang Y, Fahim H, Javaid S, Mahmood J, Abideen ZU, M.S. S. Deep Learning Techniques for Vehicle Detection and Classification from Images/Videos: A Survey. Sensors. 2023; 23(10):4832. https://doi.org/10.3390/s23104832
Chicago/Turabian StyleBerwo, Michael Abebe, Asad Khan, Yong Fang, Hamza Fahim, Shumaila Javaid, Jabar Mahmood, Zain Ul Abideen, and Syam M.S. 2023. "Deep Learning Techniques for Vehicle Detection and Classification from Images/Videos: A Survey" Sensors 23, no. 10: 4832. https://doi.org/10.3390/s23104832
APA StyleBerwo, M. A., Khan, A., Fang, Y., Fahim, H., Javaid, S., Mahmood, J., Abideen, Z. U., & M.S., S. (2023). Deep Learning Techniques for Vehicle Detection and Classification from Images/Videos: A Survey. Sensors, 23(10), 4832. https://doi.org/10.3390/s23104832