An End-to-End Trainable Multi-Column CNN for Scene Recognition in Extremely Changing Environment
Abstract
:1. Introduction
2. Related Work
2.1. Handcrafted Feature Method
2.2. CNN-Based Method
3. Proposed Approach
3.1. The Multi-Column Network Structure
3.2. Train the Network by an Embedded L-Softmax Layer
3.3. Image Retrieval
4. Experimental Results and Analysis
4.1. Performance Measurements
4.2. Dataset Used in the Experiment
4.2.1. The Nordland Dataset
4.2.2. The KTH-IDOL2 Dataset
4.2.3. The KITTI Dataset
4.3. Scene Recognition with Appearance Change
4.4. Scene Recognition with Viewpoint Change
4.5. Scene Recognition with No Appearance and Viewpoint Change
4.6. Robustness Analysis
4.7. Ablation Study
5. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Zhang, M.; Liu, X.; Xu, D.; Cao, Z.; Yu, J. Vision-Based Target Following Guider for Mobile Robot. IEEE Trans. Ind. Electron. 2019, 66, 9360–9371. [Google Scholar] [CrossRef]
- Dinc, S.; Fahimi, F.; Aygun, R. Vision-based Trajectory Tracking for Mobile Robots Using Mirage Pose Estimation method. IET Comput. Vis. 2016, 10, 450–458. [Google Scholar] [CrossRef] [Green Version]
- He, W.; Li, Z.; Chen, C.P. A Survey of Human-centered Intelligent Robots: Issues and Challenges. IEEE/CAA J. Autom. Sin. 2017, 4, 602–609. [Google Scholar] [CrossRef]
- Cl’ement, M.; Kurtz, C.; Wendling, L. Learning Spatial Relations and Shapes for Structural Object Description and Scene Recognition. Pattern Recognit. 2018, 84, 197–210. [Google Scholar] [CrossRef]
- Ullah, M.M.; Pronobis, A.; Caputo, B.; Luo, J.; Jensfelt, P.; Christensen, H.I. Towards Robust Scene Recognition for Robot Localization. In Proceedings of the IEEE International Conference on Robotics and Automation, Pasadena, CA, USA, 19–23 May 2008; pp. 530–537. [Google Scholar]
- Chen, Z.; Liu, L.; Sa, I.; Ge, Z.; Chli, M. Learning Context Flexible Attention Model for Long-Term Visual Scene recognition. IEEE Robot. Autom. Lett. 2018, 3, 4015–4022. [Google Scholar] [CrossRef] [Green Version]
- Oh, J.H.; Lee, B.H.; Jeon, J.D. Place Recognition for Visual Loop Closures Using Similarities of Object Graphs. Electron. Lett. 2015, 51, 44–46. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet Classification Withdeep Convolutional Neural Networks. In Advances in Neural Information Processing Systems; Curran Associates Inc.: Red Hook, NY, USA, 2012; Volume 25, pp. 1097–1105. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-scale Image Recognition. arXiv 2014, arXiv:1409.1556v6. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Rabinovich, A. Going Deeper with Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A.A. Inception-v4, Inception Resnet and the Impact of Residual Connections on Learning. In Proceedings of the Thirty First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017. [Google Scholar]
- Liu, W.; Wen, Y.; Yu, Z.; Yang, M. Large-margin Softmax Loss for Convolutional Neural Networks. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; p. 7. [Google Scholar]
- Yuan, Y.; Mou, L.; Lu, X. Scene Recognition by Manifold Regularized Deep Learning Architecture. IEEE Trans. Neural Netw. Learn. Syst. 2015, 26, 2222–2233. [Google Scholar] [CrossRef] [PubMed]
- Lowe, D.G. Distinctive Image Features from Scale-invariant Keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Bay, H.; Tuytelaars, T.; Van Gool, L. Surf: Speeded up robust features. In Proceedings of the European Conference on Computer Vision, Graz, Austria, 7–13 May 2006; pp. 404–417. [Google Scholar]
- Mur-Artal, R.; Montiel, J.M.M.; Tardos, J.D. ORB-SLAM: A Versatile and Accurate Monocular SLAM System. IEEE Trans. Robot. 2015, 31, 1147–1163. [Google Scholar] [CrossRef] [Green Version]
- Dalal, N.; Triggs, B. Histograms of Oriented Gradients for Human Detection. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 20–26 June 2005; pp. 886–893. [Google Scholar]
- Zhang, X.; Wang, L.; Zhao, Y.; Su, Y. Graph-Based Scene Recognitionin Image Sequences with CNN Features. J. Intell. Robot. Syst. 2019, 95, 389–403. [Google Scholar] [CrossRef]
- Park, C.; Jang, J.; Zhang, L.; Jung, J.I. Light-weight Visual Scene Recognition Using Convolutional Neural Network for Mobile Robots. In Proceedings of the IEEE International Conference on Consumer Electronics, Las Vegas, NV, USA, 12–15 January 2018; pp. 1–4. [Google Scholar]
- Arandjelovic, R.; Gronat, P.; Torii, A.; Pajdla, T.; Sivic, J. NetVLAD: CNN Architecture for Weakly Supervised Place Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 5297–5307. [Google Scholar]
- Torii, A.; Arandjelovic, R.; Sivic, J.; Okutomi, M.; Pajdla, T. 24/7 Scene Recognition by View Synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1808–1817. [Google Scholar]
- López-Cifuentes, A.; Escudero-Viñolo, M.; Bescós, J.; García-Martín, Á. Semantic-Aware Scene Recognition. Pattern Recognit. 2020, 102, 107256. [Google Scholar] [CrossRef] [Green Version]
- Zhu, H.; Weibel, J.B.; Lu, S. Discriminative Multi-modal Feature Fusion for RGBD Indoor Scene Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2969–2976. [Google Scholar]
- Siméoni, O.; Avrithis, Y.; Chum, O. Local Features and Visual Words Emerge in Activations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 11651–11660. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv 2015, arXiv:1502.03167. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
- Sünderhau, N.; Shirazi, S.; Dayoub, F.; Upcroft, B.; Milford, M. On the Performance of Convnet Features for Scene Recognition. In Proceedings of the International Conference on Intelligent Robots and Systems, Hamburg, Germany, 28 September–3 October 2015; pp. 4297–4304. [Google Scholar]
- Chen, Z.; Maffra, F.; Sa, I.; Chli, M. Only Look Once, Mining Distinctive Landmarks from Convnet for Visual Scene Recognition. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Vancouver, BC, Canada, 24–28 September 2017; pp. 9–16. [Google Scholar]
- Guo, J.; Nie, X.; Yin, Y. Mutual Complementarity: Multi-Modal Enhancement Semantic Learning for Micro-Video Scene Recognition. IEEE Access 2020, 8, 29518–29524. [Google Scholar] [CrossRef]
- Zhang, L.; Shi, M.; Chen, Q. Crowd Counting Via Scale-adaptive Convolutional Neural Network. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision, Lake Tahoe, IN, USA, 12–14 March 2018; pp. 1113–1121. [Google Scholar]
- Milford, M.J.; Wyeth, G.F. SeqSLAM: Visual Route-based Navigation for Sunny Summer Days and Stormy Winter Nights. In Proceedings of the IEEE International Conference on Robotics and Automation, St Paul, MN, USA, 14–18 May 2012; pp. 1643–1649. [Google Scholar]
Dataset | Environment | Appearance | Viewpoint |
---|---|---|---|
The Nordland dataset | train journey | severe | minor |
The KTH-IDOL2 dataset | indoor | minor | sever |
The KITTI dataset | outdoor | none | minor |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, Z.; Zhou, A.; Shen, Y. An End-to-End Trainable Multi-Column CNN for Scene Recognition in Extremely Changing Environment. Sensors 2020, 20, 1556. https://doi.org/10.3390/s20061556
Li Z, Zhou A, Shen Y. An End-to-End Trainable Multi-Column CNN for Scene Recognition in Extremely Changing Environment. Sensors. 2020; 20(6):1556. https://doi.org/10.3390/s20061556
Chicago/Turabian StyleLi, Zhenyu, Aiguo Zhou, and Yong Shen. 2020. "An End-to-End Trainable Multi-Column CNN for Scene Recognition in Extremely Changing Environment" Sensors 20, no. 6: 1556. https://doi.org/10.3390/s20061556
APA StyleLi, Z., Zhou, A., & Shen, Y. (2020). An End-to-End Trainable Multi-Column CNN for Scene Recognition in Extremely Changing Environment. Sensors, 20(6), 1556. https://doi.org/10.3390/s20061556