skip to main content
10.1145/3394171.3413866acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

VONAS: Network Design in Visual Odometry using Neural Architecture Search

Published: 12 October 2020 Publication History

Abstract

The end-to-end VO (visual odometry) is a complicated task with the property of highly temporal dependency, but the design of its deep networks lacks thorough investigation. Meanwhile, NAS (Neural architecture search) has been widely searched and applied in many computer vision fields due to its advantage in automatic network design. However, most of the existing NAS frameworks only consider single image tasks such as image classification, lacking the consideration of the video (multi-frames) tasks such as VO. Therefore, this paper explores the network design for the VO task and proposes a more general single path based one-shot NAS, named VONAS, which can model sequential information for video-related tasks. Extensive experiments prove that the network architecture is significant for the (un)supervised VO. The models obtained by VONAS are lightweight and achieve SOTA performance with good generalization.

Supplementary Material

MP4 File (3394171.3413866.mp4)
The end-to-end VO (visual odometry) is a complicated task with the property of highly temporal dependency, but the design of its deep networks lacks thorough investigation. Meanwhile, NAS (Neural architecture search) has been widely searched and applied in many computer vision fields due to its advantage in automatic network design. However, most of the existing NAS frameworks only consider single image tasks such as image classification, lacking the consideration of the video (multi-frames) tasks such as VO. Therefore, this paper explores the network design for the VO task and proposes a more general single path based one-shot NAS, named VONAS, which can model sequential information for video-related tasks. Extensive experiments prove that the network architecture is significant for the (un)supervised VO. The models obtained by VONAS are lightweight and achieve SOTA performance with good generalization.

References

[1]
Jonathan T Barron. 2019. A general and adaptive robust loss function. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4331--4339.
[2]
Gabriel Bender. 2019. Understanding and simplifying one-shot architecture search. (2019).
[3]
Jiawang Bian, Zhichao Li, Naiyan Wang, Huangying Zhan, Chunhua Shen, Ming-Ming Cheng, and Ian Reid. 2019. Unsupervised scale-consistent depth and ego-motion learning from monocular video. In Advances in Neural Information Processing Systems. 35--45.
[4]
Andrew Brock, Theodore Lim, James M Ritchie, and Nick Weston. 2017. Smash: one-shot model architecture search through hypernetworks. arXiv preprint arXiv:1708.05344 (2017).
[5]
Han Cai, Ligeng Zhu, and Song Han. 2018. Proxylessnas: Direct neural architecture search on target task and hardware. arXiv preprint arXiv:1812.00332 (2018).
[6]
Xiangxiang Chu, Bo Zhang, Ruijun Xu, and Jixiang Li. 2019. Fairnas: Rethinking evaluation fairness of weight sharing neural architecture search. arXiv preprint arXiv:1907.01845 (2019).
[7]
Alexey Dosovitskiy, Philipp Fischer, Eddy Ilg, Philip Hausser, Caner Hazirbas, Vladimir Golkov, Patrick Van Der Smagt, Daniel Cremers, and Thomas Brox. 2015. Flownet: Learning optical flow with convolutional networks. In Proceedings of the IEEE international conference on computer vision. 2758--2766.
[8]
Clé ment Godard, Oisin Mac Aodha, Michael Firman, and Gabriel J. Brostow. 2018. Digging into Self-Supervised Monocular Depth Prediction. arXiv:1806.01260 (2018).
[9]
Zichao Guo, Xiangyu Zhang, Haoyuan Mu, Wen Heng, Zechun Liu, Yichen Wei, and Jian Sun. 2019. Single path one-shot neural architecture search with uniform sampling. arXiv preprint arXiv:1904.00420 (2019).
[10]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[11]
Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, et al. 2019. Searching for mobilenetv3. In Proceedings of the IEEE International Conference on Computer Vision. 1314--1324.
[12]
Alex Kendall, Matthew Grimes, and Roberto Cipolla. 2015. Posenet: A convolutional network for real-time 6-dof camera relocalization. In Proceedings of the IEEE international conference on computer vision. 2938--2946.
[13]
Ruihao Li, Sen Wang, Zhiqiang Long, and Dongbing Gu. 2017. UnDeepVO: Monocular Visual Odometry through Unsupervised Deep Learning. (2017).
[14]
Hanxiao Liu, Karen Simonyan, and Yiming Yang. 2018. Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055 (2018).
[15]
Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, and Jian Sun. 2018. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European Conference on Computer Vision (ECCV). 116--131.
[16]
Raul Mur-Artal, Jose Maria Martinez Montiel, and Juan D Tardos. 2015. ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE transactions on robotics, Vol. 31, 5 (2015), 1147--1163.
[17]
Hieu Pham, Melody Y Guan, Barret Zoph, Quoc V Le, and Jeff Dean. 2018. Efficient neural architecture search via parameter sharing. arXiv preprint arXiv:1802.03268 (2018).
[18]
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4510--4520.
[19]
Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. Computer Science (2014).
[20]
Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V. Le. 2019. MnasNet: Platform-Aware Neural Architecture Search for Mobile. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) .
[21]
Sen Wang, Ronald Clark, Hongkai Wen, and Niki Trigoni. 2017. Deepvo: Towards end-to-end visual odometry with deep recurrent convolutional neural networks. In 2017 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2043--2050.
[22]
Sen Wang, Ronald Clark, Hongkai Wen, and Niki Trigoni. 2018. End-to-end, sequence-to-sequence probabilistic visual odometry through deep neural networks. The International Journal of Robotics Research, Vol. 37, 4--5 (2018), 513--542.
[23]
Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, Vol. 8, 3--4 (1992), 229--256.
[24]
Fei Xue, Qiuyuan Wang, Xin Wang, Wei Dong, Junqiu Wang, and Hongbin Zha. 2018. Guided Feature Selection for Deep Visual Odometry. arXiv preprint arXiv:1811.09935 (2018).
[25]
Zhichao Yin and Jianping Shi. 2018. Geonet: Unsupervised learning of dense depth, optical flow and camera pose. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1983--1992.
[26]
Huangying Zhan, Ravi Garg, Chamara Saroj Weerasekera, Kejie Li, Harsh Agarwal, and Ian Reid. 2018. Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 340--349.
[27]
Yuge Zhang, Zejun Lin, Junyang Jiang, Quanlu Zhang, Yujing Wang, Hui Xue, Chen Zhang, and Yaming Yang. 2020. Deeper Insights into Weight Sharing in Neural Architecture Search. arXiv preprint arXiv:2001.01431 (2020).
[28]
Tinghui Zhou, Matthew Brown, Noah Snavely, and David G Lowe. 2017. Unsupervised learning of depth and ego-motion from video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1851--1858.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '20: Proceedings of the 28th ACM International Conference on Multimedia
October 2020
4889 pages
ISBN:9781450379885
DOI:10.1145/3394171
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. AutoML
  2. SLAM
  3. neural architecture search
  4. visual odometry

Qualifiers

  • Research-article

Funding Sources

  • Hangzhou Science and Technology Development Program
  • Key-Area Research and Development Program of Guangdong Province
  • National Natural Science Foundation of China and Guangdong Province Scientific Research on Big Data

Conference

MM '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)31
  • Downloads (Last 6 weeks)3
Reflects downloads up to 06 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media