research-article

VONAS: Network Design in Visual Odometry using Neural Architecture Search

Authors:

Thomas H. LiAuthors Info & Claims

MM '20: Proceedings of the 28th ACM International Conference on Multimedia

Pages 727 - 735

https://doi.org/10.1145/3394171.3413866

Published: 12 October 2020 Publication History

Abstract

The end-to-end VO (visual odometry) is a complicated task with the property of highly temporal dependency, but the design of its deep networks lacks thorough investigation. Meanwhile, NAS (Neural architecture search) has been widely searched and applied in many computer vision fields due to its advantage in automatic network design. However, most of the existing NAS frameworks only consider single image tasks such as image classification, lacking the consideration of the video (multi-frames) tasks such as VO. Therefore, this paper explores the network design for the VO task and proposes a more general single path based one-shot NAS, named VONAS, which can model sequential information for video-related tasks. Extensive experiments prove that the network architecture is significant for the (un)supervised VO. The models obtained by VONAS are lightweight and achieve SOTA performance with good generalization.

Supplementary Material

MP4 File (3394171.3413866.mp4)

The end-to-end VO (visual odometry) is a complicated task with the property of highly temporal dependency, but the design of its deep networks lacks thorough investigation. Meanwhile, NAS (Neural architecture search) has been widely searched and applied in many computer vision fields due to its advantage in automatic network design. However, most of the existing NAS frameworks only consider single image tasks such as image classification, lacking the consideration of the video (multi-frames) tasks such as VO. Therefore, this paper explores the network design for the VO task and proposes a more general single path based one-shot NAS, named VONAS, which can model sequential information for video-related tasks. Extensive experiments prove that the network architecture is significant for the (un)supervised VO. The models obtained by VONAS are lightweight and achieve SOTA performance with good generalization.

Download
271.12 MB

References

[1]

Jonathan T Barron. 2019. A general and adaptive robust loss function. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4331--4339.

[2]

Gabriel Bender. 2019. Understanding and simplifying one-shot architecture search. (2019).

[3]

Jiawang Bian, Zhichao Li, Naiyan Wang, Huangying Zhan, Chunhua Shen, Ming-Ming Cheng, and Ian Reid. 2019. Unsupervised scale-consistent depth and ego-motion learning from monocular video. In Advances in Neural Information Processing Systems. 35--45.

[4]

Andrew Brock, Theodore Lim, James M Ritchie, and Nick Weston. 2017. Smash: one-shot model architecture search through hypernetworks. arXiv preprint arXiv:1708.05344 (2017).

[5]

Han Cai, Ligeng Zhu, and Song Han. 2018. Proxylessnas: Direct neural architecture search on target task and hardware. arXiv preprint arXiv:1812.00332 (2018).

[6]

Xiangxiang Chu, Bo Zhang, Ruijun Xu, and Jixiang Li. 2019. Fairnas: Rethinking evaluation fairness of weight sharing neural architecture search. arXiv preprint arXiv:1907.01845 (2019).

[7]

Alexey Dosovitskiy, Philipp Fischer, Eddy Ilg, Philip Hausser, Caner Hazirbas, Vladimir Golkov, Patrick Van Der Smagt, Daniel Cremers, and Thomas Brox. 2015. Flownet: Learning optical flow with convolutional networks. In Proceedings of the IEEE international conference on computer vision. 2758--2766.

Digital Library

[8]

Clé ment Godard, Oisin Mac Aodha, Michael Firman, and Gabriel J. Brostow. 2018. Digging into Self-Supervised Monocular Depth Prediction. arXiv:1806.01260 (2018).

[9]

Zichao Guo, Xiangyu Zhang, Haoyuan Mu, Wen Heng, Zechun Liu, Yichen Wei, and Jian Sun. 2019. Single path one-shot neural architecture search with uniform sampling. arXiv preprint arXiv:1904.00420 (2019).

[10]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[11]

Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, et al. 2019. Searching for mobilenetv3. In Proceedings of the IEEE International Conference on Computer Vision. 1314--1324.

[12]

Alex Kendall, Matthew Grimes, and Roberto Cipolla. 2015. Posenet: A convolutional network for real-time 6-dof camera relocalization. In Proceedings of the IEEE international conference on computer vision. 2938--2946.

Digital Library

[13]

Ruihao Li, Sen Wang, Zhiqiang Long, and Dongbing Gu. 2017. UnDeepVO: Monocular Visual Odometry through Unsupervised Deep Learning. (2017).

[14]

Hanxiao Liu, Karen Simonyan, and Yiming Yang. 2018. Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055 (2018).

[15]

Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, and Jian Sun. 2018. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European Conference on Computer Vision (ECCV). 116--131.

Digital Library

[16]

Raul Mur-Artal, Jose Maria Martinez Montiel, and Juan D Tardos. 2015. ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE transactions on robotics, Vol. 31, 5 (2015), 1147--1163.

[17]

Hieu Pham, Melody Y Guan, Barret Zoph, Quoc V Le, and Jeff Dean. 2018. Efficient neural architecture search via parameter sharing. arXiv preprint arXiv:1802.03268 (2018).

[18]

Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4510--4520.

[19]

Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. Computer Science (2014).

[20]

Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V. Le. 2019. MnasNet: Platform-Aware Neural Architecture Search for Mobile. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) .

[21]

Sen Wang, Ronald Clark, Hongkai Wen, and Niki Trigoni. 2017. Deepvo: Towards end-to-end visual odometry with deep recurrent convolutional neural networks. In 2017 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2043--2050.

Digital Library

[22]

Sen Wang, Ronald Clark, Hongkai Wen, and Niki Trigoni. 2018. End-to-end, sequence-to-sequence probabilistic visual odometry through deep neural networks. The International Journal of Robotics Research, Vol. 37, 4--5 (2018), 513--542.

[23]

Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, Vol. 8, 3--4 (1992), 229--256.

[24]

Fei Xue, Qiuyuan Wang, Xin Wang, Wei Dong, Junqiu Wang, and Hongbin Zha. 2018. Guided Feature Selection for Deep Visual Odometry. arXiv preprint arXiv:1811.09935 (2018).

[25]

Zhichao Yin and Jianping Shi. 2018. Geonet: Unsupervised learning of dense depth, optical flow and camera pose. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1983--1992.

[26]

Huangying Zhan, Ravi Garg, Chamara Saroj Weerasekera, Kejie Li, Harsh Agarwal, and Ian Reid. 2018. Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 340--349.

[27]

Yuge Zhang, Zejun Lin, Junyang Jiang, Quanlu Zhang, Yujing Wang, Hui Xue, Chen Zhang, and Yaming Yang. 2020. Deeper Insights into Weight Sharing in Neural Architecture Search. arXiv preprint arXiv:2001.01431 (2020).

[28]

Tinghui Zhou, Matthew Brown, Noah Snavely, and David G Lowe. 2017. Unsupervised learning of depth and ego-motion from video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1851--1858.

Cited By

Ji ZNan KXu YWang HLi XBai J(2024)Global-Context-Aware Visual Odometry System With Epipolar-Geometry-Constrained Loss FunctionIEEE Transactions on Instrumentation and Measurement10.1109/TIM.2024.337080473(1-11)Online publication date: 2024
https://doi.org/10.1109/TIM.2024.3370804
Chen YYang MKim H(2023)Search for Efficient Deep Visual-Inertial Odometry Through Neural Architecture SearchICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP49357.2023.10095166(1-5)Online publication date: 4-Jun-2023
https://doi.org/10.1109/ICASSP49357.2023.10095166
Liu LLi GLi T(2021)ATVIO: Attention Guided Visual-Inertial OdometryICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP39728.2021.9413912(4125-4129)Online publication date: 6-Jun-2021
https://doi.org/10.1109/ICASSP39728.2021.9413912

Index Terms

VONAS: Network Design in Visual Odometry using Neural Architecture Search
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Vision for robotics
      2. Image and video acquisition
        Motion capture
  2. Computer graphics
    1. Animation
      1. Motion processing

Recommendations

Auto-Keras: An Efficient Neural Architecture Search System
KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Neural architecture search (NAS) has been proposed to automatically tune deep neural networks, but existing search algorithms, e.g., NASNet, PNAS, usually suffer from expensive computational cost. Network morphism, which keeps the functionality of a ...
A review of neural architecture search
Abstract
Despite the impressive progress in neural network architecture design, improving the performance of the existing state-of-the-art models has become increasingly challenging. For this reason, the paradigm for neural architecture design is shifting ...
True Rank Guided Efficient Neural Architecture Search for End to End Low-Complexity Network Discovery
Computer Analysis of Images and Patterns
Abstract
Neural architecture search (NAS) aims to automate neural network design process and has shown promising results for image classification tasks. Owing to combinatorially huge neural network design spaces coupled with training cost of candidates, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '20: Proceedings of the 28th ACM International Conference on Multimedia

October 2020

4889 pages

ISBN:9781450379885

DOI:10.1145/3394171

General Chairs:
Chang Wen Chen
Chinese University of Hong Kong, Shenzhen, China
,
Rita Cucchiara
UNIMORE, Italy
,
Xian-Sheng Hua
Alibaba Group, China
,
Program Chairs:
Guo-Jun Qi
Futurewei Technologies, USA
,
Elisa Ricci
UNITN & Fondazione Bruno Kessler, Italy
,
Zhengyou Zhang
Tencent, China
,
Roger Zimmermann
National University of Singapore, Singapore

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Hangzhou Science and Technology Development Program
Key-Area Research and Development Program of Guangdong Province
National Natural Science Foundation of China and Guangdong Province Scientific Research on Big Data

Conference

MM '20

Sponsor:

SIGMM

MM '20: The 28th ACM International Conference on Multimedia

October 12 - 16, 2020

WA, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
208
Total Downloads

Downloads (Last 12 months)31
Downloads (Last 6 weeks)3

Reflects downloads up to 06 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ji ZNan KXu YWang HLi XBai J(2024)Global-Context-Aware Visual Odometry System With Epipolar-Geometry-Constrained Loss FunctionIEEE Transactions on Instrumentation and Measurement10.1109/TIM.2024.337080473(1-11)Online publication date: 2024
https://doi.org/10.1109/TIM.2024.3370804
Chen YYang MKim H(2023)Search for Efficient Deep Visual-Inertial Odometry Through Neural Architecture SearchICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP49357.2023.10095166(1-5)Online publication date: 4-Jun-2023
https://doi.org/10.1109/ICASSP49357.2023.10095166
Liu LLi GLi T(2021)ATVIO: Attention Guided Visual-Inertial OdometryICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP39728.2021.9413912(4125-4129)Online publication date: 6-Jun-2021
https://doi.org/10.1109/ICASSP39728.2021.9413912

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents