research-article

nn-METER: Towards Accurate Latency Prediction of DNN Inference on Diverse Edge Devices

Authors:

Yunxin LiuAuthors Info & Claims

GetMobile: Mobile Computing and Communications, Volume 25, Issue 4

Pages 19 - 23

https://doi.org/10.1145/3529706.3529712

Published: 30 March 2022 Publication History

Get Access

Abstract

Inference latency has become a crucial metric in running Deep Neural Network (DNN) models on various mobile and edge devices. To this end, latency prediction of DNN inference is highly desirable for many tasks where measuring the latency on real devices is infeasible or too costly. Yet it is very challenging and existing approaches fail to achieve a high accuracy of prediction, due to the varying model-inference latency caused by the runtime optimizations on diverse edge devices. In this paper, we propose and develop nn-Meter, a novel and efficient system to accurately predict the DNN inference latency on diverse edge devices. The key idea of nn-Meter is dividing a whole model inference into kernels, i.e., the execution units on a device, and conducting kernel-level prediction. nn-Meter builds atop two key techniques: (i) kernel detection to automatically detect the execution unit of model inference via a set of well-designed test cases; and (ii) adaptive sampling to efficiently sample the most beneficial configurations from a large space to build accurate kernel-level latency predictors. nn-Meter achieves significant high prediction accuracy on four types of edge devices.

References

[1]

Mengwei Xu, Jiawei Liu, Yuanqiang Liu, Felix Xiaozhu Lin, Yunxin Liu, and Xuanzhe Liu. 2019. A first look at deep learning apps on smartphones. The World Wide Web Conference (WWW).

Digital Library

Google Scholar

[2]

Song Han, Huizi Mao, and William J. Dally. 2016. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. International Conference on Learning Representations (ICLR).

Digital Library

Google Scholar

[3]

Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, Li-Jia Li, and Song Han. 2018. AMC: AutoML for model compression and acceleration on mobile devices. European Conference on Computer Vision (ECCV).

Crossref

Google Scholar

[4]

Han Cai, Ligeng Zhu, and Song Han. 2019. ProxylessNAS: Direct neural architecture search on target task and hardware. In International Conference on Learning Representations (ICLR).

Google Scholar

[5]

Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, and Kurt Keutzer. 2019. Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 10734--10742.

Crossref

Google Scholar

[6]

Zechun Liu, Haoyuan Mu, Xiangyu Zhang, Zichao Guo, Tim Kwang-Ting Cheng Xin Yang, and Jian Sun. 2019. MetaPruning: Meta learning for automatic neural architecture channel pruning. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).

Google Scholar

[7]

Lukasz Dudziak, Thomas Chau, Mohamed Abdelfattah, Royson Lee, Hyeji Kim, and Nicholas Lane. 2020. BRP-NAS: Predictionbased NAS using GCNs. Advances in Neural Information Processing Systems (Neurips).

Google Scholar

[8]

Xuanyi Dong and Yi Yang. 2020. NASBench- 201: Extending the scope of reproducible neural architecture search. In International Conference on Learning Representations (ICLR).

Google Scholar

Cited By

View all

Tuli SJha N(2023)EdgeTran: Device-Aware Co-Search of Transformers for Efficient Inference on Mobile Edge PlatformsIEEE Transactions on Mobile Computing10.1109/TMC.2023.332828723:6(7012-7029)Online publication date: 30-Oct-2023
https://dl.acm.org/doi/10.1109/TMC.2023.3328287

Recommendations

nn-Meter: towards accurate latency prediction of deep-learning model inference on diverse edge devices
MobiSys '21: Proceedings of the 19th Annual International Conference on Mobile Systems, Applications, and Services

With the recent trend of on-device deep learning, inference latency has become a crucial metric in running Deep Neural Network (DNN) models on various mobile and edge devices. To this end, latency prediction of DNN model inference is highly desirable ...
NN based ATM cell scheduling with queue length-based priority scheme

The asynchronous transfer mode (ATM) is the choice of transport mode for broadband integrated service digital networks (B-ISDNs). We propose a window-based contention resolution algorithm to achieve higher throughput for nonblocking switches in ATM ...
ADMM-NN: An Algorithm-Hardware Co-Design Framework of DNNs Using Alternating Direction Methods of Multipliers
ASPLOS '19: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems

Model compression is an important technique to facilitate efficient embedded and hardware implementations of deep neural networks (DNNs), a number of prior works are dedicated to model compression techniques. The target is to simultaneously reduce the ...

Comments

Information & Contributors

Information

Published In

cover image GetMobile: Mobile Computing and Communications

GetMobile: Mobile Computing and Communications Volume 25, Issue 4

December 2021

34 pages

ISSN:2375-0529

EISSN:2375-0537

DOI:10.1145/3529706

Editor:
Landon Cox
Microsoft Research

Issue’s Table of Contents

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 March 2022

Published in SIGMOBILE-GETMOBILE Volume 25, Issue 4

Check for updates

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
159
Total Downloads

Downloads (Last 12 months)42
Downloads (Last 6 weeks)5

Reflects downloads up to 24 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Tuli SJha N(2023)EdgeTran: Device-Aware Co-Search of Transformers for Efficient Inference on Mobile Edge PlatformsIEEE Transactions on Mobile Computing10.1109/TMC.2023.332828723:6(7012-7029)Online publication date: 30-Oct-2023
https://dl.acm.org/doi/10.1109/TMC.2023.3328287

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Recommendations

nn-Meter: towards accurate latency prediction of deep-learning model inference on diverse edge devices

NN based ATM cell scheduling with queue length-based priority scheme

ADMM-NN: An Algorithm-Hardware Co-Design Framework of DNNs Using Alternating Direction Methods of Multipliers

Comments

Published In

Publisher

Publication History

Check for updates

Qualifiers

Other Metrics

Article Metrics

Other Metrics

Cited By

Login options

Full Access

PDF

eReader

Abstract

References

Cited By

Recommendations

nn-Meter: towards accurate latency prediction of deep-learning model inference on diverse edge devices

NN based ATM cell scheduling with queue length-based priority scheme

ADMM-NN: An Algorithm-Hardware Co-Design Framework of DNNs Using Alternating Direction Methods of Multipliers

Comments

Information

Published In

Publisher

Publication History

Check for updates

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations