article

Adaptive deep learning model selection on embedded systems

Authors:

Vicent Sanz Marco,

Yehia Elkhatib,

Zheng WangAuthors Info & Claims

ACM SIGPLAN Notices, Volume 53, Issue 6

Pages 31 - 43

https://doi.org/10.1145/3299710.3211336

Published: 19 June 2018 Publication History

Abstract

The recent ground-breaking advances in deep learning networks (DNNs) make them attractive for embedded systems. However, it can take a long time for DNNs to make an inference on resource-limited embedded devices. Offloading the computation into the cloud is often infeasible due to privacy concerns, high latency, or the lack of connectivity. As such, there is a critical need to find a way to effectively execute the DNN models locally on the devices.

This paper presents an adaptive scheme to determine which DNN model to use for a given input, by considering the desired accuracy and inference time. Our approach employs machine learning to develop a predictive model to quickly select a pre-trained DNN to use for a given input and the optimization constraint. We achieve this by first training off-line a predictive model, and then use the learnt model to select a DNN model to use for new, unseen inputs. We apply our approach to the image classification task and evaluate it on a Jetson TX2 embedded deep learning platform using the ImageNet ILSVRC 2012 validation dataset. We consider a range of influential DNN models. Experimental results show that our approach achieves a 7.52% improvement in inference accuracy, and a 1.8x reduction in inference time over the most-capable single DNN model.

References

[1]

JJ Allaire, Dirk Eddelbuettel, Nick Golding, and Yuan Tang. 2016. TensorFlow for R. https://tensorflow.rstudio.com/

[2]

Dario Amodei et al. 2016. Deep Speech 2: End-to-End Speech Recognition in English and Mandarin. In ICML ’16.

Digital Library

[3]

Dzmitry Bahdanau et al. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).

[4]

Sourav Bhattacharya and Nicholas D Lane. 2016. Sparsification and separation of deep learning layers for constrained resource inference on wearables. In Conference on Embedded Networked Sensor Systems.

Digital Library

[5]

Alfredo Canziani, Adam Paszke, and Eugenio Culurciello. 2016. An Analysis of Deep Neural Network Models for Practical Applications. CoRR (2016).

[6]

Shizhao Chen et al. 2018. Adaptive Optimization of Sparse MatrixVector Multiplication on Emerging Many-Core Architectures. In HPCC ’18.

[7]

Wenlin Chen et al. 2015. Compressing Neural Networks with the Hashing Trick. In ICML ’16.

Digital Library

[8]

Kyunghyun Cho et al. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In EMNLP ’14.

[9]

Chris Cummins et al. 2017. End-to-end Deep Learning of Optimization Heuristics. In PACT ’17.

[10]

Christina Delimitrou and Christos Kozyrakis. 2014. Quasar: Resourceefficient and QoS-aware Cluster Management. In ASPLOS ’14.

Digital Library

[11]

Jeff Donahue et al. 2014. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. In ICML ’14.

Digital Library

[12]

Murali Krishna Emani et al. 2013. Smart, adaptive mapping of parallelism in the presence of external workload. In CGO ’13.

[13]

Murali Krishna Emani and Michael O’Boyle. 2015. Celebrating Diversity: A Mixture of Experts Approach for Runtime Mapping in Dynamic Environments. In PLDI ’15.

Digital Library

[14]

Petko Georgiev et al. 2017. Low-resource Multi-task Audio Sensing for Mobile and Embedded Devices via Shared Deep Neural Network Representations. ACM Interact. Mob. Wearable Ubiquitous Technol. (2017).

Digital Library

[15]

Dominik Grewe et al. 2011. A workload-aware mapping approach for data-parallel programs. In HiPEAC ’11.

Digital Library

[16]

Dominik Grewe et al. 2013. OpenCL task partitioning in the presence of GPU contention. In LCPC ’13.

[17]

Dominik Grewe et al. 2013. Portable mapping of data parallel programs to OpenCL for heterogeneous systems. In CGO ’13.

[18]

Tian Guo. 2017. Towards Efficient Deep Inference for Mobile Applications. CoRR abs/1707.04610 (2017).

[19]

Song Han et al. 2015. Learning both weights and connections for efficient neural network. In NIPS ’15.

Digital Library

[20]

Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A Horowitz, and William J Dally. 2016. EIE: efficient inference engine on compressed deep neural network. In ISCA ’16.

Digital Library

[21]

M Hassaballah et al. 2016. Image features detection, description and matching. In Image Feature Detectors and Descriptors.

[22]

Kaiming He et al. 2016. Deep residual learning for image recognition. In CVPR ’16.

[23]

Kaiming He et al. 2016. Identity mappings in deep residual networks. In ECCV ’16.

[24]

Andrew G. Howard et al. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).

[25]

Loc N. Huynh et al. 2017. DeepMon: Mobile GPU-based Deep Learning Framework for Continuous Vision Applications. In MobiSys ’17.

Digital Library

[26]

Forrest N. Iandola et al. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size. CoRR abs/1602.07360 (2016).

[27]

Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML ’15.

Digital Library

[28]

Jonghoon Jin, Aysegul Dundar, and Eugenio Culurciello. 2015. Flattened Convolutional Neural Networks for Feedforward Acceleration. (2015).

[29]

Yiping Kang et al. 2017. Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge. In ASPLOS ’17.

Digital Library

[30]

Aaron Klein et al. 2016. Fast bayesian optimization of machine learning hyperparameters on large datasets. arXiv preprint arXiv:1605.07079 (2016).

[31]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In NIPS ’12.

Digital Library

[32]

Nicholas D Lane, Sourav Bhattacharya, Petko Georgiev, Claudio Forlivesi, Lei Jiao, Lorena Qendro, and Fahim Kawsar. 2016. DeepX: A software accelerator for low-power deep learning inference on mobile devices. In IPSN ’16.

Digital Library

[33]

Seyyed Salar Latifi Oskouei et al. 2016. Cnndroid: GPU-accelerated execution of trained deep convolutional neural networks on android. In Multimedia Conference.

Digital Library

[34]

Honglak Lee et al. 2009. Unsupervised Feature Learning for Audio Classification Using Convolutional Deep Belief Networks. In NIPS ’09.

Digital Library

[35]

Vicent Sanz Marco et al. 2017. Improving Spark Application Throughput via Memory Aware Task Co-location: A Mixture of Experts Approach. In Middleware ’17.

[36]

Mohammad Motamedi et al. 2017. Machine Intelligence on ResourceConstrained IoT Devices: The Case of Thread Granularity Optimization for CNN Inference. ACM Trans. Embed. Comput. Syst. (2017).

Digital Library

[37]

William F Ogilvie et al. 2014. Fast automatic heuristic construction using active learning. In LCPC ’14.

[38]

William F Ogilvie et al. 2017. Minimizing the cost of iterative compilation with active learning. In CGO ’17.

Digital Library

[39]

Seyed Ali Ossia, Ali Shahin Shamsabadi, Ali Taheri, Hamid R Rabiee, Nic Lane, and Hamed Haddadi. 2017. A Hybrid Deep Learning Architecture for Privacy-Preserving Mobile Analytics. arXiv preprint arXiv:1703.02952 (2017).

[40]

Omkar M Parkhi et al. 2015. Deep Face Recognition. In BMVC ’15.

[41]

Sundari K. Rallapalli et al. 2016. Are Very Deep Neural Networks Feasible on Mobile Devices? Technical Report. University of Southern California.

[42]

Mohammad Rastegari et al. 2016. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks. CoRR abs/1603.05279 (2016).

[43]

Sujith Ravi. 2015. ProjectionNet: Learning Efficient On-Device Deep Networks Using Neural Projections. arXiv:1708.00630 (2015).

[44]

Jie Ren et al. 2017. Optimise web browsing on heterogeneous mobile platforms: a machine learning based approach. In INFOCOM ’17.

[45]

Sandra Servia Rodríguez et al. 2017. Personal Model Training under Privacy Constraints. CoRR abs/1703.00380 (2017).

[46]

Olga Russakovsky et al. 2015. ImageNet Large Scale Visual Recognition Challenge. In IJCV ’15.

Digital Library

[47]

Faiza Samreen et al. 2016. Daleel: Simplifying Cloud Instance Selection Using Machine Learning. In NOMS ’16.

[48]

Nathan Silberman and Sergio Guadarrama. 2013. TensorFlow-slim image classification library. https://github.com/tensorflow/models/tree/master/research/slim. (2013).

[49]

Mingcong Song, Yang Hu, Huixiang Chen, and Tao Li. 2017. Towards Pervasive and User Satisfactory CNN across GPU Microarchitectures. In HPCA ’17.

[50]

Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin A. Riedmiller. 2014. Striving for Simplicity: The All Convolutional Net. CoRR abs/1412.6806 (2014).

[51]

Yi Sun, Yuheng Chen, et al. 2014. Deep learning face representation by joint identification-verification. In NIPS ’14.

Digital Library

[52]

Ben Taylor et al. 2017. Adaptive optimization for OpenCL programs on embedded heterogeneous systems. In LCTES ’17.

Digital Library

[53]

Surat Teerapittayanon et al. 2017. Distributed deep neural networks over the cloud, the edge and end devices. In ICDCS ’17.

[54]

Georgios Tournavitis et al. 2009. Towards a Holistic Approach to Auto-parallelization: Integrating Profile-driven Parallelism Detection and Machine-learning Based Mapping. In PLDI ’09.

Digital Library

[55]

Zheng Wang et al. 2014. Automatic and Portable Mapping of Data Parallel Programs to OpenCL for GPU-Based Heterogeneous Systems. ACM TACO (2014).

Digital Library

[56]

Zheng Wang et al. 2014. Integrating profile-driven parallelism detection and machine-learning-based mapping. ACM TACO (2014).

Digital Library

[57]

Zheng Wang and Michael O’Boyle. 2018. Machine Learning in Compiler Optimisation. Proc. IEEE (2018).

[58]

Zheng Wang and Michael F.P. O’Boyle. 2009. Mapping Parallelism to Multi-cores: A Machine Learning Based Approach. In PPoPP ’09.

Digital Library

[59]

Zheng Wang and Michael FP O’Boyle. 2010. Partitioning streaming parallelism for multi-cores: a machine learning based approach. In PACT ’10.

Digital Library

[60]

Zheng Wang and Michael FP O’boyle. 2013. Using machine learning to partition streaming programs. ACM TACO (2013).

Digital Library

[61]

Peng Zhang, et al. 2018. Auto-tuning Streamed Applications on Intel Xeon Phi. In IPDPS ’18.

[62]

Will Y Zou et al. 2013. Bilingual word embeddings for phrase-based machine translation. In EMNLP ’13.

Cited By

Huang JLiu FZhang J(2024)Multi-Dimensional QoS Evaluation and Optimization of Mobile Edge Computing for IoT: A SurveyChinese Journal of Electronics10.23919/cje.2023.00.26433:4(859-874)Online publication date: Jul-2024
https://doi.org/10.23919/cje.2023.00.264
Sponner MWaschneck BKumar A(2024)Adapting Neural Networks at Runtime: Current Trends in At-Runtime Optimizations for Deep LearningACM Computing Surveys10.1145/365728356:10(1-40)Online publication date: 14-May-2024
https://dl.acm.org/doi/10.1145/3657283
Moothedath VChampati JGross J(2024)Getting the Best Out of Both Worlds: Algorithms for Hierarchical Inference at the EdgeIEEE Transactions on Machine Learning in Communications and Networking10.1109/TMLCN.2024.33665012(280-297)Online publication date: 2024
https://doi.org/10.1109/TMLCN.2024.3366501
Show More Cited By

Index Terms

Adaptive deep learning model selection on embedded systems
1. Computer systems organization
  1. Embedded and cyber-physical systems
    1. Embedded systems
      1. Embedded software
2. Computing methodologies
  1. Parallel computing methodologies

Recommendations

Optimizing Deep Learning Inference on Embedded Systems Through Adaptive Model Selection

Deep neural networks (DNNs) are becoming a key enabling technique for many application domains. However, on-device inference on battery-powered, resource-constrained embedding systems is often infeasible due to prohibitively long inferencing time and ...
Adaptive deep learning model selection on embedded systems
LCTES 2018: Proceedings of the 19th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems

The recent ground-breaking advances in deep learning networks (DNNs) make them attractive for embedded systems. However, it can take a long time for DNNs to make an inference on resource-limited embedded devices. Offloading the computation into the ...
Moving convolutional neural networks to embedded systems: the alexnet and VGG-16 case
IPSN '18: Proceedings of the 17th ACM/IEEE International Conference on Information Processing in Sensor Networks

Execution of deep learning solutions is mostly restricted to high performing computing platforms, e.g., those endowed with GPUs or FPGAs, due to the high demand on computation and memory such solutions require. Despite the fact that dedicated hardware ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices

ACM SIGPLAN Notices Volume 53, Issue 6

LCTES '18

June 2018

112 pages

ISSN:0362-1340

EISSN:1558-1160

DOI:10.1145/3299710

Editor:
Matthew Fluet
Rodchester Institude of Technology

Issue’s Table of Contents

LCTES 2018: Proceedings of the 19th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems
June 2018
112 pages
ISBN:9781450358033
DOI:10.1145/3211332
General Chair:
Zheng Zhang
Rutgers University, USA
,
Program Chair:
Christophe Dubach
University of Edinburgh, UK

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication Notes

Badge change: Article originally badged under Version 1.0 guidelines https://www.acm.org/publications/policies/artifact-review-badging

Publication History

Published: 19 June 2018

Published in SIGPLAN Volume 53, Issue 6

Check for updates

Badges

Author Tags

Qualifiers

Article

Funding Sources

Engineering and Physical Sciences Research Council

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

120
Total Citations
View Citations
2,101
Total Downloads

Downloads (Last 12 months)176
Downloads (Last 6 weeks)26

Reflects downloads up to 16 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Huang JLiu FZhang J(2024)Multi-Dimensional QoS Evaluation and Optimization of Mobile Edge Computing for IoT: A SurveyChinese Journal of Electronics10.23919/cje.2023.00.26433:4(859-874)Online publication date: Jul-2024
https://doi.org/10.23919/cje.2023.00.264
Sponner MWaschneck BKumar A(2024)Adapting Neural Networks at Runtime: Current Trends in At-Runtime Optimizations for Deep LearningACM Computing Surveys10.1145/365728356:10(1-40)Online publication date: 14-May-2024
https://dl.acm.org/doi/10.1145/3657283
Moothedath VChampati JGross J(2024)Getting the Best Out of Both Worlds: Algorithms for Hierarchical Inference at the EdgeIEEE Transactions on Machine Learning in Communications and Networking10.1109/TMLCN.2024.33665012(280-297)Online publication date: 2024
https://doi.org/10.1109/TMLCN.2024.3366501
Ma XHe SQiao HMa D(2024)DiTMoS: Delving into Diverse Tiny-Model Selection on Microcontrollers2024 IEEE International Conference on Pervasive Computing and Communications (PerCom)10.1109/PerCom59722.2024.10494422(69-79)Online publication date: 11-Mar-2024
https://doi.org/10.1109/PerCom59722.2024.10494422
Sun CZhao LCui TLi HBai YWu STong Q(2024)AI Model Selection and Monitoring for Beam Management in 5G-AdvancedIEEE Open Journal of the Communications Society10.1109/OJCOMS.2023.33378505(38-50)Online publication date: 2024
https://doi.org/10.1109/OJCOMS.2023.3337850
Zhou JWang XLu YWu QDing XLiu Y(2024)An Edge-Detection Method for Capsule Defect on Embedded PlatformIEEE Sensors Journal10.1109/JSEN.2024.346842924:22(37445-37454)Online publication date: 15-Nov-2024
https://doi.org/10.1109/JSEN.2024.3468429
Papaioannou DMygdalis VPitas I(2024)Forest Fire Image Classification Through Decentralized DNN Inference2024 IEEE International Conference on Image Processing Challenges and Workshops (ICIPCW)10.1109/ICIPCW64161.2024.10769107(4134-4140)Online publication date: 27-Oct-2024
https://doi.org/10.1109/ICIPCW64161.2024.10769107
Sun CCui TZhang WBai YWang SLi H(2024)On the Combination of AI and Wireless Technologies: 3GPP Standardization Progress2024 IEEE/CIC International Conference on Communications in China (ICCC Workshops)10.1109/ICCCWorkshops62562.2024.10693720(523-528)Online publication date: 7-Aug-2024
https://doi.org/10.1109/ICCCWorkshops62562.2024.10693720
Dash YKumar NAbraham A(2024)Edge of Innovation: Exploring the Potential of Edge Computing and Content Delivery Networks2024 IEEE International Conference on Computing, Power and Communication Technologies (IC2PCT)10.1109/IC2PCT60090.2024.10486411(1228-1233)Online publication date: 9-Feb-2024
https://doi.org/10.1109/IC2PCT60090.2024.10486411
Ergen MSaoud BShayea IEl-Saleh AErgen OInan FTuysuz M(2024)Edge computing in future wireless networks: A comprehensive evaluation and vision for 6G and beyondICT Express10.1016/j.icte.2024.08.00710:5(1151-1173)Online publication date: Oct-2024
https://doi.org/10.1016/j.icte.2024.08.007
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents