skip to main content

Adaptive deep learning model selection on embedded systems

Published: 19 June 2018 Publication History


The recent ground-breaking advances in deep learning networks (DNNs) make them attractive for embedded systems. However, it can take a long time for DNNs to make an inference on resource-limited embedded devices. Offloading the computation into the cloud is often infeasible due to privacy concerns, high latency, or the lack of connectivity. As such, there is a critical need to find a way to effectively execute the DNN models locally on the devices.
This paper presents an adaptive scheme to determine which DNN model to use for a given input, by considering the desired accuracy and inference time. Our approach employs machine learning to develop a predictive model to quickly select a pre-trained DNN to use for a given input and the optimization constraint. We achieve this by first training off-line a predictive model, and then use the learnt model to select a DNN model to use for new, unseen inputs. We apply our approach to the image classification task and evaluate it on a Jetson TX2 embedded deep learning platform using the ImageNet ILSVRC 2012 validation dataset. We consider a range of influential DNN models. Experimental results show that our approach achieves a 7.52% improvement in inference accuracy, and a 1.8x reduction in inference time over the most-capable single DNN model.


JJ Allaire, Dirk Eddelbuettel, Nick Golding, and Yuan Tang. 2016. TensorFlow for R.
Dario Amodei et al. 2016. Deep Speech 2: End-to-End Speech Recognition in English and Mandarin. In ICML ’16.
Dzmitry Bahdanau et al. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).
Sourav Bhattacharya and Nicholas D Lane. 2016. Sparsification and separation of deep learning layers for constrained resource inference on wearables. In Conference on Embedded Networked Sensor Systems.
Alfredo Canziani, Adam Paszke, and Eugenio Culurciello. 2016. An Analysis of Deep Neural Network Models for Practical Applications. CoRR (2016).
Shizhao Chen et al. 2018. Adaptive Optimization of Sparse MatrixVector Multiplication on Emerging Many-Core Architectures. In HPCC ’18.
Wenlin Chen et al. 2015. Compressing Neural Networks with the Hashing Trick. In ICML ’16.
Kyunghyun Cho et al. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In EMNLP ’14.
Chris Cummins et al. 2017. End-to-end Deep Learning of Optimization Heuristics. In PACT ’17.
Christina Delimitrou and Christos Kozyrakis. 2014. Quasar: Resourceefficient and QoS-aware Cluster Management. In ASPLOS ’14.
Jeff Donahue et al. 2014. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. In ICML ’14.
Murali Krishna Emani et al. 2013. Smart, adaptive mapping of parallelism in the presence of external workload. In CGO ’13.
Murali Krishna Emani and Michael O’Boyle. 2015. Celebrating Diversity: A Mixture of Experts Approach for Runtime Mapping in Dynamic Environments. In PLDI ’15.
Petko Georgiev et al. 2017. Low-resource Multi-task Audio Sensing for Mobile and Embedded Devices via Shared Deep Neural Network Representations. ACM Interact. Mob. Wearable Ubiquitous Technol. (2017).
Dominik Grewe et al. 2011. A workload-aware mapping approach for data-parallel programs. In HiPEAC ’11.
Dominik Grewe et al. 2013. OpenCL task partitioning in the presence of GPU contention. In LCPC ’13.
Dominik Grewe et al. 2013. Portable mapping of data parallel programs to OpenCL for heterogeneous systems. In CGO ’13.
Tian Guo. 2017. Towards Efficient Deep Inference for Mobile Applications. CoRR abs/1707.04610 (2017).
Song Han et al. 2015. Learning both weights and connections for efficient neural network. In NIPS ’15.
Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A Horowitz, and William J Dally. 2016. EIE: efficient inference engine on compressed deep neural network. In ISCA ’16.
M Hassaballah et al. 2016. Image features detection, description and matching. In Image Feature Detectors and Descriptors.
Kaiming He et al. 2016. Deep residual learning for image recognition. In CVPR ’16.
Kaiming He et al. 2016. Identity mappings in deep residual networks. In ECCV ’16.
Andrew G. Howard et al. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).
Loc N. Huynh et al. 2017. DeepMon: Mobile GPU-based Deep Learning Framework for Continuous Vision Applications. In MobiSys ’17.
Forrest N. Iandola et al. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size. CoRR abs/1602.07360 (2016).
Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML ’15.
Jonghoon Jin, Aysegul Dundar, and Eugenio Culurciello. 2015. Flattened Convolutional Neural Networks for Feedforward Acceleration. (2015).
Yiping Kang et al. 2017. Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge. In ASPLOS ’17.
Aaron Klein et al. 2016. Fast bayesian optimization of machine learning hyperparameters on large datasets. arXiv preprint arXiv:1605.07079 (2016).
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In NIPS ’12.
Nicholas D Lane, Sourav Bhattacharya, Petko Georgiev, Claudio Forlivesi, Lei Jiao, Lorena Qendro, and Fahim Kawsar. 2016. DeepX: A software accelerator for low-power deep learning inference on mobile devices. In IPSN ’16.
Seyyed Salar Latifi Oskouei et al. 2016. Cnndroid: GPU-accelerated execution of trained deep convolutional neural networks on android. In Multimedia Conference.
Honglak Lee et al. 2009. Unsupervised Feature Learning for Audio Classification Using Convolutional Deep Belief Networks. In NIPS ’09.
Vicent Sanz Marco et al. 2017. Improving Spark Application Throughput via Memory Aware Task Co-location: A Mixture of Experts Approach. In Middleware ’17.
Mohammad Motamedi et al. 2017. Machine Intelligence on ResourceConstrained IoT Devices: The Case of Thread Granularity Optimization for CNN Inference. ACM Trans. Embed. Comput. Syst. (2017).
William F Ogilvie et al. 2014. Fast automatic heuristic construction using active learning. In LCPC ’14.
William F Ogilvie et al. 2017. Minimizing the cost of iterative compilation with active learning. In CGO ’17.
Seyed Ali Ossia, Ali Shahin Shamsabadi, Ali Taheri, Hamid R Rabiee, Nic Lane, and Hamed Haddadi. 2017. A Hybrid Deep Learning Architecture for Privacy-Preserving Mobile Analytics. arXiv preprint arXiv:1703.02952 (2017).
Omkar M Parkhi et al. 2015. Deep Face Recognition. In BMVC ’15.
Sundari K. Rallapalli et al. 2016. Are Very Deep Neural Networks Feasible on Mobile Devices? Technical Report. University of Southern California.
Mohammad Rastegari et al. 2016. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks. CoRR abs/1603.05279 (2016).
Sujith Ravi. 2015. ProjectionNet: Learning Efficient On-Device Deep Networks Using Neural Projections. arXiv:1708.00630 (2015).
Jie Ren et al. 2017. Optimise web browsing on heterogeneous mobile platforms: a machine learning based approach. In INFOCOM ’17.
Sandra Servia Rodríguez et al. 2017. Personal Model Training under Privacy Constraints. CoRR abs/1703.00380 (2017).
Olga Russakovsky et al. 2015. ImageNet Large Scale Visual Recognition Challenge. In IJCV ’15.
Faiza Samreen et al. 2016. Daleel: Simplifying Cloud Instance Selection Using Machine Learning. In NOMS ’16.
Nathan Silberman and Sergio Guadarrama. 2013. TensorFlow-slim image classification library. (2013).
Mingcong Song, Yang Hu, Huixiang Chen, and Tao Li. 2017. Towards Pervasive and User Satisfactory CNN across GPU Microarchitectures. In HPCA ’17.
Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin A. Riedmiller. 2014. Striving for Simplicity: The All Convolutional Net. CoRR abs/1412.6806 (2014).
Yi Sun, Yuheng Chen, et al. 2014. Deep learning face representation by joint identification-verification. In NIPS ’14.
Ben Taylor et al. 2017. Adaptive optimization for OpenCL programs on embedded heterogeneous systems. In LCTES ’17.
Surat Teerapittayanon et al. 2017. Distributed deep neural networks over the cloud, the edge and end devices. In ICDCS ’17.
Georgios Tournavitis et al. 2009. Towards a Holistic Approach to Auto-parallelization: Integrating Profile-driven Parallelism Detection and Machine-learning Based Mapping. In PLDI ’09.
Zheng Wang et al. 2014. Automatic and Portable Mapping of Data Parallel Programs to OpenCL for GPU-Based Heterogeneous Systems. ACM TACO (2014).
Zheng Wang et al. 2014. Integrating profile-driven parallelism detection and machine-learning-based mapping. ACM TACO (2014).
Zheng Wang and Michael O’Boyle. 2018. Machine Learning in Compiler Optimisation. Proc. IEEE (2018).
Zheng Wang and Michael F.P. O’Boyle. 2009. Mapping Parallelism to Multi-cores: A Machine Learning Based Approach. In PPoPP ’09.
Zheng Wang and Michael FP O’Boyle. 2010. Partitioning streaming parallelism for multi-cores: a machine learning based approach. In PACT ’10.
Zheng Wang and Michael FP O’boyle. 2013. Using machine learning to partition streaming programs. ACM TACO (2013).
Peng Zhang, et al. 2018. Auto-tuning Streamed Applications on Intel Xeon Phi. In IPDPS ’18.
Will Y Zou et al. 2013. Bilingual word embeddings for phrase-based machine translation. In EMNLP ’13.

Cited By

View all



Information & Contributors


Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 53, Issue 6
June 2018
112 pages
Issue’s Table of Contents
  • cover image ACM Conferences
    LCTES 2018: Proceedings of the 19th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems
    June 2018
    112 pages
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]


Association for Computing Machinery

New York, NY, United States

Publication Notes

Badge change: Article originally badged under Version 1.0 guidelines

Publication History

Published: 19 June 2018
Published in SIGPLAN Volume 53, Issue 6

Check for updates


Author Tags

  1. Adaptive computing
  2. Deep learning
  3. Embedded systems


  • Article

Funding Sources


Other Metrics

Bibliometrics & Citations


Article Metrics

  • Downloads (Last 12 months)176
  • Downloads (Last 6 weeks)26
Reflects downloads up to 16 Jan 2025

Other Metrics


Cited By

View all

View Options

Login options

View options


View or Download as a PDF file.



View online with eReader.








Share this Publication link

Share on social media