skip to main content
10.1145/3564121.3564796acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaimlsystemsConference Proceedingsconference-collections
research-article

Automated Deep Learning Model Partitioning for Heterogeneous Edge Devices

Published: 16 May 2023 Publication History

Abstract

Deep Neural Networks (DNN) have made machine learning accessible to a wide set of practitioners working with field deployment of analytics algorithms over sensor data. Along with it, focus on data privacy, low latency inference, and sustainability has highlighted the need for efficient in-situ analytics close to sensors, at the edge of the network, which is challenging given the constrained nature of the edge platforms, including Common Off-the-Shelf (COTS) AI accelerators. Efficient DNN model partitioning across multiple edge nodes is a well-studied approach, but no definitive characterization exists as to why there is a performance improvement due to DNN model partitioning, and whether the benefits hold for currently used edge hardware & state-of-the-art DNN models. In this paper, we present a detailed study and analyses to address the above-mentioned shortcomings and propose a framework that automatically determines the best partitioning scheme and enhances system efficiency.

References

[1]
Shohin Aheleroff, Xun Xu, Yuqian Lu, Mauricio Aristizabal, Juan Pablo Velásquez, Benjamin Joa, and Yesid Valencia. 2020. IoT-enabled smart appliances under industry 4.0: A case study. Advanced Engineering Informatics 43 (2020), 101043.
[2]
Ahmed Abdelmoamen Ahmed and Mathias Echi. 2021. Hawk-Eye: An AI-Powered Threat Detector for Intelligent Surveillance Cameras. IEEE Access 9(2021), 63283–63293.
[3]
Byung Hoon Ahn, Jinwon Lee, Jamie Menjay Lin, Hsin-Pai Cheng, Jilei Hou, and Hadi Esmaeilzadeh. 2020. Ordering Chaos: Memory-Aware Scheduling of Irregularly Wired Neural Networks for Edge Devices. In Proceedings of Machine Learning and Systems, I. Dhillon, D. Papailiopoulos, and V. Sze (Eds.). Vol. 2. mlsys.org, USA, 44–57. https://proceedings.mlsys.org/paper/2020/file/9bf31c7ff062936a96d3c8bd1f8f2ff3-Paper.pdf
[4]
Rosa Andrie Asmara, Bimo Syahputro, Dodit Supriyanto, and Anik Nur Handayani. 2020. Prediction of traffic density using yolo object detection and implemented in raspberry pi 3b+ and intel ncs 2. In 2020 4th International Conference on Vocational Education and Training (ICOVET). IEEE, NJ, 391–395.
[5]
MANEESH AYI. 2020. RMNv2: Reduced Mobilenet V2 An Efficient Lightweight Model for Hardware Deployment. Ph.D. Dissertation. Purdue University. https://hammer.purdue.edu/articles/thesis/RMNv2_Reduced_Mobilenet_V2_An_Efficient_Lightweight_Model_for_Hardware_Deployment/12156771
[6]
Maneesh Ayi and Mohamed El-Sharkawy. 2020. Real-time Implementation of RMNv2 Classifier in NXP Bluebox 2.0 and NXP i. MX RT1060. In 2020 IEEE Midwest Industry Conference (MIC), Vol. 1. IEEE, NJ, 1–4.
[7]
Yixin Bao, Yanghua Peng, Yangrui Chen, and Chuan Wu. 2020. Preemptive All-reduce Scheduling for Expediting Distributed DNN Training. In IEEE INFOCOM 2020 - IEEE Conference on Computer Communications. IEEE, NJ, 626–635.
[8]
Maurizio Capra, Riccardo Peloso, Guido Masera, Massimo Ruo Roch, and Maurizio Martina. 2019. Edge computing: A survey on the hardware requirements in the internet of things world. Future Internet 11, 4 (2019), 100.
[9]
Chunlei Chen, Peng Zhang, Huixiang Zhang, Jiangyan Dai, Yugen Yi, Huihui Zhang, and Yonghui Zhang. 2020. Deep Learning on Computational-Resource-Limited Platforms: A Survey. Mob. Inf. Syst. 2020(2020), 8454327:1–8454327:19.
[10]
Mateusz Chmurski, Mariusz Zubert, Kay Bierzynski, and Avik Santra. 2021. Analysis of Edge-Optimized Deep Learning Classifiers for Radar-Based Gesture Recognition. IEEE Access 9(2021), 74406–74421.
[11]
Yoojin Choi, Mostafa El-Khamy, and Jungwon Lee. 2016. Towards the Limit of Network Quantization. CoRR abs/1612.01543(2016). http://arxiv.org/abs/1612.01543
[12]
Giulia Crocioni, Giambattista Gruosso, Danilo Pau, Davide Denaro, Luigi Zambrano, and Giuseppe di Giore. 2021. Characterization of Neural Networks Automatically Mapped on Automotive-grade Microcontrollers. CoRR abs/2103.00201(2021). https://arxiv.org/abs/2103.00201
[13]
Robert David, Jared Duke, Advait Jain, Vijay Janapa Reddi, Nat Jeffries, Jian Li, Nick Kreeger, Ian Nappier, Meghna Natraj, Tiezhen Wang, Pete Warden, and Rocky Rhodes. 2021. TensorFlow Lite Micro: Embedded Machine Learning for TinyML Systems. In Proceedings of Machine Learning and Systems, A. Smola, A. Dimakis, and I. Stoica (Eds.). Vol. 3. mlsys.org, USA, 800–811. https://proceedings.mlsys.org/paper/2021/file/d2ddea18f00665ce8623e36bd4e3c7c5-Paper.pdf
[14]
Paul Dempsey. 2018. The teardown Samsung Note9. Engineering & Technology 13, 10 (2018), 82–83.
[15]
Bradley Denby and Brandon Lucia. 2020. Orbital Edge Computing: Nanosatellite Constellations as a New Class of Computer System. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (Lausanne, Switzerland) (ASPLOS ’20). Association for Computing Machinery, New York, NY, USA, 939–954.
[16]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, USA, 248–255.
[17]
Swarnava Dey and Ranjan Dasgupta. 2009. Fast Boot User Experience Using Adaptive Storage Partitioning. In 2009 Computation World: Future Computing, Service Computation, Cognitive, Adaptive, Content, Patterns. 113–118. https://doi.org/10.1109/ComputationWorld.2009.121
[18]
S. Dey, J. Mondal, and A. Mukherjee. 2019. Offloaded Execution of Deep Learning Inference at Edge: Challenges and Insights. In 2019 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops). 855–861.
[19]
Swarnava Dey and Arijit Mukherjee. 2018. Implementing Deep Learning and Inferencing on Fog and Edge Computing Systems. 2018 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops)(2018), 818–823.
[20]
S. Dey and A. Mukherjee. 2018. Implementing Deep Learning and Inferencing on Fog and Edge Computing Systems. In 2018 IEEE International Conference on Pervasive Computing and Communications Workshops. IEEE, NJ, 818–823.
[21]
Swarnava Dey, Arijit Mukherjee, Arpan Pal, and P. Balamuralidhar. 2018. Partitioning of CNN Models for Execution on Fog Devices. In Proceedings of the 1st ACM International Workshop on Smart Cities and Fog Computing (Shenzhen, China) (CitiFog’18). ACM, New York, NY, USA, 19–24.
[22]
Somdip Dey, Suman Saha, Amit Singh, and Klaus McDonald-Maier. 2020. FruitVegCNN: Power-and Memory-Efficient Classification of Fruits & Vegetables Using CNN in Mobile MPSoC. In 2020 IEEE 17th India Council International Conference (INDICON). IEEE, NJ, 1–7.
[23]
Somdip Dey, Amit Kumar Singh, Xiaohang Wang, and Klaus McDonald-Maier. 2020. User interaction aware reinforcement learning for power and thermal efficiency of CPU-GPU mobile MPSoCs. In 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, NJ, 1728–1733.
[24]
Jie Ding, Mahyar Nemati, Chathurika Ranaweera, and Jinho Choi. 2020. IoT Connectivity Technologies and Applications: A Survey. IEEE Access 8(2020), 67646–67673.
[25]
Banbury et al. 2021. MLPerf Tiny Benchmark. CoRR abs/2106.07597(2021). arXiv:2106.07597https://arxiv.org/abs/2106.07597
[26]
Dustin Franklin. 2018. NVIDIA Jetson AGX Xavier Delivers 32 TeraOps for New Era of AI in Robotics. Retrieved March 23, 2022 from https://developer.nvidia.com/blog/nvidia-jetson-agx-xavier-32-teraops-ai-robotics
[27]
Chang Gao, Antonio Rios-Navarro, Xi Chen, Tobi Delbruck, and Shih-Chii Liu. 2020. Edgedrnn: Enabling low-latency recurrent neural network edge inference. In 2020 2nd IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS). IEEE, NJ, 41–45.
[28]
Song Han, Huizi Mao, and William J. Dally. 2016. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). ICLR, CA.
[29]
Pierre Hansen and Keh-Wei Lih. 1992. Improved Algorithms for Partitioning Problems in Parallel, Pipelined, and Distributed Computing. IEEE Trans. Comput. 41, 6 (June 1992), 769–771.
[30]
Andrey Ignatov, Cheng-Ming Chiang, and Hsien-Kai et. al. Kuo. 2021. Learned Smartphone ISP on Mobile NPUs with Deep Learning, Mobile AI 2021 Challenge: Report. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, NJ, 2503–2514.
[31]
Hyuk-Jin Jeong, Hyeon-Jae Lee, Chang Hyun Shin, and Soo-Mook Moon. 2018. IONN: Incremental Offloading of Neural Network Computations from Mobile Devices to Edge Servers. In Proceedings of the ACM Symposium on Cloud Computing (Carlsbad, CA, USA) (SoCC ’18). Association for Computing Machinery, New York, NY, USA, 401–411. https://doi.org/10.1145/3267809.3267828
[32]
Yiping Kang, Johann Hauswald, Cao Gao, Austin Rovinski, Trevor Mudge, Jason Mars, and Lingjia Tang. 2017. Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (Xi’an, China) (ASPLOS ’17). ACM, New York, NY, USA, 615–629.
[33]
Jong Hwan Ko, Taesik Na, Mohammad Faisal Amir, and S. Mukhopadhyay. 2018. Edge-Host Partitioning of Deep Neural Networks with Feature Space Encoding for Resource-Constrained Internet-of-Things Platforms. In 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, NJ, 1–6.
[34]
Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. Technical Report 0. University of Toronto, Toronto, Ontario.
[35]
Yen-Lin Lee, Pei-Kuei Tsung, and Max Wu. 2018. Technology trend of edge AI. In 2018 International Symposium on VLSI Design, Automation and Test (VLSI-DAT). IEEE, NJ, 1–2.
[36]
Chien-Hung Lin, Chih-Chung Cheng, Yi-Min Tsai, Sheng-Je Hung, Yu-Ting Kuo, Perry H Wang, Pei-Kuei Tsung, Jeng-Yun Hsu, Wei-Chih Lai, Chia-Hung Liu, 2020. 7.1 a 3.4-to-13.3 tops/w 3.6 tops dual-core deep-learning accelerator for versatile ai applications in 7nm 5g smartphone soc. In 2020 IEEE International Solid-State Circuits Conference-(ISSCC). IEEE, NJ, 134–136.
[37]
Ji Lin, Wei-Ming Chen, Yujun Lin, John Cohn, Chuang Gan, and Song Han. 2020. MCUNet: Tiny Deep Learning on IoT Devices. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.). NeurIPS, Canada.
[38]
Jiachen Mao, Zhongda Yang, Wei Wen, Chunpeng Wu, Linghao Song, Kent W. Nixon, Xiang Chen, Hai Li, and Yiran Chen. 2017. MeDNN: A Distributed Mobile System with Enhanced Partition and Deployment for Large-Scale DNNs. In Proceedings of the 36th International Conference on Computer-Aided Design(ICCAD ’17). IEEE Press, Irvine, California, 751–756.
[39]
Fabíola Martins Campos de Oliveira and Edson Borin. 2019. Partitioning Convolutional Neural Networks to Maximize the Inference Rate on Constrained IoT Devices. Future Internet 11, 10 (2019). https://doi.org/10.3390/fi11100209
[40]
Mirgahney Mohamed, Gabriele Cesa, Taco S. Cohen, and Max Welling. 2020. A Data and Compute Efficient Design for Limited-Resources Deep Learning. CoRR abs/2004.09691(2020). https://arxiv.org/abs/2004.09691
[41]
Arijit Mukherjee, Jayeeta Mondal, and Swarnava Dey. 2022. Accelerated Fire Detection and Localization at Edge. ACM Trans. Embed. Comput. Syst. (dec 2022). https://doi.org/10.1145/3510027 Just Accepted.
[42]
Qualcomm Developer Network. 2021. Qualcomm Neural Processing SDK for AI. Retrieved July 14, 2021 from https://developer.qualcomm.com/software/qualcomm-neural-processing-sdk
[43]
Samsung Newsroom. 2018. Samsung Optimizes Premium Exynos 9 Series 9810 for AI Applications and Richer Multimedia Content. Retrieved July 14, 2021 from https://news.samsung.com/global/samsung-optimizes-premium-exynos-9-series-9810-for-ai-applications-and-richer-multimedia-content
[44]
Huy-Hung Nguyen, Duong Nguyen-Ngoc Tran, and Jae Wook Jeon. 2020. Towards Real-Time Vehicle Detection on Edge Devices with Nvidia Jetson TX2. In 2020 IEEE International Conference on Consumer Electronics - Asia (ICCE-Asia). IEEE, NJ, 1–4.
[45]
Arpan Pal, Arijit Mukherjee, and Swarnava Dey. 2016. Future of Healthcare—Sensor Data-Driven Prognosis. Springer International Publishing, Cham, 93–109. https://doi.org/10.1007/978-3-319-42141-4_9
[46]
Danilo Pau, Marco Lattuada, Francesco Loro, Antonio De Vita, and Gian Domenico Licciardo. 2021. Comparing Industry Frameworks with Deeply Quantized Neural Networks on Microcontrollers. In 2021 IEEE International Conference on Consumer Electronics (ICCE). IEEE, NJ, 1–6.
[47]
SP Kavyashree Prasad and Mohamed El-Sharkawy. 2021. Deployment of Compressed MobileNet V3 on iMX RT 1060. In 2021 IEEE International IOT, Electronics and Mechatronics Conference (IEMTRONICS). IEEE, NJ, 1–4.
[48]
Xukan Ran, Haoliang Chen, Zhenming Liu, and Jiasi Chen. 2017. Delivering Deep Learning to Mobile Devices via Offloading. In Proceedings of the Workshop on Virtual Reality and Augmented Reality Network (Los Angeles, CA, USA) (VR/AR Network ’17). Association for Computing Machinery, New York, NY, USA, 42–47.
[49]
Sergio Márquez Sánchez, Francisco Lecumberri, Vishwani Sati, Ashish Arora, Niloufar Shoeibi, Sara Rodríguez, and Juan M. Corchado Rodríguez. 2020. Edge Computing Driven Smart Personal Protective System Deployed on NVIDIA Jetson and Integrated with ROS. In Highlights in Practical Applications of Agents, Multi-Agent Systems, and Trust-worthiness. The PAAMS Collection, Fernando De La Prieta, Philippe Mathieu, Jaime Andrés Rincón Arango, Alia El Bolock, Elena Del Val, Jaume Jordán Prunera, João Carneiro, Rubén Fuentes, Fernando Lopes, and Vicente Julian (Eds.). Springer International Publishing, Cham, 385–393.
[50]
Jairam Sharma. 2022. Building Industrial embedded deep learning inference pipelines with TensorRT. Retrieved March 23, 2022 from https://learnopencv.com/building-industrial-embedded-deep-learning-inference-pipelines-with-tensorrt/
[51]
Duncan Stewart, Jeff Loucks, Mark Casey, and Craig Wigginton. 2019. Bringing AI to the device: Edge AI chips come into their own. https://www2.deloitte.com/us/en/insights/industry/technology/technology-media-and-telecom-predictions/2020/ai-chips.html.
[52]
Surat Teerapittayanon, Bradley McDanel, and H.T. Kung. 2017. Distributed Deep Neural Networks Over the Cloud, the Edge and End Devices. In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). IEEE, NJ, 328–339.
[53]
S. Teerapittayanon, B. McDanel, and H. T. Kung. 2017. Distributed Deep Neural Networks Over the Cloud, the Edge and End Devices. In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). 328–339.
[54]
Rob van der Meulen. 2018. What Edge Computing Means for Infrastructure and Operations Leaders. https://www.gartner.com/smarterwithgartner/what-edge-computing-means-for-infrastructure-and-operations-leaders.
[55]
Wei Wang, Hui Lin, and Junshu Wang. 2020. CNN Based Lane Detection with Instance Segmentation in Edge-Cloud Computing. J. Cloud Comput. 9, 1 (may 2020), 10 pages.
[56]
Cuebong Wong, Erfu Yang, Xiu-Tian Yan, and Dongbing Gu. 2018. Autonomous robots for harsh environments: a holistic overview of current solutions and ongoing challenges. Systems Science & Control Engineering 6, 1 (2018), 213–219.
[57]
Yecheng Xiang and Hyoseung Kim. 2019. Pipelined Data-Parallel CPU/GPU Scheduling for Multi-DNN Real-Time Inference. In 2019 IEEE Real-Time Systems Symposium (RTSS). IEEE, NJ, 392–405.
[58]
Y. Xing, P. Kirkland, G. Di Caterina, J. Soraghan, and G. Matich. 2018. Real-Time Embedded Intelligence System: Emotion Recognition on Raspberry Pi with Intel NCS. In Artificial Neural Networks and Machine Learning – ICANN 2018, Věra Kůrková, Yannis Manolopoulos, Barbara Hammer, Lazaros Iliadis, and Ilias Maglogiannis (Eds.). Springer International Publishing, Cham, 801–808.
[59]
Mengwei Xu, Yunxin Liu, and Xuanzhe Liu. 2021. A Case for Camera-as-a-Service. IEEE Pervasive Computing 20, 2 (2021), 9–17.
[60]
Dingcheng Yang, Wenjian Yu, Ao Zhou, Haoyuan Mu, Gary Yao, and Xiaoyi Wang. 2020. DP-Net: Dynamic Programming Guided Deep Neural Network Compression. CoRR abs/2003.09615(2020).
[61]
Shuochao Yao, Jinyang Li, Dongxin Liu, Tianshi Wang, Shengzhong Liu, Huajie Shao, and Tarek Abdelzaher. 2020. Deep Compressive Offloading: Speeding up Neural Network Inference by Trading Edge Computation for Network Latency. In Proceedings of the 18th Conference on Embedded Networked Sensor Systems (Virtual Event, Japan) (SenSys ’20). Association for Computing Machinery, New York, NY, USA, 476–488.
[62]
Shuochao Yao, Yiran Zhao, Huajie Shao, ShengZhong Liu, Dongxin Liu, Lu Su, and Tarek Abdelzaher. 2018. FastDeepIoT: Towards Understanding and Optimizing Neural Network Execution Time on Mobile and Embedded Devices. In Proceedings of the 16th ACM Conference on Embedded Networked Sensor Systems (Shenzhen, China) (SenSys ’18). Association for Computing Machinery, New York, NY, USA, 278–291.
[63]
Li Zhou, Mohammad Hossein Samavatian, Anys Bacha, Saikat Majumdar, and Radu Teodorescu. 2019. Adaptive Parallel Execution of Deep Neural Networks on Heterogeneous Edge Devices. In Proceedings of the 4th ACM/IEEE Symposium on Edge Computing (Arlington, Virginia) (SEC ’19). Association for Computing Machinery, New York, NY, USA, 195–208. https://doi.org/10.1145/3318216.3363312
[64]
Li Zhou, Hao Wen, Radu Teodorescu, and David H. C. Du. 2019. Distributing Deep Neural Networks with Containerized Partitions at the Edge. In 2nd USENIX Workshop on Hot Topics in Edge Computing, HotEdge 2019, Renton, WA, USA, July 9, 2019, Irfan Ahmad and Swaminathan Sundararaman (Eds.). USENIX Association. https://www.usenix.org/conference/hotedge19/presentation/zhou

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
AIMLSystems '22: Proceedings of the Second International Conference on AI-ML Systems
October 2022
209 pages
ISBN:9781450398473
DOI:10.1145/3564121
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 May 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. deep learning
  2. edge
  3. nas
  4. neural networks
  5. pruning

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

AIMLSystems 2022

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)106
  • Downloads (Last 6 weeks)19
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media