research-article

Computation offloading for fast CNN inference in edge computing

Authors:

Toshiaki Miyazaki,

Xiaoyan WangAuthors Info & Claims

RACS '19: Proceedings of the Conference on Research in Adaptive and Convergent Systems

Pages 101 - 106

https://doi.org/10.1145/3338840.3355669

Published: 24 September 2019 Publication History

Abstract

Convolutional Neural Network (CNN) is an important computation model for many popular mobile artificial intelligence applications. However, CNN inference, i.e., processing input data based on well-trained CNN models, is computation-intensive and incurs a heavy overhead for mobile devices with limited hardware resources. In this paper, we propose to offload a portion of CNN inference computation of mobile devices to the edge computing site. We find that batching tasks on GPU can significantly reduce average inference time on GPUs. Based on this important observation, we design an algorithm that jointly considers the tasks on all mobile devices and the corresponding batching benefit on the edge site, different from existing work on the collaborative inference that let each mobile device independently make offloading decisions. Furthermore, an online algorithm is proposed to handle the scenario that CNN inference tasks arrive at different time. It can significantly reduce average inference time without the knowledge of future task arrivals. Finally, extensive simulations are conducted to evaluate the performance of our proposed algorithms and the results show they outperform existing work under different settings.

References

[1]

Xing Lin, Yair Rivenson, Nezih T Yardimci, Muhammed Veli, Yi Luo, Mona Jarrahi, and Aydogan Ozcan. 2018. All-optical machine learning using diffractive deep neural networks. Science 361, 6406 (2018), 1004--1008.

[2]

Andreas Kaplan and Michael Haenlein. 2019. Siri, Siri, in my hand: Who?s the fairest in the land? On the interpretations, illustrations, and implications of artificial intelligence. Business Horizons 62, 1 (2019), 15--25.

[3]

Seongwook Park, Kyeongryeol Bong, Dongjoo Shin, Jinmook Lee, Sungpill Choi, and Hoi-Jun Yoo. 2015. 4.6 A1. 93TOPS/W scalable deep learning/inference processor with tetra-parallel MIMD architecture for big-data applications. In 2015 IEEE International Solid-State Circuits Conference-(ISSCC) Digest of Technical Papers. IEEE, 1--3.

[4]

Xianfu Chen, Honggang Zhang, Celimuge Wu, Shiwen Mao, Yusheng Ji, and Mehdi Bennis. 2018. Optimized computation offloading performance in virtual edge computing systems via deep reinforcement learning. IEEE Internet of Things Journal (2018).

[5]

Eduardo Cuervo, Aruna Balasubramanian, Dae-ki Cho, Alec Wolman, Stefan Saroiu, Ranveer Chandra, and Paramvir Bahl. 2010. MAUI: making smartphones last longer with code offload. In Proceedings of the 8th international conference on Mobile systems, applications, and services. ACM, 49--62.

Digital Library

[6]

Jeremy Fowers, Kalin Ovtcharov, Michael Papamichael, Todd Massengill, Ming Liu, Daniel Lo, Shlomi Alkalay, Michael Haselman, Logan Adams, Mahdi Ghandi, et al. 2018. A configurable cloud-scale dnn processor for real-time ai. In Proceedings of the 45th Annual International Symposium on Computer Architecture. IEEE Press, 1--14.

Digital Library

[7]

Songtao Guo, Bin Xiao, Yuanyuan Yang, and Yang Yang. 2016. Energy-efficient dynamic offloading and resource scheduling in mobile cloud computing. In IEEE INFOCOM 2016-The 35th Annual IEEE International Conference on Computer Communications. IEEE, 1--9.

Digital Library

[8]

Forrest N. Iandola, Matthew W. Moskewicz, Khalid Ashraf, Song Han, William J. Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size. CoRR abs/1602.07360 (2016). arXiv:1602.07360 http://arxiv.org/abs/1602.07360

[9]

Yiping Kang, Johann Hauswald, Cao Gao, Austin Rovinski, Trevor Mudge, Jason Mars, and Lingjia Tang. 2017. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. In ACM SIGARCH Computer Architecture News, Vol. 45. ACM, 615--629.

[10]

P. Li, S. Guo, W. Zhuang, and B. Ye. 2014. On Efficient Resource Allocation for Cognitive and Cooperative Communications. IEEE Journal on Selected Areas in Communications 32, 2 (February 2014), 264--273.

[11]

P. Li, T. Miyazaki, K. Wang, S. Guo, and W. Zhuang. 2017. Vehicle-assist resilient information and network system for disaster management. IEEE Transactions on Emerging Topics in Computing 5, 3 (July 2017), 438--448.

[12]

P. Li, X. Wu, W. Shen, W. Tong, and S. Guo. 2019. Collaboration of Heterogeneous Unmanned Vehicles for Smart Cities. IEEE Network 33, 4 (July 2019), 133--137.

[13]

L. Lin, X. Liao, H. Jin, and P. Li. 2019. Computation Offloading Toward Edge Computing. Proc. IEEE 107, 8 (Aug 2019), 1584--1607.

[14]

Moo-Ryong Ra, Anmol Sheth, Lily Mummert, Padmanabhan Pillai, David Wether-all, and Ramesh Govindan. 2011. Odessa: enabling interactive perception applications on mobile devices. In Proceedings of the 9th international conference on Mobile systems, applications, and services. ACM, 43--56.

Digital Library

[15]

M. Satyanarayanan. 2017. The Emergence of Edge Computing. Computer 50, 1 (Jan 2017), 30--39.

Digital Library

[16]

Ji Wang, Jianguo Zhang, Weidong Bao, Xiaomin Zhu, Bokai Cao, and Philip S Yu. 2018. Not just privacy: Improving performance of private deep learning in mobile cloud. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 2407--2416.

Digital Library

[17]

Xiaowei Xu, Yukun Ding, Sharon Xiaobo Hu, Michael Niemier, Jason Cong, Yu Hu, and Yiyu Shi. 2018. Scaling for edge inference of deep neural networks. Nature Electronics 1, 4 (2018), 216.

[18]

Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. 2015. Optimizing fpga-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 161--170.

Digital Library

[19]

Zhicai Zhang, F Richard Yu, Fang Fu, Qiao Yan, and Zhouyang Wang. 2018. Joint Offloading and Resource Allocation in Mobile Edge Computing Systems: An Actor-Critic Approach. In 2018 IEEE Global Communications Conference (GLOBECOM). IEEE, 1--6.

Cited By

Xu ZQiao HLiang WXu ZXia QZhou PRana OXu W(2024)Flow-Time Minimization for Timely Data Stream Processing in UAV-Aided Mobile Edge ComputingACM Transactions on Sensor Networks10.1145/364381320:3(1-28)Online publication date: 2-Feb-2024
https://dl.acm.org/doi/10.1145/3643813
Xu ZXu GWang HLiang WXia QWang S(2024)Enabling Streaming Analytics in Satellite Edge Computing via Timely Evaluation of Big Data QueriesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.333233335:1(105-122)Online publication date: Jan-2024
https://doi.org/10.1109/TPDS.2023.3332333
Xu ZLi DLiang WXu WXia QZhou PRana OLi H(2024)Energy or Accuracy? Near-Optimal User Selection and Aggregator Placement for Federated Learning in MECIEEE Transactions on Mobile Computing10.1109/TMC.2023.326282923:3(2470-2485)Online publication date: Mar-2024
https://doi.org/10.1109/TMC.2023.3262829
Show More Cited By

Index Terms

Computation offloading for fast CNN inference in edge computing
1. Computer systems organization
  1. Dependable and fault-tolerant systems and networks
    1. Redundancy
  2. Embedded and cyber-physical systems
    1. Embedded systems
    2. Robotics
2. Networks
  1. Network properties
    1. Network reliability

Recommendations

Performance Analysis of CNN Models for Mobile Device Eye Tracking with Edge Computing
Abstract
Eye-tracking is a technique used for determining where users are looking and how long they keep their gaze fixed on a particular location. Developments in mobile technology have made mobile applications pervasive; however, eye tracking on mobile ...
Weightless Neural Networks for Efficient Edge Inference
PACT '22: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques

Weightless neural networks (WNNs) are a class of machine learning model which use table lookups to perform inference, rather than the multiply-accumulate operations typical of deep neural networks (DNNs). Individual weightless neurons are capable of ...
Distributing deep learning inference on edge devices
CoNEXT '20: Proceedings of the 16th International Conference on emerging Networking EXperiments and Technologies

Deep Neural Networks (DNNs) and Convolutional Neural Networks (CNNs) are widely used in IoT related applications. However, inferencing pre-trained large DNNs and CNNs consumes a significant amount of time, memory and computational resources. This makes ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

RACS '19: Proceedings of the Conference on Research in Adaptive and Convergent Systems

September 2019

323 pages

ISBN:9781450368438

DOI:10.1145/3338840

Conference Chair:
Chih-Cheng Hung
Kennesaw State University
,
General Chair:
Qianbin Chen
Chongqing University of Posts and Telecommunications, China
,
Program Chairs:
Xianzhong Xie
Chongqing University of Posts and Telecommunications, China
,
Christian Esposito
University of Salerno, Italy
,
Jun Huang
Chongqing University of Posts and Telecommunications, China
,
Juw Won Park
University of Louisville
,
Qinghua Zhang
Chongqing University of Posts and Telecommunications, China

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGAPP: ACM Special Interest Group on Applied Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 September 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

RACS '19

Sponsor:

SIGAPP

RACS '19: International Conference on Research in Adaptive and Convergent Systems

September 24 - 27, 2019

Chongqing, China

Acceptance Rates

RACS '19 Paper Acceptance Rate 56 of 188 submissions, 30%;

Overall Acceptance Rate 393 of 1,581 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

17
Total Citations
View Citations
591
Total Downloads

Downloads (Last 12 months)67
Downloads (Last 6 weeks)6

Reflects downloads up to 26 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Xu ZQiao HLiang WXu ZXia QZhou PRana OXu W(2024)Flow-Time Minimization for Timely Data Stream Processing in UAV-Aided Mobile Edge ComputingACM Transactions on Sensor Networks10.1145/364381320:3(1-28)Online publication date: 2-Feb-2024
https://dl.acm.org/doi/10.1145/3643813
Xu ZXu GWang HLiang WXia QWang S(2024)Enabling Streaming Analytics in Satellite Edge Computing via Timely Evaluation of Big Data QueriesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.333233335:1(105-122)Online publication date: Jan-2024
https://doi.org/10.1109/TPDS.2023.3332333
Xu ZLi DLiang WXu WXia QZhou PRana OLi H(2024)Energy or Accuracy? Near-Optimal User Selection and Aggregator Placement for Federated Learning in MECIEEE Transactions on Mobile Computing10.1109/TMC.2023.326282923:3(2470-2485)Online publication date: Mar-2024
https://doi.org/10.1109/TMC.2023.3262829
Ko HJeong HJung DPack S(2024)Dynamic Split Computing Framework in Distributed Serverless Edge CloudsIEEE Internet of Things Journal10.1109/JIOT.2023.334243811:8(14523-14531)Online publication date: 15-Apr-2024
https://doi.org/10.1109/JIOT.2023.3342438
Choudhury AGhose MIslam AYogita (2024)Machine learning-based computation offloading in multi-access edge computingJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2024.103090148:COnline publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1016/j.sysarc.2024.103090
Mustafa EShuja JRehman FRiaz AMaray MBilal MKhan M(2024)Deep Neural Networks meet computation offloading in mobile edge networks: Applications, taxonomy, and open issuesJournal of Network and Computer Applications10.1016/j.jnca.2024.103886226(103886)Online publication date: Jun-2024
https://doi.org/10.1016/j.jnca.2024.103886
Zhang JMa SYan ZHuang J(2023)Joint DNN partitioning and task offloading in mobile edge computing via deep reinforcement learningJournal of Cloud Computing: Advances, Systems and Applications10.1186/s13677-023-00493-912:1Online publication date: 3-Aug-2023
https://dl.acm.org/doi/10.1186/s13677-023-00493-9
Hao ZXu GLuo YHu HAn JMao S(2023)Multi-Agent Collaborative Inference via DNN Decoupling: Intermediate Feature Compression and Edge LearningIEEE Transactions on Mobile Computing10.1109/TMC.2022.318309822:10(6041-6055)Online publication date: 1-Oct-2023
https://doi.org/10.1109/TMC.2022.3183098
Chen YZhang QZhang YMa XZhou A(2023)Energy and Time-Aware Inference Offloading for DNN-based Applications in LEO Satellites2023 IEEE 31st International Conference on Network Protocols (ICNP)10.1109/ICNP59255.2023.10355644(1-6)Online publication date: 10-Oct-2023
https://doi.org/10.1109/ICNP59255.2023.10355644
Rodriguez-Conde ICampos CFdez-Riverola F(2023)Cloud-assisted collaborative inference of convolutional neural networks for vision tasks on resource-constrained devicesNeurocomputing10.1016/j.neucom.2023.126835560(126835)Online publication date: Dec-2023
https://doi.org/10.1016/j.neucom.2023.126835
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten