skip to main content
10.1145/3338840.3355669acmconferencesArticle/Chapter ViewAbstractPublication PagesracsConference Proceedingsconference-collections
research-article

Computation offloading for fast CNN inference in edge computing

Published: 24 September 2019 Publication History

Abstract

Convolutional Neural Network (CNN) is an important computation model for many popular mobile artificial intelligence applications. However, CNN inference, i.e., processing input data based on well-trained CNN models, is computation-intensive and incurs a heavy overhead for mobile devices with limited hardware resources. In this paper, we propose to offload a portion of CNN inference computation of mobile devices to the edge computing site. We find that batching tasks on GPU can significantly reduce average inference time on GPUs. Based on this important observation, we design an algorithm that jointly considers the tasks on all mobile devices and the corresponding batching benefit on the edge site, different from existing work on the collaborative inference that let each mobile device independently make offloading decisions. Furthermore, an online algorithm is proposed to handle the scenario that CNN inference tasks arrive at different time. It can significantly reduce average inference time without the knowledge of future task arrivals. Finally, extensive simulations are conducted to evaluate the performance of our proposed algorithms and the results show they outperform existing work under different settings.

References

[1]
Xing Lin, Yair Rivenson, Nezih T Yardimci, Muhammed Veli, Yi Luo, Mona Jarrahi, and Aydogan Ozcan. 2018. All-optical machine learning using diffractive deep neural networks. Science 361, 6406 (2018), 1004--1008.
[2]
Andreas Kaplan and Michael Haenlein. 2019. Siri, Siri, in my hand: Who?s the fairest in the land? On the interpretations, illustrations, and implications of artificial intelligence. Business Horizons 62, 1 (2019), 15--25.
[3]
Seongwook Park, Kyeongryeol Bong, Dongjoo Shin, Jinmook Lee, Sungpill Choi, and Hoi-Jun Yoo. 2015. 4.6 A1. 93TOPS/W scalable deep learning/inference processor with tetra-parallel MIMD architecture for big-data applications. In 2015 IEEE International Solid-State Circuits Conference-(ISSCC) Digest of Technical Papers. IEEE, 1--3.
[4]
Xianfu Chen, Honggang Zhang, Celimuge Wu, Shiwen Mao, Yusheng Ji, and Mehdi Bennis. 2018. Optimized computation offloading performance in virtual edge computing systems via deep reinforcement learning. IEEE Internet of Things Journal (2018).
[5]
Eduardo Cuervo, Aruna Balasubramanian, Dae-ki Cho, Alec Wolman, Stefan Saroiu, Ranveer Chandra, and Paramvir Bahl. 2010. MAUI: making smartphones last longer with code offload. In Proceedings of the 8th international conference on Mobile systems, applications, and services. ACM, 49--62.
[6]
Jeremy Fowers, Kalin Ovtcharov, Michael Papamichael, Todd Massengill, Ming Liu, Daniel Lo, Shlomi Alkalay, Michael Haselman, Logan Adams, Mahdi Ghandi, et al. 2018. A configurable cloud-scale dnn processor for real-time ai. In Proceedings of the 45th Annual International Symposium on Computer Architecture. IEEE Press, 1--14.
[7]
Songtao Guo, Bin Xiao, Yuanyuan Yang, and Yang Yang. 2016. Energy-efficient dynamic offloading and resource scheduling in mobile cloud computing. In IEEE INFOCOM 2016-The 35th Annual IEEE International Conference on Computer Communications. IEEE, 1--9.
[8]
Forrest N. Iandola, Matthew W. Moskewicz, Khalid Ashraf, Song Han, William J. Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size. CoRR abs/1602.07360 (2016). arXiv:1602.07360 http://arxiv.org/abs/1602.07360
[9]
Yiping Kang, Johann Hauswald, Cao Gao, Austin Rovinski, Trevor Mudge, Jason Mars, and Lingjia Tang. 2017. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. In ACM SIGARCH Computer Architecture News, Vol. 45. ACM, 615--629.
[10]
P. Li, S. Guo, W. Zhuang, and B. Ye. 2014. On Efficient Resource Allocation for Cognitive and Cooperative Communications. IEEE Journal on Selected Areas in Communications 32, 2 (February 2014), 264--273.
[11]
P. Li, T. Miyazaki, K. Wang, S. Guo, and W. Zhuang. 2017. Vehicle-assist resilient information and network system for disaster management. IEEE Transactions on Emerging Topics in Computing 5, 3 (July 2017), 438--448.
[12]
P. Li, X. Wu, W. Shen, W. Tong, and S. Guo. 2019. Collaboration of Heterogeneous Unmanned Vehicles for Smart Cities. IEEE Network 33, 4 (July 2019), 133--137.
[13]
L. Lin, X. Liao, H. Jin, and P. Li. 2019. Computation Offloading Toward Edge Computing. Proc. IEEE 107, 8 (Aug 2019), 1584--1607.
[14]
Moo-Ryong Ra, Anmol Sheth, Lily Mummert, Padmanabhan Pillai, David Wether-all, and Ramesh Govindan. 2011. Odessa: enabling interactive perception applications on mobile devices. In Proceedings of the 9th international conference on Mobile systems, applications, and services. ACM, 43--56.
[15]
M. Satyanarayanan. 2017. The Emergence of Edge Computing. Computer 50, 1 (Jan 2017), 30--39.
[16]
Ji Wang, Jianguo Zhang, Weidong Bao, Xiaomin Zhu, Bokai Cao, and Philip S Yu. 2018. Not just privacy: Improving performance of private deep learning in mobile cloud. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 2407--2416.
[17]
Xiaowei Xu, Yukun Ding, Sharon Xiaobo Hu, Michael Niemier, Jason Cong, Yu Hu, and Yiyu Shi. 2018. Scaling for edge inference of deep neural networks. Nature Electronics 1, 4 (2018), 216.
[18]
Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. 2015. Optimizing fpga-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 161--170.
[19]
Zhicai Zhang, F Richard Yu, Fang Fu, Qiao Yan, and Zhouyang Wang. 2018. Joint Offloading and Resource Allocation in Mobile Edge Computing Systems: An Actor-Critic Approach. In 2018 IEEE Global Communications Conference (GLOBECOM). IEEE, 1--6.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
RACS '19: Proceedings of the Conference on Research in Adaptive and Convergent Systems
September 2019
323 pages
ISBN:9781450368438
DOI:10.1145/3338840
  • Conference Chair:
  • Chih-Cheng Hung,
  • General Chair:
  • Qianbin Chen,
  • Program Chairs:
  • Xianzhong Xie,
  • Christian Esposito,
  • Jun Huang,
  • Juw Won Park,
  • Qinghua Zhang
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 September 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. batching
  2. convolutional neural network
  3. edge computing
  4. inference
  5. online algorithm

Qualifiers

  • Research-article

Conference

RACS '19
Sponsor:

Acceptance Rates

RACS '19 Paper Acceptance Rate 56 of 188 submissions, 30%;
Overall Acceptance Rate 393 of 1,581 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)67
  • Downloads (Last 6 weeks)6
Reflects downloads up to 26 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media