skip to main content
10.1145/3447993.3448628acmconferencesArticle/Chapter ViewAbstractPublication PagesmobicomConference Proceedingsconference-collections
research-article
Open access

Elf: accelerate high-resolution mobile deep vision with content-aware parallel offloading

Published: 09 September 2021 Publication History

Abstract

As mobile devices continuously generate streams of images and videos, a new class of mobile deep vision applications are rapidly emerging, which usually involve running deep neural networks on these multimedia data in real-time. To support such applications, having mobile devices offload the computation, especially the neural network inference, to edge clouds has proved effective. Existing solutions often assume there exists a dedicated and powerful server, to which the entire inference can be offloaded. In reality, however, we may not be able to find such a server but need to make do with less powerful ones. To address these more practical situations, we propose to partition the video frame and offload the partial inference tasks to multiple servers for parallel processing. This paper presents the design of Elf, a framework to accelerate the mobile deep vision applications with any server provisioning through the parallel offloading. Elf employs a recurrent region proposal prediction algorithm, a region proposal centric frame partitioning, and a resource-aware multi-offloading scheme. We implement and evaluate Elf upon Linux and Android platforms using four commercial mobile devices and three deep vision applications with ten state-of-the-art models. The comprehensive experiments show that Elf can speed up the applications by 4.85× with saving bandwidth usage by 52.6%, while with <1% application accuracy sacrifice.

References

[1]
Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," nature, vol. 521, no. 7553, p. 436, 2015.
[2]
M. Xu, J. Liu, Y. Liu, F. X. Lin, Y. Liu, and X. Liu, "A first look at deep learning apps on smartphones," in The World Wide Web Conference, pp. 2125--2136, 2019.
[3]
S. Ren, K. He, R. Girshick, and J. Sun, "Faster r-cnn: Towards real-time object detection with region proposal networks," in Advances in neural information processing systems, pp. 91--99, 2015.
[4]
M. Teichmann, M. Weber, M. Zoellner, R. Cipolla, and R. Urtasun, "Multinet: Realtime joint semantic reasoning for autonomous driving," in 2018 IEEE Intelligent Vehicles Symposium (IV), pp. 1013--1020, IEEE, 2018.
[5]
C. Xiang, C. R. Qi, and B. Li, "Generating 3d adversarial point clouds," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9136--9144, 2019.
[6]
S. Xu, D. Liu, L. Bao, W. Liu, and P. Zhou, "Mhp-vos: Multiple hypotheses propagation for video object segmentation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 314--323, 2019.
[7]
Y. He, X. Zhang, and J. Sun, "Channel pruning for accelerating very deep neural networks," in Proceedings of the IEEE International Conference on Computer Vision, pp. 1389--1397, 2017.
[8]
B. Fang, X. Zeng, and M. Zhang, "Nestdnn: Resource-aware multi-tenant ondevice deep learning for continuous mobile vision," in Proceedings of the 24th Annual International Conference on Mobile Computing and Networking, pp. 115--127, ACM, 2018.
[9]
J. Wu, C. Leng, Y. Wang, Q. Hu, and J. Cheng, "Quantized convolutional neural networks for mobile devices," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4820--4828, 2016.
[10]
Z. He and D. Fan, "Simultaneously optimizing weight and quantizer of ternary neural network using truncated gaussian approximation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11438--11446, 2019.
[11]
J. Yim, D. Joo, J. Bae, and J. Kim, "A gift from knowledge distillation: Fast optimization, network minimization and transfer learning," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4133--4141, 2017.
[12]
B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le, "Learning transferable architectures for scalable image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8697--8710, 2018.
[13]
T. Lee, Z. Lin, S. Pushp, C. Li, Y. Liu, Y. Lee, C. Xu, F. Xu, L. Zhang, and J. Song, "Occlumency: Privacy-preserving remote deep-learning inference using sgx," in Proceedings of the 25th Annual International Conference on Mobile Computing and Networking, MobiCom 2019, October 21--25, 2019, Los Cabos, Mexico, ACM, 2019.
[14]
C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, "Optimizing fpga-based accelerator design for deep convolutional neural networks," in Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 161--170, ACM, 2015.
[15]
N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, et al., "In-datacenter performance analysis of a tensor processing unit," in 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), pp. 1--12, IEEE, 2017.
[16]
L. Liu, H. Li, and M. Gruteser, "Edge assisted real-time object detection for mobile augmented reality," in The 25th Annual International Conference on Mobile Computing and Networking, pp. 1--16, 2019.
[17]
W. Zhang, S. Li, L. Liu, Z. Jia, Y. Zhang, and D. Raychaudhuri, "Hetero-edge: Orchestration of real-time vision applications on heterogeneous edge clouds," in IEEE INFOCOM 2019-IEEE Conference on Computer Communications, IEEE, 2019.
[18]
J. Emmons, S. Fouladi, G. Ananthanarayanan, S. Venkataraman, S. Savarese, and K. Winstein, "Cracking open the dnn black-box: Video analytics with dnns across the camera-cloud boundary," in Proceedings of the 2019 Workshop on Hot Topics in Video Analytics and Intelligent Edges, pp. 27--32, 2019.
[19]
C. Canel, T. Kim, G. Zhou, C. Li, H. Lim, D. G. Andersen, M. Kaminsky, and S. R. Dulloor, "Scaling video analytics on constrained edge nodes," arXiv preprint arXiv:1905.13536, 2019.
[20]
Y. Li, A. Padmanabhan, P. Zhao, Y. Wang, G. H. Xu, and R. Netravali, "Reducto: On-camera filtering for resource-efficient real-time video analytics," in Proceedings of the Annual conference of the ACM Special Interest Group on Data Communication on the applications, technologies, architectures, and protocols for computer communication, pp. 359--376, 2020.
[21]
S. Naderiparizi, P. Zhang, M. Philipose, B. Priyantha, J. Liu, and D. Ganesan, "Glimpse: A programmable early-discard camera architecture for continuous mobile vision," in Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services, pp. 292--305, 2017.
[22]
T. Zhang, A. Chowdhery, P. Bahl, K. Jamieson, and S. Banerjee, "The design and implementation of a wireless video surveillance system," MobiCom, ACM, 2015.
[23]
"Aws wavelength: Bring aws services to the edge of the verizon 5g network." https://enterprise.verizon.com/business/learn/edge-computing/.
[24]
A. Narayanan, E. Ramadan, J. Carpenter, Q. Liu, Y. Liu, F. Qian, and Z.-L. Zhang, "A first look at commercial 5g performance on smartphones," in Proceedings of The Web Conference 2020, pp. 894--905, 2020.
[25]
S. Zhou, W. Shen, D. Zeng, M. Fang, Y. Wei, and Z. Zhang, "Spatial-temporal convolutional neural networks for anomaly detection and localization in crowded scenes," Signal Processing: Image Communication, vol. 47, pp. 358--368, 2016.
[26]
N. Tijtgat, W. Van Ranst, T. Goedeme, B. Volckaert, and F. De Turck, "Embedded real-time object detection for a uav warning system," in The IEEE International Conference on Computer Vision (ICCV) Workshops, Oct 2017.
[27]
H. Zhao, X. Qi, X. Shen, J. Shi, and J. Jia, "Icnet for real-time semantic segmentation on high-resolution images," in Proceedings of the European Conference on Computer Vision (ECCV), pp. 405--420, 2018.
[28]
"Intel xeon scalable processors." https://www.intel.com/content/www/us/en/products/processors/xeon/scalable.html.
[29]
"Nvidia egx a100: delivering real-time ai processing and enhanced security at the edge." https://www.nvidia.com/en-us/data-center/products/egx-a100/.
[30]
R. Grandl, G. Ananthanarayanan, S. Kandula, S. Rao, and A. Akella, "Multi-resource packing for cluster schedulers," ACM SIGCOMM Computer Communication Review, vol. 44, no. 4, pp. 455--466, 2014.
[31]
L. Peterson, T. Anderson, S. Katti, N. McKeown, G. Parulkar, J. Rexford, M. Satyanarayanan, O. Sunay, and A. Vahdat, "Democratizing the network edge," ACM SIGCOMM Computer Communication Review, vol. 49, no. 2, pp. 31--36, 2019.
[32]
S. Yang, E. Bailey, Z. Yang, J. Ostrometzky, G. Zussman, I. Seskar, and Z. Kostic, "Cosmos smart intersection: Edge compute and communications for bird's eye object tracking," in Proc. 4th International Workshop on Smart Edge Computing and Networking (SmartEdge'20), 2020.
[33]
K. He, G. Gkioxari, P. Dollár, and R. Girshick, "Mask r-cnn," in Proceedings of the IEEE international conference on computer vision, pp. 2961--2969, 2017.
[34]
H.-S. Fang, S. Xie, Y.-W. Tai, and C. Lu, "Rmpe: Regional multi-person pose estimation," in Proceedings of the IEEE International Conference on Computer Vision, pp. 2334--2343, 2017.
[35]
X. Ran, H. Chen, X. Zhu, Z. Liu, and J. Chen, "Deepdecision: A mobile deep learning framework for edge video analytics," in IEEE INFOCOM 2018-IEEE Conference on Computer Communications, pp. 1421--1429, IEEE, 2018.
[36]
K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770--778, 2016.
[37]
"Nvidia jetson nano, the ai platform for autonomous everything." https://www.nvidia.com/jetson-nano.
[38]
"Amazon sagemaker: Machine learning for every developer and data scientist." https://aws.amazon.com/sagemaker/.
[39]
K. Sun, B. Xiao, D. Liu, and J. Wang, "Deep high-resolution representation learning for human pose estimation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019.
[40]
D. Raychaudhuri, I. Seskar, G. Zussman, T. Korakis, D. Kilper, T. Chen, J. Kolodziejski, M. Sherman, Z. Kostic, X. Gu, et al., "Challenge: Cosmos: A city-scale programmable testbed for experimentation with advanced wireless," in Proceedings of the 26th Annual International Conference on Mobile Computing and Networking, pp. 1--13, 2020.
[41]
J. Huang, V. Rathod, C. Sun, M. Zhu, A. Korattikara, A. Fathi, I. Fischer, Z. Wojna, Y. Song, S. Guadarrama, et al., "Speed/accuracy trade-offs for modern convolutional object detectors," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7310--7311, 2017.
[42]
"Nvidia jetson tx2, the fastest, most power-efficient embedded ai computing device." https://developer.nvidia.com/embedded/jetson-tx2.
[43]
M. Wang, C.-c. Huang, and J. Li, "Supporting very large models using automatic dataflow graph partitioning," in Proceedings of the Fourteenth EuroSys Conference 2019, p. 26, ACM, 2019.
[44]
P. Voigtlaender, M. Krause, A. Osep, J. Luiten, B. B. G. Sekar, A. Geiger, and B. Leibe, "Mots: Multi-object tracking and segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7942--7951, 2019.
[45]
M. Najibi, B. Singh, and L. S. Davis, "Autofocus: Efficient multi-scale inference," in Proceedings of the IEEE International Conference on Computer Vision, pp. 9745--9755, 2019.
[46]
M. Figurnov, M. D. Collins, Y. Zhu, L. Zhang, J. Huang, D. Vetrov, and R. Salakhutdinov, "Spatially adaptive computation time for residual networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1039--1048, 2017.
[47]
D. Bahdanau, J. Chorowski, D. Serdyuk, P. Brakel, and Y. Bengio, "End-to-end attention-based large vocabulary speech recognition," in 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 4945--4949, IEEE, 2016.
[48]
Y. Wang, M. Huang, X. Zhu, and L. Zhao, "Attention-based lstm for aspect-level sentiment classification," in Proceedings of the 2016 conference on empirical methods in natural language processing, pp. 606--615, 2016.
[49]
Y. Qin, D. Song, H. Chen, W. Cheng, G. Jiang, and G. Cottrell, "A dual-stage attention-based recurrent neural network for time series prediction," arXiv preprint arXiv:1704.02971, 2017.
[50]
F. A. Gers, J. Schmidhuber, and F. Cummins, "Learning to forget: Continual prediction with lstm," 1999.
[51]
Q. Wang, L. Zhang, L. Bertinetto, W. Hu, and P. H. Torr, "Fast online object tracking and segmentation: A unifying approach," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1328--1338, 2019.
[52]
P. Hintjens, ZeroMQ: messaging for many applications. " O'Reilly Media, Inc.", 2013.
[53]
"Build and run docker containers leveraging nvidia gpus." https://github.com/NVIDIA/nvidia-docker.
[54]
"Nvidia gpu-accelerated jpeg encoder and decoder." https://developer.nvidia.com/nvjpeg.
[55]
A. Narayanan, J. Carpenter, E. Ramadan, Q. Liu, Y. Liu, F. Qian, and Z.-L. Zhang, "A first measurement study of commercial mmwave 5g performance on smart-phones," arXiv preprint arXiv:1909.07532, 2019.
[56]
Z. Cai and N. Vasconcelos, "Cascade r-cnn: Delving into high quality object detection," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6154--6162, 2018.
[57]
H. Zhang, H. Chang, B. Ma, N. Wang, and X. Chen, "Dynamic r-cnn: Towards high quality object detection via dynamic training," arXiv preprint arXiv:2004.06002, 2020.
[58]
Z. Tian, C. Shen, H. Chen, and T. He, "Fcos: Fully convolutional one-stage object detection," in Proceedings of the IEEE international conference on computer vision, pp. 9627--9636, 2019.
[59]
T. Kong, F. Sun, H. Liu, Y. Jiang, L. Li, and J. Shi, "Foveabox: Beyound anchor-based object detection," IEEE Transactions on Image Processing, vol. 29, pp. 7389--7398, 2020.
[60]
X. Zhang, F. Wan, C. Liu, R. Ji, and Q. Ye, "Freeanchor: Learning to match anchors for visual object detection," in Advances in Neural Information Processing Systems, pp. 147--155, 2019.
[61]
C. Zhu, Y. He, and M. Savvides, "Feature selective anchor-free module for singleshot object detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 840--849, 2019.
[62]
G. Ghiasi, T.-Y. Lin, and Q. V. Le, "Nas-fpn: Learning scalable feature pyramid architecture for object detection," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7036--7045, 2019.
[63]
T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, "Focal loss for dense object detection," in Proceedings of the IEEE international conference on computer vision, pp. 2980--2988, 2017.
[64]
P. Voigtlaender, M. Krause, A. Osep, J. Luiten, B. B. G. Sekar, A. Geiger, and B. Leibe, "Mots: Multi-object tracking and segmentation," in Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[65]
A. Geiger, P. Lenz, and R. Urtasun, "Are we ready for autonomous driving? the kitti vision benchmark suite," in Conference on Computer Vision and Pattern Recognition (CVPR), 2012.
[66]
M. Andriluka, U. Iqbal, E. Ensafutdinov, L. Pishchulin, A. Milan, J. Gall, and S. B., "PoseTrack: A benchmark for human pose estimation and tracking," in CVPR, 2018.
[67]
R. Alp Güler, N. Neverova, and I. Kokkinos, "Densepose: Dense human pose estimation in the wild," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7297--7306, 2018.
[68]
"Nvidia tensorrt programmable inference accelerator." https://developer.nvidia.com/tensorrt.
[69]
M. Menze and A. Geiger, "Object scene flow for autonomous vehicles," in Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
[70]
M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, "The cityscapes dataset for semantic urban scene understanding," in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[71]
D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," ICLR, 2015.
[72]
Y. Guan and T. Plötz, "Ensembles of deep lstm learners for activity recognition using wearables," Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 1, no. 2, pp. 1--28, 2017.
[73]
P. Zhang, W. Ouyang, P. Zhang, J. Xue, and N. Zheng, "Sr-lstm: State refinement for lstm towards pedestrian trajectory prediction," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 12085--12094, 2019.
[74]
D. Held, S. Thrun, and S. Savarese, "Learning to track at 100 fps with deep regression networks," in European Conference on Computer Vision, pp. 749--765, Springer, 2016.
[75]
Y. Guan, C. Zheng, X. Zhang, Z. Guo, and J. Jiang, "Pano: Optimizing 360 video streaming with a better understanding of quality perception," in Proceedings of the ACM Special Interest Group on Data Communication, pp. 394--407, 2019.
[76]
J. Jiang, G. Ananthanarayanan, P. Bodik, S. Sen, and I. Stoica, "Chameleon: scalable adaptation of video analytics," in Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication, pp. 253--266, 2018.
[77]
H. Zhang, G. Ananthanarayanan, P. Bodik, M. Philipose, P. Bahl, and M. J. Freedman, "Live video analytics at scale with approximation and delay-tolerance," in 14th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 17), pp. 377--392, 2017.
[78]
B. Zhang, X. Jin, S. Ratnasamy, J. Wawrzynek, and E. A. Lee, "Awstream: Adaptive wide-area streaming analytics," in Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication, pp. 236--252, 2018.
[79]
Y. Kang, J. Hauswald, C. Gao, A. Rovinski, T. Mudge, J. Mars, and L. Tang, "Neurosurgeon: Collaborative intelligence between the cloud and mobile edge," ACM SIGARCH Computer Architecture News, vol. 45, no. 1, pp. 615--629, 2017.
[80]
Z. Zhao, K. M. Barijough, and A. Gerstlauer, "Deepthings: Distributed adaptive deep learning inference on resource-constrained iot edge clusters," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 37, no. 11, pp. 2348--2359, 2018.
[81]
K. Apicharttrisorn, X. Ran, J. Chen, S. V. Krishnamurthy, and A. K. Roy-Chowdhury, "Frugal following: Power thrifty object detection and tracking for mobile augmented reality," in Proceedings of the 17th Conference on Embedded Networked Sensor Systems, pp. 96--109, 2019.
[82]
K. Du, A. Pervaiz, X. Yuan, A. Chowdhery, Q. Zhang, H. Hoffmann, and J. Jiang, "Server-driven video streaming for deep learning inference," in Proceedings of the Annual conference of the ACM Special Interest Group on Data Communication on the applications, technologies, architectures, and protocols for computer communication, pp. 557--570, 2020.
[83]
A. Veit and S. Belongie, "Convolutional networks with adaptive inference graphs," in Proceedings of the European Conference on Computer Vision (ECCV), pp. 3--18, 2018.
[84]
S. Liu, Y. Lin, Z. Zhou, K. Nan, H. Liu, and J. Du, "On-demand deep model compression for mobile devices: A usage-driven model selection framework," in Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services, pp. 389--400, 2018.
[85]
S. Jiang, Z. Ma, X. Zeng, C. Xu, M. Zhang, C. Zhang, and Y. Liu, "Scylla: Qoe-aware continuous mobile vision with fpga-based dynamic deep neural network reconfiguration," in IEEE INFOCOM 2020-IEEE Conference on Computer Communications, pp. 1369--1378, IEEE, 2020.
[86]
M. Xu, M. Zhu, Y. Liu, F. X. Lin, and X. Liu, "Deepcache: principled cache for mobile deep vision," in Proceedings of the 24th Annual International Conference on Mobile Computing and Networking, pp. 129--144, ACM, 2018.
[87]
M. Long, H. Zhu, J. Wang, and M. I. Jordan, "Unsupervised domain adaptation with residual transfer networks," in Advances in Neural Information Processing Systems, pp. 136--144, 2016.
[88]
S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally, "Eie: efficient inference engine on compressed deep neural network," in 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), pp. 243--254, IEEE, 2016.

Cited By

View all

Index Terms

  1. Elf: accelerate high-resolution mobile deep vision with content-aware parallel offloading

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MobiCom '21: Proceedings of the 27th Annual International Conference on Mobile Computing and Networking
      October 2021
      887 pages
      ISBN:9781450383424
      DOI:10.1145/3447993
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 09 September 2021

      Permissions

      Request permissions for this article.

      Check for updates

      Qualifiers

      • Research-article

      Funding Sources

      • NSF PAWR

      Conference

      ACM MobiCom '21
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 440 of 2,972 submissions, 15%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)888
      • Downloads (Last 6 weeks)98
      Reflects downloads up to 03 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media