research-article

Public Access

PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning

Authors:

Bin RenAuthors Info & Claims

ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems

Pages 907 - 922

https://doi.org/10.1145/3373376.3378534

Published: 13 March 2020 Publication History

Abstract

With the emergence of a spectrum of high-end mobile devices, many applications that formerly required desktop-level computation capability are being transferred to these devices. However, executing Deep Neural Networks (DNNs) inference is still challenging considering the high computation and storage demands, specifically, if real-time performance with high accuracy is needed. Weight pruning of DNNs is proposed, but existing schemes represent two extremes in the design space: non-structured pruning is fine-grained, accurate, but not hardware friendly; structured pruning is coarse-grained, hardware-efficient, but with higher accuracy loss.

In this paper, we advance the state-of-the-art by introducing a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in the design space. With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency. In other words, our method achieves the best of both worlds, and is desirable across theory/algorithm, compiler, and hardware levels. The proposed PatDNN is an end-to-end framework to efficiently execute DNN on mobile devices with the help of a novel model compression technique---pattern-based pruning based on an extended ADMM solution framework---and a set of thorough architecture-aware compiler/code generation-based optimizations, i.e., filter kernel reordering, compressed weight storage, register load redundancy elimination, and parameter auto-tuning. Evaluation results demonstrate that PatDNN outperforms three state-of-the-art end-to-end DNN frameworks, TensorFlow Lite, TVM, and Alibaba Mobile Neural Network with speedup up to 44.5X, 11.4X, and 7.1X, respectively, with no accuracy compromise. Real-time inference of representative large-scale DNNs (e.g., VGG-16, ResNet-50) can be achieved using mobile devices.

References

[1]

Alibaba. 2019. MNN. https://github.com/alibaba/MNN

[2]

Sourav Bhattacharya and Nicholas D Lane. 2016. From smart to deep: Robust activity recognition on smartwatches using deep learning. In 2016 IEEE International Conference on Pervasive Computing and Communication Workshops (PerCom Workshops). IEEE, 1--6.

[3]

Ivica Boticki and Hyo-Jeong So. 2010. Quiet captures: A tool for capturing the evidence of seamless learning with mobile devices. In Proceedings of the 9th International Conference of the Learning Sciences-Volume 1. International Society of the Learning Sciences, 500--507.

[4]

Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. 2011. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine Learning, Vol. 3, 1 (2011), 1--122.

[5]

Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, et almbox. 2018. TVM: An automated end-to-end optimizing compiler for deep learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18) . 578--594.

[6]

Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2015. Binaryconnect: Training deep neural networks with binary weights during propagations. In Advances in neural information processing systems. 3123--3131.

Digital Library

[7]

Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized neural networks: Training deep neural networks with weights and activations constrained to

[8]

1 or-1. arXiv preprint arXiv:1602.02830 (2016).

[9]

Xiaoliang Dai, Hongxu Yin, and Niraj K Jha. 2017. NeST: a neural network synthesis tool based on a grow-and-prune paradigm. arXiv preprint arXiv:1711.02017 (2017).

[10]

Yunbin Deng. 2019. Deep Learning on Mobile Devices -- A Review. arXiv preprint arXiv:1904.09274 (2019).

[11]

Google. 2019. TensorFlow Lite. https://www.tensorflow.org/mobile/tflite/

[12]

Joseph L Greathouse, Kent Knox, Jakub Poła, Kiran Varaganti, and Mayank Daga. 2016. clSPARSE: A Vendor-Optimized Open-Source Sparse BLAS Library. In Proceedings of the 4th International Workshop on OpenCL. ACM, 7.

Digital Library

[13]

Yiwen Guo, Anbang Yao, and Yurong Chen. 2016. Dynamic network surgery for efficient dnns. In Advances In Neural Information Processing Systems. 1379--1387.

[14]

Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. 2015. Deep learning with limited numerical precision. In International Conference on Machine Learning . 1737--1746.

[15]

Song Han, Huizi Mao, and William J Dally. 2015a. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).

[16]

Song Han, Jeff Pool, John Tran, and William Dally. 2015b. Learning both weights and connections for efficient neural network. In Advances in Neural Information Processing Systems. 1135--1143.

[17]

Seungyeop Han, Haichen Shen, Matthai Philipose, Sharad Agarwal, Alec Wolman, and Arvind Krishnamurthy. 2016. Mcdnn: An approximation-based execution framework for deep stream processing under resource constraints. In Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services. ACM, 123--136.

Digital Library

[18]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.

[19]

Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, Li-Jia Li, and Song Han. 2018. AMC: AutoML for Model Compression and Acceleration on Mobile Devices. In European Conference on Computer Vision. Springer, 815--832.

[20]

Yihui He, Xiangyu Zhang, and Jian Sun. 2017. Channel Pruning for Accelerating Very Deep Neural Networks. In Computer Vision (ICCV), 2017 IEEE International Conference on. IEEE, 1398--1406.

[21]

Parker Hill, Animesh Jain, Mason Hill, Babak Zamirai, Chang-Hong Hsu, Michael A Laurenzano, Scott Mahlke, Lingjia Tang, and Jason Mars. 2017. Deftnn: Addressing bottlenecks for dnn execution on GPUs via synapse vector elimination and near-compute data fission. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 786--799.

Digital Library

[22]

Mingyi Hong, Zhi-Quan Luo, and Meisam Razaviyayn. 2016. Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM Journal on Optimization, Vol. 26, 1 (2016), 337--364.

[23]

Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized neural networks. In Advances in neural information processing systems. 4107--4115.

[24]

Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2017. Quantized neural networks: Training neural networks with low precision weights and activations. The Journal of Machine Learning Research, Vol. 18, 1 (2017), 6869--6898.

Digital Library

[25]

Loc N Huynh, Youngki Lee, and Rajesh Krishna Balan. 2017. Deepmon: Mobile gpu-based deep learning framework for continuous vision applications. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services. ACM, 82--95.

Digital Library

[26]

Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015).

Digital Library

[27]

Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. 2014. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition . 1725--1732.

Digital Library

[28]

Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. In Proceedings of the International Conference on Learning Representations (ICLR) .

[29]

Nicholas D Lane, Sourav Bhattacharya, Petko Georgiev, Claudio Forlivesi, Lei Jiao, Lorena Qendro, and Fahim Kawsar. 2016. Deepx: A software accelerator for low-power deep learning inference on mobile devices. In Proceedings of the 15th International Conference on Information Processing in Sensor Networks. IEEE Press, 23.

Digital Library

[30]

Nicholas D Lane, Sourav Bhattacharya, Petko Georgiev, Claudio Forlivesi, and Fahim Kawsar. 2015a. An early resource characterization of deep learning on wearables, smartphones and internet-of-things devices. In Proceedings of the 2015 international workshop on internet of things towards applications. ACM, 7--12.

Digital Library

[31]

Nicholas D Lane, Sourav Bhattacharya, Akhil Mathur, Petko Georgiev, Claudio Forlivesi, and Fahim Kawsar. 2017. Squeezing deep learning into mobile and embedded devices. IEEE Pervasive Computing, Vol. 16, 3 (2017), 82--88.

Digital Library

[32]

Nicholas D Lane, Petko Georgiev, and Lorena Qendro. 2015b. DeepEar: robust smartphone audio sensing in unconstrained acoustic environments using deep learning. In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing. ACM, 283--294.

Digital Library

[33]

Andrew Lavin and Scott Gray. 2016. Fast algorithms for convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4013--4021.

[34]

Vadim Lebedev and Victor Lempitsky. 2016. Fast convnets using group-wise brain damage. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2554--2564.

[35]

Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Y Ng. 2009. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In Proceedings of the 26th annual international conference on machine learning. ACM, 609--616.

Digital Library

[36]

Cong Leng, Hao Li, Shenghuo Zhu, and Rong Jin. 2017. Extremely low bit neural network: Squeeze the last bit out with admm. arXiv preprint arXiv:1707.09870 (2017).

[37]

Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. 2017. Pruning filters for efficient convnets. In International Conference on Learning Representations (ICLR) .

[38]

Darryl Lin, Sachin Talathi, and Sreekanth Annapureddy. 2016. Fixed point quantization of deep convolutional networks. In International Conference on Machine Learning. 2849--2858.

Digital Library

[39]

Baoyuan Liu, Min Wang, Hassan Foroosh, Marshall Tappen, and Marianna Pensky. 2015. Sparse convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 806--814.

[40]

Sijia Liu, Jie Chen, Pin-Yu Chen, and Alfred Hero. 2018a. Zeroth-Order Online Alternating Direction Method of Multipliers: Convergence Analysis and Applications. In International Conference on Artificial Intelligence and Statistics. 288--297.

[41]

Sicong Liu, Yingyan Lin, Zimu Zhou, Kaiming Nan, Hui Liu, and Junzhao Du. 2018b. On-demand deep model compression for mobile devices: A usage-driven model selection framework. In Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services. ACM, 389--400.

Digital Library

[42]

Xiaolong Ma, Fu-Ming Guo, Wei Niu, Xue Lin, Jian Tang, Kaisheng Ma, Bin Ren, and Yanzhi Wang. 2019. Pconv: The missing but desirable sparsity in dnn weight pruning for real-time execution on mobile devices. arXiv preprint arXiv:1909.05073 (2019).

[43]

Huizi Mao, Song Han, Jeff Pool, Wenshuo Li, Xingyu Liu, Yu Wang, and William J Dally. 2017. Exploring the regularity of sparse structure in convolutional neural networks. arXiv preprint arXiv:1705.08922 (2017).

[44]

Kaoru Ota, Minh Son Dao, Vasileios Mezaris, and Francesco GB De Natale. 2017. Deep learning for mobile multimedia: A survey. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Vol. 13, 3s (2017), 34.

[45]

Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W Keckler, and William J Dally. 2017. SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks. In Proceedings of the 44th Annual International Symposium on Computer Architecture. ACM, 27--40.

Digital Library

[46]

Eunhyeok Park, Junwhan Ahn, and Sungjoo Yoo. 2017. Weighted-entropy-based quantization for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7197--7205.

[47]

Damian Philipp, Frank Durr, and Kurt Rothermel. 2011. A sensor network abstraction for flexible public sensing systems. In 2011 IEEE Eighth International Conference on Mobile Ad-Hoc and Sensor Systems. IEEE, 460--469.

Digital Library

[48]

Qualcomm. 2019. Snapdragon 855. https://www.qualcomm.com/products/snapdragon-855-mobile-platform

[49]

Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. Xnor-net: Imagenet classification using binary convolutional neural networks. In European Conference on Computer Vision. Springer, 525--542.

[50]

Ao Ren, Tianyun Zhang, Shaokai Ye, Wenyao Xu, Xuehai Qian, Xue Lin, and Yanzhi Wang. 2019. ADMM-NN: an algorithm-hardware co-design framework of DNNs using alternating direction methods of multipliers. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM.

Digital Library

[51]

Mary M Rodgers, Vinay M Pai, and Richard S Conroy. 2014. Recent advances in wearable sensors for health monitoring. IEEE Sensors Journal, Vol. 15, 6 (2014), 3119--3126.

[52]

Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4510--4520.

[53]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[54]

Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S Moses, Sven Verdoolaege, Andrew Adams, and Albert Cohen. 2018. Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions. arXiv preprint arXiv:1802.04730 (2018).

[55]

Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. 2016. Learning structured sparsity in deep neural networks. In Advances in Neural Information Processing Systems. 2074--2082.

[56]

Shmuel Winograd. 1980. Arithmetic complexity of computations . Vol. 33. Siam.

[57]

Jiaxiang Wu, Cong Leng, Yuhang Wang, Qinghao Hu, and Jian Cheng. 2016. Quantized convolutional neural networks for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4820--4828.

[58]

Mengwei Xu, Mengze Zhu, Yunxin Liu, Felix Xiaozhu Lin, and Xuanzhe Liu. 2018. DeepCache: Principled Cache for Mobile Deep Vision. In Proceedings of the 24th Annual International Conference on Mobile Computing and Networking. ACM, 129--144.

Digital Library

[59]

Daniel LK Yamins and James J DiCarlo. 2016. Using goal-driven deep learning models to understand sensory cortex. Nature neuroscience, Vol. 19, 3 (2016), 356.

[60]

Daniel LK Yamins, Ha Hong, Charles F Cadieu, Ethan A Solomon, Darren Seibert, and James J DiCarlo. 2014. Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the National Academy of Sciences, Vol. 111, 23 (2014), 8619--8624.

[61]

Shuochao Yao, Shaohan Hu, Yiran Zhao, Aston Zhang, and Tarek Abdelzaher. 2017. Deepsense: A unified deep learning framework for time-series mobile sensing data processing. In Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 351--360.

Digital Library

[62]

Shaokai Ye, Xiaoyu Feng, Tianyun Zhang, Xiaolong Ma, Sheng Lin, Zhengang Li, Kaidi Xu, Wujie Wen, Sijia Liu, Jian Tang, et almbox. 2019. Progressive DNN Compression: A Key to Achieve Ultra-High Weight Pruning and Quantization Rates using ADMM. arXiv preprint arXiv:1903.09769 (2019).

[63]

Dong Yu and Li Deng. 2011. Deep learning and its applications to signal and information processing [exploratory dsp]. IEEE Signal Processing Magazine, Vol. 28, 1 (2011), 145--154.

[64]

Chaoyun Zhang, Paul Patras, and Hamed Haddadi. 2019. Deep learning in mobile and wireless networking: A survey. IEEE Communications Surveys & Tutorials (2019).

[65]

Tianyun Zhang, Shaokai Ye, Yipeng Zhang, Yanzhi Wang, and Makan Fardad. 2018. Systematic Weight Pruning of DNNs using Alternating Direction Method of Multipliers. arXiv preprint arXiv:1802.05747 (2018).

[66]

Aojun Zhou, Anbang Yao, Yiwen Guo, Lin Xu, and Yurong Chen. 2017. Incremental network quantization: Towards lossless cnns with low-precision weights. In International Conference on Learning Representations (ICLR) .

Cited By

Ferwana IChaffar SBelhaouari S(2024)Pruning and Neural Architectures Redesigning for Deep Neural Networks Compression in MobilesIntegrating Machine Learning Into HPC-Based Simulations and Analytics10.4018/978-1-6684-3795-7.ch005(107-126)Online publication date: 13-Dec-2024
https://doi.org/10.4018/978-1-6684-3795-7.ch005
Sun QLi PHe CSong QChen JKong XLuo Z(2024)A Lightweight and High-Precision Passion Fruit YOLO Detection Model for Deployment in Embedded DevicesSensors10.3390/s2415494224:15(4942)Online publication date: 30-Jul-2024
https://doi.org/10.3390/s24154942
Li ALi HYuan G(2024)Continual Learning with Deep Neural Networks in Physiological Signal Data: A SurveyHealthcare10.3390/healthcare1202015512:2(155)Online publication date: 9-Jan-2024
https://doi.org/10.3390/healthcare12020155
Show More Cited By

Index Terms

PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning

Recommendations

ADMM-based Weight Pruning for Real-Time Deep Learning Acceleration on Mobile Devices
GLSVLSI '19: Proceedings of the 2019 Great Lakes Symposium on VLSI

Deep learning solutions are being increasingly deployed in mobile applications, at least for the inference phase. Due to the large model size and computational requirements, model compression for deep neural networks (DNNs) becomes necessary, especially ...
GCD²: A Globally Optimizing Compiler for Mapping DNNs to Mobile DSPs
MICRO '22: Proceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture

More specialized chips are exploiting available high transistor density to expose parallelism at a large scale with more intricate instruction sets. This paper reports on a compilation system GCD², developed to support complex Deep Neural Network (DNN)...
Reweighted Alternating Direction Method of Multipliers for DNN weight pruning
Abstract
As Deep Neural Networks (DNNs) continue to grow in complexity and size, leading to a substantial computational burden, weight pruning techniques have emerged as an effective solution. This paper presents a novel method for dynamic regularization-...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems

March 2020

1412 pages

ISBN:9781450371025

DOI:10.1145/3373376

General Chair:
James Larus
EPFL
,
Program Chairs:
Luis Ceze
University of Washington
,
Karin Strauss
Microsoft

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

SIGBED: ACM Special Interest Group on Embedded Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 March 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

ASPLOS '20

Sponsor:

ASPLOS '20: Architectural Support for Programming Languages and Operating Systems

March 16 - 20, 2020

Lausanne, Switzerland

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

171
Total Citations
View Citations
3,576
Total Downloads

Downloads (Last 12 months)775
Downloads (Last 6 weeks)88

Reflects downloads up to 03 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ferwana IChaffar SBelhaouari S(2024)Pruning and Neural Architectures Redesigning for Deep Neural Networks Compression in MobilesIntegrating Machine Learning Into HPC-Based Simulations and Analytics10.4018/978-1-6684-3795-7.ch005(107-126)Online publication date: 13-Dec-2024
https://doi.org/10.4018/978-1-6684-3795-7.ch005
Sun QLi PHe CSong QChen JKong XLuo Z(2024)A Lightweight and High-Precision Passion Fruit YOLO Detection Model for Deployment in Embedded DevicesSensors10.3390/s2415494224:15(4942)Online publication date: 30-Jul-2024
https://doi.org/10.3390/s24154942
Li ALi HYuan G(2024)Continual Learning with Deep Neural Networks in Physiological Signal Data: A SurveyHealthcare10.3390/healthcare1202015512:2(155)Online publication date: 9-Jan-2024
https://doi.org/10.3390/healthcare12020155
Lu KPan XMi CWang WZhang JChen PWang B(2024)RDDPA: Real-time Defect Detection via Pruning Algorithm on Steel SurfaceISIJ International10.2355/isijinternational.ISIJINT-2023-36064:6(1019-1028)Online publication date: 15-Apr-2024
https://doi.org/10.2355/isijinternational.ISIJINT-2023-360
Luo XLiu DKong HHuai SChen HXiong GLiu W(2024)Efficient Deep Learning Infrastructures for Embedded Computing Systems: A Comprehensive Survey and Future EnvisionACM Transactions on Embedded Computing Systems10.1145/370172824:1(1-100)Online publication date: 24-Oct-2024
https://dl.acm.org/doi/10.1145/3701728
Liu PRoot AXu ALi YKjolstad FBik A(2024)Compiler Support for Sparse Tensor ConvolutionsProceedings of the ACM on Programming Languages10.1145/36897218:OOPSLA2(275-303)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3689721
Xiang MTang JYang QGuan HLiu TCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)AdapMTL: Adaptive Pruning Framework for Multitask Learning ModelProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681426(5121-5130)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681426
Bayer RPriest JTözün P(2024)Reaching the Edge of the Edge: Image Analysis in SpaceProceedings of the Eighth Workshop on Data Management for End-to-End Machine Learning10.1145/3650203.3663330(29-38)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3650203.3663330
Xie KLu YHe XYi DDong HChen Y(2024)Winols: A Large-Tiling Sparse Winograd CNN Accelerator on FPGAsACM Transactions on Architecture and Code Optimization10.1145/364368221:2(1-24)Online publication date: 31-Jan-2024
https://dl.acm.org/doi/10.1145/3643682
Niu WAgrawal GRen BTsafrir DMUSUVATHI MGupta RAbu-Ghazaleh N(2024)SoD2: Statically Optimizing Dynamic Deep Neural Network ExecutionProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624869(386-400)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3617232.3624869
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten