research-article

Optimizing DNN computation graph using graph substitutions

Authors:

Lei ChenAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 13, Issue 12

Pages 2734 - 2746

https://doi.org/10.14778/3407790.3407857

Published: 01 July 2020 Publication History

Abstract

Deep learning has achieved great success in various real-world applications. As deep neural networks (DNNs) are getting larger, the inference and training cost of DNNs increases significantly. Since one round of inference or one iteration in the training phase of a DNN is typically modeled as a computation graph, existing works propose to optimize computation graphs by performing a sequence of functionally equivalent graph substitutions, leading to higher inference and training efficiency. In this work, we formally define the Optimizing Computation Graph using Graph Substitutions (OCGGS) problem, and prove it to be NP-hard and Poly-APX-complete. We develop two exact and efficient methods to the OCGGS problem. The pruning-based algorithm eliminates the examination of redundant graph substitution sequences, and the dynamic programming with pruning algorithm makes use of the explored graph substitutions. To further speed up the search process, we propose a sampling heuristic which is effective to optimize complex computation graphs with polynomial time and space complexity. Extensive experiments on various DNN architectures and sizes are conducted to verify the effectiveness and efficiency of our proposed solutions compared with existing techniques.

References

[1]

Cuda basic linear algebra subroutine library. https://developer.nvidia.com/cuda-toolkit.

[2]

Neon. https://github.com/NervanaSystems/neon.

[3]

Nvidia tensorrt: Programmable inference accelerator. https://developer.nvidia.com/tensorrt.

[4]

Open single and half precision gemm implementations. https://github.com/openai/openai-gemm.

[5]

T. Chen, T. Moreau, Z. Jiang, L. Zheng, E. Q. Yan, H. Shen, M. Cowan, L. Wang, Y. Hu, L. Ceze, C. Guestrin, and A. Krishnamurthy. TVM: an automated end-to-end optimizing compiler for deep learning. In 13th USENIX Symposium on Operating Systems Design and Implementation, pages 578--594, 2018.

Digital Library

[6]

J. Devlin, M. Chang, K. Lee, and K. Toutanova. BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4171--4186, 2019.

[7]

H. Guo, R. Tang, Y. Ye, Z. Li, and X. He. Deepfm: A factorization-machine based neural network for CTR prediction. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, pages 1725--1731, 2017.

Digital Library

[8]

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770--778, 2016.

[9]

Z. Jia, O. Padon, J. Thomas, T. Warszawski, M. Zaharia, and A. Aiken. Taso: optimizing deep learning computation with automatic generation of graph substitutions. In Proceedings of the 27th ACM Symposium on Operating Systems Principles, pages 47--62, 2019.

Digital Library

[10]

Z. Jia, J. Thomas, T. Warszawski, M. Gao, M. Zaharia, and A. Aiken. Optimizing dnn computation with relaxed graph substitutions. In Proceedings of the 2nd Conference on Systems and Machine Learning, 2019.

[11]

T. Lei, Y. Zhang, and Y. Artzi. Training rnns as fast as cnns. CoRR, abs/1709.02755, 2017.

[12]

U. Pferschy and J. Schauer. The maximum flow problem with disjunctive constraints. Journal of Combinatorial Optimization, 26(1):109--119, 2013.

Digital Library

[13]

M. Sivathanu, T. Chugh, S. S. Singapuram, and L. Zhou. Astra: Exploiting predictability to optimize deep learning. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 909--923, 2019.

Digital Library

[14]

C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818--2826, 2016.

[15]

N. Vasilache, O. Zinenko, T. Theodoridis, P. Goyal, Z. DeVito, W. S. Moses, S. Verdoolaege, A. Adams, and A. Cohen. Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions. arXiv preprint arXiv:1802.04730, 2018.

[16]

S. Wu, Y. Tang, Y. Zhu, L. Wang, X. Xie, and T. Tan. Session-based recommendation with graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 346--353, 2019.

Digital Library

[17]

S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1492--1500, 2017.

[18]

L. Zheng, V. Noroozi, and P. S. Yu. Joint deep modeling of users and items using reviews for recommendation. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, pages 425--434, 2017.

Digital Library

[19]

B. Zoph and Q. V. Le. Neural architecture search with reinforcement learning. In 5th International Conference on Learning Representations, 2017.

[20]

B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8697--8710, 2018.

Cited By

Shen YChen LFang JZhang XGao SYin H(2024)Efficient Training of Graph Neural Networks on Large GraphsProceedings of the VLDB Endowment10.14778/3685800.368584417:12(4237-4240)Online publication date: 8-Nov-2024
https://dl.acm.org/doi/10.14778/3685800.3685844
Tang XChen LShi HLyu D(2024)DHyper: A Recurrent Dual Hypergraph Neural Network for Event Prediction in Temporal Knowledge GraphsACM Transactions on Information Systems10.1145/365301542:5(1-23)Online publication date: 29-Apr-2024
https://dl.acm.org/doi/10.1145/3653015
Xu ZDai XWei SYin SHu YDe V(2024)GSPO: A Graph Substitution and Parallelization Joint Optimization Framework for DNN InferenceProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3655683(1-6)Online publication date: 23-Jun-2024
https://dl.acm.org/doi/10.1145/3649329.3655683
Show More Cited By

Recommendations

Heuristic Search for DNN Graph Substitutions
CACML '23: Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning

The research and development of deep learning cannot be separated from deep neural networks (DNNs). DNNs become deeper and more complex in pursuit of accuracy and precision, leading to significantly increasing inference time and training cost. Existing ...
Collapsible subgraphs of a 4-edge-connected graph
Abstract
Jaeger in 1979 showed that every 4-edge-connected graph is supereulerian, graphs that have spanning eulerian subgraphs. Catlin in 1988 sharpened Jaeger’s result by showing that every 4-edge-connected graph is collapsible, graphs that are ...
Trivially noncontractible edges in a contraction critically 5-connected graph

An edge of a k-connected graph is said to be k-contractible if the contraction of the edge results in a k-connected graph. A k-connected graph with no k-contractible edge is said to be contraction critically k-connected. An edge of a k-connected graph ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 13, Issue 12

August 2020

1710 pages

ISSN:2150-8097

Editors:
Magdalena Balazinska
University of Washington
,
Xiaofang Zhou
University of Queensland, Australia

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 July 2020

Published in PVLDB Volume 13, Issue 12

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
508
Total Downloads

Downloads (Last 12 months)80
Downloads (Last 6 weeks)16

Reflects downloads up to 06 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Shen YChen LFang JZhang XGao SYin H(2024)Efficient Training of Graph Neural Networks on Large GraphsProceedings of the VLDB Endowment10.14778/3685800.368584417:12(4237-4240)Online publication date: 8-Nov-2024
https://dl.acm.org/doi/10.14778/3685800.3685844
Tang XChen LShi HLyu D(2024)DHyper: A Recurrent Dual Hypergraph Neural Network for Event Prediction in Temporal Knowledge GraphsACM Transactions on Information Systems10.1145/365301542:5(1-23)Online publication date: 29-Apr-2024
https://dl.acm.org/doi/10.1145/3653015
Xu ZDai XWei SYin SHu YDe V(2024)GSPO: A Graph Substitution and Parallelization Joint Optimization Framework for DNN InferenceProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3655683(1-6)Online publication date: 23-Jun-2024
https://dl.acm.org/doi/10.1145/3649329.3655683
Zheng BYu CWang JDing YLiu YWang YPekhimenko G(2023)Grape: Practical and Efficient Graphed Execution for Dynamic Deep Neural Networks on GPUsProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614248(1364-1380)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3614248
Wang ZNie PMiao XChen YWan CBu LZhao JJust RFraser G(2023)GenCoG: A DSL-Based Approach to Generating Computation Graphs for TVM TestingProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3597926.3598105(904-916)Online publication date: 12-Jul-2023
https://dl.acm.org/doi/10.1145/3597926.3598105
Ding YYu CZheng BLiu YWang YPekhimenko GAamodt TJerger NSwift M(2023)Hidet: Task-Mapping Programming Paradigm for Deep Learning Tensor ProgramsProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575702(370-384)Online publication date: 27-Jan-2023
https://dl.acm.org/doi/10.1145/3575693.3575702
Boehm MInterlandi MJermaine CDas SPandis ISelçuk Candan KAmer-Yahia S(2023)Optimizing Tensor Computations: From Applications to Compilation and Runtime TechniquesCompanion of the 2023 International Conference on Management of Data10.1145/3555041.3589407(53-59)Online publication date: 4-Jun-2023
https://dl.acm.org/doi/10.1145/3555041.3589407
Jeon BPark SLiao PXu SChen TJia ZKloeckner AMoreira J(2022)CollageProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569651(517-529)Online publication date: 8-Oct-2022
https://dl.acm.org/doi/10.1145/3559009.3569651
Phani ARath BBoehm MLi GLi ZIdreos SSrivastava D(2021)LIMA: Fine-grained Lineage Tracing and Reuse in Machine Learning SystemsProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3452788(1426-1439)Online publication date: 9-Jun-2021
https://dl.acm.org/doi/10.1145/3448016.3452788
Wang XZhao BHou RAwad ATian ZMeng DMartínez JDuato JJohn L(2021)NASGuardProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00066(776-789)Online publication date: 14-Jun-2021
https://dl.acm.org/doi/10.1109/ISCA52012.2021.00066

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents