research-article

DL Inference and Training Optimization Towards Speed and Scale

Author:

Minjia ZhangAuthors Info & Claims

WWW '21: Companion Proceedings of the Web Conference 2021

Page 192

https://doi.org/10.1145/3442442.3452297

Published: 03 June 2021 Publication History

Get Access

Abstract

The application of deep learning models presents significant improvement to many services and products in Microsoft. However, it is challenging to provide efficient computation and memory capabilities for both DNN workload inference and training given that the model size and complexities keep increasing. From the serving aspect, many DL models suffer from long inference latency and high cost, preventing their deployment in production. On the training side, large-scale model training often requires complex refactoring of models and access to prohibitively expensive GPU clusters, which are not always accessible to many practitioners. We want to deliver solid solutions and systems while exploring the cutting-edge techniques to address these issues. In this talk, I will introduce our experience and lessons from designing and implementing optimizations for both DNN serving and training at large scale with remarkable compute and memory efficiency improvement and infrastructure cost reduction.

References

[1]

Conglong Li, Minjia Zhang, David G. Andersen, and Yuxiong He. 2020. Improving Approximate Nearest Neighbor Search through Learned Adaptive Early Termination. In Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14-19, 2020. ACM, 2539–2554.

Digital Library

Google Scholar

[2]

Menghao Li, Minjia Zhang, Chi Wang, and Mingqin Li. 2020. AdaTune: Adaptive Tensor Program Compilation Made Efficient. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.

Google Scholar

[3]

Jie Ren, Jiaolin Luo, Kai Wu, Minjia Zhang, Hyeran Jeon, and Dong Li. 2021. Sentinel: Efficient Tensor Migration and Allocation on Heterogeneous Memory Systems for Deep Learning. In 27th IEEE International Symposium on High-Performance Computer Architecture.

Google Scholar

[4]

Jie Ren, Jiaolin Luo, Kai Wu, Minjia Zhang, Hyeran Jeon, and Dong Li. 2021. Sentinel: Efficient Tensor Migration and Allocation on Heterogeneous Memory Systems for Deep Learning. In the 12th Non-Volatile Memories Workshop, San Diego, USA.

Google Scholar

[5]

Jie Ren, Minjia Zhang, and Dong Li. 2020. HM-ANN: Efficient Billion-Point Nearest Neighbor Search on Heterogeneous Memory. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.

Google Scholar

[6]

Jie Ren, Minjia Zhang, and Dong Li. 2021. HM-ANN: Efficient Billion-Point Nearest Neighbor Search on Heterogeneous Memory. In the 12th Non-Volatile Memories Workshop, San Diego, USA.

Google Scholar

[7]

Minjia Zhang and Yuxiong He. 2019. GRIP: Multi-Store Capacity-Optimized High-Performance Nearest Neighbor Search for Vector Search Engine. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, CIKM 2019, Beijing, China, November 3-7, 2019. ACM, 1673–1682.

Digital Library

Google Scholar

[8]

Minjia Zhang and Yuxiong He. 2020. Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.

Google Scholar

[9]

Minjia Zhang, Zehua Hu, and Mingqin Li. 2021. DUET: A Compiler-Runtime Subgraph Scheduling Approach for Tensor Programs on a Coupled CPU-GPU Architecture. In The 35th IEEE International Parallel and Distributed Processing Symposium, Portland, Oregon, USA.

Crossref

Google Scholar

[10]

Minjia Zhang, Menghao Li, Chi Wang, and Mingqin Li. 2021. DynaTune: Dynamic Tensor Program Optimization in Deep Neural Network Compilation. In the 9th International Conference on Learning Representations (ICLR 2021).

Google Scholar

[11]

Minjia Zhang, Samyam Rajbhandari, Wenhan Wang, and Yuxiong He. 2018. DeepCPU: Serving RNN-based Deep Learning Models 10x Faster. In 2018 USENIX Annual Technical Conference, USENIX ATC 2018, Boston, MA, USA, July 11-13, 2018. USENIX Association, 951–965.

Google Scholar

[12]

Minjia Zhang, Samyam Rajbhandari, Wenhan Wang, Elton Zheng, Olatunji Ruwase, Jeff Rasley, Jason Li, Junhua Wang, and Yuxiong He. 2019. Accelerating Large Scale Deep Learning Inference through DeepCPU at Microsoft. In 2019 USENIX Conference on Operational Machine Learning, OpML 2019, Santa Clara, CA, USA, May 20, 2019. USENIX Association, 5–7.

Google Scholar

Recommendations

Training query filtering for semi-supervised learning to rank with pseudo labels

Semi-supervised learning is a machine learning paradigm that can be applied to create pseudo labels from unlabeled data for learning a ranking model, when there is only limited or no training examples available. However, the effectiveness of semi-...
Towards dropout training for convolutional neural networks

Recently, dropout has seen increasing use in deep learning. For deep convolutional neural networks, dropout is known to work well in fully-connected layers. However, its effect in convolutional and pooling layers is still not clear. This paper ...
Towards making co-training suffer less from insufficient views

Co-training is a famous semi-supervised learning algorithm which can exploit unlabeled data to improve learning performance. Generally it works under a two-view setting (the input examples have two disjoint feature sets in nature), with the assumption ...

Comments

Information & Contributors

Information

Published In

WWW '21: Companion Proceedings of the Web Conference 2021

April 2021

726 pages

ISBN:9781450383134

DOI:10.1145/3442442

Editors:
Jure Leskovec
Stanford
,
Marko Grobelnik
Jožef Stefan Institute
,
Marc Najork
Google
,
Jie Tang
Tsinghua University
,
Leila Zia
Wikimedia Foundation

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 June 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

WWW '21

Sponsor:

SIGWEB

WWW '21: The Web Conference 2021

April 19 - 23, 2021

Ljubljana, Slovenia

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
122
Total Downloads

Downloads (Last 12 months)17
Downloads (Last 6 weeks)3

Reflects downloads up to 13 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Abstract

References

Recommendations

Training query filtering for semi-supervised learning to rank with pseudo labels

Towards dropout training for convolutional neural networks

Towards making co-training suffer less from insufficient views

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

HTML Format

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations