Memory-Efficient Pipeline-Parallel DNN Training

Narayanan, Deepak; Phanishayee, Amar; Shi, Kaiyu; Chen, Xie; Zaharia, Matei

Computer Science > Machine Learning

arXiv:2006.09503 (cs)

[Submitted on 16 Jun 2020 (v1), last revised 22 Jul 2021 (this version, v3)]

Title:Memory-Efficient Pipeline-Parallel DNN Training

Authors:Deepak Narayanan, Amar Phanishayee, Kaiyu Shi, Xie Chen, Matei Zaharia

View PDF

Abstract:Many state-of-the-art ML results have been obtained by scaling up the number of parameters in existing models. However, parameters and activations for such large models often do not fit in the memory of a single accelerator device; this means that it is necessary to distribute training of large models over multiple accelerators. In this work, we propose PipeDream-2BW, a system that supports memory-efficient pipeline parallelism. PipeDream-2BW uses a novel pipelining and weight gradient coalescing strategy, combined with the double buffering of weights, to ensure high throughput, low memory footprint, and weight update semantics similar to data parallelism. In addition, PipeDream-2BW automatically partitions the model over the available hardware resources, while respecting hardware constraints such as memory capacities of accelerators and interconnect topologies. PipeDream-2BW can accelerate the training of large GPT and BERT language models by up to 20$\times$ with similar final model accuracy.

Comments:	Accepted to ICML 2021
Subjects:	Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)
Cite as:	arXiv:2006.09503 [cs.LG]
	(or arXiv:2006.09503v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2006.09503

Submission history

From: Deepak Narayanan [view email]
[v1] Tue, 16 Jun 2020 20:33:54 UTC (1,049 KB)
[v2] Thu, 18 Feb 2021 05:01:32 UTC (563 KB)
[v3] Thu, 22 Jul 2021 17:25:58 UTC (2,360 KB)

Computer Science > Machine Learning

Title:Memory-Efficient Pipeline-Parallel DNN Training

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Memory-Efficient Pipeline-Parallel DNN Training

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators