DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines

Jiang, Chenyu; Jia, Zhen; Zheng, Shuai; Wang, Yida; Wu, Chuan

doi:10.1145/3627703.3629585

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2311.10418 (cs)

[Submitted on 17 Nov 2023]

Title:DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines

Authors:Chenyu Jiang, Zhen Jia, Shuai Zheng, Yida Wang, Chuan Wu

View PDF

Abstract:Multi-task model training has been adopted to enable a single deep neural network model (often a large language model) to handle multiple tasks (e.g., question answering and text summarization). Multi-task training commonly receives input sequences of highly different lengths due to the diverse contexts of different tasks. Padding (to the same sequence length) or packing (short examples into long sequences of the same length) is usually adopted to prepare input samples for model training, which is nonetheless not space or computation efficient. This paper proposes a dynamic micro-batching approach to tackle sequence length variation and enable efficient multi-task model training. We advocate pipeline-parallel training of the large model with variable-length micro-batches, each of which potentially comprises a different number of samples. We optimize micro-batch construction using a dynamic programming-based approach, and handle micro-batch execution time variation through dynamic pipeline and communication scheduling, enabling highly efficient pipeline training. Extensive evaluation on the FLANv2 dataset demonstrates up to 4.39x higher training throughput when training T5, and 3.25x when training GPT, as compared with packing-based baselines. DynaPipe's source code is publicly available at this https URL.

Comments:	18 pages, 18 figures
Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Cite as:	arXiv:2311.10418 [cs.DC]
	(or arXiv:2311.10418v1 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2311.10418
Related DOI:	https://doi.org/10.1145/3627703.3629585

Submission history

From: Chenyu Jiang [view email]
[v1] Fri, 17 Nov 2023 09:48:45 UTC (1,064 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators