Scaling Up Models and Data with $\texttt{t5x}$ and $\texttt{seqio}$

Roberts, Adam; Chung, Hyung Won; Levskaya, Anselm; Mishra, Gaurav; Bradbury, James; Andor, Daniel; Narang, Sharan; Lester, Brian; Gaffney, Colin; Mohiuddin, Afroz; Hawthorne, Curtis; Lewkowycz, Aitor; Salcianu, Alex; van Zee, Marc; Austin, Jacob; Goodman, Sebastian; Soares, Livio Baldini; Hu, Haitang; Tsvyashchenko, Sasha; Chowdhery, Aakanksha; Bastings, Jasmijn; Bulian, Jannis; Garcia, Xavier; Ni, Jianmo; Chen, Andrew; Kenealy, Kathleen; Clark, Jonathan H.; Lee, Stephan; Garrette, Dan; Lee-Thorp, James; Raffel, Colin; Shazeer, Noam; Ritter, Marvin; Bosma, Maarten; Passos, Alexandre; Maitin-Shepard, Jeremy; Fiedel, Noah; Omernick, Mark; Saeta, Brennan; Sepassi, Ryan; Spiridonov, Alexander; Newlan, Joshua; Gesmundo, Andrea

Abstract:Recent neural network-based language models have benefited greatly from scaling up the size of training datasets and the number of parameters in the models themselves. Scaling can be complicated due to various factors including the need to distribute computation on supercomputer clusters (e.g., TPUs), prevent bottlenecks when infeeding data, and ensure reproducible results. In this work, we present two software libraries that ease these issues: $\texttt{t5x}$ simplifies the process of building and training large language models at scale while maintaining ease of use, and $\texttt{seqio}$ provides a task-based API for simple creation of fast and reproducible training data and evaluation pipelines. These open-source libraries have been used to train models with hundreds of billions of parameters on datasets with multiple terabytes of training data.
Along with the libraries, we release configurations and instructions for T5-like encoder-decoder models as well as GPT-like decoder-only architectures.
$\texttt{t5x}$ and $\texttt{seqio}$ are open source and available at this https URL and this https URL, respectively.

Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL)
Cite as:	arXiv:2203.17189 [cs.LG]
	(or arXiv:2203.17189v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2203.17189

Computer Science > Machine Learning

Title:Scaling Up Models and Data with $\texttt{t5x}$ and $\texttt{seqio}$

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators