research-article

Open access

FastFold: Optimizing AlphaFold Training and Inference on GPU Clusters

Authors:

Shenggan Cheng,

Yang YouAuthors Info & Claims

PPoPP '24: Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming

Pages 417 - 430

https://doi.org/10.1145/3627535.3638465

Published: 20 February 2024 Publication History

Abstract

Protein structure prediction helps to understand gene translation and protein function, which is of growing interest and importance in structural biology. The AlphaFold model, which used transformer architecture to achieve atomic-level accuracy in protein structure prediction, was a significant breakthrough. However, training and inference of AlphaFold model are challenging due to its high computation and memory cost. In this work, we present FastFold, an efficient implementation of AlphaFold for both training and inference. We propose Dynamic Axial Parallelism (DAP) as a novel model parallelism method. Additionally, we have implemented a series of low-level optimizations aimed at reducing communication, computation, and memory costs. These optimizations include Duality Async Operations, highly optimized kernels, and AutoChunk (an automated search algorithm finds the best chunk strategy to reduce memory peaks). Experimental results show that FastFold can efficiently scale to more GPUs using DAP and reduces overall training time from 11 days to 67 hours and achieves 7.5 ~ 9.5× speedup for long-sequence inference. Furthermore, AutoChunk can reduce memory cost by over 80% during inference by automatically partitioning the intermediate tensors during the computation.

References

[1]

Gustaf Ahdritz, Nazim Bouatta, Sachin Kadyan, Qinghui Xia, William Gerecke, and Mohammed AlQuraishi. 2021. OpenFold.

[2]

Christian B Anfinsen. 1973. Principles that govern the folding of protein chains. Science 181, 4096 (1973), 223--230.

[3]

Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Lučić, and Cordelia Schmid. 2021. Vivit: A video vision transformer. In Proceedings of the IEEE/CVF international conference on computer vision. 6836--6846.

[4]

Minkyung Baek, Frank DiMaio, Ivan Anishchenko, Justas Dauparas, Sergey Ovchinnikov, Gyu Rie Lee, Jue Wang, Qian Cong, Lisa N Kinch, R Dustin Schaeffer, et al. 2021. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 6557 (2021), 871--876.

[5]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877--1901.

[6]

Tianqi Chen, Bing Xu, Chiyuan Zhang, and Carlos Guestrin. 2016. Training deep nets with sublinear memory cost. arXiv preprint arXiv:1604.06174 (2016).

[7]

Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, Afroz Mohiuddin, Lukasz Kaiser, et al. 2021. Rethinking attention with performers. International Conference on Learning Representations (2021).

[8]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. (2019), 4171--4186.

[9]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2021. An image is worth 16x16 words: Transformers for image recognition at scale. International Conference on Learning Representations (2021).

[10]

Shiqing Fan, Yi Rong, Chen Meng, Zongyan Cao, Siyu Wang, Zhen Zheng, Chuan Wu, Guoping Long, Jun Yang, Lixue Xia, et al. 2021. DAPPLE: A pipelined data parallel approach for training large models. In Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 431--445.

Digital Library

[11]

Jiarui Fang, Yang Yu, Chengduo Zhao, and Jie Zhou. 2021. TurboTransformers: an efficient GPU serving system for transformer models. In Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 389--402.

Digital Library

[12]

Roy Frostig, Matthew James Johnson, and Chris Leary. 2018. Compiling machine learning programs via high-level tracing. Systems for Machine Learning (2018), 23--24.

[13]

Casper A. Goverde, Benedict Wolf, Hamed Khakzad, Stéphane Rosset, and Bruno E. Correia. 2023. De novo protein design by inversion of the AlphaFold structure prediction network. Protein Science 32, 6 (2023), e4653.

[14]

Jonathan Ho, Nal Kalchbrenner, Dirk Weissenborn, and Tim Salimans. 2019. Axial attention in multidimensional transformers. arXiv preprint arXiv:1912.12180 (2019).

[15]

Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Dehao Chen, Mia Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V Le, Yonghui Wu, et al. 2019. Gpipe: Efficient training of giant neural networks using pipeline parallelism. Advances in neural information processing systems 32 (2019).

[16]

Norman P Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th annual international symposium on computer architecture. 1--12.

Digital Library

[17]

John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, Anna Potapenko, et al. 2021. Highly accurate protein structure prediction with AlphaFold. Nature 596, 7873 (2021), 583--589.

[18]

Dhiraj Kalamkar, Dheevatsa Mudigere, Naveen Mellempudi, Dipankar Das, Kunal Banerjee, Sasikanth Avancha, Dharma Teja Vooturi, Nataraj Jammalamadaka, Jianyu Huang, Hector Yuen, et al. 2019. A study of BFLOAT16 for deep learning training. arXiv preprint arXiv:1905.12322 (2019).

[19]

Nikita Kitaev, Łukasz Kaiser, and Anselm Levskaya. 2020. Reformer: The efficient transformer. International Conference on Learning Representations (2020).

[20]

Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, et al. 2022. Language models of protein sequences at the scale of evolution enable accurate structure prediction. BioRxiv 2022 (2022), 500902.

[21]

Ziming Liu, Shenggan Cheng, Haotian Zhou, and Yang You. 2023. Hanayo: Harnessing Wave-like Pipeline Parallelism for Enhanced Large Model Training Efficiency. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1--13.

Digital Library

[22]

Deepak Narayanan, Mohammad Shoeybi, Jared Casper, Patrick LeGresley, Mostofa Patwary, Vijay Korthikanti, Dmitri Vainbrand, Prethvi Kashinkunti, Julie Bernauer, Bryan Catanzaro, Amar Phanishayee, and Matei Zaharia. 2021. Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (St. Louis, Missouri) (SC '21). Association for Computing Machinery, Article 58, 15 pages.

Digital Library

[23]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019).

[24]

Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, and Yuxiong He. 2020. Zero: Memory optimizations toward training trillion parameter models. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1--16.

[25]

Samyam Rajbhandari, Olatunji Ruwase, Jeff Rasley, Shaden Smith, and Yuxiong He. 2021. ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (St. Louis, Missouri) (SC '21). Association for Computing Machinery, Article 59, 14 pages.

Digital Library

[26]

Roshan M Rao, Jason Liu, Robert Verkuil, Joshua Meier, John Canny, Pieter Abbeel, Tom Sercu, and Alexander Rives. 2021. MSA transformer. In International Conference on Machine Learning. PMLR, 8844--8856.

[27]

Jie Ren, Samyam Rajbhandari, Reza Yazdani Aminabadi, Olatunji Ruwase, Shuangyan Yang, Minjia Zhang, Dong Li, and Yuxiong He. 2021. {ZeRO-Offload}: Democratizing {Billion-Scale} model training. In 2021 USENIX Annual Technical Conference (USENIX ATC 21). 551--564.

[28]

Andrew W Senior, Richard Evans, John Jumper, James Kirkpatrick, Laurent Sifre, Tim Green, Chongli Qin, Augustin Žídek, Alexander WR Nelson, Alex Bridgland, et al. 2020. Improved protein structure prediction using potentials from deep learning. Nature 577, 7792 (2020), 706--710.

[29]

Janet M Thornton, Roman A Laskowski, and Neera Borkakoti. 2021. AlphaFold heralds a data-driven revolution in biology and medicine. Nature Medicine 27, 10 (2021), 1666--1669.

[30]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).

[31]

Sinong Wang, Belinda Z Li, Madian Khabsa, Han Fang, and Hao Ma. 2020. Linformer: Self-attention with linear complexity. arXiv preprint arXiv:2006.04768 (2020).

[32]

Xiaohui Wang, Yang Wei, Ying Xiong, Guyue Huang, Xian Qian, Yufei Ding, Mingxuan Wang, and Lei Li. 2022. Lightseq2: Accelerated training for transformer-based models on gpus. In SC22: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1--14.

[33]

Xiaohui Wang, Ying Xiong, Yang Wei, Mingxuan Wang, and Lei Li. 2021. LightSeq: A High Performance Inference Library for Transformers. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers. 113--120.

[34]

Jinbo Xu. 2019. Distance-based protein folding powered by deep learning. Proceedings of the National Academy of Sciences 116, 34 (2019), 16856--16865.

[35]

Jinrui Xu and Yang Zhang. 2010. How significant is a protein structure similarity with TM-score= 0.5? Bioinformatics 26, 7 (2010), 889--895.

Digital Library

[36]

Yang You, Igor Gitman, and Boris Ginsburg. 2017. Large batch training of convolutional networks. arXiv preprint arXiv:1708.03888 (2017).

[37]

Yang You, Jing Li, Sashank Reddi, Jonathan Hseu, Sanjiv Kumar, Srinadh Bhojanapalli, Xiaodan Song, James Demmel, Kurt Keutzer, and Cho-Jui Hsieh. 2020. Large batch optimization for deep learning: Training bert in 76 minutes. International Conference on Learning Representations (2020).

[38]

Yang Zhang and Jeffrey Skolnick. 2004. Scoring function for automated assessment of protein structure template quality. Proteins: Structure, Function, and Bioinformatics 57, 4 (2004), 702--710.

[39]

Bozitao Zhong, Xiaoming Su, Minhua Wen, Sicheng Zuo, Liang Hong, and James Lin. 2022. ParaFold: Paralleling AlphaFold for Large-Scale Predictions. In International Conference on High Performance Computing in Asia-Pacific Region Workshops. 1--9.

Cited By

Kim GLee SLevy Karin EKim HMoriwaki YOvchinnikov SSteinegger MMirdita M(2024)Easy and accurate protein structure prediction using ColabFoldNature Protocols10.1038/s41596-024-01060-5Online publication date: 14-Oct-2024
https://doi.org/10.1038/s41596-024-01060-5
Ahdritz GBouatta NFloristean CKadyan SXia QGerecke WO’Donnell TBerenberg DFisk IZanichelli NZhang BNowaczynski AWang BStepniewska-Dziubinska MZhang SOjewole AGuney MBiderman SWatkins ARa SLorenzo PNivon LWeitzner BBan YChen SZhang MLi CSong SHe YSorger PMostaque EZhang ZBonneau RAlQuraishi M(2024)OpenFold: retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalizationNature Methods10.1038/s41592-024-02272-z21:8(1514-1524)Online publication date: 14-May-2024
https://doi.org/10.1038/s41592-024-02272-z

Index Terms

FastFold: Optimizing AlphaFold Training and Inference on GPU Clusters
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel algorithms

Recommendations

Unconventional training for neural network predictions of inter-residue contacts
CompBio '09: Proceedings of the 1st ACM workshop on Breaking frontiers of computational biology

The ab-initio prediction of protein three-dimensional structures (protein folding problem) from protein residue sequences is a problem that has involved a great deal of scientific investigation in the last decades, but the results obtained so far are ...
Parallelizing DNN Training on GPUs: Challenges and Opportunities
WWW '21: Companion Proceedings of the Web Conference 2021

In recent years, Deep Neural Networks (DNNs) have emerged as a widely adopted approach in many application domains. Training DNN models is also becoming a significant fraction of the datacenter workload. Recent evidence has demonstrated that modern ...
Scaling deep learning on GPU and knights landing clusters
SC '17: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

Training neural networks has become a big bottleneck. For example, training ImageNet dataset on one Nvidia K20 GPU needs 21 days. To speed up the training process, the current deep learning systems heavily rely on the hardware accelerators. However, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PPoPP '24: Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming

March 2024

498 pages

ISBN:9798400704352

DOI:10.1145/3627535

Chair:
Michel Steuwer,
Program Chairs:
I-Ting Angelina Lee,
Milind Chabbi
Uber Technologies Inc.

Copyright © 2024 Owner/Author(s).

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 February 2024

Check for updates

Author Tags

Qualifiers

Research-article

Conference

PPoPP '24

Sponsor:

PPoPP '24: 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming

March 2 - 6, 2024

Edinburgh, United Kingdom

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
1,434
Total Downloads

Downloads (Last 12 months)1,434
Downloads (Last 6 weeks)190

Reflects downloads up to 22 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Kim GLee SLevy Karin EKim HMoriwaki YOvchinnikov SSteinegger MMirdita M(2024)Easy and accurate protein structure prediction using ColabFoldNature Protocols10.1038/s41596-024-01060-5Online publication date: 14-Oct-2024
https://doi.org/10.1038/s41596-024-01060-5
Ahdritz GBouatta NFloristean CKadyan SXia QGerecke WO’Donnell TBerenberg DFisk IZanichelli NZhang BNowaczynski AWang BStepniewska-Dziubinska MZhang SOjewole AGuney MBiderman SWatkins ARa SLorenzo PNivon LWeitzner BBan YChen SZhang MLi CSong SHe YSorger PMostaque EZhang ZBonneau RAlQuraishi M(2024)OpenFold: retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalizationNature Methods10.1038/s41592-024-02272-z21:8(1514-1524)Online publication date: 14-May-2024
https://doi.org/10.1038/s41592-024-02272-z

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents