fsdp

Minimal yet high performant code for pretraining llms. Attempts to implement some SOTA features. Implements training through: Deepspeed, Megatron-LM, and FSDP. WIP

huggingface pretraining deepspeed megatron-lm llm fsdp

Updated Feb 6, 2024
Python

walln / loadax

Star

Dataloading for JAX

datasets ddp distributed-training dataloading jax xla fsdp

Updated Oct 3, 2024
Python

abhilash1910 / Framework-Optimization

Sponsor

Star

Framework, Model & Kernel Optimizations for Distributed Deep Learning - Data Hack Summit

pytorch triton codegen inductor ddp deepspeed fsdp tensorparallel pipelineparallel

Updated Aug 1, 2023
Python

ridwan-salau / transformer-xl

Star

Fully Sharded Data Parallel (FSDP) implementation of Transformer XL

pytorch transformer fsdp fully-sharded-data-parallel

Updated Apr 24, 2023
Python

HROlive / Large-Language-Models-on-Supercomputers

Star

Comprehensive exploration of LLMs, including cutting-edge techniques and tools such as parameter-efficient fine-tuning (PEFT), quantization, zero redundancy optimizers (ZeRO), fully sharded data parallelism (FSDP), DeepSpeed, and Huggingface accelerate.

python monitoring jupyter hpc slurm transformer high-performance-computing quantization evaluation-metrics tokenization peft huggingface huggingface-transformers deepspeed llm fsdp llm-training llm-inference unsloth

Updated Sep 27, 2024
Jupyter Notebook

Improve this page

Add a description, image, and links to the fsdp topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the fsdp topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fsdp

Here are 10 public repositories matching this topic...

LambdaLabsML / distributed-training-guide

SohamGovande / podplex

GURPREETKAURJETHRA / Meta-LLAMA3-GenAI-UseCases-End-To-End-Implementation-Guides

AlibabaPAI / FlashModels

arawxx / FSDP-Distributed-Training-of-ConvNextV2-on-CIFAR10

SulRash / minLLMTrain

walln / loadax

abhilash1910 / Framework-Optimization

ridwan-salau / transformer-xl

HROlive / Large-Language-Models-on-Supercomputers

Improve this page

Add this topic to your repo