Stars
Start building LLM-empowered multi-agent applications in an easier way.
Ongoing research training transformer models at scale
A high-throughput and memory-efficient inference and serving engine for LLMs
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.