Anyscale’s Post

View organization page for Anyscale, graphic

30,997 followers

Audio and Images inputs are now supported in vLLM, with video and others coming as well. Multi-modal LLMs are coming and will open a huge number of use cases for generative AI. In this year's Ray Summit, we're hosting a vLLM track, where the creators of vLLM, key contributors, and power users will gather and share perspectives on the future of LLM infrastructure. If you're hosting open source models or interested in the future of large multi-modal models, Ray Summit is the place to be: https://lnkd.in/gDKgc-h

View profile for Richard Liaw, graphic

Anyscale

It's hard not to be bullish on open source infrastructure for AI. 🚀 With the 0.6.0 release, Roger Wang and other key contributors have driven major contributions for multi-modality support in vLLM, opening up a huge number of applications and use cases. In this release: - 🎶 vLLM has added its first Audio model (Ultravox), bringing its total supported multimodal model count to nearly a dozen - 📸 vLLM can now accept multiple images in a single input, allowing users to provide more robust context to the language model - ✊ Tensor parallelism for vision encoders, dramatically improving latency for multi-modal models - 📽 In the near future, vLLM will support other modalities such as video inputs, and the team is currently working hard on supporting Qwen2-VL as well! These changes are significant in enabling a whole new set of applications to leverage generative AI, including audio summarization, conversational AI, product cataloging, and more. If you're interested in learning more, the core members of the vLLM team will be at Ray Summit this year. Come by! https://lnkd.in/gRfYQ-ST

  • No alternative text description for this image

To view or add a comment, sign in

Explore topics