Magnifico’s Post

Magnifico reposted this

View profile for Philipp Schmid, graphic

Technical Lead & LLMs at Hugging Face 🤗 | AWS ML HERO 🦸🏻♂️

Can we pre-train LLMs with Retrieval Augmentation? 🤔 RETRO was a research by Google DeepMind, which included retrieval into the pre-trainng process. Now NVIDIA continues this research by scaling RETRO to 48B, where they continued pretraining a 43B GPT model on an additional 100 billion tokens using the Retrieval augmentation method by retrieving from 1.2 trillion tokens. 🤯 𝗜𝗺𝗽𝗹𝗲𝗺𝗲𝗻𝘁𝗮𝘁𝗶𝗼𝗻: 1️⃣ Pretrain an LLM using the next token prediction on 1.1T token 2️⃣ Continue pretraining of the LLM with retrieval augmentation (Retro-fitting) on an additional 100 billion tokens while retrieving from whole pretraining dataset. 3️⃣ Instruction tune the model, with retrieval augmentation. Only update the weights of the decoder. 𝗜𝗻𝘀𝗶𝗴𝗵𝘁𝘀: 📈 Continued pretraining of LLMs with retrieval yields better decoders for QA. 🚀 Up to 10% improvements doing retrieval augmented pretraining 📚 Pretraining with retrieval improves the incorporating of context 🧮 Retrieval database had 19 billion chunks, with each chunk containing 64 tokens 🐌 Retrieval pretraining is slower and more complex due index and retrieval step Check out the full paper: https://lnkd.in/d2R4zsQ5 Remember that these are just my personal findings. Make sure always to conduct your own research and analysis. 🤗

  • No alternative text description for this image
Syed Muhammad Asad

Developing Solutions for M+ Figure Organizations | Entrepreneur | Founder | AI/ML

10mo

Insightful. I'm also doing the practices of fine-tuning of model (llama) with retrieval but using Lora instead of decoder-only. The objective is to make the model to be able to work with real-time data and provide the answers on fresh data while having dialogue behaviour.

Emeka Boris Ama

MLOps(Cloud - AWS) + Machine Learning Engineer transitioning to Data Engineering 💪Making a difference with ML and Data Science

10mo

This is mind blowing

Ratna Sambhav

Machine Learning | Deep Learning | AI

10mo

Only 64 token per chunk? But it is interesting to find that a model can be trained for the whole process of RAG.

John Alexander

Strategist. Teacher. Leader. Advocate. | Passionate about applying AI in the real world.

10mo

Very cool! What was the cost on this?

Layton Perrin

Entrepreneur, Leader, Architect, Full-Stack Extreme Virtuoso: Business Analysis, Cyber Security, Data Science. ITIL BPM SLM Expert bringing Modern Approaches to drive Business Processes.

10mo

Thank you for the post - I am thinking Domain Specific Retrieval Augmentation :)

Nick Sciarrilli

Senior Solutions Architect @ AWS

10mo

Retrieval Augmented Pretraining RAP is cool but have you heard of RASTA? Rastafari!

Like
Reply
Lingxiao Wang

NLP Researcher | Expert in LLMs & Industrial Applications | Keep learning ^^

10mo

Huge scalable "fine tune", too expensive.$_$

Ajith Vallath Prabhakar

Top Artificial Intelligence (AI) Voice | AI & ML Visionary | Generative AI Advocate | Strategic Technologist | Deloitte | Author of Ajith's AI Pulse (ajithp.com)

10mo

This is really insightful. Thanks for sharing

Like
Reply
Carlos Bilbao Lara

AI/ML engineer | Software Engineer | MLOps engineer

10mo

🤯🤯🤯

Like
Reply
See more comments

To view or add a comment, sign in

Explore topics