Magnifico’s Post

Magnifico reposted this

Technical Lead & LLMs at Hugging Face 🤗 | AWS ML HERO 🦸🏻♂️

10mo Edited

Can we pre-train LLMs with Retrieval Augmentation? 🤔 RETRO was a research by Google DeepMind, which included retrieval into the pre-trainng process. Now NVIDIA continues this research by scaling RETRO to 48B, where they continued pretraining a 43B GPT model on an additional 100 billion tokens using the Retrieval augmentation method by retrieving from 1.2 trillion tokens. 🤯 𝗜𝗺𝗽𝗹𝗲𝗺𝗲𝗻𝘁𝗮𝘁𝗶𝗼𝗻: 1️⃣ Pretrain an LLM using the next token prediction on 1.1T token 2️⃣ Continue pretraining of the LLM with retrieval augmentation (Retro-fitting) on an additional 100 billion tokens while retrieving from whole pretraining dataset. 3️⃣ Instruction tune the model, with retrieval augmentation. Only update the weights of the decoder. 𝗜𝗻𝘀𝗶𝗴𝗵𝘁𝘀: 📈 Continued pretraining of LLMs with retrieval yields better decoders for QA. 🚀 Up to 10% improvements doing retrieval augmented pretraining 📚 Pretraining with retrieval improves the incorporating of context 🧮 Retrieval database had 19 billion chunks, with each chunk containing 64 tokens 🐌 Retrieval pretraining is slower and more complex due index and retrieval step Check out the full paper: https://lnkd.in/d2R4zsQ5 Remember that these are just my personal findings. Make sure always to conduct your own research and analysis. 🤗

12 Comments

Syed Muhammad Asad

Developing Solutions for M+ Figure Organizations | Entrepreneur | Founder | AI/ML

10mo

Insightful. I'm also doing the practices of fine-tuning of model (llama) with retrieval but using Lora instead of decoder-only. The objective is to make the model to be able to work with real-time data and provide the answers on fresh data while having dialogue behaviour.

2 Reactions

Emeka Boris Ama

MLOps(Cloud - AWS) + Machine Learning Engineer transitioning to Data Engineering 💪Making a difference with ML and Data Science

10mo

This is mind blowing

2 Reactions

Ratna Sambhav

Machine Learning | Deep Learning | AI

10mo

Only 64 token per chunk? But it is interesting to find that a model can be trained for the whole process of RAG.

1 Reaction

John Alexander

Strategist. Teacher. Leader. Advocate. | Passionate about applying AI in the real world.

10mo

Very cool! What was the cost on this?

1 Reaction

Layton Perrin

Entrepreneur, Leader, Architect, Full-Stack Extreme Virtuoso: Business Analysis, Cyber Security, Data Science. ITIL BPM SLM Expert bringing Modern Approaches to drive Business Processes.

10mo

Thank you for the post - I am thinking Domain Specific Retrieval Augmentation :)

1 Reaction

Dr. Florian Thamm

10mo

Maria Sukhareva

2 Reactions

Nick Sciarrilli

Senior Solutions Architect @ AWS

10mo

Retrieval Augmented Pretraining RAP is cool but have you heard of RASTA? Rastafari!

Lingxiao Wang

NLP Researcher | Expert in LLMs & Industrial Applications | Keep learning ^^

10mo

Huge scalable "fine tune", too expensive.$_$

1 Reaction

Ajith Vallath Prabhakar

10mo

This is really insightful. Thanks for sharing

Carlos Bilbao Lara

AI/ML engineer | Software Engineer | MLOps engineer

10mo

🤯🤯🤯

See more comments

To view or add a comment, sign in

More Relevant Posts

Magnifico

207 followers
9mo
Report this post
Excited to learn more about NeMo and possibly implement it for customers!

NVIDIA AI

1,021,603 followers
9mo

ICYMI: We announced NeMo Retriever, a #generativeAI microservice that connects custom LLMs to enterprise data. It delivers highly accurate responses for AI applications. #AWSreInvent Image created using OpenAI's DALL-E, prompted by Jordan Ranous of StorageReview.com. Read more: https://nvda.ws/46HWQGl

NVIDIA NeMo Retriever for Generative AI in Enterprises Announed

https://www.storagereview.com
Like Comment
To view or add a comment, sign in
Magnifico

207 followers
9mo
Report this post
This is amazing!

Anthony Robbins

Watch the keynote from our most recent GTC to see NVIDIA CEO Jensen Huang share the AI technologies affecting every industry—and our everyday lives.
9mo

BREAKING NEWS from #awsreinvent2023 Amazon Web Services (AWS) and NVIDIA Announce Strategic Collaboration to Offer New Supercomputing Infrastructure, Software and Services for Generative AI November 28, 2023 -> AWS to offer first cloud AI supercomputer with NVIDIA Grace Hopper Superchip and AWS UltraCluster scalability -> NVIDIA DGX Cloud—first to feature NVIDIA GH200 NVL32—coming to AWS -> Companies partner on Project Ceiba—the world’s fastest GPU-powered AI supercomputer and newest NVIDIA DGX Cloud supercomputer for NVIDIA AI R&D and custom model development -> New Amazon EC2 instances powered by NVIDIA GH200, H200, L40S and L4 GPUs supercharge generative AI, HPC, design and simulation workloads -> NVIDIA software on AWS—NeMo LLM framework, NeMo Retriever and BioNeMo—to boost generative AI development for custom models, semantic retrieval and drug discovery “Amazon Web Services (AWS) and NVIDIA have collaborated for more than 13 years, beginning with the world’s first #GPU cloud instance. Today, we offer the widest range of NVIDIA #GPU solutions for workloads including #graphics, #gaming, #highperformancecomputing, #machinelearning, and now, generative ai,” said Adam Selipsky, #CEO at Amazon Web Services (AWS). “We continue to innovate with NVIDIA to make AWS the best place to run GPUs, combining next-gen NVIDIA Grace Hopper Superchips with AWS’s EFA powerful networking, EC2 UltraClusters’ hyper-scale clustering, and Nitro’s advanced virtualization capabilities.” “#generativeai is transforming #cloud workloads and putting #acceleratedcomputing at the foundation of diverse content generation,” said Jensen Huang, #founder and #CEO of NVIDIA. “Driven by a common mission to deliver cost-effective state-of-the-art #generativeai to every customer, NVIDIA and AWS are collaborating across the entire computing stack, spanning AI infrastructure, acceleration libraries, foundation models, to generative AI services.” Bill Vass | Rich Geraffo | David Appel | Kim Majerus | Ray Falcione | Jim Young | Rebecca Wetherly | Rima Olinger | Heidi Buck | Mary Alexander | Ash Thankey | Amy Belcher | Kyle Johnson | Phil Goldstein | Iram A. Ali | Matthew Briggs | David Rubal, CISSP, NREMT | Renzo Rodriguez | Christian Hoff | Debra Goldfarb | Robin Goad | Dominic Delmolino | Brian Pickering United States Department of Defense | Defense Information Systems Agency | Defense Advanced Research Projects Agency (DARPA) | Lockheed Martin | Raytheon | Northrop Grumman | Huntington Ingalls Industries, Inc. | MITRE

AWS and NVIDIA Announce Strategic Collaboration to Offer New Supercomputing Infrastructure, Software and Services for Generative AI

nvidianews.nvidia.com
Like Comment
To view or add a comment, sign in
Magnifico

207 followers
10mo
Report this post
👏
Gabriele Venturi

Building PandasAI, the library to extract value from your data
10mo

OpenAI is not the death of startups, it's a Wakeup Call! In the past days, many fellow entrepreneurs have reached out to me asking if I'm concerned about OpenAI's new offerings. There's a narrative spreading that thousands of startups will be killed by capabilities like custom chatbots and text generation revealed at DevDay. Like them, I've been following the OpenAI announcements closely. The launches of tools like custom GPTs have no doubt led some to predict the impending downfall of companies built on conversational AI. However, I believe this view misses the mark. The startups destined for disruption by OpenAI's offerings were likely already on borrowed time. Building a thin wrapper on top of an owned technology like GPT-4 was never going to be a sustainable business in the long-run. The value lies in building differentiated products with unique data and capabilities. There are a few key points we should keep in mind: ✅ Generative AI is more than just conversational interfaces. For the first time, we have technology mimicking human reasoning and creativity. The possibilities extend far beyond chatbots. ✅ Conversational interfaces are not necessarily the future. OpenAI's own usage data for ChatGPT shows declines after initial hype. A clickable UI can often be more efficient than typing sentences. ✅ There's a massive difference between a basic integration of LLMs and building an actual product. True startups solve real problems and meet needs. An intelligent interface is just one piece. For many companies, this moment is an opportunity, not a death sentence. The time has come to stop relying on third-party technology and double down on unique data, industry expertise, and product-market fit. The fundamentals haven't changed. Building a startup today still means making something people want. OpenAI expands what's possible, but ultimately we still need to identify real problems and develop complete solutions. Rather than the end, I see this as a wakeup call. A nudge to build differentiated products on owned technology, not thin layers on leased foundations. The startups that survive will be those that embrace OpenAI as an enabler, not a crutch. An amazing new technology to create value, not a shortcut. This is an exciting time full of possibility. OpenAI has raised the bar, but also opened up many new avenues. For startups willing to learn and adapt, the opportunities are endless. The only true death will come to those who fail to evolve. #OpenAI #startup #GenAI
Like Comment
To view or add a comment, sign in
Magnifico

207 followers
10mo
Report this post
There's a lot to be optimistic about with AI.
Shahid Azim

CEO I Managing Partner I Co-founder I Serial Health Tech Entrepreneur
10mo Edited

Back from Ted AI, here some interesting snippets from an exciting couple of days. Broadly, though we are seeing hype cycles in some segments, there is a massive societal scale shift underway which touches every profession and every sector. #c10labs C10 Labs also hosted its first west coast AI Salon which was attended by some amazing minds! #ai4impact With God like powers , comes a need for god like wisdom. Don’t hate the tech players, but change the rules of the game! ai is not just a tool but a ladder for us. English is the most common programming language now! We are going to have an agent for everything Personals agent for everyone. “Having your bit flipped !!! ( seeing AI perform !) “, Ried Hoffman’s quote on seeing chatGPT perform for the first time in a private setting with Gates. Do not panic. Line of sight medical assistants for every one, Tutors for everyone. Potential emergence of world of possibilities and abundance. Navigate better outcomes for humanity with AI. Human ingenuity like never before with AI. Ai allows for Gift of time for the patient doctor relationship. Ai is a technology of abundance and we should not approach it with a scarcity mindset. Ramesh Raskar Patricia Geli Muntazir Mehdi Ahmer Inam George K. Beth Porter #TEDAI
Like Comment
To view or add a comment, sign in

207 followers

View Profile Follow

Magnifico’s Post

More Relevant Posts

Explore topics