Can a technology called RAG keep AI models from making stuff up?

richierocks · Jun 6, 2024

I love the analogy about the student taking a test.

(Shameless plug.) If you are into RAG, here's me interviewing Elan Dekel, the VP of Product at vector database company Pinecone, with more detail this topic.

https://www.datacamp.com/podcast/the-power-of-vector-databases-and-semantic-search

gmerrick · Jun 6, 2024

How does the RAG model deal with IP and copyright? As well, if the RAG model can't find any information on the search, will it make shit up as well?

One other thing that I don't think's been mentioned is why can't there be code that requires the LLM to provide a null answer if it doesn't have the data. "I'm sorry Dave, I can't do that"

KingKrayola · Jun 6, 2024

Not much to add bar to say that Giansiracusa seems like a solid engineer. Nice to see someone pushing things forward without overstating a technology's abilities for money/glory.

1db96 · Jun 6, 2024

This article starts off sounding very iffy:

confabulation (also called “hallucination”), which is a creative gap-filling technique AI language models use when they encounter holes in their knowledge that weren’t present in their training data.

but a subsequent line is better:

LLMs produce the most statistically likely response based on their training data and don’t understand anything they output, meaning they can present false information that seems accurate if you don't have expert knowledge on a subject.

=============

My initial comment was cranky old man ranting based on the first quote. I do wish the first quote was less supportive of the nonsense hype.

DrewW · Jun 6, 2024

So you clean up the AI slop with a RAG?

WereCatf · Jun 6, 2024

gmerrick said:
As well, if the RAG model can't find any information on the search, will it make shit up as well?

The article addresses this, if you'd just read it, but yes. It can hallucinate stuff even when it does find information related to the query.

gmerrick said:
One other thing that I don't think's been mentioned is why can't there be code that requires the LLM to provide a null answer if it doesn't have the data. "I'm sorry Dave, I can't do that"

The code runs after the model has already come up with an answer, so it has no way of telling whether the model actually had any good information to draw from or not.

redtomato · Jun 6, 2024

"Generative Deep Learning: Teaching Machines how to Paint, Write, Compose and Play."

Slightly surprised there hasn't been much discussion about teaching machines to pray. And before you downvote me, several sci-fi stories have revolved around teaching machines to pray. Arthur Clarke did a brilliant short on this topic, 'The Nine Billion Names Of God' which won a retrospective Hugo for 1954.

In some religions, prayer can be done mechanically, eg via prayer wheels, which are sometimes wind-powered or water powered. Quite curious what would be the natural progression of that strand of thought via the use of AI tech.

S2pidiT · Jun 6, 2024

BokononsLies said:
I'm pretty sure this is how we will get AGI. Same as our brain has a wide range of specialized regions for different tasks AGI will be achieved through the integration of a wide range of machine learning/AI tools. These different tools are already capable of stunning feats in their respective domains, it is just a question of time until we will be able to bring them all together.

But will the AGI then decide that it can make stuff up if it wants to, just like people?

Crito · Jun 6, 2024

Working at an F10 and building/deploying various LLM solutions, very much still in the experimentation phase despite a lot of pressure from the C-suite to “bring us AI” (while legal and infosec actively limit choices to reduce risk — understandable but stifling).

RAG is, right now, the only way to deal with enterprise-scale data and still get useful results. Internal data (generally) ensures it is not fake and avoids issues of relying on the broader WWW. GPT-4 is remarkably good at minimizing hallucinations when you give it simple instructions and chain activities together with separate calls instead of trying to do everything in one.

The current challenges are:

(1) that a lot of folks want to throw everything and the kitchen sink into the vector databases, which increases likelihood of bad matches surfacing (similar to training, there's a benefit to curating the underlying information) and;

(2) semantic matching on the source chunks alone doesn't consider important metadata (e.g. "What was said about _______ last earnings call?"). We’re beginning to experiment with knowledge graphs to whittle down the searched chunks (e.g. getting adjacent topics when you mention a molecule class to automatically get molecule names; when you say “in the last year” it reduces the search space to only items created in the last 12 months, etc.)

We’re still at peak hype. The trough of disillusionment is coming, there will be good use cases afterwards but until there’s AGI or exceptionally easy-to-implement solutions (business user level, think PowerAutomate which Copilot Studio is trying to do), it’s not going to be nearly as impactful as a lot of senior executives believe. Really compelling demos are easy, actually-useful production apps are not.

Auie · Jun 6, 2024

So instead of the AI telling me to put glue on my pizza, it'll recommend I put the latest brands of glue, that it finds from the live internet, on my pizza?

Megalomania · Jun 6, 2024

The rare instance Betteridges law of headlines ... striking twice in one headline.

Crito · Jun 6, 2024

Slightly adjacent topic: given this is Ars, I'd love to see (and/or expect?) some deeper materials on things like chunking strategies, retrieval methods, re-ranking, pre/post-processing chunks, etc. For those curious "the" (scarequotes intentional) AI Conference last year in SF had a ton of speakers (as always, some good and some less-good). The slides and videos are public now.

Highlights:

Pinecone's presentation is a more-technical-but-non-mathy dive into RAG, ditto Jerry Liu of LlamaIndex.
Weaviate's looked at securing RAG solutions via multi-tenancy/sharding. Sales-y but interesting.
Waleed Kaduos of Anyscale was one of the most practical presenters of the lot ("LLMS in Production: Learning from Experience).
Some big-name folks spoke with flashy presentations but kept it pretty high level (Peter Norvig, Bryan Catanzaro, Benjamin Mann). Ditto the Fireside Chat speakers. Skip.

Special mention: Replit's VP of AI made a strong case for extremely small, special-purpose models, with an eye toward local processing (though primarily focused on Ghostwriter, their version of code Copilot), vs. the large cloud-based foundation models we're mostly using now.

DovePig · Jun 6, 2024

Gemini 1.5 Pro uses RAG, so it should be much better, right?

Let's see how well that works, then:

Q: A man is walking with a goat and they come to a river with a boat on the riverside. How can they get across the river? Please give a detailed explanation of the reasoning behind your answer.

A: This is a classic river crossing puzzle! Here's how they can get across:

Steps:

Man takes the goat across the river. The man rows the boat across, leaving the goat on the other side.
Man returns alone. The man rows the boat back to the original side by himself.
Man takes something else across the river (optional). This step is often omitted in the riddle, but the man could take something else across the river, like a cabbage or a wolf (if it's a different version of the riddle). For simplicity, let's assume he returns alone.
Man picks up the goat. The man rows back across the river and picks up the goat.
Man and goat cross the river together. They both row across to the other side.
Reasoning:

The key to this puzzle is understanding the constraint: the goat cannot be left alone on either side of the river without the man present. Here's why the above solution works:

Prevents the goat from being eaten: In variations of this puzzle, there might be a wolf that would eat the goat. This solution ensures the goat is never alone with a potential predator.
Prevents the goat from eating something: Other versions might have a cabbage that the goat would eat. This solution keeps the goat from being alone with the cabbage.
Let me know if you'd like to try another river crossing puzzle!

gmerrick · Jun 6, 2024

WereCatf said:
The article addresses this, if you'd just read it, but yes. It can hallucinate stuff even when it does find information related to the query.

In fact I did read it and then re-read it twice more. No where do I see where or how RAG deals with copyright and IP issues.

The code runs after the model has already come up with an answer, so it has no way of telling whether the model actually had any good information to draw from or not.

What I am talking about is the LLM itself without a RAG to wipe it's ass. Why doesn't the LLM code base have code that provides a null result when it doesn't have a result? That would solve most of the current issues regarding "hallucinations".

As well what happens if the RAG cannot provide any additional info? Will the LLM still hallucinate and make up information?

1db96 · Jun 6, 2024

gmerrick said:
What I am talking about is the LLM itself without a RAG to wipe it's ass. Why doesn't the LLM code base have code that provides a null result when it doesn't have a result? That would solve most of the current issues regarding "hallucinations".

Because the LLM does not have a concept of "correct" or "incorrect", or "having a result" or "not having a result".

If it produces "correct" output, that is by accident, not by intention.

It is purely spicy autocomplete. Nothing more.

Pat_Murph · Jun 6, 2024

1db96 said:
This article starts off sounding very iffy:

but a subsequent line is better:

=============

My initial comment was cranky old man ranting based on the first quote. I do wish the first quote was less supportive of the nonsense hype.

How is pointing out the fact that LLMs hallucinate, nonsense hype?

Random John Smith Guy · Jun 6, 2024

So it's weird honestly. RAG provides huge benefits, especially as you capitalize on longer context length. I've gotten great results by dumping entire papers and pages of documentation into Claude Opus. And it can allow compensation for the fact that by the time of model release the information it's trained on is usually at least six months out of date. But it's also not a miracle solution that will fix everything. Write a paper with Claude Opus, based off stuff you dump directly into the context window, and it'll still hallucinate added links and sources, even if the quality is better.

balthazarr · Jun 6, 2024

So we're fixing "AI" by... [check's notes]... doing a web search?

nash076 · Jun 6, 2024

Pat_Murph said:
How is pointing out the fact that LLMs hallucinate, nonsense hype?

Because that term is as much hype as calling an LLM "AI".

It's not "hallucinating." There's no mind involved. It simply doesn't work properly.

J.C. Helios · Jun 6, 2024

"Have we tried having the A.I. Google the answer?" is definitely an idea. Next week you'll be telling us that the answer to LLMs' math problems is to give it a calculator.

ab78 · Jun 6, 2024

Some form of RAG is clearly needed. LLMs feel like the subconscious state - literally a stream of symbols with each depending on the prior symbols, but without any feedback loop. RAG would provide the "conscious state" feedback loop that checks whether the unconscious stream is relevant and justified, and decides either to output it, iterate a new response or say "I don't know".

Having said that, RAG as described sounds woefully inadequate. It would be fine for checking output if you asked the LLM to cite a passage from a (Project Gutenberg) book, but it would not be fine for many things because simply finding an external document to cross-check output against is not enough. There are plenty of incorrect documents out there, and plenty of misinformation and opinion-presented-as-fact.

What needs to happen is for AIs wanting to exceed the capabilities of LLMs and become genuinely useful is for them to incorporate not just RAG, but also an effective means of source credibility evaluation, so that a scientific paper with extensive peer review gets more highly weighted than a widely-read but obviously nonsense National Enquirer (or The Onion) article.

The roadblock that I see with that is that such a framework would be fairly well-suited to enabling lines of questioning about things like policies with the answers having a higher degree of effort put into source evaluation (and therefore critical "thinking") than the average Joe applies (note: applies, not can apply). And that's the kind of thing I would expect to receive ferocious pushback from politicians, especially if the artificial source evaluation is accurate and fair.

1db96 · Jun 6, 2024

Pat_Murph said:
How is pointing out the fact that LLMs hallucinate, nonsense hype?

Because it uses the phrasing "creative gap-filling technique AI language models use when they encounter holes in their knowledge that weren’t present in their training data."

That is problematic for two reasons:
"creative gap-filling technique" & "they encounter" implies that there is entity that can "encounter" things and be "creative".

"holes in their knowledge" implies that this tech has "knowledge" but is incomplete. This is false. This stuff has no concept of "knowledge" or "correct" or "incorrect".

The hype people want you to imagine that this stuff is an entity with knowledge, but imperfect.

When in reality, if this stuff produces output that is "correct" - that is purely by accident, not by intention.

Dmytry · Jun 6, 2024

balthazarr said:
So we're fixing "AI" by... [check's notes]... doing a web search?

Yeah, and the idea is that several top results are in need of some kind of synthesis from multiple results into one summary.

Which is of course not typically the case, since there is a lot of results that are already a very high quality synthesis of other data (e.g. Wikipedia). On top of it, combination of multiple individually good howtos can be a non workable howto (especially for recipes).

Then comes "what if you are making a chatbot that's using your product's documentation", as if the magical "AI" is going to fix up bad documentation or search that doesn't find the relevant pages.

Harvesterify · Jun 6, 2024

A very recent research paper explored the hypothesis that RAG would reduce hallucinations and improve recall, when applied to legal texts and legal-related tasks (summarizing caselaws, document drafting, etc), and the conclusion is negative, specialized models hallucinate between 17 and 33% (which is a slight improvement over general purposes models, but not much), while slightly improving recall.

Paper is the following one: "Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools", from Varun Magesh, Faiz Surani, Matthew Dahl, Mirac Suzgun, Christopher D. Manning, Daniel E. Ho

peterford · Jun 6, 2024

Meta's Yan Lecun has posted recently on X about RAG. I'll confess I don't understand what he means!

x.com

Dinosaurius · Jun 6, 2024

redtomato said:
"Generative Deep Learning: Teaching Machines how to Paint, Write, Compose and Play."

Slightly surprised there hasn't been much discussion about teaching machines to pray. And before you downvote me, several sci-fi stories have revolved around teaching machines to pray. Arthur Clarke did a brilliant short on this topic, 'The Nine Billion Names Of God' which won a retrospective Hugo for 1954.

In some religions, prayer can be done mechanically, eg via prayer wheels, which are sometimes wind-powered or water powered. Quite curious what would be the natural progression of that strand of thought via the use of AI tech.

I'm not trying to be an a$$ or disrespectful, but: Why ?

Humans pray to a deity for the hope that said deity will reward them in some way - either a better after life, a better current life, or simply scoring "points" within their religious scheme.
An AI would not have any hard evidence of deities or the benefits of praying, other than the resources spent "praying" for an outcome that is literally impossible to achieve (for one of two reasons: The AI itself does not know that a deity exists or could provide tangible benefits or two: Most of the religions in the world would not recognize that an AI has a soul and therefore would not/could not be heard by a deity)

rachel612 · Jun 6, 2024

Now do RAFT.

The problem with RAG and RAFT is that they attempt to make up for model deficiencies with controls that improve accuracy in some use cases but don’t solve the essential problem, which is that the models don’t have cognition and so can’t understand why something needs to be corrected.

The techniques have their place but they don’t solve the context problem entirely.

WereCatf · Jun 6, 2024

gmerrick said:
In fact I did read it and then re-read it twice more. No where do I see where or how RAG deals with copyright and IP issues.

I didn't say anything about copyrights or IP issues. I literally quoted the part of your comment asking about RAG and hallucinations and answered that part.

gmerrick said:
What I am talking about is the LLM itself without a RAG to wipe it's ass. Why doesn't the LLM code base have code that provides a null result when it doesn't have a result?

Because the model isn't deterministic code. It's not made the same way regular programs are, where a programmer sits down, figures the program flow and writes functions to do stuff.

uncas_ripley · Jun 6, 2024

LLMs have a huge environmental impact. Each prompt carries an obscene carbon footprint. Adding RAG on top just exacerbates the problem. If anything, generative AI tools should be reserved for use cases where the truth is not important. Like idea generation or simulating user responses or generating test data etc.

redtomato · Jun 6, 2024

Dinosaurius said:
I'm not trying to be an a$$ or disrespectful, but: Why ?

Humans pray to a deity for the hope that said deity will reward them in some way - either a better after life, a better current life, or simply scoring "points" within their religious scheme.
An AI would not have any hard evidence of deities or the benefits of praying, other than the resources spent "praying" for an outcome that is literally impossible to achieve (for one of two reasons: The AI itself does not know that a deity exists or could provide tangible benefits or two: Most of the religions in the world would not recognize that an AI has a soul and therefore would not/could not be heard by a deity)

As a non-religious person with an interest in various philosophical frameworks, it's an interesting intellectual exercise to take a philosophical framework - that is accepted by large parts of humanity - at face value then see what interesting points arise. Same reason I followed the discussion on 'Which direction should Muslim astronauts in orbit face when praying to Mecca?'. Ancient frameworks in the space age creak in interesting ways.

(It helps being in Europe where religion is very much a private matter. Harder in America where religiosity, especially weird christian religiosity, is so dominant it poisons the well of discourse.)

So, back to the original point. Humans have some mainstream religions where performance of ritual (including praying) can be more important than the emotion felt while doing the ritual. Hence things like thousand-year-old wind-powered prayer wheels. Some prayer wheels are now electric-powered. Some are online digital prayer wheels. I even found a NFT prayer wheel that writes a Tibetan incantation repeatedly to a text file, wiping the file every 100,000 incantations.

Easy to argue that online digital prayer wheels are insubstantial and immaterial and of little import. However AI-powered praying machines could fall into that uncanny valley of being real enough for some humans to start engaging with them on an emotional and religious level. Going to be interesting.

schnackenpfefferhausen · Jun 6, 2024

Giansiracusa also pointed out that the rise of generative AI-aided search results—and the recent "enshittification"of the web through AI-generated content—means that what might at one point have been a halfway useful solution to a fundamental flaw in generative AI tools could become less useful if AI language models draw from AI-written junk found online.

ambivalent · Jun 6, 2024

Pat_Murph said:
How is pointing out the fact that LLMs hallucinate, nonsense hype?

Because the term "hallucinate" implies that LLMs have an internal conceptualisation of reality that has become temporarily warped, which, of course, they do not. It's a loaded phrase meant to trick people into believing LLMs can think. The "hallucination" is entirely equal to anything else it outputs.

imchillyb · Jun 6, 2024

If an article, paper, or discussion asks a question that question must be answered with: "No."

Journalists, researchers, and scientists will issue a statement if there is even the smallest shred of evidence backing their statement. Those same people will ask a question and rely upon authority and emotion to have the reader come to the conclusion desired when there is no evidence.

There is no evidence to support this question. That makes the answer No, just like every other question a supposed journalist asks.

Psyborgue · Jun 6, 2024

gmerrick said:
How does the RAG model deal with IP and copyright?

The sources the agent uses to generate the answer can be included in the response to the user. You might not be able to tell which facts can from where so you can just include all sources given to the agent. Bing probably does something similar. The agent searches, gets some results, generates an answer, and you get citations at the bottom of what you see.

gmerrick said:
As well, if the RAG model can't find any information on the search, will it make shit up as well?

Yes. And even if the information is there. This can be mitigated by more "I don't know" in training.

gmerrick said:
One other thing that I don't think's been mentioned is why can't there be code that requires the LLM to provide a null answer if it doesn't have the data. "I'm sorry Dave, I can't do that"

Not code, but that can be tuned. The model usually has some confidence in the answer. If the mode is trained to say "I don't know" this mostly fixes the issue.

WereCatf said:
The code runs after the model has already come up with an answer, so it has no way of telling whether the model actually had any good information to draw from or not.

This is when you regenerate. It's entirely possible to generate three answers, for example, and pick the best or have a "fact checker" agent go over the results. Google's Deep Mind recently outlined a way.

https://arxiv.org/pdf/2403.18802.pdf

S2pidiT said:
But will the AGI then decide that it can make stuff up if it wants to, just like people?

Generative AI can already choose to lie (and if agents are forced to think out loud, the reasoning can include self preservation). What exact is a choice is a philosophical disagreement. Some here will argue for pages over it. Regardless:

https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html#safety-relevant-deception

Normally it won't happen since these features are set such that the model is "honest" but if they aren't known about, the model can absolutely choose to lie. It can happen anyway.

1db96 said:
implies that there is entity that can "encounter" things and be "creative".

Generative, creative, is a matter of perspective. We make stuff up, create stuff, based on what we have been trained with. With agents, there absolutely is an "entity" that creates. That entity may be a predictive figment of the imagination, but it also exists when it's rendered.

Edit: fix broken sentence

silverboy · Jun 6, 2024

Or we could just answer questions using a proven technology with thousands of years of reliable results: Human beings who actually learn things and understand what they're being asked.

The only things driving AI are greed and starry-eyed fools who want to create the Torment Nexus because they think it sounds cool.

Depressing.

ChefJeff789 · Jun 6, 2024

It's absolutely wild to me that the newest, hottest thing in tech is a slightly smarter chatbot that constantly lies to you, and that it was anything other than an amusement until the lying problem was very solidly corrected.

Can a technology called RAG keep AI models from making stuff up?

Wise, Aged Ars Veteran

Ars Praefectus

Ars Scholae Palatinae

Smack-Fu Master, in training

Ars Scholae Palatinae

Ars Tribunus Militum

Ars Praefectus

Ars Scholae Palatinae

Ars Scholae Palatinae

Ars Scholae Palatinae

Ars Centurion

Ars Scholae Palatinae

Ars Praetorian

Ars Praefectus

Smack-Fu Master, in training

Wise, Aged Ars Veteran

Ars Tribunus Militum

Ars Praefectus

Wise, Aged Ars Veteran

Ars Scholae Palatinae

Ars Tribunus Militum

Smack-Fu Master, in training

Ars Legatus Legionis

Smack-Fu Master, in training

Ars Praefectus

Ars Centurion

Ars Centurion

Ars Tribunus Militum

Smack-Fu Master, in training

Ars Praefectus

Ars Centurion

Smack-Fu Master, in training

Ars Praetorian

Ars Praefectus

Ars Scholae Palatinae

Ars Tribunus Militum