Can a technology called RAG keep AI models from making stuff up?

Post content hidden for low score. Show…
Post content hidden for low score. Show…

richierocks

Wise, Aged Ars Veteran
163
Upvote
15 (24 / -9)
How does the RAG model deal with IP and copyright? As well, if the RAG model can't find any information on the search, will it make shit up as well?

One other thing that I don't think's been mentioned is why can't there be code that requires the LLM to provide a null answer if it doesn't have the data. "I'm sorry Dave, I can't do that"
 
Upvote
51 (59 / -8)

1db96

Smack-Fu Master, in training
42
Subscriptor
This article starts off sounding very iffy:

confabulation (also called “hallucination”), which is a creative gap-filling technique AI language models use when they encounter holes in their knowledge that weren’t present in their training data.

but a subsequent line is better:
LLMs produce the most statistically likely response based on their training data and don’t understand anything they output, meaning they can present false information that seems accurate if you don't have expert knowledge on a subject.

=============

My initial comment was cranky old man ranting based on the first quote. I do wish the first quote was less supportive of the nonsense hype.
 
Upvote
45 (60 / -15)

WereCatf

Ars Tribunus Militum
2,374
As well, if the RAG model can't find any information on the search, will it make shit up as well?
The article addresses this, if you'd just read it, but yes. It can hallucinate stuff even when it does find information related to the query.
One other thing that I don't think's been mentioned is why can't there be code that requires the LLM to provide a null answer if it doesn't have the data. "I'm sorry Dave, I can't do that"
The code runs after the model has already come up with an answer, so it has no way of telling whether the model actually had any good information to draw from or not.
 
Upvote
40 (44 / -4)
"Generative Deep Learning: Teaching Machines how to Paint, Write, Compose and Play."

Slightly surprised there hasn't been much discussion about teaching machines to pray. And before you downvote me, several sci-fi stories have revolved around teaching machines to pray. Arthur Clarke did a brilliant short on this topic, 'The Nine Billion Names Of God' which won a retrospective Hugo for 1954.

In some religions, prayer can be done mechanically, eg via prayer wheels, which are sometimes wind-powered or water powered. Quite curious what would be the natural progression of that strand of thought via the use of AI tech.
 
Upvote
-17 (20 / -37)

S2pidiT

Ars Scholae Palatinae
1,495
I'm pretty sure this is how we will get AGI. Same as our brain has a wide range of specialized regions for different tasks AGI will be achieved through the integration of a wide range of machine learning/AI tools. These different tools are already capable of stunning feats in their respective domains, it is just a question of time until we will be able to bring them all together.
But will the AGI then decide that it can make stuff up if it wants to, just like people?
 
Upvote
14 (14 / 0)

Crito

Ars Scholae Palatinae
638
Subscriptor++
Working at an F10 and building/deploying various LLM solutions, very much still in the experimentation phase despite a lot of pressure from the C-suite to “bring us AI” (while legal and infosec actively limit choices to reduce risk — understandable but stifling).

RAG is, right now, the only way to deal with enterprise-scale data and still get useful results. Internal data (generally) ensures it is not fake and avoids issues of relying on the broader WWW. GPT-4 is remarkably good at minimizing hallucinations when you give it simple instructions and chain activities together with separate calls instead of trying to do everything in one.

The current challenges are:

(1) that a lot of folks want to throw everything and the kitchen sink into the vector databases, which increases likelihood of bad matches surfacing (similar to training, there's a benefit to curating the underlying information) and;

(2) semantic matching on the source chunks alone doesn't consider important metadata (e.g. "What was said about _______ last earnings call?"). We’re beginning to experiment with knowledge graphs to whittle down the searched chunks (e.g. getting adjacent topics when you mention a molecule class to automatically get molecule names; when you say “in the last year” it reduces the search space to only items created in the last 12 months, etc.)

We’re still at peak hype. The trough of disillusionment is coming, there will be good use cases afterwards but until there’s AGI or exceptionally easy-to-implement solutions (business user level, think PowerAutomate which Copilot Studio is trying to do), it’s not going to be nearly as impactful as a lot of senior executives believe. Really compelling demos are easy, actually-useful production apps are not.
 
Last edited:
Upvote
106 (110 / -4)

Crito

Ars Scholae Palatinae
638
Subscriptor++
Slightly adjacent topic: given this is Ars, I'd love to see (and/or expect?) some deeper materials on things like chunking strategies, retrieval methods, re-ranking, pre/post-processing chunks, etc. For those curious "the" (scarequotes intentional) AI Conference last year in SF had a ton of speakers (as always, some good and some less-good). The slides and videos are public now.

Highlights:
  • Pinecone's presentation is a more-technical-but-non-mathy dive into RAG, ditto Jerry Liu of LlamaIndex.
  • Weaviate's looked at securing RAG solutions via multi-tenancy/sharding. Sales-y but interesting.
  • Waleed Kaduos of Anyscale was one of the most practical presenters of the lot ("LLMS in Production: Learning from Experience).
  • Some big-name folks spoke with flashy presentations but kept it pretty high level (Peter Norvig, Bryan Catanzaro, Benjamin Mann). Ditto the Fireside Chat speakers. Skip.
Special mention: Replit's VP of AI made a strong case for extremely small, special-purpose models, with an eye toward local processing (though primarily focused on Ghostwriter, their version of code Copilot), vs. the large cloud-based foundation models we're mostly using now.
 
Upvote
35 (35 / 0)
Post content hidden for low score. Show…
Gemini 1.5 Pro uses RAG, so it should be much better, right?

Let's see how well that works, then:
Q: A man is walking with a goat and they come to a river with a boat on the riverside. How can they get across the river? Please give a detailed explanation of the reasoning behind your answer.

A: This is a classic river crossing puzzle! Here's how they can get across:

Steps:

Man takes the goat across the river. The man rows the boat across, leaving the goat on the other side.
Man returns alone. The man rows the boat back to the original side by himself.
Man takes something else across the river (optional). This step is often omitted in the riddle, but the man could take something else across the river, like a cabbage or a wolf (if it's a different version of the riddle). For simplicity, let's assume he returns alone.
Man picks up the goat. The man rows back across the river and picks up the goat.
Man and goat cross the river together. They both row across to the other side.
Reasoning:

The key to this puzzle is understanding the constraint: the goat cannot be left alone on either side of the river without the man present. Here's why the above solution works:

Prevents the goat from being eaten: In variations of this puzzle, there might be a wolf that would eat the goat. This solution ensures the goat is never alone with a potential predator.
Prevents the goat from eating something: Other versions might have a cabbage that the goat would eat. This solution keeps the goat from being alone with the cabbage.
Let me know if you'd like to try another river crossing puzzle!
🤣🤣🤣
 
Upvote
80 (82 / -2)
The article addresses this, if you'd just read it, but yes. It can hallucinate stuff even when it does find information related to the query.

In fact I did read it and then re-read it twice more. No where do I see where or how RAG deals with copyright and IP issues.

The code runs after the model has already come up with an answer, so it has no way of telling whether the model actually had any good information to draw from or not.

What I am talking about is the LLM itself without a RAG to wipe it's ass. Why doesn't the LLM code base have code that provides a null result when it doesn't have a result? That would solve most of the current issues regarding "hallucinations".

As well what happens if the RAG cannot provide any additional info? Will the LLM still hallucinate and make up information?
 
Upvote
-3 (13 / -16)

1db96

Smack-Fu Master, in training
42
Subscriptor
What I am talking about is the LLM itself without a RAG to wipe it's ass. Why doesn't the LLM code base have code that provides a null result when it doesn't have a result? That would solve most of the current issues regarding "hallucinations".

Because the LLM does not have a concept of "correct" or "incorrect", or "having a result" or "not having a result".

If it produces "correct" output, that is by accident, not by intention.

It is purely spicy autocomplete. Nothing more.
 
Upvote
100 (109 / -9)

Pat_Murph

Wise, Aged Ars Veteran
190
This article starts off sounding very iffy:



but a subsequent line is better:


=============

My initial comment was cranky old man ranting based on the first quote. I do wish the first quote was less supportive of the nonsense hype.
How is pointing out the fact that LLMs hallucinate, nonsense hype?
 
Upvote
9 (11 / -2)
So it's weird honestly. RAG provides huge benefits, especially as you capitalize on longer context length. I've gotten great results by dumping entire papers and pages of documentation into Claude Opus. And it can allow compensation for the fact that by the time of model release the information it's trained on is usually at least six months out of date. But it's also not a miracle solution that will fix everything. Write a paper with Claude Opus, based off stuff you dump directly into the context window, and it'll still hallucinate added links and sources, even if the quality is better.
 
Upvote
4 (5 / -1)

ab78

Ars Tribunus Militum
1,791
Some form of RAG is clearly needed. LLMs feel like the subconscious state - literally a stream of symbols with each depending on the prior symbols, but without any feedback loop. RAG would provide the "conscious state" feedback loop that checks whether the unconscious stream is relevant and justified, and decides either to output it, iterate a new response or say "I don't know".

Having said that, RAG as described sounds woefully inadequate. It would be fine for checking output if you asked the LLM to cite a passage from a (Project Gutenberg) book, but it would not be fine for many things because simply finding an external document to cross-check output against is not enough. There are plenty of incorrect documents out there, and plenty of misinformation and opinion-presented-as-fact.

What needs to happen is for AIs wanting to exceed the capabilities of LLMs and become genuinely useful is for them to incorporate not just RAG, but also an effective means of source credibility evaluation, so that a scientific paper with extensive peer review gets more highly weighted than a widely-read but obviously nonsense National Enquirer (or The Onion) article.

The roadblock that I see with that is that such a framework would be fairly well-suited to enabling lines of questioning about things like policies with the answers having a higher degree of effort put into source evaluation (and therefore critical "thinking") than the average Joe applies (note: applies, not can apply). And that's the kind of thing I would expect to receive ferocious pushback from politicians, especially if the artificial source evaluation is accurate and fair.
 
Upvote
5 (10 / -5)

1db96

Smack-Fu Master, in training
42
Subscriptor
How is pointing out the fact that LLMs hallucinate, nonsense hype?
Because it uses the phrasing "creative gap-filling technique AI language models use when they encounter holes in their knowledge that weren’t present in their training data."

That is problematic for two reasons:
"creative gap-filling technique" & "they encounter" implies that there is entity that can "encounter" things and be "creative".

"holes in their knowledge" implies that this tech has "knowledge" but is incomplete. This is false. This stuff has no concept of "knowledge" or "correct" or "incorrect".

The hype people want you to imagine that this stuff is an entity with knowledge, but imperfect.

When in reality, if this stuff produces output that is "correct" - that is purely by accident, not by intention.
 
Last edited:
Upvote
48 (60 / -12)

Dmytry

Ars Legatus Legionis
10,435
So we're fixing "AI" by... [check's notes]... doing a web search?
Yeah, and the idea is that several top results are in need of some kind of synthesis from multiple results into one summary.

Which is of course not typically the case, since there is a lot of results that are already a very high quality synthesis of other data (e.g. Wikipedia). On top of it, combination of multiple individually good howtos can be a non workable howto (especially for recipes).

Then comes "what if you are making a chatbot that's using your product's documentation", as if the magical "AI" is going to fix up bad documentation or search that doesn't find the relevant pages.
 
Upvote
16 (16 / 0)

Harvesterify

Smack-Fu Master, in training
1
A very recent research paper explored the hypothesis that RAG would reduce hallucinations and improve recall, when applied to legal texts and legal-related tasks (summarizing caselaws, document drafting, etc), and the conclusion is negative, specialized models hallucinate between 17 and 33% (which is a slight improvement over general purposes models, but not much), while slightly improving recall.

Paper is the following one: "Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools", from Varun Magesh, Faiz Surani, Matthew Dahl, Mirac Suzgun, Christopher D. Manning, Daniel E. Ho
 
Last edited:
Upvote
46 (46 / 0)

Dinosaurius

Ars Centurion
360
Subscriptor++
"Generative Deep Learning: Teaching Machines how to Paint, Write, Compose and Play."

Slightly surprised there hasn't been much discussion about teaching machines to pray. And before you downvote me, several sci-fi stories have revolved around teaching machines to pray. Arthur Clarke did a brilliant short on this topic, 'The Nine Billion Names Of God' which won a retrospective Hugo for 1954.

In some religions, prayer can be done mechanically, eg via prayer wheels, which are sometimes wind-powered or water powered. Quite curious what would be the natural progression of that strand of thought via the use of AI tech.
I'm not trying to be an a$$ or disrespectful, but: Why ?

Humans pray to a deity for the hope that said deity will reward them in some way - either a better after life, a better current life, or simply scoring "points" within their religious scheme.
An AI would not have any hard evidence of deities or the benefits of praying, other than the resources spent "praying" for an outcome that is literally impossible to achieve (for one of two reasons: The AI itself does not know that a deity exists or could provide tangible benefits or two: Most of the religions in the world would not recognize that an AI has a soul and therefore would not/could not be heard by a deity)
 
Upvote
15 (16 / -1)

rachel612

Ars Centurion
321
Subscriptor++
Now do RAFT.

The problem with RAG and RAFT is that they attempt to make up for model deficiencies with controls that improve accuracy in some use cases but don’t solve the essential problem, which is that the models don’t have cognition and so can’t understand why something needs to be corrected.

The techniques have their place but they don’t solve the context problem entirely.
 
Upvote
26 (27 / -1)

WereCatf

Ars Tribunus Militum
2,374
In fact I did read it and then re-read it twice more. No where do I see where or how RAG deals with copyright and IP issues.
I didn't say anything about copyrights or IP issues. I literally quoted the part of your comment asking about RAG and hallucinations and answered that part.
What I am talking about is the LLM itself without a RAG to wipe it's ass. Why doesn't the LLM code base have code that provides a null result when it doesn't have a result?
Because the model isn't deterministic code. It's not made the same way regular programs are, where a programmer sits down, figures the program flow and writes functions to do stuff.
 
Upvote
19 (20 / -1)

uncas_ripley

Smack-Fu Master, in training
13
LLMs have a huge environmental impact. Each prompt carries an obscene carbon footprint. Adding RAG on top just exacerbates the problem. If anything, generative AI tools should be reserved for use cases where the truth is not important. Like idea generation or simulating user responses or generating test data etc.
 
Upvote
-3 (13 / -16)
I'm not trying to be an a$$ or disrespectful, but: Why ?

Humans pray to a deity for the hope that said deity will reward them in some way - either a better after life, a better current life, or simply scoring "points" within their religious scheme.
An AI would not have any hard evidence of deities or the benefits of praying, other than the resources spent "praying" for an outcome that is literally impossible to achieve (for one of two reasons: The AI itself does not know that a deity exists or could provide tangible benefits or two: Most of the religions in the world would not recognize that an AI has a soul and therefore would not/could not be heard by a deity)

As a non-religious person with an interest in various philosophical frameworks, it's an interesting intellectual exercise to take a philosophical framework - that is accepted by large parts of humanity - at face value then see what interesting points arise. Same reason I followed the discussion on 'Which direction should Muslim astronauts in orbit face when praying to Mecca?'. Ancient frameworks in the space age creak in interesting ways.

(It helps being in Europe where religion is very much a private matter. Harder in America where religiosity, especially weird christian religiosity, is so dominant it poisons the well of discourse.)

So, back to the original point. Humans have some mainstream religions where performance of ritual (including praying) can be more important than the emotion felt while doing the ritual. Hence things like thousand-year-old wind-powered prayer wheels. Some prayer wheels are now electric-powered. Some are online digital prayer wheels. I even found a NFT prayer wheel that writes a Tibetan incantation repeatedly to a text file, wiping the file every 100,000 incantations.

Easy to argue that online digital prayer wheels are insubstantial and immaterial and of little import. However AI-powered praying machines could fall into that uncanny valley of being real enough for some humans to start engaging with them on an emotional and religious level. Going to be interesting.
 
Last edited:
Upvote
20 (21 / -1)
Giansiracusa also pointed out that the rise of generative AI-aided search results—and the recent "enshittification"of the web through AI-generated content—means that what might at one point have been a halfway useful solution to a fundamental flaw in generative AI tools could become less useful if AI language models draw from AI-written junk found online.
HUMANCENTiPAD.jpg
 
Upvote
-1 (7 / -8)

ambivalent

Smack-Fu Master, in training
54
How is pointing out the fact that LLMs hallucinate, nonsense hype?
Because the term "hallucinate" implies that LLMs have an internal conceptualisation of reality that has become temporarily warped, which, of course, they do not. It's a loaded phrase meant to trick people into believing LLMs can think. The "hallucination" is entirely equal to anything else it outputs.
 
Upvote
17 (28 / -11)

imchillyb

Ars Praetorian
468
Subscriptor
If an article, paper, or discussion asks a question that question must be answered with: "No."

Journalists, researchers, and scientists will issue a statement if there is even the smallest shred of evidence backing their statement. Those same people will ask a question and rely upon authority and emotion to have the reader come to the conclusion desired when there is no evidence.

There is no evidence to support this question. That makes the answer No, just like every other question a supposed journalist asks.
 
Upvote
-9 (1 / -10)

Psyborgue

Ars Praefectus
5,272
Subscriptor++
How does the RAG model deal with IP and copyright?
The sources the agent uses to generate the answer can be included in the response to the user. You might not be able to tell which facts can from where so you can just include all sources given to the agent. Bing probably does something similar. The agent searches, gets some results, generates an answer, and you get citations at the bottom of what you see.
As well, if the RAG model can't find any information on the search, will it make shit up as well?
Yes. And even if the information is there. This can be mitigated by more "I don't know" in training.
One other thing that I don't think's been mentioned is why can't there be code that requires the LLM to provide a null answer if it doesn't have the data. "I'm sorry Dave, I can't do that"
Not code, but that can be tuned. The model usually has some confidence in the answer. If the mode is trained to say "I don't know" this mostly fixes the issue.
The code runs after the model has already come up with an answer, so it has no way of telling whether the model actually had any good information to draw from or not.
This is when you regenerate. It's entirely possible to generate three answers, for example, and pick the best or have a "fact checker" agent go over the results. Google's Deep Mind recently outlined a way.



But will the AGI then decide that it can make stuff up if it wants to, just like people?
Generative AI can already choose to lie (and if agents are forced to think out loud, the reasoning can include self preservation). What exact is a choice is a philosophical disagreement. Some here will argue for pages over it. Regardless:

1717679893934.png

Normally it won't happen since these features are set such that the model is "honest" but if they aren't known about, the model can absolutely choose to lie. It can happen anyway.
implies that there is entity that can "encounter" things and be "creative".
Generative, creative, is a matter of perspective. We make stuff up, create stuff, based on what we have been trained with. With agents, there absolutely is an "entity" that creates. That entity may be a predictive figment of the imagination, but it also exists when it's rendered.

Edit: fix broken sentence
 
Last edited:
Upvote
-4 (11 / -15)

silverboy

Ars Scholae Palatinae
1,233
Subscriptor++
Or we could just answer questions using a proven technology with thousands of years of reliable results: Human beings who actually learn things and understand what they're being asked.

The only things driving AI are greed and starry-eyed fools who want to create the Torment Nexus because they think it sounds cool.

Depressing.
 
Upvote
-4 (10 / -14)