Google’s “AI Overview” can give false, misleading, and dangerous answers

302

If you use Google regularly, you may have noticed the company's new AI Overviews providing summarized answers to some of your questions in recent days. If you use social media regularly, you may have come across many examples of those AI Overviews being hilariously or even dangerously wrong.

Factual errors can pop up in existing LLM chatbots as well, of course. But the potential damage that can be caused by AI inaccuracy gets multiplied when those errors appear atop the ultra-valuable web real estate of the Google search results page.

"The examples we've seen are generally very uncommon queries and aren’t representative of most people’s experiences," a Google spokesperson told Ars. "The vast majority of AI Overviews provide high quality information, with links to dig deeper on the web."

After looking through dozens of examples of Google AI Overview mistakes (and replicating many ourselves for the galleries below), we've noticed a few broad categories of errors that seemed to show up again and again. Consider this a crash course in some of the current weak points of Google's AI Overviews and a look at areas of concern for the company to improve as the system continues to roll out.

Treating jokes as facts

Some of the funniest example of Google's AI Overview failing come, ironically enough, when the system doesn't realize a source online was trying to be funny. An AI answer that suggested using "1/8 cup of non-toxic glue" to stop cheese from sliding off pizza can be traced back to someone who was obviously trying to troll an ongoing thread. A response recommending "blinker fluid" for a turn signal that doesn't make noise can similarly be traced back to a troll on the Good Sam advice forums, which Google's AI Overview apparently trusts as a reliable source.

In regular Google searches, these jokey posts from random Internet users probably wouldn't be among the first answers someone saw when clicking through a list of web links. But with AI Overviews, those trolls were integrated into the authoritative-sounding data summary presented right at the top of the results page.

What's more, there's nothing in the tiny "source link" boxes below Google's AI summary to suggest either of these forum trolls are anything other than good sources of information. Sometimes, though, glancing at the source can save you some grief, such as when you see a response calling running with scissors "cardio exercise that some say is effective" (that came from a 2022 post from Little Old Lady Comedy).

Bad sourcing

Sometimes Google's AI Overview offers an accurate summary of a non-joke source that happens to be wrong. When asking about how many Declaration of Independence signers owned slaves, for instance, Google's AI Overview accurately summarizes a Washington University of St. Louis library page saying that one-third "were personally enslavers." But the response ignores contradictory sources like a Chicago Sun-Times article saying the real answer is closer to three-quarters. I'm not enough of a history expert to judge which authoritative-seeming source is right, but at least one historian online took issue with the Google AI's answer sourcing.

Other times, a source that Google trusts as authoritative is really just fan fiction. That's the case for a response that imagined a 2022 remake of 2001: A Space Odyssey, directed by Steven Spielberg and produced by George Lucas. A savvy web user would probably do a double-take before citing citing Fandom's "Idea Wiki" as a reliable source, but a careless AI Overview user might not notice where the AI got its information.

Answering a different question

One of the most frustrating types of errors in Google's AI Overviews is when the system gives a somewhat correct answer to a slightly different question. Searching for the southernmost point in mainland Alaska, for instance, got us a response about the southernmost part of the Aleutian island chain instead. A careful reader should probably notice that an island is, by definition, not part of the mainland, but someone who blindly trusts Google's AI might not be so careful.

Asking about animals involved with sports teams sometimes causes this type of error, too. When we asked about dogs that have played in the NHL, we got a response about a promotional dog who runs around at some game-adjacent activities. When we asked about dogs in the NBA, we similarly got an answer about a dog that merely sat courtside at a recent Lakers game. Reading these answers carefully can show the discrepancy between question and answer, but focusing on the part highlighted by Google might give a searcher the wrong idea.

Math and reading comprehension

Like some other LLMs, Google's AI search system can sometimes struggle with basic math problems and equations. Asking about the relative value of dollars in the year 2000, for instance, returns a nonsensical response about "a cumulative price increase of -43.49%" between 2000 and 2023 (prices actually went up 77 percent in that time, according to the inflation calculator Google itself cites). In another example, the AI bafflingly told us there are 738,523 days between October 2024 and January 2025 (in reality, there are fewer).

Questions about the letters that makes up words can also trip Google's AI up, as when the system tried to tell us that Catania was an Italian city starting with K.

The same name game

If you have the same name as a US president, Google's AI system seems liable to confuse you with your famous namesake. Asking about presidents that died in volcanoes, for instance, brings up accurate information about Harry R. Truman, a man who died in the 1980 Mount St. Helens explosion and who is decidedly not President Harry S. Truman (who died in 1972 of non-volcano-related causes). Google's AI also told us that President John F. Kennedy attended Brown University, mixing him up with his son JFK, Jr. (President Kennedy went to Harvard University).

It gets worse, though. Google's AI overview also told users that 13 US presidents had earned 59 degrees from the University of Wisconsin Madison in a trend dating back hundreds of years. The source for that mistake, funnily enough, was a page that was specifically about UW alumni who shared a name with famous presidents (by the time we tested the same prompt, Google told us that AI Overview "is not available" for this query).

Not wrong, exactly...

Even when Google's AI Overviews aren't wrong, per se, they can sometimes be surprisingly weird. When asking about ducks that played in the NBA, for instance, we got AI responses about an Oregon Ducks alumni who went pro, as well as Utah Jazz player Donald Edgar "Duck" Williams. Considering that Google has no way of knowing what we meant precisely by "duck," this is actually a pretty clever response.

We were also tickled when a list of AI recommendations for a healthy breakfast listed one smoothie recipe as "my kids' favorite," drawing from similar phrasing from source Kristine's Kitchen Blog. When an AI system starts giving itself theoretical kids, you begin to see the problems that can happen when you start summarizing web content en masse.

Getting better all the time

While we were able to generate all of the Google AI Overview mistakes you see in the galleries above, it's worth mentioning that there are plenty of other examples that are floating around social media that we were not able to replicate for ourselves. That doesn't mean those posts aren't legitimate, though; it's likely that Google simply updated its system to fix those errors in the intervening days (either through human review or automated tuning).

In at least one case, we saw this kind of daily update for ourselves. On Wednesday morning, Google's AI overview told us that the Sony PlayStation and Sega Saturn were available for purchase in 1993 (seemingly misinterpreting a Wikipedia page describing a console generation spanning 1993 to 1998). When we re-ran the same query on Thursday morning, the AI Overview correctly indicated that the Atari Jaguar launched in 1993, with competition from Sony and Sega launching later.

A Google spokesperson told Ars, "We conducted extensive testing before launching this new experience and will use these isolated examples as we continue to refine our systems overall." Google also said that it uses adversarial testing to improve the results in terms of both factuality and safety and that the company doesn't plan to show AI summaries for certain explicit or dangerous topics.

While seeing a bunch of AI search errors like this can be striking, it's worth remembering that social media posters are less likely to call attention to the frequent examples where Google's AI Overview worked as intended by providing concise and accurate information culled from the web. Still, when a new system threatens to alter something as fundamental to the Internet as Google search, it's worth examining just where that system seems to be failing.

Listing image: Getty Images

Kyle Orland Senior Gaming Editor

Kyle Orland has been the Senior Gaming Editor at Ars Technica since 2012, writing primarily about the business, tech, and culture behind video games. He has journalism and computer science degrees from University of Maryland. He once wrote a whole book about Minesweeper.

302

View Comments

Staff Picks

invertedpanda

It should be noted that the issue is actually rarely the AI itself, but Google's ranking system and featured snippets.

In most cases where I've tested these "bad AI results", the actual problem is that the AI is just re-phrasing the top result that creates the featured snippets. As an example, the "How many rocks should I eat per day" one that's been making the rounds is actually just some gray-hat SEO of a featured snippet for a fracking company that cites an Onion article (SEO is complicated, folks).

So, the problem already exited in the form of featured snippets: Now it's just rephrased with AI.

Of course, Google has some real issues with handling these shifty SEO strategies, and it plays whack-a-mole constantly. I actually shut down one of my own websites because of Google not being able or willing to handle low-effort content farms that just gobble up content like mine and rewrite it using AI while doing a handful of additional techniques to edge out ranking over my original content.

May 24, 2024 at 11:20 am

torp

View attachment 81471

This is a great example of:

garbage in, garbage out. Even the LLM says it's from a Reddit post.
people having unrealistic expectations about LLMs. Perhaps this will convince everyone that they're parroting what they're fed and have no understanding or self consciousness.
google shooting themselves in the foot. It's one thing to give a result like the Reddit suggesion as a link to the original post on Reddit. It's another one entirely to get it in this overview where it sounds like it's endorsed by Google.

May 24, 2024 at 11:26 am

MichaelHurd

When it comes to treating jokes as factual, nothing beats The Onion!

May 24, 2024 at 12:03 pm