If you use Google regularly, you may have noticed the company's new AI Overviews providing summarized answers to some of your questions in recent days. If you use social media regularly, you may have come across many examples of those AI Overviews being hilariously or even dangerously wrong.
Factual errors can pop up in existing LLM chatbots as well, of course. But the potential damage that can be caused by AI inaccuracy gets multiplied when those errors appear atop the ultra-valuable web real estate of the Google search results page.
"The examples we've seen are generally very uncommon queries and aren’t representative of most people’s experiences," a Google spokesperson told Ars. "The vast majority of AI Overviews provide high quality information, with links to dig deeper on the web."
After looking through dozens of examples of Google AI Overview mistakes (and replicating many ourselves for the galleries below), we've noticed a few broad categories of errors that seemed to show up again and again. Consider this a crash course in some of the current weak points of Google's AI Overviews and a look at areas of concern for the company to improve as the system continues to roll out.
Treating jokes as facts
Some of the funniest example of Google's AI Overview failing come, ironically enough, when the system doesn't realize a source online was trying to be funny. An AI answer that suggested using "1/8 cup of non-toxic glue" to stop cheese from sliding off pizza can be traced back to someone who was obviously trying to troll an ongoing thread. A response recommending "blinker fluid" for a turn signal that doesn't make noise can similarly be traced back to a troll on the Good Sam advice forums, which Google's AI Overview apparently trusts as a reliable source.
In regular Google searches, these jokey posts from random Internet users probably wouldn't be among the first answers someone saw when clicking through a list of web links. But with AI Overviews, those trolls were integrated into the authoritative-sounding data summary presented right at the top of the results page.
In most cases where I've tested these "bad AI results", the actual problem is that the AI is just re-phrasing the top result that creates the featured snippets. As an example, the "How many rocks should I eat per day" one that's been making the rounds is actually just some gray-hat SEO of a featured snippet for a fracking company that cites an Onion article (SEO is complicated, folks).
So, the problem already exited in the form of featured snippets: Now it's just rephrased with AI.
Of course, Google has some real issues with handling these shifty SEO strategies, and it plays whack-a-mole constantly. I actually shut down one of my own websites because of Google not being able or willing to handle low-effort content farms that just gobble up content like mine and rewrite it using AI while doing a handful of additional techniques to edge out ranking over my original content.
This is a great example of: