Google admits its AI Overviews need work, but we’re all helping it beta test
Google is embarrassed about its AI Overviews, too. After a deluge of dunks and memes over the past week, which cracked on the poor quality and outright misinformation that arose from the tech giantās underbaked new AI-powered search feature, the company on Thursday issued a mea culpa of sorts. Google ā a company whose name is synonymous with searching the web ā whose brand focuses on āorganizing the worldās informationā and putting it at userās fingertips ā actually wrote in a blog post that āsome odd, inaccurate or unhelpful AI Overviews certainly did show up.ā
The admission of failure, penned by Google VP and Head of Search Liz Reid, seems a testimony as to how the drive to mash AI technology into everything has now somehow made Google Search worse.
In the post titled āAbout last week,ā (this got past PR?), Reid spells out the many ways its AI Overviews make mistakes. While they donāt āhallucinateā or make things up the way that other large language models (LLMs) may, she says, they can get things wrong for āother reasons,ā like āmisinterpreting queries, misinterpreting a nuance of language on the web, or not having a lot of great information available.ā
Reid also noted that some of the screenshots shared on social media over the past week were faked, while others were for nonsensical queries, like āHow many rocks should I eat?ā ā something no one ever really searched for before. Since thereās little factual information on this topic, Googleās AI guided a user to satirical content. (In the case of the rocks, the satirical content had been publishedĀ on a geological software providerās website.)
Itās worth pointing out that if you had Googled āHow many rocks should I eat?ā and were presented with a set of unhelpful links, or even a jokey article, you wouldnāt be surprised. What people are reacting to is the confidence with which the AI spouted back that āgeologists recommend eating at least one small rock per dayā as if itās a factual answer. It may not be a āhallucination,ā in technical terms, but the end user doesnāt care. Itās insane.
Whatās unsettling, too, is that Reid claims Google ātested the feature extensively before launch,ā including with ārobust red-teaming efforts.ā
Does no one at Google have a sense of humor then? No one thought of prompts that would generate poor results?
In addition, Google downplayed the AI featureās reliance on Reddit user data as a source of knowledge and truth. Although people have regularly appended āRedditā to their searches for so long that Google finally made it a built-in search filter, Reddit is not a body of factual knowledge. And yet the AI would point to Reddit forum posts to answer questions, without an understanding of when first-hand Reddit knowledge is helpful and when it is not ā or worse, when it is a troll.
Reddit today is making bank by offering its data to companies like Google, OpenAI and others to train their models, but that doesnāt mean users want Googleās AI deciding when to search Reddit for an answer, or suggesting that someoneās opinion is a fact. Thereās nuance to learning when to search Reddit and Googleās AI doesnāt understand that yet.
As Reid admits, āforums are often a great source of authentic, first-hand information, but in some cases can lead to less-than-helpful advice, like using glue to get cheese to stick to pizza,ā she said, referencing one of the AI featureās more spectacular failures over the past week.
Google AI overview suggests adding glue to get cheese to stick to pizza, and it turns out the source is an 11 year old Reddit comment from user F*cksmith š pic.twitter.com/uDPAbsAKeO
ā Peter Yang (@petergyang) May 23, 2024
If last week was a disaster, though, at least Google is iterating quickly as a result ā or so it says.
The company says itās looked at examples from AI Overviews and identified patterns where it could do better, including building better detection mechanisms for nonsensical queries, limiting the user of user-generated content for responses that could offer misleading advice, adding triggering restrictions for queries where AI Overviews were not helpful, not showing AI Overviews for hard news topics, āwhere freshness and factuality are important,ā and adding additional triggering refinements to its protections for health searches.
With AI companies building ever-improving chatbots every day, the question is not on whether they will ever outperform Google Search for helping us understand the worldās information, but whether Google Search will ever be able to get up to speed on AI to challenge them in return.
As ridiculous as Googleās mistakes may be, itās too soon to count it out of the race yet ā especially given the massive scale of Googleās beta-testing crew, which is essentially anybody who uses search.
āThereās nothing quite like having millions of people using the feature with many novel searches,ā says Reid.