Google finds AI chatbots to be only 69% accurate… at best

Google finds AI chatbots to be only 69% accurate… at best

Google has published a blunt assessment of how reliable today’s AI chatbots really are, and the numbers are not flattering. Using the newly introduced FACTS Benchmark Suitethe company found that even the best AI models struggle to get past 70% actual accuracy. The best performing, Gemini 3 Pro, achieved an overall accuracy of 69%, while other leading systems from OpenAI, Anthropic and xAI scored even lower. The takeaway is simple and awkward. These chatbots still get about one in three answers wrong, even when they are confident they do.

The benchmark matters because most existing AI tests focus on whether a model can complete a task, rather than whether the information it produces is actually true. For sectors such as finance, healthcare and law, that gap can be costly. A smooth answer that sounds confident but contains errors can do real damage, especially if users assume the chatbot knows what it’s talking about.

What Google’s Accuracy Test Reveals

The FACTS Benchmark Suite was built by Google’s FACTS team with Kaggle to directly test factual accuracy across four real-world applications. One test measures parametric knowledge, which checks whether a model can answer fact-based questions using only what it learned during training. Another evaluates search performance and tests how well models use web tools to retrieve accurate information. A third focuses on grounding, that is, whether the model sticks to a provided document without adding false details. The fourth examines multimodal understanding, such as correctly reading graphs, charts, and images.

The results show sharp differences between models. Gemini 3 Pro topped the leaderboard with a FACTS score of 69%, followed by Gemini 2.5 Pro and OpenAI’s ChatGPT-5 with almost 62% percent. Claude 4.5 Opus came in at ~51%, while Grok 4 scored ~54%. Multimodal tasks were the weakest area across the board, with accuracy often below 50%. This is important because these tasks involve reading graphs, charts or images, where a chatbot can confidently misread a sales chart or extract the wrong number from a document, leading to errors that are easily overlooked but difficult to undo.

The conclusion is not that chatbots are useless, but blind trust is risky. Google’s own data shows that AI is improving, but it still needs verification, guardrails, and human oversight before it can be treated as a trusted source of truth.

#Google #finds #chatbots #accurate..

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *