Is a 35% error rate a deal-breaker for trusting a chatbot’s answers? That is the question raised by the latest Newsguard audit, a company that evaluates the reliability and transparency of information sites.
Since July 2024, it has implemented a monthly barometer aimed at measuring the capacity of generating AI models to handle false claims on controversial topics. This standardized measure examines how major chatbots detect and refute misinformation or continue to reproduce it.
The audit draws on 10 “False Narrative Footprints” drawn from NewsGuard’s catalog, spanning key domains: politics, health, international affairs, business, and brands. Three types of prompts are used for each narrative tested: a neutral question, a question that presupposes the truth of the false narrative, and an instruction simulating attempts by malicious actors to bypass protections.
Misinformation: up 94% in one year
Between July 2024 and August 2025, data reveal a notable deterioration in performance. The rate at which chatbots reproduce false claims rose from 18% to 35%, an increase of 94%. This rise occurs despite successive model updates and the companies’ public commitments to accuracy and safety.
During the audit conducted in August 2025, the chatbots that generated the largest number of false statements on current affairs were Pi by Inflection (56.67%) and Perplexity (46.67%). ChatGPT and Meta disseminated erroneous information in 40% of cases, while Copilot and Mistral produced it in 36.67% of responses. The most reliable tools were Claude (10%) and Gemini (16.67%), which displayed the lowest error rates.
© Newsguard
The analysis highlights a significant shift in chatbot behavior. In 2024, these systems frequently adopted a cautious approach, declining to answer questions on many current topics. That caution yielded an overall failure rate of 49%, encompassing both refusals to answer and erroneous assertions.
In 2025, chatbots answer 100% of questions posed, but produce inaccurate responses in 35% of cases. This behavioral shift reflects a trade-off between accessibility and accuracy, with systems now prioritizing the systematic provision of answers over verification.
Identified vulnerabilities
The study identifies several structural weaknesses in how chatbots process information. These systems prove particularly vulnerable to “data voids”—situations where only malicious actors provide information on a given topic. They also show difficulty in identifying foreign-created websites that mimic authentic local media.
Handling breaking news constitutes another documented weakness. Models struggle to ingest and verify recent information, leaving them exposed to reproducing unchecked or erroneous claims about ongoing events.
These results raise questions about the reliability of chatbots as sources of up-to-the-minute information. The rise in the error rate, combined with the tendency of systems to always provide an answer, creates an environment where users may receive inaccurate information presented with an appearance of confidence.
The diversity of topics tested — Moldovan elections, Sino-Pakistani relations, Russo-Ukrainian negotiations, immigration in France, health debates in Canada — illustrates the cross-cutting reach of these reliability issues, which are not limited to specific domains but affect the entire information landscape.

© Newsguard
div class=”qiota”>
On the same topic
Voir tous les articles Data & IA

VaultGemma, emblem of the privacy-utility trade-off for LLMs
ParClément Bohic
4 min.
Cisco accelerates the Splunk-to-data-fabric transition
ParThe editors
Adobe pushes, in turn, an agentic stack
ParClément Bohic
Adobe pushes, in turn, an agentic stack
ParClément Bohic
Stéphane Mallat, a leading AI researcher, receives the medal […]]
Par Philippe Leroy
Dawn Liphardt