
OpenAI attributes the persistent challenge of AI model hallucinations, where models confidently generate false information, primarily to current evaluation methods that incentivize guessing over acknowledging uncertainty. This leads models, including their latest GPT-5, to make confident errors rather than admit 'I don't know.' The company argues that a fundamental shift is required in how models are trained and benchmarked, advocating for metrics that penalize confident errors more severely than expressions of uncertainty to significantly improve reliability and trustworthiness for critical applications.
OpenAI's research identifies a fundamental challenge in the development of Large Language Models (LLMs): the persistence of 'hallucinations,' or confidently stated falsehoods. The core issue is attributed not to a model's inherent capability but to the industry's standard evaluation procedures, which incentivize guessing to maximize accuracy scores. This is quantified in a comparison on the SimpleQA eval, where a new model variant (`gpt-5-thinking-mini`) achieved a slightly lower accuracy rate (22%) than an older one (`o4-mini` at 24%) but drastically reduced its error rate from 75% to 26% by increasing its abstention rate from 1% to 52%. The paper argues that hallucinations arise from the statistical nature of pretraining on unlabeled text, making it difficult for models to distinguish arbitrary, low-frequency facts from patterned information. OpenAI proposes a systemic fix: updating primary evaluation scoreboards to penalize confident errors more than expressions of uncertainty. This move away from accuracy-only leaderboards is presented as essential for developing more reliable and trustworthy AI, as the research concludes that perfect accuracy is unattainable, but hallucinations can be mitigated if models are trained and rewarded for acknowledging their own limits.
AI-powered research, real-time alerts, and portfolio analytics for institutional investors.
Request a DemoOverall Sentiment
moderately positive
Sentiment Score
0.40