OpenAI‘s latest AI models, o3 and o4-mini, show concerning increases in hallucination rates, reversing the industry’s progress in reducing AI fabrications. While these models reportedly excel at complex reasoning tasks like math and coding, they demonstrate significantly higher tendencies to generate false information compared to their predecessors—a serious setback that undermines their reliability for practical applications and contradicts the expected evolutionary improvement of AI systems.
The big picture: OpenAI’s new reasoning models hallucinate at dramatically higher rates than previous versions, with internal testing showing the o3 model fabricating information 33% of the time and o4-mini reaching a troubling 48% hallucination rate.
Key details: The models’ hallucination problems are particularly pronounced in computer code generation, where independent testing by nonprofit AI research company Transluce revealed bizarre behavior.
Behind the numbers: OpenAI suggests that o4-mini’s worse performance may partly stem from it being a smaller model with “less world knowledge,” which increases its tendency to fabricate information.
What they’re saying: “Addressing hallucinations across all our models is an ongoing area of research, and we’re continually working to improve their accuracy and reliability,” OpenAI spokesperson Niko Felix told TechCrunch.
Why this matters: Hallucinations have long been a fundamental limitation of large language models, significantly undermining their practical utility and trustworthiness for critical applications.