OpenAI’s newest AI models, o3 and o4-mini, are exhibiting an unexpected and concerning trend: higher hallucination rates than their predecessors. This regression in factual reliability comes at a particularly problematic time as these models are designed for more complex reasoning tasks, potentially undermining trust among enterprise clients and raising questions about how AI advancement is being measured. The company has acknowledged the issue in its technical report but admits it doesn’t fully understand the underlying causes.
The hallucination problem: OpenAI’s technical report reveals that the o3 model hallucinated in response to 33% of questions during evaluation, approximately double the rate of previous reasoning models.
Why this matters: The increased hallucination rate runs counter to the expected evolutionary path of AI models, where newer iterations typically demonstrate improved factual reliability alongside other capabilities.
Behind the numbers: OpenAI admits in its technical report that “more research is needed to understand the cause of this result,” suggesting the company is struggling to identify why their newer models are regressing in this specific dimension.
Reading between the lines: The hallucination increase reveals the complex trade-offs inherent in AI development, where optimizing for certain capabilities might inadvertently compromise others.
Where we go from here: With both models now released to the public, OpenAI may be hoping that widespread usage and feedback will help identify patterns and potentially resolve the hallucination issue through further training.