OpenAI's new o3 model is putting up monster scores on the industry's toughest tests

Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage

Join Now

The artificial intelligence research company OpenAI has announced a new AI model called o3 that demonstrates unprecedented performance on complex technical benchmarks in mathematics, science, and programming.

Key breakthrough: OpenAI’s o3 model has achieved remarkable results on FrontierMath, a benchmark of expert-level mathematics problems, scoring 25% accuracy compared to the previous state-of-the-art performance of approximately 2%.

Leading mathematician Terence Tao had predicted these problems would resist AI solutions for several years
The problems were specifically designed to be novel and unpublished to prevent data contamination
Epoch AI‘s director Jaime Sevilla noted that the results far exceeded their expectations

Technical achievements: The o3 model has demonstrated exceptional performance across multiple specialized benchmarks, setting new standards in various technical domains.

Achieved 88% accuracy on the GPQA Diamond benchmark of PhD-level science questions, surpassing both previous AI models and human domain experts
Ranked equivalent to 175th place globally among human competitors on Codeforces programming challenges
Scored 72% on SWE-Bench, a repository of real-world coding issues, significantly improving upon the 55% benchmark from early December

Computational costs: While o3 shows impressive capabilities, its operation comes with substantial computational expenses.

Testing costs range from $17 to thousands of dollars per problem on the ARC-AGI benchmark
Human solutions to similar problems typically cost between $5-10
OpenAI researchers expect token prices to decrease over time, potentially making the technology more economically viable

Progress indicators: The o3 model’s performance suggests a new phase in AI development, particularly in specialized technical domains.

The model excels at complex mathematical reasoning while still showing limitations in more common programming tasks
This pattern indicates uneven progress across different types of problems and capabilities
The advances demonstrate that AI development continues to accelerate in specific areas while possibly stagnating in others

Looking forward: The mixed performance profile of o3 across different domains suggests a nuanced future for AI development where progress may be concentrated in specific technical areas while other capabilities develop more gradually.

The model’s success in highly technical domains could accelerate scientific research and AI development
Researchers will likely continue to observe varying levels of progress across different types of tasks
The economic viability of such computationally intensive models remains an important consideration for their practical implementation

We are in a New Paradigm of AI Progress - OpenAI's o3 model makes huge gains on the toughest AI benchmarks in the world

lesswrong

Menu

OpenAI’s new o3 model is putting up monster scores on the industry’s toughest tests

Recent News

SAG-AFTRA president Sean Astin targets AI actress in upcoming talent agent talks

AI investments drive 67% of US growth despite being just 6% of GDP

Too real? Harvard study finds AI companion bots use emotional manipulation 37% of the time

Join the revolution

CO/AI

Resources

Join the revolution

Menu

Welcome

OpenAI’s new o3 model is putting up monster scores on the industry’s toughest tests

Recent News

SAG-AFTRA president Sean Astin targets AI actress in upcoming talent agent talks

AI investments drive 67% of US growth despite being just 6% of GDP

Too real? Harvard study finds AI companion bots use emotional manipulation 37% of the time

Join the revolution

CO/AI

Resources

Join the revolution