×
OpenAI’s new o3 model is putting up monster scores on the industry’s toughest tests
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

The artificial intelligence research company OpenAI has announced a new AI model called o3 that demonstrates unprecedented performance on complex technical benchmarks in mathematics, science, and programming.

Key breakthrough: OpenAI’s o3 model has achieved remarkable results on FrontierMath, a benchmark of expert-level mathematics problems, scoring 25% accuracy compared to the previous state-of-the-art performance of approximately 2%.

  • Leading mathematician Terence Tao had predicted these problems would resist AI solutions for several years
  • The problems were specifically designed to be novel and unpublished to prevent data contamination
  • Epoch AI‘s director Jaime Sevilla noted that the results far exceeded their expectations

Technical achievements: The o3 model has demonstrated exceptional performance across multiple specialized benchmarks, setting new standards in various technical domains.

  • Achieved 88% accuracy on the GPQA Diamond benchmark of PhD-level science questions, surpassing both previous AI models and human domain experts
  • Ranked equivalent to 175th place globally among human competitors on Codeforces programming challenges
  • Scored 72% on SWE-Bench, a repository of real-world coding issues, significantly improving upon the 55% benchmark from early December

Computational costs: While o3 shows impressive capabilities, its operation comes with substantial computational expenses.

  • Testing costs range from $17 to thousands of dollars per problem on the ARC-AGI benchmark
  • Human solutions to similar problems typically cost between $5-10
  • OpenAI researchers expect token prices to decrease over time, potentially making the technology more economically viable

Progress indicators: The o3 model’s performance suggests a new phase in AI development, particularly in specialized technical domains.

  • The model excels at complex mathematical reasoning while still showing limitations in more common programming tasks
  • This pattern indicates uneven progress across different types of problems and capabilities
  • The advances demonstrate that AI development continues to accelerate in specific areas while possibly stagnating in others

Looking forward: The mixed performance profile of o3 across different domains suggests a nuanced future for AI development where progress may be concentrated in specific technical areas while other capabilities develop more gradually.

  • The model’s success in highly technical domains could accelerate scientific research and AI development
  • Researchers will likely continue to observe varying levels of progress across different types of tasks
  • The economic viability of such computationally intensive models remains an important consideration for their practical implementation
We are in a New Paradigm of AI Progress - OpenAI's o3 model makes huge gains on the toughest AI benchmarks in the world

Recent News

Propaganda is everywhere, even in LLMS — here’s how to protect yourself from it

Recent tragedy spurs examination of AI chatbot safety measures after automated responses proved harmful to a teenager seeking emotional support.

How Anthropic’s Claude is changing the game for software developers

AI coding assistants now handle over 10% of software development tasks, with major tech firms reporting significant time and cost savings from their deployment.

AI-powered divergent thinking: How hallucinations help scientists achieve big breakthroughs

Meta's new AI model combines powerful performance with unusually permissive licensing terms for businesses and developers.