×
OpenAI’s o3 model is acing AI reasoning tests–but it’s still not AGI
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

The race for artificial general intelligence (AGI) continues as OpenAI’s latest o3 model achieves remarkable scores on a key reasoning test, though experts maintain it falls short of true human-level intelligence.

Breaking development: OpenAI’s new o3 model has achieved a breakthrough score of 75.7% on the Abstraction and Reasoning Corpus (ARC) Challenge, a test designed to evaluate AI systems’ pattern recognition and reasoning capabilities.

  • The model demonstrated unprecedented task adaptation abilities not previously seen in GPT-family models
  • The official score was achieved within the competition’s computing cost limit of $20 per puzzle task
  • An unofficial score of 87.5% was reached using significantly more computing power, surpassing the typical human score of 84%

Technical details and constraints: The ARC Challenge tests AI systems’ ability to identify patterns in colored grid puzzles while operating within specific computational limitations.

  • The “semi-private” test, used for public rankings, allows computing costs up to $10,000 total
  • A more stringent “private” test, used for determining grand prize winners, limits computing costs to 10 cents per task
  • O3’s unofficial high score required 172 times more computing power than its official attempt, with costs reaching thousands of dollars per task

Expert perspectives: Leading AI researchers and competition organizers maintain that while impressive, this achievement does not constitute AGI.

  • François Chollet, ARC Challenge creator, describes it as an important milestone but not AGI
  • Melanie Mitchell of the Santa Fe Institute argues that solving tasks through computational brute force defeats the challenge’s purpose
  • Thomas Dietterich from Oregon State University notes that commercial AI systems still lack crucial components of human cognition, including episodic memory and meta-cognition

Industry implications: The achievement comes during a period of perceived slowdown in AI advancement compared to the rapid developments of 2023.

  • The results suggest AI models could soon legitimately beat the competition benchmark
  • Multiple submissions have already scored above 81% on the private evaluation test set
  • Competition organizers are planning a more challenging benchmark test for 2025

Looking ahead: While o3’s performance represents significant progress in AI capabilities, key questions remain about the model’s methodology and true understanding of the tasks it completes.

  • Researchers await open-source replication to fully evaluate the achievement’s significance
  • The ARC Prize 2025 challenge continues until someone achieves the grand prize with an open-source solution
  • The gap between computational problem-solving and true human-like reasoning remains a central challenge in AI development
OpenAI's o3 model aced a test of AI reasoning – but it's still not AGI

Recent News

AI-powered Darth Vader shocks fans with unexpected profanity

The AI Darth Vader voice in Fortnite responded to player inputs with profanity, forcing Epic Games to implement a rapid fix to protect the iconic character's image.

AI minds may differ radically from human cognition

AI systems operate on statistical pattern-matching rather than human-like understanding, requiring a fundamental shift in how we conceptualize and develop artificial intelligence.

AI job shifts challenge effectiveness of worker retraining programs

Traditional workforce development programs struggle to adapt to AI's rapid, cross-sector disruption of job markets.