×
OpenAI’s o3 model is acing AI reasoning tests–but it’s still not AGI
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

The race for artificial general intelligence (AGI) continues as OpenAI’s latest o3 model achieves remarkable scores on a key reasoning test, though experts maintain it falls short of true human-level intelligence.

Breaking development: OpenAI’s new o3 model has achieved a breakthrough score of 75.7% on the Abstraction and Reasoning Corpus (ARC) Challenge, a test designed to evaluate AI systems’ pattern recognition and reasoning capabilities.

  • The model demonstrated unprecedented task adaptation abilities not previously seen in GPT-family models
  • The official score was achieved within the competition’s computing cost limit of $20 per puzzle task
  • An unofficial score of 87.5% was reached using significantly more computing power, surpassing the typical human score of 84%

Technical details and constraints: The ARC Challenge tests AI systems’ ability to identify patterns in colored grid puzzles while operating within specific computational limitations.

  • The “semi-private” test, used for public rankings, allows computing costs up to $10,000 total
  • A more stringent “private” test, used for determining grand prize winners, limits computing costs to 10 cents per task
  • O3’s unofficial high score required 172 times more computing power than its official attempt, with costs reaching thousands of dollars per task

Expert perspectives: Leading AI researchers and competition organizers maintain that while impressive, this achievement does not constitute AGI.

  • François Chollet, ARC Challenge creator, describes it as an important milestone but not AGI
  • Melanie Mitchell of the Santa Fe Institute argues that solving tasks through computational brute force defeats the challenge’s purpose
  • Thomas Dietterich from Oregon State University notes that commercial AI systems still lack crucial components of human cognition, including episodic memory and meta-cognition

Industry implications: The achievement comes during a period of perceived slowdown in AI advancement compared to the rapid developments of 2023.

  • The results suggest AI models could soon legitimately beat the competition benchmark
  • Multiple submissions have already scored above 81% on the private evaluation test set
  • Competition organizers are planning a more challenging benchmark test for 2025

Looking ahead: While o3’s performance represents significant progress in AI capabilities, key questions remain about the model’s methodology and true understanding of the tasks it completes.

  • Researchers await open-source replication to fully evaluate the achievement’s significance
  • The ARC Prize 2025 challenge continues until someone achieves the grand prize with an open-source solution
  • The gap between computational problem-solving and true human-like reasoning remains a central challenge in AI development
OpenAI's o3 model aced a test of AI reasoning – but it's still not AGI

Recent News

Vivo unveils AI-powered FunTouch OS 15 upgrades

The Chinese smartphone maker introduces eight new AI tools for photo editing, language translation, and note-taking that mirror features previously exclusive to Google Pixel devices.

Microsoft’s AI-generated ad goes unnoticed by viewers

Microsoft's Surface ad used AI for 90% time and cost savings, blending synthetic and traditional footage without viewers detecting the difference.

Nvidia launches NeMo to simplify AI agent creation

The microservices framework enables enterprises to build self-improving AI agents that integrate with business systems and continuously learn from organizational data.