OpenAI's o3 model is acing AI reasoning tests-but it's still not AGI

Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage

Join Now

The race for artificial general intelligence (AGI) continues as OpenAI’s latest o3 model achieves remarkable scores on a key reasoning test, though experts maintain it falls short of true human-level intelligence.

Breaking development: OpenAI’s new o3 model has achieved a breakthrough score of 75.7% on the Abstraction and Reasoning Corpus (ARC) Challenge, a test designed to evaluate AI systems’ pattern recognition and reasoning capabilities.

The model demonstrated unprecedented task adaptation abilities not previously seen in GPT-family models
The official score was achieved within the competition’s computing cost limit of $20 per puzzle task
An unofficial score of 87.5% was reached using significantly more computing power, surpassing the typical human score of 84%

Technical details and constraints: The ARC Challenge tests AI systems’ ability to identify patterns in colored grid puzzles while operating within specific computational limitations.

The “semi-private” test, used for public rankings, allows computing costs up to $10,000 total
A more stringent “private” test, used for determining grand prize winners, limits computing costs to 10 cents per task
O3’s unofficial high score required 172 times more computing power than its official attempt, with costs reaching thousands of dollars per task

Expert perspectives: Leading AI researchers and competition organizers maintain that while impressive, this achievement does not constitute AGI.

François Chollet, ARC Challenge creator, describes it as an important milestone but not AGI
Melanie Mitchell of the Santa Fe Institute argues that solving tasks through computational brute force defeats the challenge’s purpose
Thomas Dietterich from Oregon State University notes that commercial AI systems still lack crucial components of human cognition, including episodic memory and meta-cognition

Industry implications: The achievement comes during a period of perceived slowdown in AI advancement compared to the rapid developments of 2023.

The results suggest AI models could soon legitimately beat the competition benchmark
Multiple submissions have already scored above 81% on the private evaluation test set
Competition organizers are planning a more challenging benchmark test for 2025

Looking ahead: While o3’s performance represents significant progress in AI capabilities, key questions remain about the model’s methodology and true understanding of the tasks it completes.

Researchers await open-source replication to fully evaluate the achievement’s significance
The ARC Prize 2025 challenge continues until someone achieves the grand prize with an open-source solution
The gap between computational problem-solving and true human-like reasoning remains a central challenge in AI development

OpenAI's o3 model aced a test of AI reasoning – but it's still not AGI

New Scientist

Menu

OpenAI’s o3 model is acing AI reasoning tests–but it’s still not AGI

Recent News

Arc browser maker launches $20/month AI subscription tier

OpenAI releases first open-source models with Phi-like synthetic training

When to use AI coding tools, when to avoid them, and when to split the difference

Join the revolution

CO/AI

Resources

Join the revolution

Menu

Welcome

OpenAI’s o3 model is acing AI reasoning tests–but it’s still not AGI

Recent News

Arc browser maker launches $20/month AI subscription tier

OpenAI releases first open-source models with Phi-like synthetic training

When to use AI coding tools, when to avoid them, and when to split the difference

Join the revolution

CO/AI

Resources

Join the revolution