AI excels at identifying geographical locations but struggles with objects in retro games

Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage

Join Now

The curious gap between AI’s geographic prowess and its struggles with pixelated video games highlights an intriguing inconsistency in current visual AI capabilities. While some large models like OpenAI’s o3 excel at identifying locations from photographs with minimal visual cues, they simultaneously struggle with seemingly simpler tasks like recognizing objects in vintage games. This discrepancy reveals important insights about how artificial intelligence processes different types of visual information and where current models may have unexpected blind spots.

The puzzle: Current AI models demonstrate contradictory visual recognition abilities that don’t align with human intuition.

Large language models like o3 perform remarkably well at GeoGuessr, identifying global locations even from seemingly featureless landscapes.
Yet these same models struggle with visually simpler tasks like identifying staircases or doors in Pokemon Red screenshots, even when clearly marked.

Possible explanations: The contradiction likely stems from differences in training data and visual processing approaches.

Geographic images were likely abundant in training data, allowing models to recognize subtle regional differences in vegetation, terrain, and environmental features.
Retro game visuals represent a specialized domain with pixel art that may be underrepresented in training datasets despite appearing visually simpler to humans.

Human vs. AI perception: This discrepancy highlights fundamental differences in how humans and AI systems process visual information.

Humans find navigating stylized game worlds intuitive because we understand symbolic representation and easily grasp visual abstractions.
AI models may excel at tasks they’ve been extensively exposed to through training data while showing surprising weaknesses in domains that seem simpler but are less represented.

The significance: These contradictory capabilities reveal important insights about current AI visual systems.

The uneven performance across different visual domains suggests AI visual understanding remains brittle and domain-specific rather than generalizable.
This phenomenon demonstrates how AI capabilities don’t necessarily develop along the same trajectory as human visual cognition, creating unexpected strengths and weaknesses.

What's up with AI's vision

lesswrong

Menu

AI excels at identifying geographical locations but struggles with objects in retro games

Recent News

SITE BEING UPDATED. PLEASE STAY TUNED.

Adnoc partners with US robotics startup to deploy AI across oil operations

6 places where Google’s Gemini AI should be but isn’t

Join the revolution

CO/AI

Resources

Join the revolution

Menu

Welcome

AI excels at identifying geographical locations but struggles with objects in retro games

Recent News

SITE BEING UPDATED. PLEASE STAY TUNED.

Adnoc partners with US robotics startup to deploy AI across oil operations

6 places where Google’s Gemini AI should be but isn’t

Join the revolution

CO/AI

Resources

Join the revolution