×
These AI models outperform open-source peers but lag behind humans
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

AI’s struggle with visual reasoning puzzles: Recent research from the USC Viterbi School of Engineering Information Sciences Institute (ISI) tested the ability of multi-modal large language models (MLLMs) to solve abstract visual puzzles similar to those found on human IQ tests, revealing significant limitations in AI’s cognitive abilities.

  • The study, presented at the Conference on Language Modeling (COLM 2024) in Philadelphia, focused on evaluating the nonverbal abstract reasoning abilities of both open-source and closed-source MLLMs.
  • Researchers used puzzles developed from Raven’s Progressive Matrices, a standard type of abstract reasoning test, to challenge the AI models’ visual perception and logical reasoning skills.
  • The tests required AI models to identify patterns and apply them to different scenarios, such as recognizing that a yellow circle turning into a blue triangle represents a specific transformation.

Performance disparities between AI models: The study revealed significant differences in performance between open-source and closed-source AI models, with the latter demonstrating superior capabilities in visual reasoning tasks.

  • Open-source models generally struggled more with visual reasoning puzzles compared to their closed-source counterparts.
  • GPT-4V, a closed-source model, showed relatively good reasoning abilities, although it still fell short of human-level performance.
  • Researchers attribute the better performance of closed-source models to factors such as specialized development, training on larger datasets, and access to greater computing resources from private companies.

Improving AI performance: The research team explored methods to enhance the AI models’ problem-solving abilities, with some success in guiding their reasoning processes.

  • Chain of Thought prompting, a technique that guides models step-by-step through the reasoning portion of the test, helped improve performance for some AI models.
  • This approach demonstrates the potential for developing more effective strategies to enhance AI’s cognitive abilities in the future.

Implications for AI development: The study’s findings highlight the current limitations of AI in abstract reasoning tasks and underscore the importance of continued research in this area.

  • Jay Pujara, research associate professor and study author, emphasized the need to understand AI models’ limitations to make them better, safer, and more useful.
  • By identifying weaknesses in AI’s reasoning abilities, this research can help direct future efforts to develop more advanced and capable AI systems.
  • The goal of achieving human-level logic in AI remains a significant challenge, with current models still far from matching human cognitive abilities in complex reasoning tasks.

Broader context of AI capabilities: This study contributes to the ongoing assessment of AI’s strengths and weaknesses across various cognitive domains.

  • While AI has shown remarkable progress in certain areas, such as natural language processing and image recognition, abstract reasoning remains a significant hurdle.
  • The research highlights the complexity of human cognition and the challenges involved in replicating these abilities in artificial systems.
  • As AI continues to advance, understanding its limitations becomes crucial for responsible development and deployment in real-world applications.

Looking ahead: Challenges and opportunities: The study’s results open up new avenues for AI research and development, while also raising important questions about the future of artificial intelligence.

  • Researchers may focus on developing more sophisticated training methods and architectures to improve AI’s abstract reasoning capabilities.
  • The disparity between open-source and closed-source models’ performance may fuel discussions about access to resources and the potential for a widening gap in AI capabilities.
  • As AI systems become more advanced, ongoing evaluation of their cognitive abilities will be essential to ensure they are deployed safely and effectively in various domains.
These AI models reason better than their open-source peers - but still can't rival humans

Recent News

AI agents and the rise of Hybrid Organizations

Meta makes its improved AI image generator free to use while adding visible watermarks and daily limits to prevent misuse.

Adobe partnership brings AI creativity tools to Box’s content management platform

Box users can now access Adobe's AI-powered editing tools directly within their secure storage environment, eliminating the need to download files or switch between platforms.

Nvidia’s new ACE platform aims to bring more AI to games, but not everyone’s sold

Gaming companies are racing to integrate AI features into mainstream titles, but high hardware requirements and artificial interactions may limit near-term adoption.