×
These AI models outperform open-source peers but lag behind humans
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

AI’s struggle with visual reasoning puzzles: Recent research from the USC Viterbi School of Engineering Information Sciences Institute (ISI) tested the ability of multi-modal large language models (MLLMs) to solve abstract visual puzzles similar to those found on human IQ tests, revealing significant limitations in AI’s cognitive abilities.

  • The study, presented at the Conference on Language Modeling (COLM 2024) in Philadelphia, focused on evaluating the nonverbal abstract reasoning abilities of both open-source and closed-source MLLMs.
  • Researchers used puzzles developed from Raven’s Progressive Matrices, a standard type of abstract reasoning test, to challenge the AI models’ visual perception and logical reasoning skills.
  • The tests required AI models to identify patterns and apply them to different scenarios, such as recognizing that a yellow circle turning into a blue triangle represents a specific transformation.

Performance disparities between AI models: The study revealed significant differences in performance between open-source and closed-source AI models, with the latter demonstrating superior capabilities in visual reasoning tasks.

  • Open-source models generally struggled more with visual reasoning puzzles compared to their closed-source counterparts.
  • GPT-4V, a closed-source model, showed relatively good reasoning abilities, although it still fell short of human-level performance.
  • Researchers attribute the better performance of closed-source models to factors such as specialized development, training on larger datasets, and access to greater computing resources from private companies.

Improving AI performance: The research team explored methods to enhance the AI models’ problem-solving abilities, with some success in guiding their reasoning processes.

  • Chain of Thought prompting, a technique that guides models step-by-step through the reasoning portion of the test, helped improve performance for some AI models.
  • This approach demonstrates the potential for developing more effective strategies to enhance AI’s cognitive abilities in the future.

Implications for AI development: The study’s findings highlight the current limitations of AI in abstract reasoning tasks and underscore the importance of continued research in this area.

  • Jay Pujara, research associate professor and study author, emphasized the need to understand AI models’ limitations to make them better, safer, and more useful.
  • By identifying weaknesses in AI’s reasoning abilities, this research can help direct future efforts to develop more advanced and capable AI systems.
  • The goal of achieving human-level logic in AI remains a significant challenge, with current models still far from matching human cognitive abilities in complex reasoning tasks.

Broader context of AI capabilities: This study contributes to the ongoing assessment of AI’s strengths and weaknesses across various cognitive domains.

  • While AI has shown remarkable progress in certain areas, such as natural language processing and image recognition, abstract reasoning remains a significant hurdle.
  • The research highlights the complexity of human cognition and the challenges involved in replicating these abilities in artificial systems.
  • As AI continues to advance, understanding its limitations becomes crucial for responsible development and deployment in real-world applications.

Looking ahead: Challenges and opportunities: The study’s results open up new avenues for AI research and development, while also raising important questions about the future of artificial intelligence.

  • Researchers may focus on developing more sophisticated training methods and architectures to improve AI’s abstract reasoning capabilities.
  • The disparity between open-source and closed-source models’ performance may fuel discussions about access to resources and the potential for a widening gap in AI capabilities.
  • As AI systems become more advanced, ongoing evaluation of their cognitive abilities will be essential to ensure they are deployed safely and effectively in various domains.
These AI models reason better than their open-source peers - but still can't rival humans

Recent News

How mathematicians are incorporating AI assistants into their work

AI tools are helping mathematicians develop and verify complex proofs, marking the most significant change in mathematical research methods since computer algebra systems.

Wondering whether to take profits on Nvidia? Jim Cramer offers advice

Early Nvidia investors who held through market swings have seen a single lot of 125 shares grow to 5,000 shares worth millions, highlighting the challenge of balancing profit-taking with maintaining exposure to future growth.

What’s inside ChatGPT’s latest update and why it’s back on top of the AI leaderboards

OpenAI's latest GPT-4 upgrade demonstrates improved file handling and natural language capabilities while maintaining its position ahead of rival models in independent tests.