×
AI models lack true understanding of the world, despite impressive output
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

Generative AI’s limitations in understanding the world: Recent research reveals that large language models (LLMs) lack a coherent understanding of the world, despite their impressive capabilities in various tasks.

  • A study conducted by researchers from MIT, Harvard, and Cornell shows that even advanced LLMs fail to form accurate internal models of the world and its rules.
  • The research focused on transformer models, which form the backbone of popular LLMs like GPT-4.
  • These findings have significant implications for the deployment of generative AI in real-world applications, as models may unexpectedly fail when faced with slight changes in tasks or environments.

Testing AI’s world understanding: Researchers developed new metrics to evaluate whether LLMs have formed accurate world models, going beyond simple prediction accuracy.

  • The team created two new metrics: sequence distinction and sequence compression, to test a transformer’s world model.
  • They applied these metrics to two problem domains formulated as deterministic finite automations (DFAs): navigating New York City streets and playing the board game Othello.
  • The metrics were designed to assess whether the models could recognize differences between states and understand that identical states have the same possible next steps.

Surprising findings: The study revealed unexpected results regarding the formation of coherent world models in transformer-based AI systems.

  • Transformers trained on randomly generated data formed more accurate world models compared to those trained on data generated by following strategies.
  • Despite high accuracy in generating directions and valid game moves, most models failed to form coherent world models for navigation and Othello.
  • When tested with added detours in New York City, the navigation models’ performance plummeted dramatically, with accuracy dropping from nearly 100% to 67% after closing just 1% of possible streets.

Implications for AI development: The research highlights the need for a different approach in building LLMs that can capture accurate world models.

  • The study demonstrates that transformers can perform well at certain tasks without truly understanding the underlying rules or structure.
  • These findings challenge the assumption that impressive performance in language-based tasks necessarily translates to a deep understanding of the world.
  • Researchers emphasize the importance of careful evaluation and not relying solely on intuition when assessing AI models’ comprehension of the world.

Future research directions: The team plans to expand their investigation into AI’s world understanding capabilities.

  • Researchers aim to tackle a more diverse set of problems, including those with partially known rules.
  • They intend to apply their evaluation metrics to real-world scientific problems to further test and refine their approach.
  • The work is supported by various institutions, including the Harvard Data Science Initiative, National Science Foundation, and the MacArthur Foundation.

Broader implications: This research raises important questions about the current state and future development of artificial intelligence.

  • The study highlights the gap between AI’s performance in specific tasks and its actual understanding of the world, challenging assumptions about AI’s cognitive capabilities.
  • These findings may impact how researchers and developers approach the creation of more robust and truly intelligent AI systems, potentially leading to new methodologies in AI training and evaluation.
  • As AI continues to be integrated into various aspects of society, understanding its limitations becomes crucial for ensuring safe and reliable deployment in critical applications.
Despite its impressive output, generative AI doesn’t have a coherent understanding of the world

Recent News

What business leaders can learn from ServiceNow’s $11B ARR milestone

ServiceNow's steady 23% growth rate and high customer retention paint a rare picture of sustainable expansion in enterprise software while larger rivals struggle to maintain momentum.

Why retail investors keep flocking to AI chip darling Nvidia

Individual investors have shifted their focus from meme stocks to AI giants, with Nvidia attracting twice as much retail money as S&P 500 index funds in early 2024.

The year of the AI election wasn’t quite what most had predicted — here’s why

Political campaigns in 2024 embraced AI for internal operations like email writing and strategy planning, while largely avoiding synthetic media and deepfakes that many initially feared would dominate elections.