×
LLMs don’t reason — new Apple research shows why that’s a big problem
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

Apple researchers challenge LLM reasoning capabilities: A new study from Apple’s AI researchers has cast doubt on the formal reasoning abilities of large language models (LLMs), suggesting their performance is based more on pattern matching than true reasoning.

  • The study, conducted by six AI researchers at Apple, found no evidence of formal reasoning in language models, indicating that their behavior is better explained by sophisticated pattern matching.
  • Changing names in problems could alter results by approximately 10%, highlighting the fragility of LLMs’ reasoning capabilities.
  • The researchers developed a new task called GSM-NoOp, which demonstrated LLMs’ vulnerability to distracting information when attempting to solve problems.

Historical context and previous research: The Apple study’s findings align with earlier research that questioned LLMs’ reasoning abilities and their susceptibility to irrelevant information.

  • A 2017 study by Robin Jia and Percy Liang of Stanford University yielded similar results, showing that LLMs could be easily misled by irrelevant information.
  • These findings were cited in the 2019 book “Rebooting AI” by Ernest Davis and Gary Marcus, indicating that concerns about LLMs’ reasoning capabilities have persisted for years.

Performance limitations on complex tasks: Recent analyses have revealed that LLMs’ performance tends to deteriorate as problems become more complex or larger in scale.

  • A study of GPT-01 by Subbarao Kambhapati’s team showed that while the model performed adequately on small problems, its performance rapidly declined as problem complexity increased.
  • Similar patterns have been observed in integer arithmetic tasks, where LLMs struggle with increasingly large multiplication problems, unlike calculators which maintain consistent accuracy.
  • Even advanced models like GPT-01 exhibit this limitation, suggesting that the issue persists across different generations of LLMs.

Implications for real-world applications: The observed limitations in LLMs’ reasoning abilities raise concerns about their reliability in critical real-world applications.

  • Self-driving cars, such as Elon Musk’s proposed robotaxis, may face challenges in reasoning abstractly in uncommon or complex situations, potentially compromising safety.
  • The lack of transparency from companies developing these technologies makes it difficult to assess the full extent of these limitations and their potential impact.

Broader perspective on AI development: The study’s findings support long-standing critiques of neural network architectures and their ability to perform formal reasoning tasks.

  • Gary Marcus, a prominent AI researcher, has been highlighting the limitations of neural networks in extrapolation and formal reasoning since the late 1990s.
  • Marcus argues that symbol manipulation, involving abstract representation of knowledge through variables and operations, is crucial for advancing AI capabilities.
  • The concept of neurosymbolic AI, which combines symbolic reasoning with neural networks, is proposed as a potential path forward in addressing these limitations.

Analyzing deeper: The need for alternative research strategies: The persistent challenges in LLM reasoning capabilities suggest that current approaches may be insufficient for achieving true artificial intelligence.

  • Despite years of progress in deep learning and LLMs, fundamental limitations in formal reasoning remain unresolved.
  • Researchers and developers may need to explore alternative strategies, such as neurosymbolic AI, to overcome these obstacles and create more robust and reliable AI systems.
  • As AI continues to be integrated into critical applications, addressing these reasoning limitations becomes increasingly important for ensuring safe and effective deployment of AI technologies.
LLMs don’t do formal reasoning - and that is a HUGE problem

Recent News

Veo 2 vs. Sora: A closer look at Google and OpenAI’s latest AI video tools

Tech companies unveil AI tools capable of generating realistic short videos from text prompts, though length and quality limitations persist as major hurdles.

7 essential ways to use ChatGPT’s new mobile search feature

OpenAI's mobile search upgrade enables business users to access current market data and news through conversational queries, marking a departure from traditional search methods.

FastVideo is an open-source framework that accelerates video diffusion models

New optimization techniques reduce the computing power needed for AI video generation from days to hours, though widespread adoption remains limited by hardware costs.