×
AI models stumble on basic queries as size grows, study finds
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

AI models struggle with simple tasks as they grow: Large language models (LLMs) are becoming less reliable at answering basic questions as they increase in size and complexity, despite improvements in handling more difficult queries.

Research findings: A study conducted by José Hernández-Orallo and colleagues at the Polytechnic University of Valencia, Spain, examined the performance of various LLMs as they scaled up in size and were fine-tuned through human feedback.

  • The research analyzed OpenAI’s GPT series, Meta’s LLaMA AI models, and the BLOOM model developed by BigScience.
  • Five types of tasks were used to test the AIs, including arithmetic problems, anagrams, geography questions, scientific challenges, and information extraction from disorganized lists.
  • Results showed that while larger models improved at solving complex problems, their performance on simpler tasks did not improve correspondingly.

Key observations: The study revealed a concerning trend in the development of AI language models, highlighting potential risks in their practical applications.

  • As LLMs grew in size and capability, they became more likely to attempt answering questions, even when uncertain.
  • This increased willingness to respond led to a higher likelihood of incorrect answers for basic queries.
  • The improvement in handling complex tasks was not matched by better performance on simpler questions, creating an imbalance in the models’ overall reliability.

Implications for AI trustworthiness: The research findings raise important questions about the perceived omniscience of AI systems and the potential for user overreliance.

  • Hernández-Orallo warns against presenting AI systems as all-knowing, a common practice among developers that can lead to misplaced trust from users.
  • The study underscores the need for caution when relying on AI for decision-making, especially in critical applications.
  • AI models’ inability to accurately assess the limits of their own knowledge poses a significant challenge for responsible deployment and use.

Expert perspectives: The research has sparked discussions among AI ethicists and researchers about the nature of AI knowledge and its limitations.

  • Carissa Véliz from the University of Oxford points out that unlike humans, who can often recognize gaps in their knowledge, LLMs lack this self-awareness.
  • This lack of metacognition in AI systems further emphasizes the importance of human oversight and critical evaluation of AI-generated information.

Industry implications: The study’s findings could have far-reaching consequences for AI development and deployment strategies.

  • Major AI developers, including OpenAI, Meta, and BigScience, have not yet responded to requests for comment on the research.
  • The results may prompt a reevaluation of current AI training methodologies and the metrics used to assess AI performance.

Broader context: This research contributes to the ongoing debate about AI safety, reliability, and the ethical considerations surrounding the rapid advancement of AI technologies.

  • As AI systems become more integrated into various aspects of society, understanding their limitations becomes crucial for responsible implementation.
  • The study highlights the need for continued research into AI cognition and the development of more robust evaluation methods for AI systems.

Looking ahead: The research published in Nature raises important questions about the future direction of AI development and deployment.

  • Developers may need to focus on creating more balanced AI models that perform consistently across a range of task complexities.
  • There is a growing need for transparent communication about AI capabilities and limitations to prevent overreliance and potential misuse.
  • Future research could explore ways to enhance AI’s self-awareness and ability to accurately assess its own knowledge boundaries.
AIs get worse at answering simple questions as they get bigger

Recent News

7 ways to optimize your business for ChatGPT recommendations

Companies must adapt their digital strategy with specific expertise, consistent information across platforms, and authoritative content to appear in AI-powered recommendation results.

Robin Williams’ daughter Zelda slams OpenAI’s Ghibli-style images amid artistic and ethical concerns

Robin Williams' daughter condemns OpenAI's AI-generated Ghibli-style images, highlighting both environmental costs and the contradiction with Miyazaki's well-documented opposition to artificial intelligence in creative work.

AI search tools provide wrong answers up to 60% of the time despite growing adoption

Independent testing reveals AI search tools frequently provide incorrect information, with error rates ranging from 37% to 94% across major platforms despite their growing popularity as Google alternatives.