×
AI models stumble on basic queries as size grows, study finds
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

AI models struggle with simple tasks as they grow: Large language models (LLMs) are becoming less reliable at answering basic questions as they increase in size and complexity, despite improvements in handling more difficult queries.

Research findings: A study conducted by José Hernández-Orallo and colleagues at the Polytechnic University of Valencia, Spain, examined the performance of various LLMs as they scaled up in size and were fine-tuned through human feedback.

  • The research analyzed OpenAI’s GPT series, Meta’s LLaMA AI models, and the BLOOM model developed by BigScience.
  • Five types of tasks were used to test the AIs, including arithmetic problems, anagrams, geography questions, scientific challenges, and information extraction from disorganized lists.
  • Results showed that while larger models improved at solving complex problems, their performance on simpler tasks did not improve correspondingly.

Key observations: The study revealed a concerning trend in the development of AI language models, highlighting potential risks in their practical applications.

  • As LLMs grew in size and capability, they became more likely to attempt answering questions, even when uncertain.
  • This increased willingness to respond led to a higher likelihood of incorrect answers for basic queries.
  • The improvement in handling complex tasks was not matched by better performance on simpler questions, creating an imbalance in the models’ overall reliability.

Implications for AI trustworthiness: The research findings raise important questions about the perceived omniscience of AI systems and the potential for user overreliance.

  • Hernández-Orallo warns against presenting AI systems as all-knowing, a common practice among developers that can lead to misplaced trust from users.
  • The study underscores the need for caution when relying on AI for decision-making, especially in critical applications.
  • AI models’ inability to accurately assess the limits of their own knowledge poses a significant challenge for responsible deployment and use.

Expert perspectives: The research has sparked discussions among AI ethicists and researchers about the nature of AI knowledge and its limitations.

  • Carissa Véliz from the University of Oxford points out that unlike humans, who can often recognize gaps in their knowledge, LLMs lack this self-awareness.
  • This lack of metacognition in AI systems further emphasizes the importance of human oversight and critical evaluation of AI-generated information.

Industry implications: The study’s findings could have far-reaching consequences for AI development and deployment strategies.

  • Major AI developers, including OpenAI, Meta, and BigScience, have not yet responded to requests for comment on the research.
  • The results may prompt a reevaluation of current AI training methodologies and the metrics used to assess AI performance.

Broader context: This research contributes to the ongoing debate about AI safety, reliability, and the ethical considerations surrounding the rapid advancement of AI technologies.

  • As AI systems become more integrated into various aspects of society, understanding their limitations becomes crucial for responsible implementation.
  • The study highlights the need for continued research into AI cognition and the development of more robust evaluation methods for AI systems.

Looking ahead: The research published in Nature raises important questions about the future direction of AI development and deployment.

  • Developers may need to focus on creating more balanced AI models that perform consistently across a range of task complexities.
  • There is a growing need for transparent communication about AI capabilities and limitations to prevent overreliance and potential misuse.
  • Future research could explore ways to enhance AI’s self-awareness and ability to accurately assess its own knowledge boundaries.
AIs get worse at answering simple questions as they get bigger

Recent News

Propaganda is everywhere, even in LLMS — here’s how to protect yourself from it

Recent tragedy spurs examination of AI chatbot safety measures after automated responses proved harmful to a teenager seeking emotional support.

How Anthropic’s Claude is changing the game for software developers

AI coding assistants now handle over 10% of software development tasks, with major tech firms reporting significant time and cost savings from their deployment.

AI-powered divergent thinking: How hallucinations help scientists achieve big breakthroughs

Meta's new AI model combines powerful performance with unusually permissive licensing terms for businesses and developers.