×
AI “Truth Cop” Fights Chatbot Hallucinations, But Challenges Remain
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

A new approach to detecting AI hallucinations could pave the way for more reliable and trustworthy chatbots and answer engines in domains like health care and education.

Key innovation: Using semantic entropy to catch AI’s confabulations; The method measures the randomness of an AI model’s responses by asking the same question multiple times and using a second “truth cop” model to compare the semantic similarity of the answers:

  • Responses with similar meanings across multiple queries earn low entropy scores, indicating the model’s output is likely reliable.
  • Answers with vastly different meanings to the same question get high entropy scores, signaling possible hallucinations or made-up information.

Promising results but limitations remain; Initial tests showed the semantic entropy approach agreed with human raters over 90% of the time in assessing an AI’s response consistency:

  • The method is relatively straightforward to integrate into existing language models and could help make them more suitable for high-stakes applications.
  • However, it won’t catch errors if the AI simply repeats the same false information, and there is a computational cost that delays response times.
  • The researchers acknowledge their approach doesn’t address all the ways language models can still go wrong or produce false information.

Broader context: Reliability is critical as AI expands into new domains; Being able to trust AI-generated information is increasingly important as language models are used for more than just casual conversations:

  • Applications in fields like health care and education will require a high degree of accuracy and truthfulness to avoid potentially harmful hallucinations.
  • Identifying the source of AI confabulations remains challenging due to the complex algorithms and training data involved in large language models.

Looking ahead: Fighting fire with fire; Having AI systems like the proposed “truth cop” model audit other AI could be an important strategy for improving reliability:

  • The semantic entropy technique is a clever approach to using the power of large language models to help control their own problematic outputs.
  • However, the rapid pace of progress in AI means new methods will need to be continually developed and updated to keep up with ever-more sophisticated systems.
  • Ultimately, a combination of technical solutions, human oversight, and appropriate constraints on AI’s applications will be needed to mitigate the risk of hallucinations while leveraging the technology’s potential benefits.

Recent News

$1B Solo.io’s Kagent Studio brings AI agents to Kubernetes workflows

Engineers can now diagnose system problems with AI assistance directly in their code editor.

81% of citizens lose trust when governments use AI for public services, says study

Automation disasters have already forced citizens into bankruptcy and homelessness.