A new approach to detecting AI hallucinations could pave the way for more reliable and trustworthy chatbots and answer engines in domains like health care and education.
Key innovation: Using semantic entropy to catch AI’s confabulations; The method measures the randomness of an AI model’s responses by asking the same question multiple times and using a second “truth cop” model to compare the semantic similarity of the answers:
- Responses with similar meanings across multiple queries earn low entropy scores, indicating the model’s output is likely reliable.
- Answers with vastly different meanings to the same question get high entropy scores, signaling possible hallucinations or made-up information.
Promising results but limitations remain; Initial tests showed the semantic entropy approach agreed with human raters over 90% of the time in assessing an AI’s response consistency:
- The method is relatively straightforward to integrate into existing language models and could help make them more suitable for high-stakes applications.
- However, it won’t catch errors if the AI simply repeats the same false information, and there is a computational cost that delays response times.
- The researchers acknowledge their approach doesn’t address all the ways language models can still go wrong or produce false information.
Broader context: Reliability is critical as AI expands into new domains; Being able to trust AI-generated information is increasingly important as language models are used for more than just casual conversations:
- Applications in fields like health care and education will require a high degree of accuracy and truthfulness to avoid potentially harmful hallucinations.
- Identifying the source of AI confabulations remains challenging due to the complex algorithms and training data involved in large language models.
Looking ahead: Fighting fire with fire; Having AI systems like the proposed “truth cop” model audit other AI could be an important strategy for improving reliability:
- The semantic entropy technique is a clever approach to using the power of large language models to help control their own problematic outputs.
- However, the rapid pace of progress in AI means new methods will need to be continually developed and updated to keep up with ever-more sophisticated systems.
- Ultimately, a combination of technical solutions, human oversight, and appropriate constraints on AI’s applications will be needed to mitigate the risk of hallucinations while leveraging the technology’s potential benefits.