Large language models (LLMs) can be easily compromised with medical misinformation by altering just 0.001% of their training data, according to new research from New York University.
Key findings: Researchers discovered that injecting a tiny fraction of false medical information into LLM training data can significantly impact the accuracy of AI responses.
- Even when misinformation made up just 0.001% of training data, over 7% of the LLM’s answers contained incorrect medical information
- The compromised models passed standard medical performance tests, making the poisoning difficult to detect
- For a large model like LLaMA 2, researchers estimated it would cost under $100 to generate enough misleading content to compromise its medical responses
Methodology breakdown: The research team focused on The Pile, a common LLM training database, examining how medical misinformation affects AI responses across different medical specialties.
- Researchers selected 60 medical topics across general medicine, neurosurgery, and medications
- They used GPT-3.5 to generate convincing medical misinformation by bypassing its safeguards
- The team tested different percentages of false information to find the minimum amount needed to compromise the system
Technical implications: The study revealed several concerning vulnerabilities in how LLMs process and validate medical information.
- Misinformation can be hidden in invisible webpage text or metadata that still gets incorporated into training data
- Standard methods to improve model performance post-training proved ineffective at addressing the poisoned data
- The compromised models showed degraded performance even on medical topics not directly targeted by the false information
Potential solutions: The researchers developed some promising approaches to combat medical misinformation in LLMs.
- They created an algorithm that cross-references medical terminology against validated biomedical knowledge databases
- This system successfully flagged a high percentage of medical misinformation for human review
- However, the solution may not be practical for general-purpose LLMs used in search engines and other consumer applications
Broader challenges: The research highlights fundamental issues with medical information in AI systems that extend beyond intentional poisoning.
- General-purpose LLMs trained on internet data are already exposed to significant medical misinformation
- Even curated medical databases like PubMed contain outdated treatments and disproven research
- The rapid evolution of medical knowledge means that maintaining accurate, up-to-date training data remains a significant challenge
Future implications: The ease of compromising medical AI systems raises serious concerns about their reliability and potential misuse.
- The low cost and effort required to poison LLMs could incentivize bad actors to spread medical misinformation
- As LLMs become more integrated into search engines and healthcare applications, ensuring their accuracy becomes increasingly critical
- The challenge of creating trustworthy medical AI systems may prove even more complex than advancing medical science itself
It’s remarkably easy to inject new medical misinformation into LLMs