The Department of Defense‘s Chief Digital and Artificial Intelligence Office (CDAO) has completed a pilot program testing AI chatbots in military medicine through crowdsourced security testing.
Program Overview: The Crowdsourced AI Red-Teaming (CAIRT) Assurance Program evaluated Large Language Model (LLM) chatbots for military medical applications through collaborative testing with multiple stakeholders.
- Tech company Humane Intelligence led the initiative in partnership with the Defense Health Agency (DHA) and Program Executive Office, Defense Healthcare Management Systems
- Over 200 participants from various military medical institutions took part in the testing
- The program examined two specific use cases: clinical note summarization and a medical advisory chatbot
Testing Methodology and Results: Red-teaming, which involves intentionally probing for system weaknesses, was used to evaluate three popular LLM platforms for potential vulnerabilities.
- The testing process uncovered more than 800 potential vulnerabilities and biases
- Participants included clinical providers and healthcare analysts from multiple military health organizations
- The findings will be used to create benchmark datasets for evaluating future AI tools and vendors
Key Implications: The pilot program represents a significant step in developing responsible AI implementation within military healthcare settings.
- Results will help shape Department of Defense policies on Generative AI use in medical applications
- The program serves as a pathfinder for future AI testing and deployment
- All implementations will comply with risk management practices outlined in OMB M-24-10 when applicable
Expert Perspective: Dr. Matthew Johnson, CDAO’s initiative lead, emphasized the program’s role in early-stage AI development.
- The pilot provides essential testing data for future AI implementations
- Findings will help identify potential issues before deployment
- Results will guide future research and development of military medical AI systems
Looking Forward: CDAO’s continued commitment to rigorous AI testing through programs like CAIRT suggests an increasingly sophisticated approach to military medical technology implementation, though questions remain about how quickly these systems can be safely deployed in real-world medical settings.
CDAO Sponsors Crowdsourced AI Assurance Pilot in the Context of Milita