AI evaluation landscape evolves: Third-party evaluations of AI systems are becoming increasingly important as general-purpose AI usage grows, bringing both benefits and risks.
- Millions of people worldwide use AI systems like ChatGPT, Claude, and Stable Diffusion for various tasks, from writing documents to generating images.
- While these systems offer significant benefits, they also pose serious risks, including the production of non-consensual intimate imagery, facilitation of bioweapons production, and contribution to biased decisions.
- Third-party evaluations are crucial for assessing these risks independently from company interests and incorporating diverse perspectives that reflect real-world applications.
Key workshop takeaways: A recent Stanford-MIT-Princeton workshop highlighted several critical areas for improving third-party AI evaluations.
- Experts called for more legal and technical protections for third-party evaluators, often referred to as “safe harbors.”
- The need for standardization and coordination of evaluation processes was emphasized.
- Developing shared terminology across the field was identified as a priority to improve communication and understanding.
Current evaluation practices: Industry leaders shared insights on their approaches to AI evaluation and the challenges they face.
- Google DeepMind’s Nicholas Carlini highlighted the lack of standardized procedures for disclosing AI vulnerabilities, unlike the established norms in software penetration testing.
- OpenAI’s Lama Ahmad described three forms of evaluations: external red teaming, specialized third-party assessments, and independent research promoting better evaluation methods.
- Hugging Face’s Avijit Ghosh presented the coordinated flaw disclosures (CFD) framework as an analog to coordinated vulnerability disclosure (CVD) for software.
- Microsoft’s Victoria Westerhoff stressed the importance of diverse teams for comprehensive testing and shared insights on their open-source red teaming framework, PyRIT.
Evaluation design challenges: Experts discussed the complexities of designing effective AI evaluations.
- Deb Raji from Mozilla Foundation highlighted barriers facing AI auditors, including lack of protection against retaliation and little standardization of audit processes.
- Casey Ellis of Bugcrowd emphasized the similarities between AI and software vulnerability discovery, calling for consensus on terminology.
- CISA’s Jonathan Spring outlined seven concrete goals for security by design, stressing the integration of AI vulnerability management with existing cybersecurity practices.
- Lauren McIlvenny from CMU’s AISIRT emphasized the importance of secure development practices in reducing AI vulnerabilities.
Legal and policy considerations: The workshop explored the legal landscape surrounding AI evaluations and potential policy directions.
- Harley Geiger from the Hacking Policy Council noted that existing laws on software security need updating to cover non-security AI risks explicitly.
- HackerOne’s Ilona Cohen shared success stories of third-party AI evaluations and provided an overview of the current fragmented AI regulation landscape.
- Amit Elazari of OpenPolicy stressed the unique opportunity to shape definitions and concepts in AI evaluation, calling for community action to influence law and policy positively.
Broader implications: The workshop underscores the growing recognition of AI evaluation’s importance and the need for a coordinated, multi-stakeholder approach.
- As AI systems become more prevalent and powerful, the need for robust, independent evaluation mechanisms becomes increasingly critical.
- The parallel drawn between AI security and software security suggests that lessons from the latter could inform the development of AI evaluation practices.
- The call for legal protections and standardized processes indicates a shift towards a more formalized and regulated AI evaluation ecosystem.
- The emphasis on diverse perspectives and expertise in evaluations reflects the complex, multifaceted nature of AI risks and impacts.
Strengthening AI Accountability Through Better Third Party Evaluations