Stanford HAI: AI accountability improves with third-party evaluations

Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage

Join Now

AI evaluation landscape evolves: Third-party evaluations of AI systems are becoming increasingly important as general-purpose AI usage grows, bringing both benefits and risks.

Millions of people worldwide use AI systems like ChatGPT, Claude, and Stable Diffusion for various tasks, from writing documents to generating images.
While these systems offer significant benefits, they also pose serious risks, including the production of non-consensual intimate imagery, facilitation of bioweapons production, and contribution to biased decisions.
Third-party evaluations are crucial for assessing these risks independently from company interests and incorporating diverse perspectives that reflect real-world applications.

Key workshop takeaways: A recent Stanford-MIT-Princeton workshop highlighted several critical areas for improving third-party AI evaluations.

Experts called for more legal and technical protections for third-party evaluators, often referred to as “safe harbors.”
The need for standardization and coordination of evaluation processes was emphasized.
Developing shared terminology across the field was identified as a priority to improve communication and understanding.

Current evaluation practices: Industry leaders shared insights on their approaches to AI evaluation and the challenges they face.

Google DeepMind’s Nicholas Carlini highlighted the lack of standardized procedures for disclosing AI vulnerabilities, unlike the established norms in software penetration testing.
OpenAI’s Lama Ahmad described three forms of evaluations: external red teaming, specialized third-party assessments, and independent research promoting better evaluation methods.
Hugging Face’s Avijit Ghosh presented the coordinated flaw disclosures (CFD) framework as an analog to coordinated vulnerability disclosure (CVD) for software.
Microsoft’s Victoria Westerhoff stressed the importance of diverse teams for comprehensive testing and shared insights on their open-source red teaming framework, PyRIT.

Evaluation design challenges: Experts discussed the complexities of designing effective AI evaluations.

Deb Raji from Mozilla Foundation highlighted barriers facing AI auditors, including lack of protection against retaliation and little standardization of audit processes.
Casey Ellis of Bugcrowd emphasized the similarities between AI and software vulnerability discovery, calling for consensus on terminology.
CISA’s Jonathan Spring outlined seven concrete goals for security by design, stressing the integration of AI vulnerability management with existing cybersecurity practices.
Lauren McIlvenny from CMU’s AISIRT emphasized the importance of secure development practices in reducing AI vulnerabilities.

Legal and policy considerations: The workshop explored the legal landscape surrounding AI evaluations and potential policy directions.

Harley Geiger from the Hacking Policy Council noted that existing laws on software security need updating to cover non-security AI risks explicitly.
HackerOne’s Ilona Cohen shared success stories of third-party AI evaluations and provided an overview of the current fragmented AI regulation landscape.
Amit Elazari of OpenPolicy stressed the unique opportunity to shape definitions and concepts in AI evaluation, calling for community action to influence law and policy positively.

Broader implications: The workshop underscores the growing recognition of AI evaluation’s importance and the need for a coordinated, multi-stakeholder approach.

As AI systems become more prevalent and powerful, the need for robust, independent evaluation mechanisms becomes increasingly critical.
The parallel drawn between AI security and software security suggests that lessons from the latter could inform the development of AI evaluation practices.
The call for legal protections and standardized processes indicates a shift towards a more formalized and regulated AI evaluation ecosystem.
The emphasis on diverse perspectives and expertise in evaluations reflects the complex, multifaceted nature of AI risks and impacts.

Strengthening AI Accountability Through Better Third Party Evaluations

Stanford HAI

Menu

Stanford HAI: AI accountability improves with third-party evaluations

Recent News

Condos with filters? Real estate agents use AI to fake property photos, sparking legal concerns

“Learn to AI”: California propels workforce training with tech giants across public education system

Qualcomm plans AI server chips for 2028 amid competitive challenges

Join the revolution

CO/AI

Resources

Join the revolution

Menu

Welcome

Stanford HAI: AI accountability improves with third-party evaluations

Recent News

Condos with filters? Real estate agents use AI to fake property photos, sparking legal concerns

“Learn to AI”: California propels workforce training with tech giants across public education system

Qualcomm plans AI server chips for 2028 amid competitive challenges

Join the revolution

CO/AI

Resources

Join the revolution