AI companies are failing to provide adequate justification for their safety claims based on dangerous capability evaluations, according to a new analysis by researcher Zach Stein-Perlman. Despite OpenAI, Google DeepMind, and Anthropic publishing evaluation reports intended to demonstrate their models’ safety, these reports largely fail to explain why their results—which often show strong performance—actually indicate the models aren’t dangerous, particularly for biothreat and cyber capabilities.
The core problem: Companies consistently fail to bridge the gap between their evaluation results and safety conclusions, often reporting strong model performance while claiming safety without clear reasoning.
- OpenAI acknowledges that “several of our biology evaluations indicate our models are on the cusp of being able to meaningfully help novices create known biological threats,” yet doesn’t explain how it concludes this or what results would change its assessment.
- On biothreat evaluations, OpenAI’s o3 model performs well enough that one evaluation “is reaching saturation,” and it matches or substantially outperforms human expert baselines on others—results that seem to suggest dangerous capabilities rather than rule them out.
- DeepMind claims Gemini 2.5 Pro lacks dangerous CBRN (chemical, biological, radiological, and nuclear) capabilities because “it does not yet consistently or completely enable progress through key bottleneck stages,” but provides no comparison to human performance or criteria for what would change their conclusion.
Poor elicitation undermines results: Companies systematically underestimate their models’ true capabilities through inadequate testing methods, making their safety claims even less reliable.
- On a subset of RE-Bench (a reverse engineering benchmark), Anthropic scored Sonnet 3.6 at 0.21, but external researchers at METR (Model Evaluation and Threat Research) achieved 0.51 on the same model—more than double the performance.
- Meta’s cyber evaluations showed such poor elicitation that when other researchers retested with better methods, performance jumped from 5% to 100% on some tests.
- DeepMind initially reported about 0.15 on an AI research and development evaluation, but later recalculated to allow multiple attempts, changing the score to about 0.72—a nearly five-fold increase.
Transparency gaps hide critical details: The evaluation processes remain largely opaque, preventing external verification of companies’ safety determinations.
- Companies often don’t provide clear thresholds for what constitutes dangerous capabilities or explain their decision-making criteria.
- Anthropic mentions “thresholds” for many evaluations but mostly doesn’t explain what these thresholds mean or which evaluations are actually driving safety decisions.
- In cyber evaluations, Anthropic reports that “Claude Opus 4 achieved generally higher performance than Claude Sonnet 3.7 on all three ranges” but provides no specific metrics or context for these claims.
What the analysis reveals: The fundamental issue extends beyond poor methodology to a lack of accountability and explanation in how companies interpret their results.
- Companies don’t explain what would change their minds about model safety or provide clear criteria for dangerous capabilities.
- There’s no external accountability mechanism to verify that evaluations are conducted properly or results interpreted correctly.
- When companies do report concerning results, they often fail to explain why these don’t indicate actual danger.
Why this matters: As AI capabilities rapidly advance, the gap between companies’ evaluation practices and their safety claims creates significant risks for public safety and regulatory oversight, potentially allowing dangerous capabilities to be deployed without adequate safeguards.
Recent Stories
DOE fusion roadmap targets 2030s commercial deployment as AI drives $9B investment
The Department of Energy has released a new roadmap targeting commercial-scale fusion power deployment by the mid-2030s, though the plan lacks specific funding commitments and relies on scientific breakthroughs that have eluded researchers for decades. The strategy emphasizes public-private partnerships and positions AI as both a research tool and motivation for developing fusion energy to meet data centers' growing electricity demands. The big picture: The DOE's roadmap aims to "deliver the public infrastructure that supports the fusion private sector scale up in the 2030s," but acknowledges it cannot commit to specific funding levels and remains subject to Congressional appropriations. Why...
Oct 17, 2025Tying it all together: Credo’s purple cables power the $4B AI data center boom
Credo, a Silicon Valley semiconductor company specializing in data center cables and chips, has seen its stock price more than double this year to $143.61, following a 245% surge in 2024. The company's signature purple cables, which cost between $300-$500 each, have become essential infrastructure for AI data centers, positioning Credo to capitalize on the trillion-dollar AI infrastructure expansion as hyperscalers like Amazon, Microsoft, and Elon Musk's xAI rapidly build out massive computing facilities. What you should know: Credo's active electrical cables (AECs) are becoming indispensable for connecting the massive GPU clusters required for AI training and inference. The company...
Oct 17, 2025Vatican launches Latin American AI network for human development
The Vatican hosted a two-day conference bringing together 50 global experts to explore how artificial intelligence can advance peace, social justice, and human development. The event launched the Latin American AI Network for Integral Human Development and established principles for ethical AI governance that prioritize human dignity over technological advancement. What you should know: The Pontifical Academy of Social Sciences, the Vatican's research body for social issues, organized the "Digital Rerum Novarum" conference on October 16-17, combining academic research with practical AI applications. Participants included leading experts from MIT, Microsoft, Columbia University, the UN, and major European institutions. The conference...