Despite superhuman AI performance in Go, researchers have discovered exploitable algorithmic flaws that allow even novice human players to defeat top Go AIs using simple, unorthodox strategies.
Attempting to harden AI defenses: Researchers at MIT and FAR AI tested three methods to improve the “worst-case” performance of the KataGo algorithm against adversarial attacks:
- Fine-tuning KataGo with more examples of the exploitative cyclic strategies initially showed promise, but a fine-tuned attacker quickly found a variation that reduced KataGo’s win rate to just 9%.
- An iterative “arms race” training approach, where defensive models tried to plug holes discovered by adversarial models, resulted in a final defensive algorithm winning only 19% of games against a novel attacking variation.
- Using vision transformers to avoid potential biases in KataGo’s convolutional neural networks also failed, with the defensive model winning only 22% of games against a human-replicable cyclic attack.
Implications for AI robustness: The research highlights the importance of evaluating “worst-case” performance in AI systems, even when “average-case” performance seems superhuman:
- The exploitable holes in KataGo demonstrate that otherwise “weak” adversaries can find vulnerabilities that cause the system to break down, despite its ability to dominate human players on average.
- This principle extends to other AI domains, such as large language models failing at simple math problems or visual AI struggling with basic geometric shapes, despite excelling at more complex tasks.
- Improving worst-case scenarios is crucial for avoiding embarrassing public mistakes when deploying AI systems.
Challenges and future directions: The study suggests that determined adversaries can discover new vulnerabilities in AI algorithms more quickly than the algorithms can evolve to fix them, especially in less controlled environments compared to Go:
- While the researchers’ methods could not prevent new attacks, they successfully defended against previously identified exploits, indicating that training against a sufficiently large corpus of attacks might eventually lead to full defense.
- However, the difficulty of solving this issue in a simple domain like Go suggests that patching similar vulnerabilities in more complex AI systems, such as ChatGPT jailbreaks, may be even more challenging in the near term.
- Future research proposals aim to make AI systems more robust against worst-case scenarios, which could be as valuable as pursuing new superhuman capabilities.
Analyzing deeper: This research underscores the ongoing challenges in creating truly robust AI systems that are resilient against adversarial attacks and worst-case scenarios. As AI continues to advance and be deployed in increasingly critical domains, addressing these vulnerabilities will be essential to ensure the reliability and safety of AI-powered systems. The findings also raise questions about the extent to which AI can be fully hardened against exploitation, given the speed at which adversaries can discover new holes compared to the pace of defensive adaptation. Striking a balance between pushing the boundaries of AI capabilities and ensuring robust performance in all scenarios will likely remain a key focus for researchers and developers in the field.
Recent Stories
DOE fusion roadmap targets 2030s commercial deployment as AI drives $9B investment
The Department of Energy has released a new roadmap targeting commercial-scale fusion power deployment by the mid-2030s, though the plan lacks specific funding commitments and relies on scientific breakthroughs that have eluded researchers for decades. The strategy emphasizes public-private partnerships and positions AI as both a research tool and motivation for developing fusion energy to meet data centers' growing electricity demands. The big picture: The DOE's roadmap aims to "deliver the public infrastructure that supports the fusion private sector scale up in the 2030s," but acknowledges it cannot commit to specific funding levels and remains subject to Congressional appropriations. Why...
Oct 17, 2025Tying it all together: Credo’s purple cables power the $4B AI data center boom
Credo, a Silicon Valley semiconductor company specializing in data center cables and chips, has seen its stock price more than double this year to $143.61, following a 245% surge in 2024. The company's signature purple cables, which cost between $300-$500 each, have become essential infrastructure for AI data centers, positioning Credo to capitalize on the trillion-dollar AI infrastructure expansion as hyperscalers like Amazon, Microsoft, and Elon Musk's xAI rapidly build out massive computing facilities. What you should know: Credo's active electrical cables (AECs) are becoming indispensable for connecting the massive GPU clusters required for AI training and inference. The company...
Oct 17, 2025Vatican launches Latin American AI network for human development
The Vatican hosted a two-day conference bringing together 50 global experts to explore how artificial intelligence can advance peace, social justice, and human development. The event launched the Latin American AI Network for Integral Human Development and established principles for ethical AI governance that prioritize human dignity over technological advancement. What you should know: The Pontifical Academy of Social Sciences, the Vatican's research body for social issues, organized the "Digital Rerum Novarum" conference on October 16-17, combining academic research with practical AI applications. Participants included leading experts from MIT, Microsoft, Columbia University, the UN, and major European institutions. The conference...