back
Get SIGNAL/NOISE in your inbox daily

The unexpected decline in chess-playing abilities among modern Large Language Models (LLMs) raises intriguing questions about how these AI systems develop and maintain specific skills.

Key findings and methodology: A comprehensive evaluation of various LLMs’ chess-playing capabilities against Stockfish AI at its lowest difficulty setting revealed surprising performance disparities.

  • GPT-3.5-Turbo-Instruct emerged as the sole strong performer, winning all its games against Stockfish
  • Popular models including Llama (both 3B and 70B versions), Qwen, Command-R, Gemma, and even GPT-4 performed poorly, consistently losing their matches
  • The testing process utilized specific grammars to constrain moves and addressed tokenization challenges to ensure fair evaluation

Historical context: The current results mark a significant departure from previous observations about LLMs’ chess capabilities.

  • Roughly a year ago, numerous LLMs demonstrated advanced amateur-level chess playing abilities
  • This apparent regression in chess performance across newer models challenges previous assumptions about how LLMs retain and develop specialized skills

Theoretical explanations: Several hypotheses attempt to explain this unexpected phenomenon.

  • Instruction tuning processes might inadvertently compromise chess-playing abilities present in base models
  • GPT-3.5-Turbo-Instruct’s superior performance could be attributed to more extensive chess training data
  • Different transformer architectures may influence chess-playing capabilities
  • Internal competition between various types of knowledge within LLMs could affect specific skill retention

Technical considerations: The research highlighted important implementation factors that could impact performance.

  • Move constraints and proper tokenization proved crucial for accurate assessment
  • The experimental setup ensured consistent evaluation conditions across all tested models
  • Technical limitations of certain models may have influenced their ability to process and respond to chess scenarios

Future implications: This unexpected variation in chess performance among LLMs raises fundamental questions about AI model development and skill retention.

  • The findings suggest that advancements in general AI capabilities don’t necessarily translate to improved performance in specific domains
  • Understanding why only one model maintains strong chess abilities could provide valuable insights into how LLMs learn and retain specialized skills
  • This research highlights the need for more detailed investigation into how different training approaches affect specific capabilities in AI systems

Recent Stories

Oct 17, 2025

DOE fusion roadmap targets 2030s commercial deployment as AI drives $9B investment

The Department of Energy has released a new roadmap targeting commercial-scale fusion power deployment by the mid-2030s, though the plan lacks specific funding commitments and relies on scientific breakthroughs that have eluded researchers for decades. The strategy emphasizes public-private partnerships and positions AI as both a research tool and motivation for developing fusion energy to meet data centers' growing electricity demands. The big picture: The DOE's roadmap aims to "deliver the public infrastructure that supports the fusion private sector scale up in the 2030s," but acknowledges it cannot commit to specific funding levels and remains subject to Congressional appropriations. Why...

Oct 17, 2025

Tying it all together: Credo’s purple cables power the $4B AI data center boom

Credo, a Silicon Valley semiconductor company specializing in data center cables and chips, has seen its stock price more than double this year to $143.61, following a 245% surge in 2024. The company's signature purple cables, which cost between $300-$500 each, have become essential infrastructure for AI data centers, positioning Credo to capitalize on the trillion-dollar AI infrastructure expansion as hyperscalers like Amazon, Microsoft, and Elon Musk's xAI rapidly build out massive computing facilities. What you should know: Credo's active electrical cables (AECs) are becoming indispensable for connecting the massive GPU clusters required for AI training and inference. The company...

Oct 17, 2025

Vatican launches Latin American AI network for human development

The Vatican hosted a two-day conference bringing together 50 global experts to explore how artificial intelligence can advance peace, social justice, and human development. The event launched the Latin American AI Network for Integral Human Development and established principles for ethical AI governance that prioritize human dignity over technological advancement. What you should know: The Pontifical Academy of Social Sciences, the Vatican's research body for social issues, organized the "Digital Rerum Novarum" conference on October 16-17, combining academic research with practical AI applications. Participants included leading experts from MIT, Microsoft, Columbia University, the UN, and major European institutions. The conference...