back
Get SIGNAL/NOISE in your inbox daily

MIT researchers have developed CodeSteer, a “smart coach” system that guides large language models to switch between text and code generation to solve complex problems more accurately. The system boosted LLM accuracy on symbolic tasks like math problems and Sudoku by more than 30 percent, addressing a key weakness where models often default to less effective textual reasoning even when code would be more appropriate.

How it works: CodeSteer operates as a smaller, specialized LLM that iteratively guides larger models through problem-solving processes.

  • The system first analyzes a query to determine whether text or code would be more effective, then generates prompts directing the larger LLM accordingly.
  • After receiving an answer, CodeSteer reviews the response and continues prompting the model to refine its approach until reaching a correct solution.
  • A symbolic checker evaluates code complexity to prevent the larger LLM from using overly simple or inefficient solutions, while a self-answer checker verifies correctness.

Key performance gains: Testing across 37 complex symbolic tasks showed significant improvements in LLM capabilities.

  • Average accuracy increased from 53.3 percent to 86.4 percent when CodeSteer was added to existing models.
  • The system enabled less sophisticated models to outperform more advanced models with enhanced reasoning skills.
  • Performance remained consistent across different LLMs and on previously unseen tasks.

Why this matters: The approach addresses a fundamental limitation in how LLMs handle computational versus linguistic tasks.

  • Initially trained to understand and predict human language, LLMs are more likely to answer queries using text, even when code would be more effective for problems like comparing numbers or solving math equations.
  • This could improve LLM performance on complex real-world applications like robot path planning or supply chain scheduling.

The bigger picture: Rather than developing entirely new models, the MIT team focused on enhancing existing capabilities through strategic guidance.

  • “There is a race to develop better and better models that are capable of doing everything, but we’ve taken a complementary approach,” says Chuchu Fan, an associate professor at MIT and the study’s senior author.
  • Fine-tuning the smaller CodeSteer model doesn’t alter the larger LLM, eliminating risks to its other capabilities.

What they’re saying: External experts praised the approach for its practical impact on LLM performance.

  • “This simple yet impactful method enables state-of-the-art LLMs to achieve significant performance improvements without requiring direct fine-tuning,” said Jinsung Yoon, a staff research scientist at Google Cloud AI.
  • Chi Wang from Google DeepMind highlighted how “this intelligent collaboration among diverse AI ‘agents’ paves the way for more robust and versatile applications in complex real-world scenarios.”

What’s next: The researchers plan to streamline CodeSteer’s iterative prompting process for faster performance and explore developing unified models that can switch between reasoning modes without requiring a separate assistant.

Recent Stories

Oct 17, 2025

DOE fusion roadmap targets 2030s commercial deployment as AI drives $9B investment

The Department of Energy has released a new roadmap targeting commercial-scale fusion power deployment by the mid-2030s, though the plan lacks specific funding commitments and relies on scientific breakthroughs that have eluded researchers for decades. The strategy emphasizes public-private partnerships and positions AI as both a research tool and motivation for developing fusion energy to meet data centers' growing electricity demands. The big picture: The DOE's roadmap aims to "deliver the public infrastructure that supports the fusion private sector scale up in the 2030s," but acknowledges it cannot commit to specific funding levels and remains subject to Congressional appropriations. Why...

Oct 17, 2025

Tying it all together: Credo’s purple cables power the $4B AI data center boom

Credo, a Silicon Valley semiconductor company specializing in data center cables and chips, has seen its stock price more than double this year to $143.61, following a 245% surge in 2024. The company's signature purple cables, which cost between $300-$500 each, have become essential infrastructure for AI data centers, positioning Credo to capitalize on the trillion-dollar AI infrastructure expansion as hyperscalers like Amazon, Microsoft, and Elon Musk's xAI rapidly build out massive computing facilities. What you should know: Credo's active electrical cables (AECs) are becoming indispensable for connecting the massive GPU clusters required for AI training and inference. The company...

Oct 17, 2025

Vatican launches Latin American AI network for human development

The Vatican hosted a two-day conference bringing together 50 global experts to explore how artificial intelligence can advance peace, social justice, and human development. The event launched the Latin American AI Network for Integral Human Development and established principles for ethical AI governance that prioritize human dignity over technological advancement. What you should know: The Pontifical Academy of Social Sciences, the Vatican's research body for social issues, organized the "Digital Rerum Novarum" conference on October 16-17, combining academic research with practical AI applications. Participants included leading experts from MIT, Microsoft, Columbia University, the UN, and major European institutions. The conference...