back
Get SIGNAL/NOISE in your inbox daily

MIT researchers have developed a breakthrough training technique that can boost large language models’ accuracy on complex reasoning tasks by up to sixfold. The method, called test-time training, temporarily updates a model’s parameters during deployment to help it adapt to challenging new problems that require strategic planning, logical deduction, or process optimization.

What you should know: Test-time training represents a significant advance over traditional in-context learning by actually updating model parameters rather than just providing examples.

  • The technique involves temporarily modifying some of a model’s internal variables using task-specific data, then reverting the model to its original state after making predictions.
  • Researchers found that combining test-time training with in-context learning produces dramatically better results than either method alone, particularly for problems requiring logic and reasoning.
  • The approach uses low-rank adaptation to update only a small number of parameters, making the process more efficient for real-world deployment.

Why this matters: Current LLMs struggle with unfamiliar tasks that require complex reasoning, limiting their effectiveness in critical applications like medical diagnostics, supply chain management, and financial analysis.

  • An accounting firm’s LLM might excel at summarizing reports but fail when tasked with predicting market trends or identifying fraudulent transactions.
  • The breakthrough could enable off-the-shelf LLMs to tackle sophisticated problems involving planning and abstraction without requiring expensive retraining.

How it works: The researchers create task-specific datasets by expanding on the small set of examples typically used in in-context learning.

  • They generate new training inputs by slightly modifying existing problems and solutions, such as horizontally flipping input data.
  • The model trains on outputs from this expanded dataset, developing new skills that persist temporarily during the specific task.
  • “We find that test-time training is a much stronger form of learning. While simply providing examples can modestly boost accuracy, actually updating the model with those examples can lead to significantly better performance, particularly in challenging domains,” says Mehul Damani, a graduate student at MIT.

The trade-offs: While highly effective, test-time training requires additional computational resources and time.

  • A model that typically responds in under a minute might take five to 10 minutes when using test-time training.
  • The method is deployed on a per-instance basis, meaning users must apply it individually for each challenging task.
  • “We wouldn’t want to do this for all user queries, but it is useful if you have a very hard task that you want to the model to solve well,” explains lead author Ekin Akyürek PhD ’25.

Performance results: Testing on benchmark datasets of extremely complex problems, including IQ puzzles, showed remarkable improvements.

  • The technique achieved up to sixfold accuracy improvements over methods using only in-context learning.
  • Tasks involving structured patterns or completely unfamiliar data types showed the largest performance gains.
  • “For simpler tasks, in-context learning might be OK. But updating the parameters themselves might develop a new skill in the model,” Damani notes.

What they’re saying: The research team emphasizes that this represents genuine learning capabilities that current LLMs lack after deployment.

  • “Genuine learning — what we did here with test-time training — is something these models can’t do on their own after they are shipped. They can’t gain new skills or get better at a task. But we have shown that if you push the model a little bit to do actual learning, you see that huge improvements in performance can happen,” says Akyürek.

Looking ahead: The researchers aim to develop models that can automatically determine when to use test-time training versus in-context learning.

  • The long-term goal is an LLM that can assess incoming queries and implement the optimal training strategy without human intervention.
  • This work could eventually lead to models that continually learn and adapt to new challenges over time.

Recent Stories

Oct 17, 2025

DOE fusion roadmap targets 2030s commercial deployment as AI drives $9B investment

The Department of Energy has released a new roadmap targeting commercial-scale fusion power deployment by the mid-2030s, though the plan lacks specific funding commitments and relies on scientific breakthroughs that have eluded researchers for decades. The strategy emphasizes public-private partnerships and positions AI as both a research tool and motivation for developing fusion energy to meet data centers' growing electricity demands. The big picture: The DOE's roadmap aims to "deliver the public infrastructure that supports the fusion private sector scale up in the 2030s," but acknowledges it cannot commit to specific funding levels and remains subject to Congressional appropriations. Why...

Oct 17, 2025

Tying it all together: Credo’s purple cables power the $4B AI data center boom

Credo, a Silicon Valley semiconductor company specializing in data center cables and chips, has seen its stock price more than double this year to $143.61, following a 245% surge in 2024. The company's signature purple cables, which cost between $300-$500 each, have become essential infrastructure for AI data centers, positioning Credo to capitalize on the trillion-dollar AI infrastructure expansion as hyperscalers like Amazon, Microsoft, and Elon Musk's xAI rapidly build out massive computing facilities. What you should know: Credo's active electrical cables (AECs) are becoming indispensable for connecting the massive GPU clusters required for AI training and inference. The company...

Oct 17, 2025

Vatican launches Latin American AI network for human development

The Vatican hosted a two-day conference bringing together 50 global experts to explore how artificial intelligence can advance peace, social justice, and human development. The event launched the Latin American AI Network for Integral Human Development and established principles for ethical AI governance that prioritize human dignity over technological advancement. What you should know: The Pontifical Academy of Social Sciences, the Vatican's research body for social issues, organized the "Digital Rerum Novarum" conference on October 16-17, combining academic research with practical AI applications. Participants included leading experts from MIT, Microsoft, Columbia University, the UN, and major European institutions. The conference...