back
Get SIGNAL/NOISE in your inbox daily

A new study from Tsinghua University challenges prevailing assumptions about how reinforcement learning (RL) enhances large language models’ reasoning abilities. The research suggests that rather than developing new reasoning capabilities, RL primarily amplifies existing reasoning pathways by increasing their sampling frequency, potentially at the cost of reasoning diversity. This finding has significant implications for AI development strategies and raises questions about the most effective approaches for improving AI reasoning capabilities beyond superficial performance metrics.

The big picture: Researchers discovered that models fine-tuned with reinforcement learning on verifiable rewards (RLVR) initially appear to reason better but actually narrow the model’s reasoning pathways.

  • When comparing base models to their RL-tuned counterparts using pass@k evaluation metrics, RL-tuned models show better performance at low k values but worse performance at higher values.
  • This unexpected pattern suggests RL doesn’t create new capabilities but rather amplifies certain reasoning pathways while potentially eliminating others.

Key details: The study found that RL-tuned models outperform base models when given a single attempt (pass@1) but underperform when allowed multiple attempts (pass@256), especially on math benchmarks.

  • The crossover point where base models begin outperforming RL-tuned models can occur with as few as four attempts (k=4) on some benchmarks.
  • This performance pattern indicates that RL “narrows the reasoning boundary” by increasing the probability of previously successful reasoning paths at the cost of diversity.

Behind the numbers: Lower perplexity scores for RL model generations compared to base models suggest that RL-tuned outputs remain within the distribution of responses the base model could generate.

  • This finding reinforces the theory that RL primarily redistributes probability mass toward certain reasoning paths rather than creating new reasoning capabilities.
  • Sanity checks revealed that for most questions, base models can generate at least one correct reasoning trace, confirming they possess the capability being amplified by RL.

Counterpoints: The researchers acknowledge several limitations that might affect the generalizability of their findings.

  • RL can still enable emergent capabilities on long-horizon tasks not explored in this study.
  • Testing was limited to specific domains (math, code generation, and mathematical visual reasoning) and models up to 32B parameters.
  • Effects might differ for larger models or different types of reasoning tasks.

Why this matters: The study suggests that current RL approaches may not be optimal for developing more capable reasoning in AI systems, despite improving surface-level metrics.

  • These findings connect to previous work on “mode collapse” in RLHF, suggesting a broader pattern in how reinforcement learning affects language models.
  • In contrast, distillation from stronger teacher models was found to “expand the reasoning boundary,” indicating alternative approaches may better develop genuine reasoning capabilities.

Reading between the lines: The research challenges the narrative that reinforcement learning necessarily improves a model’s fundamental capabilities, suggesting AI developers may need to rethink evaluation metrics and development strategies.

  • By relying on pass@1 metrics alone, researchers might overestimate the improvements from RL while missing its potential narrowing effect on reasoning diversity.
  • The findings suggest a more nuanced approach to model evaluation that considers both reliability and the breadth of reasoning capabilities.

Recent Stories

Oct 17, 2025

DOE fusion roadmap targets 2030s commercial deployment as AI drives $9B investment

The Department of Energy has released a new roadmap targeting commercial-scale fusion power deployment by the mid-2030s, though the plan lacks specific funding commitments and relies on scientific breakthroughs that have eluded researchers for decades. The strategy emphasizes public-private partnerships and positions AI as both a research tool and motivation for developing fusion energy to meet data centers' growing electricity demands. The big picture: The DOE's roadmap aims to "deliver the public infrastructure that supports the fusion private sector scale up in the 2030s," but acknowledges it cannot commit to specific funding levels and remains subject to Congressional appropriations. Why...

Oct 17, 2025

Tying it all together: Credo’s purple cables power the $4B AI data center boom

Credo, a Silicon Valley semiconductor company specializing in data center cables and chips, has seen its stock price more than double this year to $143.61, following a 245% surge in 2024. The company's signature purple cables, which cost between $300-$500 each, have become essential infrastructure for AI data centers, positioning Credo to capitalize on the trillion-dollar AI infrastructure expansion as hyperscalers like Amazon, Microsoft, and Elon Musk's xAI rapidly build out massive computing facilities. What you should know: Credo's active electrical cables (AECs) are becoming indispensable for connecting the massive GPU clusters required for AI training and inference. The company...

Oct 17, 2025

Vatican launches Latin American AI network for human development

The Vatican hosted a two-day conference bringing together 50 global experts to explore how artificial intelligence can advance peace, social justice, and human development. The event launched the Latin American AI Network for Integral Human Development and established principles for ethical AI governance that prioritize human dignity over technological advancement. What you should know: The Pontifical Academy of Social Sciences, the Vatican's research body for social issues, organized the "Digital Rerum Novarum" conference on October 16-17, combining academic research with practical AI applications. Participants included leading experts from MIT, Microsoft, Columbia University, the UN, and major European institutions. The conference...