×
DeepMind, Berkeley Show How to Make AI Models Better, Not Bigger
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

Optimizing LLM performance through inference-time compute: Researchers from DeepMind and UC Berkeley have explored innovative ways to enhance large language model (LLM) performance by strategically allocating compute resources during inference, potentially reducing the need for larger models or extensive pre-training.

  • The study investigates how to maximize LLM performance using a fixed amount of inference-time compute, comparing different methods and their effectiveness against larger pre-trained models.
  • This approach aims to enable the deployment of smaller LLMs while achieving comparable performance to larger, more computationally expensive models.

Key strategies for inference-time compute optimization: The researchers focused on two main approaches to improve LLM performance without increasing model size or pre-training.

  • The first strategy involves modifying the proposal distribution, which is the process by which the LLM generates responses. This can be achieved by fine-tuning the LLM to iteratively revise its answers in complex reasoning-based settings.
  • The second strategy optimizes the verifier, which is the mechanism used to select the best answer from generated responses. This is done by training a process-based reward model that evaluates the correctness of individual steps in an answer.

Experimental findings: The researchers conducted experiments on the challenging MATH benchmark using PaLM-2 models to evaluate their approach.

  • For easier problems, allowing the model to iteratively refine its initial answer proved more effective than generating multiple samples in parallel.
  • For more difficult problems requiring exploration of different solution strategies, resampling multiple responses in parallel or deploying tree-search against a process-based reward model was more effective.
  • The efficacy of a particular test-time compute strategy was found to depend critically on both the nature of the specific problem and the base LLM used.

Performance improvements: The study demonstrated significant performance gains through strategic allocation of test-time compute.

  • By appropriately allocating test-time compute, the researchers were able to surpass the best-of-N baseline while using only about 25% of the computation.
  • This finding highlights the potential for developing adaptive “compute-optimal” strategies that select specific approaches based on the prompt to make the best use of additional computation.

Comparing test-time compute to pre-training: The researchers also investigated how test-time computation compares to additional pre-training in terms of performance improvements.

  • For easier and medium-difficulty questions, a smaller model with additional test-time compute performed comparably to a 14x larger model with more pre-training.
  • However, for the most challenging questions, additional pre-training compute proved to be more effective, indicating that current approaches to scaling test-time compute may not be a perfect substitute for scaling pre-training in all scenarios.

Future research directions: The study suggests several avenues for further exploration in optimizing LLM performance through inference-time compute.

  • Exploring more complex strategies that combine different revision and search techniques.
  • Developing more efficient methods for estimating question difficulty to better tailor the compute allocation strategy.
  • Investigating how to balance the allocation of compute resources between pre-training and inference for optimal performance across different types of tasks and difficulty levels.

Implications for AI development: This research points to a potential shift in how AI models are developed and deployed in the future.

  • The findings suggest that allocating more computational resources to inference rather than pre-training could be a more efficient approach in some scenarios.
  • This could lead to a future where fewer FLOPs (floating-point operations) are spent during pre-training and more are allocated to inference, potentially changing the landscape of AI model development and deployment.
DeepMind and UC Berkeley shows how to make the most of LLM inference-time compute

Recent News

AI builds architecture solutions from concept to construction

AI tools are giving architects intelligent collaborators that propose design solutions, handle technical tasks, and identify optimal materials while preserving human creative direction.

Push, pull, sniff: AI perception research advances beyond sight to touch and smell

AI systems struggle to understand sensory experiences like touch and smell because they lack physical bodies, though multimodal training is showing promise in bridging this comprehension gap.

Vibe coding shifts power dynamics in Silicon Valley

AI assistants now write most of the code for tech startups, shifting value from technical skills to creative vision and idea generation.