×
DeepMind’s “PEER” Architecture Scales Language Models While Keeping Costs Low
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

DeepMind’s new Mixture-of-Experts (MoE) architecture, PEER, scales language models to millions of tiny experts, improving performance while keeping computational costs down.

Key innovation: Parameter Efficient Expert Retrieval (PEER); DeepMind’s novel MoE architecture introduces a learned index to efficiently route input data to a vast pool of millions of tiny experts, enabling significant scaling without slowing down inference:

  • PEER replaces the fixed router in traditional MoE with a fast initial computation to create a shortlist of potential expert candidates before activating the top experts.
  • Unlike previous MoE architectures with large experts, PEER uses tiny single-neuron experts in the hidden layer, allowing knowledge sharing among experts and improving parameter efficiency.
  • PEER employs a multi-head retrieval approach, similar to the multi-head attention in transformer models, to compensate for the small expert size.

Overcoming MoE scaling limitations: Current MoE techniques are limited to a relatively small number of experts due to fixed routers that need readjustment when new experts are added:

  • Studies suggest increasing MoE “granularity” (number of experts) can improve performance, especially with increased model size and training data.
  • High-granularity MoE also enables models to learn new knowledge more efficiently by adding experts and proper regularization.
  • PEER addresses these challenges, allowing MoE to scale to millions of experts without the limitations of fixed routers.

Outperforming dense models and other MoEs: Experiments show PEER models achieve a better performance-compute tradeoff compared to transformer models with dense feedforward layers and other MoE architectures:

  • PEER reaches lower perplexity scores with the same computational budget as counterparts.
  • Increasing the number of experts in PEER leads to further perplexity reduction, challenging the belief that MoE efficiency peaks with a limited number of experts.

Broader implications for large language models: PEER’s approach can help further reduce the cost and complexity of training and serving very large language models:

  • PEER might be used in DeepMind’s Gemini 1.5 models, which reportedly use “a new Mixture-of-Experts (MoE) architecture.”
  • The architecture shows potential for dynamically adding new knowledge and features to LLMs by adapting PEER to select parameter-efficient fine-tuning adapters at runtime.
  • As the race to scale LLMs continues, PEER represents a significant step in improving the performance-compute tradeoff, positioning it as a competitive alternative to dense layers in foundation models.
DeepMind’s PEER scales language models with millions of tiny experts

Recent News

Royally pained: King Charles’ charity warns AI could widen global digital divide

Despite AI advancements championed by King Charles III's charity, millions worldwide lack basic digital access, threatening to deepen global inequality unless governments and businesses implement inclusive technology policies.

AI chatbots are transforming mental health care amid global therapist shortage

AI chatbots provide 24/7, stigma-free mental health support amid a global shortage of professionals, though they remain better suited for mild to moderate issues than severe conditions.

Crunchy AI: Softmax’s “organic alignment” approach draws from nature to reimagine AI-human collaboration

Inspired by ant colonies and natural systems, Softmax aims to develop AI that collaborates with humans through coordination principles rather than hierarchical control.