×
Google’s new LLM architecture cuts costs with memory separation
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

Large language models (LLMs) are getting a significant upgrade through Google’s new Titans architecture, which reimagines how AI systems store and process information by separating different types of memory components.

Key innovation: Google researchers have developed a neural network architecture called Titans that extends model memory capabilities while keeping computational costs manageable.

  • The architecture introduces a novel three-part system that handles information processing and storage differently from traditional LLMs
  • By segregating memory functions, Titans can process sequences up to 2 million tokens in length
  • Early testing shows Titans outperforming GPT-4 on long-sequence tasks despite using fewer parameters

Technical framework: The Titans architecture consists of three distinct modules that work together to process and retain information.

  • A “core” module manages short-term memory using traditional attention mechanisms
  • A “long-term memory” component employs neural memory for storing important information
  • A “persistent memory” module maintains fixed parameters after training, serving as a stable knowledge base

Memory management innovation: Titans employs a sophisticated “surprise” mechanism to determine which information deserves long-term storage.

  • This selective approach helps optimize memory usage and computational efficiency
  • The system can maintain longer context windows without the dramatic cost increases typically associated with expanding model capacity
  • The architecture reduces reliance on retrieval-augmented generation (RAG), a commonly used technique for extending model context

Implementation and accessibility: Google is taking steps to make this technology available to the broader AI community.

  • Plans are in place to release both PyTorch and JAX implementations for training and evaluation
  • The architecture could be integrated into Google’s existing models like Gemini and Gemma
  • The open-source release will allow researchers and developers to build upon and improve the technology

Future implications: The Titans architecture represents a significant step toward making large language models more practical and cost-effective for enterprise applications, though questions remain about real-world performance at scale and integration challenges with existing systems.

Google’s new neural-net LLM architecture separates memory components to control exploding costs of capacity and compute

Recent News

AI’s energy demands set to triple, but economic gains expected to surpass costs

Economic gains from AI will reach 0.5% of global GDP annually through 2030, outweighing environmental costs despite data centers potentially consuming as much electricity as India.

AI-generated dolls spark backlash from traditional art community

Human artists rally against viral AI doll portrait trend that threatens custom figure makers and raises questions about artistic authenticity.

The impact of LLMs on problem-solving in software engineering

Developing deep expertise in a specific domain remains more valuable than general AI skills as technology continues to reshape technical professions.