×
Google’s new LLM architecture cuts costs with memory separation
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

Large language models (LLMs) are getting a significant upgrade through Google’s new Titans architecture, which reimagines how AI systems store and process information by separating different types of memory components.

Key innovation: Google researchers have developed a neural network architecture called Titans that extends model memory capabilities while keeping computational costs manageable.

  • The architecture introduces a novel three-part system that handles information processing and storage differently from traditional LLMs
  • By segregating memory functions, Titans can process sequences up to 2 million tokens in length
  • Early testing shows Titans outperforming GPT-4 on long-sequence tasks despite using fewer parameters

Technical framework: The Titans architecture consists of three distinct modules that work together to process and retain information.

  • A “core” module manages short-term memory using traditional attention mechanisms
  • A “long-term memory” component employs neural memory for storing important information
  • A “persistent memory” module maintains fixed parameters after training, serving as a stable knowledge base

Memory management innovation: Titans employs a sophisticated “surprise” mechanism to determine which information deserves long-term storage.

  • This selective approach helps optimize memory usage and computational efficiency
  • The system can maintain longer context windows without the dramatic cost increases typically associated with expanding model capacity
  • The architecture reduces reliance on retrieval-augmented generation (RAG), a commonly used technique for extending model context

Implementation and accessibility: Google is taking steps to make this technology available to the broader AI community.

  • Plans are in place to release both PyTorch and JAX implementations for training and evaluation
  • The architecture could be integrated into Google’s existing models like Gemini and Gemma
  • The open-source release will allow researchers and developers to build upon and improve the technology

Future implications: The Titans architecture represents a significant step toward making large language models more practical and cost-effective for enterprise applications, though questions remain about real-world performance at scale and integration challenges with existing systems.

Google’s new neural-net LLM architecture separates memory components to control exploding costs of capacity and compute

Recent News

AI pushes construction towards zero-incident workplaces

Advanced sensors and predictive algorithms are helping construction firms identify hazards before accidents occur, building on safety improvements that have already reduced fatalities by 90% since the 1940s.

Legal AI use skyrockets as firms prioritize workflow efficiency

Legal professionals rapidly adopt AI tools to eliminate administrative burdens and streamline workflows, with usage surging from 19% to 79% in one year.

Exponential decay offers insight into AI weaknesses

AI systems show consistent failure patterns over time, with success rates declining exponentially the longer they work on complex research tasks.