×
Google’s new LLM architecture cuts costs with memory separation
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

Large language models (LLMs) are getting a significant upgrade through Google’s new Titans architecture, which reimagines how AI systems store and process information by separating different types of memory components.

Key innovation: Google researchers have developed a neural network architecture called Titans that extends model memory capabilities while keeping computational costs manageable.

  • The architecture introduces a novel three-part system that handles information processing and storage differently from traditional LLMs
  • By segregating memory functions, Titans can process sequences up to 2 million tokens in length
  • Early testing shows Titans outperforming GPT-4 on long-sequence tasks despite using fewer parameters

Technical framework: The Titans architecture consists of three distinct modules that work together to process and retain information.

  • A “core” module manages short-term memory using traditional attention mechanisms
  • A “long-term memory” component employs neural memory for storing important information
  • A “persistent memory” module maintains fixed parameters after training, serving as a stable knowledge base

Memory management innovation: Titans employs a sophisticated “surprise” mechanism to determine which information deserves long-term storage.

  • This selective approach helps optimize memory usage and computational efficiency
  • The system can maintain longer context windows without the dramatic cost increases typically associated with expanding model capacity
  • The architecture reduces reliance on retrieval-augmented generation (RAG), a commonly used technique for extending model context

Implementation and accessibility: Google is taking steps to make this technology available to the broader AI community.

  • Plans are in place to release both PyTorch and JAX implementations for training and evaluation
  • The architecture could be integrated into Google’s existing models like Gemini and Gemma
  • The open-source release will allow researchers and developers to build upon and improve the technology

Future implications: The Titans architecture represents a significant step toward making large language models more practical and cost-effective for enterprise applications, though questions remain about real-world performance at scale and integration challenges with existing systems.

Google’s new neural-net LLM architecture separates memory components to control exploding costs of capacity and compute

Recent News

7 ways to optimize your business for ChatGPT recommendations

Companies must adapt their digital strategy with specific expertise, consistent information across platforms, and authoritative content to appear in AI-powered recommendation results.

Robin Williams’ daughter Zelda slams OpenAI’s Ghibli-style images amid artistic and ethical concerns

Robin Williams' daughter condemns OpenAI's AI-generated Ghibli-style images, highlighting both environmental costs and the contradiction with Miyazaki's well-documented opposition to artificial intelligence in creative work.

AI search tools provide wrong answers up to 60% of the time despite growing adoption

Independent testing reveals AI search tools frequently provide incorrect information, with error rates ranging from 37% to 94% across major platforms despite their growing popularity as Google alternatives.