×
WEKA’s new AI data platform cuts inference costs by 24% through NVIDIA partnership
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

WEKA’s newest AI data platform innovations are set to reshape the technological landscape for enterprises deploying advanced AI systems. The company’s expanded partnership with NVIDIA introduces capabilities specifically designed to address memory bottlenecks that have historically limited inference performance for large language models. By combining WEKA’s data management expertise with NVIDIA’s computing power, these advancements aim to dramatically reduce costs and improve efficiency for organizations deploying increasingly complex AI reasoning and agent-based systems.

The big picture: WEKA has strengthened its NVIDIA partnership through integration with the NVIDIA AI Data Platform reference design and achieved new storage certifications for NVIDIA’s cloud and enterprise ecosystems.

  • The integration positions WEKA as a key infrastructure provider for companies deploying agentic AI and reasoning models that require optimized data pipelines.
  • These certifications signal WEKA’s readiness to support large-scale AI implementations through NVIDIA’s partner network and certified systems.

Key innovation: WEKA’s new Augmented Memory Grid capability combines its data platform software with NVIDIA accelerated computing to solve critical bottlenecks in AI inference workloads.

  • The technology accelerates AI inference by maximizing tokens processed per second and improving token efficiency for large language models.
  • This capability enables memory extension for large model inferencing by “three orders of magnitude,” representing a significant breakthrough for deploying memory-intensive AI applications.

By the numbers: The Augmented Memory Grid demonstrates remarkable performance improvements that could dramatically reduce inference costs.

  • When processing 105,000 tokens, the system reduced time to first token by 41x compared to standard implementations.
  • The technology can lower token throughput costs by up to 24% for inference systems, addressing a major economic challenge for enterprises deploying large language models.

What’s next: WEKA’s new capabilities and certifications will be rolling out through spring 2025.

  • The NCP reference architecture for NVIDIA Blackwell systems will become available later this month.
  • The WEKA Augmented Memory Grid, which promises the most significant performance gains, will be generally available in Spring 2025.
WEKA Expands NVIDIA Integrations and Certifications, Unveils Augmented Memory Grid at GTC 2025

Recent News

Runway’s Gen-4 AI model solves video character consistency problem for filmmakers

The AI video system maintains character and object consistency across different scenes using just one reference image, solving a critical challenge for narrative filmmaking.

MSI Stealth 18 AI gaming laptop gets $800 price cut at Best Buy

The high-end gaming laptop features Intel Ultra 9 and RTX 4080 alongside a high-resolution 18-inch display, positioning it for both gaming and professional creative work.

Google’s Gemini and the hallucination problem plaguing AI assistants

Google's phasing out of its traditional Assistant for Gemini highlights a core challenge: AI that can convincingly present false information is inherently problematic for tasks requiring factual accuracy.