Nvidia unveils Rubin CPX GPU with 128GB memory for AI inference

Nvidia has unveiled the Rubin CPX GPU, a new compute-focused graphics processor featuring 128GB of GDDR7 memory specifically designed for enterprise AI inference workloads. The announcement positions Nvidia to address the growing demand for long-context AI applications in software development, research, and high-definition video generation, with shipments planned for late 2026.

What you should know: The Rubin CPX represents Nvidia’s first GPU to reach 128GB memory capacity, delivering up to 30 petaFLOPs of NVFP4 compute performance.
• The GPU integrates hardware attention acceleration that Nvidia claims is three times faster than the GB300 NVL72.
• Four NVENC and four NVDEC units are built in to accelerate video workflows.
• This is explicitly not a gaming GPU—it’s engineered purely for compute-intensive inference tasks.

The big picture: Nvidia is implementing a disaggregated inference strategy where different processors handle specific AI workload phases.
• Rubin CPX focuses on the compute-heavy context phase of AI processing.
• Other Rubin GPUs and Vera CPUs handle generation tasks.
• Nvidia’s Dynamo software manages low-latency cache transfers and routing across components behind the scenes.

In plain English: Think of this like a restaurant kitchen where different chefs specialize in different courses. Instead of one chef making the entire meal, Rubin CPX handles the heavy preparation work (understanding context), while other processors focus on the final presentation (generating responses). This specialization makes the entire process faster and more efficient.

Massive deployment scale: The flagship Vera Rubin NVL144 CPX rack represents Nvidia’s largest deployment configuration.
• Each rack integrates 144 Rubin CPX GPUs, 144 standard Rubin GPUs, and 36 Vera CPUs.
• Combined performance delivers 8 exaFLOPs of NVFP4 compute, 100TB of high-speed memory, and 1.7PB/s of memory bandwidth.
• Connectivity comes through Quantum-X800 InfiniBand or Spectrum-X Ethernet with ConnectX-9 SuperNICs.

Timeline and roadmap: Rubin CPX and NVL144 CPX racks are scheduled to ship in late 2026 following recent tape-out at TSMC.
• Rubin Ultra is expected in 2027 with higher density modules.
• Feynman architecture is slated for 2028, featuring HBM4E memory and faster networking.
• The roadmap extends the Rubin architecture with progressive performance improvements.

Why this matters: By concentrating specialized hardware on context processing tasks, Nvidia aims to improve AI inference throughput while reducing deployment costs for high-value enterprise applications that require processing large amounts of contextual information.

Nvidia unveils Rubin CPX GPU with 128GB memory for AI inference

Recent Stories

DOE fusion roadmap targets 2030s commercial deployment as AI drives $9B investment

Tying it all together: Credo’s purple cables power the $4B AI data center boom

Vatican launches Latin American AI network for human development