Microsoft Azure has unveiled the world’s first production-scale NVIDIA GB300 NVL72 supercomputing cluster, specifically designed for OpenAI’s most demanding AI workloads. This milestone represents a significant leap in AI infrastructure capabilities, featuring over 4,600 NVIDIA Blackwell Ultra GPUs that will power next-generation reasoning models and agentic AI systems while reinforcing American leadership in artificial intelligence.
What you should know: The new NDv6 GB300 VM series delivers unprecedented computational power through liquid-cooled, rack-scale systems that integrate cutting-edge hardware into a unified supercomputer.
- Each rack contains 72 NVIDIA Blackwell Ultra GPUs and 36 NVIDIA Grace CPUs, providing 37 terabytes of fast memory and 1.44 exaflops of FP4 Tensor Core performance per VM.
- The cluster connects over 4,600 GPUs using NVIDIA’s Quantum-X800 InfiniBand networking platform, creating a massive unified memory space essential for complex AI model training and inference.
- Microsoft applied “radical engineering to memory and networking” to achieve the massive scale of compute required for high-performance reasoning models.
Performance breakthrough: Recent benchmarks demonstrate the system’s exceptional capabilities, particularly for large language models and reasoning AI.
- In MLPerf Inference v5.1 benchmarks, the GB300 NVL72 systems delivered up to 5x higher throughput per GPU on the 671-billion-parameter DeepSeek-R1 reasoning model compared to NVIDIA’s previous Hopper architecture.
- The platform achieved leadership performance across all newly introduced benchmarks, including the Llama 3.1 405B model.
- NVIDIA Blackwell Ultra supports new formats like NVFP4 for breakthrough training performance and compiler technologies like NVIDIA Dynamo for optimal inference performance.
The networking architecture: A sophisticated two-tiered approach enables both scale-up performance within racks and scale-out performance across the entire cluster.
- Within each GB300 NVL72 rack, fifth-generation NVIDIA NVLink Switch fabric provides 130 TB/s of direct, all-to-all bandwidth between the 72 Blackwell Ultra GPUs.
- The NVIDIA Quantum-X800 InfiniBand platform delivers 800 Gb/s of bandwidth per GPU, featuring ConnectX-8 SuperNICs and advanced adaptive routing capabilities.
- NVIDIA Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) v4 accelerates operations to significantly boost efficiency of large-scale training and inference.
Why this matters: This deployment marks a crucial step toward building the infrastructure needed for frontier AI development and positions the United States at the forefront of AI innovation.
- The achievement required reimagining every layer of Microsoft’s data center infrastructure, from custom liquid cooling and power distribution to reengineered software stacks.
- As Azure scales toward deploying hundreds of thousands of NVIDIA Blackwell Ultra GPUs, this milestone enables customers like OpenAI to access unprecedented computational resources for developing next-generation AI systems.
What they’re saying: Industry leaders emphasize the collaborative engineering effort required to achieve this scale of AI infrastructure.
- “Delivering the industry’s first at-scale NVIDIA GB300 NVL72 production cluster for frontier AI is an achievement that goes beyond powerful silicon — it reflects Microsoft Azure and NVIDIA’s shared commitment to optimize all parts of the modern AI data center,” said Nidhi Chappell, corporate vice president of Microsoft Azure AI Infrastructure.
- “Our collaboration helps ensure customers like OpenAI can deploy next-generation infrastructure at unprecedented scale and speed.”
Microsoft Azure Unveils World’s First NVIDIA GB300 NVL72 Supercomputing Cluster for OpenAI