back
Get SIGNAL/NOISE in your inbox daily

Huawei unveiled CloudMatrix-Infer, a massive AI inference system featuring 384 Ascend 910C NPUs and 192 Kunpeng CPUs interconnected through a 2.8Tbps Unified Bus architecture. The system represents Huawei’s most ambitious challenge to Nvidia’s AI infrastructure dominance, offering superior throughput performance while demonstrating the company’s complete vertical AI stack from silicon to software.

What you should know: CloudMatrix-Infer delivers impressive performance metrics that outpace Nvidia’s H100 chips on similar workloads.

  • The system achieves 6,688 tokens/sec prefill and 1,943 decode/sec per NPU, surpassing Nvidia H100 performance on comparable tasks.
  • Huawei compensates for individual chip limitations by networking five times more processors than typical Nvidia configurations, turning quantity into a performance advantage.
  • The architecture supports advanced features like disaggregated prefill-decode-caching, INT8 quantization, and microbatch pipelining optimized for large-scale inference.

The big picture: Huawei’s vertical integration strategy positions the company as a credible alternative to Nvidia’s CUDA ecosystem, particularly for massive AI model deployment.

  • The company’s CANN (Compute Architecture for Neural Networks) stack has matured to version 7.0, mirroring CUDA’s layered structure across driver, runtime, and libraries.
  • CloudMatrix demonstrates system-level optimization specifically designed for next-generation inference workloads, especially models exceeding 700B parameters.
  • The hybrid design pairs Kunpeng CPUs with Ascend NPUs to handle control-plane tasks like distributed resource scheduling and fault recovery, preventing non-AI workloads from bottlenecking performance.

Technical advantages: The Unified Bus architecture represents a fundamental departure from Nvidia’s NVLink approach, implementing an all-to-all interconnect topology.

  • Huawei uses 6,912 optical modules to create a “flat” network connecting all 384 NPUs and 192 CPUs across 16 racks, eliminating hierarchical communication hops.
  • CANN supports massive-scale expert parallelism (EP320), allowing one expert per NPU die—a capability that even CUDA ecosystems struggle to achieve.
  • The architecture is purpose-built for massive Mixture of Experts (MoE) models with bandwidth-first design principles.

In plain English: Think of traditional AI chip networks like a corporate hierarchy where messages must travel up and down through managers. Huawei’s Unified Bus creates a flat organization where every chip can talk directly to every other chip, eliminating the bottlenecks. This is particularly useful for massive AI models that split their “expertise” across many different specialized components—like having a team of specialists who can all communicate instantly rather than passing notes through a chain of command.

Remaining challenges: Despite impressive performance gains, Huawei faces significant hurdles in competing with Nvidia’s established ecosystem.

  • CloudMatrix consumes 3.9× more power than Nvidia’s GB200, creating viability concerns in regions with expensive energy or strict carbon targets.
  • CUDA maintains advantages in ecosystem maturity, documentation quality, third-party library support, and global developer community size.
  • Code migration from CUDA to CANN still requires significant effort, despite existing PyTorch and TensorFlow adapters.
  • Community trust and familiarity continue to favor Nvidia, particularly outside Asia, with some organizations legally restricted from using Huawei technology.

Why this matters: The development signals intensifying competition in AI infrastructure as geopolitical tensions drive demand for alternatives to Nvidia’s dominance.

  • Chinese organizations increasingly view Huawei as the de facto choice for AI infrastructure, while Western markets remain largely committed to Nvidia.
  • The system’s optimization for massive MoE models positions Huawei to compete in the next generation of AI applications requiring unprecedented computational scale.
  • For regions where CloudMatrix is available, the architecture presents a viable option for deploying 700B+ parameter models, potentially reshaping AI infrastructure decision-making.

What the experts say: Forrester analysts emphasize that winning the AI race requires more than just faster chips.

  • “Huawei isn’t just catching up with its CANN stack and the new CloudMatrix architecture, it’s redefining how AI infrastructure works,” the analysts noted.
  • “As someone who’s built applications (in my past life), I’d rely on CUDA’s mature, frictionless ecosystem,” they acknowledged, highlighting the practical challenges facing Huawei’s adoption.
  • The analysts concluded that diversifying AI infrastructure “is no longer optional — it’s a strategic imperative” given ongoing geopolitical uncertainty.

Recent Stories

Oct 17, 2025

DOE fusion roadmap targets 2030s commercial deployment as AI drives $9B investment

The Department of Energy has released a new roadmap targeting commercial-scale fusion power deployment by the mid-2030s, though the plan lacks specific funding commitments and relies on scientific breakthroughs that have eluded researchers for decades. The strategy emphasizes public-private partnerships and positions AI as both a research tool and motivation for developing fusion energy to meet data centers' growing electricity demands. The big picture: The DOE's roadmap aims to "deliver the public infrastructure that supports the fusion private sector scale up in the 2030s," but acknowledges it cannot commit to specific funding levels and remains subject to Congressional appropriations. Why...

Oct 17, 2025

Tying it all together: Credo’s purple cables power the $4B AI data center boom

Credo, a Silicon Valley semiconductor company specializing in data center cables and chips, has seen its stock price more than double this year to $143.61, following a 245% surge in 2024. The company's signature purple cables, which cost between $300-$500 each, have become essential infrastructure for AI data centers, positioning Credo to capitalize on the trillion-dollar AI infrastructure expansion as hyperscalers like Amazon, Microsoft, and Elon Musk's xAI rapidly build out massive computing facilities. What you should know: Credo's active electrical cables (AECs) are becoming indispensable for connecting the massive GPU clusters required for AI training and inference. The company...

Oct 17, 2025

Vatican launches Latin American AI network for human development

The Vatican hosted a two-day conference bringing together 50 global experts to explore how artificial intelligence can advance peace, social justice, and human development. The event launched the Latin American AI Network for Integral Human Development and established principles for ethical AI governance that prioritize human dignity over technological advancement. What you should know: The Pontifical Academy of Social Sciences, the Vatican's research body for social issues, organized the "Digital Rerum Novarum" conference on October 16-17, combining academic research with practical AI applications. Participants included leading experts from MIT, Microsoft, Columbia University, the UN, and major European institutions. The conference...