×
AI chip startup Cerebras outperforms NVIDIA’s Blackwell in Llama 4 test
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

Cerebras has achieved a groundbreaking milestone in AI inference performance, establishing a new world record for processing speed with Meta’s flagship large language model. By delivering over 2,500 tokens per second on the massive 400B parameter Llama 4 Maverick model, Cerebras has demonstrated that specialized AI hardware can significantly outpace even the most advanced GPU solutions, reshaping performance expectations for enterprise AI deployments.

The big picture: Cerebras has set a world record for LLM inference speed, achieving over 2,500 tokens per second with Meta’s 400B parameter Llama 4 Maverick model.

  • Independent benchmark firm Artificial Analysis measured Cerebras at 2,522 tokens per second per user, more than doubling the 1,038 tokens per second recently announced by NVIDIA’s flagship Blackwell GPUs.
  • This performance milestone makes Cerebras the first and only inference solution to break the 2,500 TPS barrier with the largest and most powerful model in the Llama 4 family.

Why this matters: Inference speed is critical for advanced AI applications, particularly for agents, code generation, and complex reasoning tasks where responsiveness determines usability.

By the numbers: Artificial Analysis tested multiple vendors against the same model, revealing a significant performance gap:

  • Cerebras: 2,522 tokens per second
  • NVIDIA Blackwell: 1,038 tokens per second
  • SambaNova: 794 tokens per second
  • Groq: 549 tokens per second
  • Amazon: 290 tokens per second
  • Google: 125 tokens per second
  • Microsoft Azure: 54 tokens per second

What they’re saying: “Cerebras is the only inference solution that outperforms Blackwell for Meta’s flagship model,” said Micah Hill-Smith, Co-Founder and CEO of Artificial Analysis.

Competitive advantage: Cerebras claims its solution is optimized specifically for Llama 4 and is available immediately without requiring special software optimizations.

Cerebras beats NVIDIA Blackwell: Llama 4 Maverick Inference

Recent News

AI language models explained: How ChatGPT generates responses

The sophisticated prediction mechanism behind ChatGPT processes text token by token, revealing both the power and limitations of AI's pattern-matching approach to language.

AI’s impact on jobs sparks concern from Rep. Amodei

AI industry insider warns that automation could displace 10-20% of jobs, challenging the more optimistic view that workers will simply adapt to new roles.

TikTokers pose as AI creations in viral Veo 3 trend

Real people are deliberately adopting the aesthetic quirks of AI-generated videos to gain viewers, blurring the boundaries between authentic and synthetic content.