×
More setbacks for NVIDIA as Blackwell chips overheat in servers
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

Increasing adoption of artificial intelligence is creating surging demand for high-performance computing chips, leading to technical challenges as manufacturers push the boundaries of what’s possible.

Critical Development: Nvidia’s next-generation Blackwell GPUs are experiencing overheating issues in server configurations, potentially causing further delays to their planned release.

  • The server racks, designed to connect up to 72 GPUs simultaneously, are creating thermal management challenges that require ongoing redesign efforts
  • This setback could impact the scheduled openings of new data centers for major tech companies including Google, Microsoft, and Meta
  • A previous design flaw had already pushed back the launch from its initial Q2 2024 target

Technical Context: GPU performance and heat generation are intrinsically linked, creating unique challenges for high-density computing environments.

  • GPUs consume substantial energy during operation, with more powerful chips typically generating more heat
  • The cryptocurrency mining industry has faced similar challenges, sometimes employing immersion cooling techniques where hardware is submerged in liquid
  • Nvidia claims the Blackwell chips will be 30 times faster than previous generations, suggesting significantly increased power requirements

Industry Impact: The delays could have cascading effects across the AI industry and its infrastructure.

  • Tech giants are already struggling to secure adequate power supplies for their AI data centers
  • Companies like Meta, Microsoft, and Google have begun exploring nuclear power options to meet growing energy demands
  • Nvidia’s stock has surged over 180% in the past year despite these challenges, while competitor AMD has recently initiated layoffs

Nvidia’s Response: The company maintains that the ongoing engineering changes are part of normal development processes.

  • A company spokesperson told Reuters they are working closely with cloud service providers as part of their engineering process
  • The statement suggests Nvidia is actively working on new server designs to address the thermal management issues
  • The company has not provided updated timeline estimates for the Blackwell GPU release

Broader Energy Implications: The situation highlights growing concerns about AI’s expanding energy footprint and infrastructure requirements.

  • Experts predict possible power shortages for AI data centers as soon as next year
  • The rate of data center construction is outpacing the addition of new power sources to the grid
  • Traditional power purchase agreements may not adequately address the fundamental energy challenges facing the AI industry
Nvidia's Delayed Blackwell AI Chips Overheating in Servers

Recent News

Startup SandboxAQ believes its large quantitative models will boost enterprise AI

Quantitative AI models from companies like SandboxAQ aim to solve complex mathematical problems in drug discovery and finance that language models cannot address.

How edge AI and 5G will power a new generation of Industry 4.0 apps

Industrial facilities are moving critical computing power closer to their operations while building private networks, enabling safer and more automated production environments.

Imbue CEO says these are the keys to building smarter AI agents

AI agents aim to make advanced artificial intelligence as approachable as personal computers, with built-in safeguards to verify their outputs and reasoning.