×
Meta unveils open-source AI hardware strategy
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

The evolution of Meta’s AI infrastructure: Meta’s journey in scaling its AI capabilities has led to significant advancements in hardware design and infrastructure optimization to support increasingly complex AI models and workloads.

  • Meta has been integrating AI into its core products for years, including features like Feed and its advertising system.
  • The company’s latest AI model, Llama 3.1 405B, boasts 405 billion parameters and required training across more than 16,000 NVIDIA H100 GPUs.
  • Meta’s AI training clusters have rapidly scaled from 128 GPUs to two 24,000-GPU clusters in just over a year, with expectations for continued growth.

Networking challenges and solutions: The scale of Meta’s AI operations necessitates advanced networking solutions to ensure optimal performance and scalability.

  • AI clusters require tightly integrated high-performance computing systems and isolated high-bandwidth compute networks.
  • Meta anticipates needing injection bandwidth of around one terabyte per second per accelerator in the coming years, representing a tenfold increase from current capabilities.
  • To meet these demands, the company is developing a high-performance, multi-tier, non-blocking network fabric with modern congestion control mechanisms.

Open hardware initiatives: Meta is championing open hardware solutions to accelerate AI innovation and foster collaboration within the industry.

  • The company announced Catalina, a new high-powered rack designed for AI workloads, based on the NVIDIA Blackwell platform and capable of supporting up to 140kW of power.
  • Meta has expanded its Grand Teton AI platform to support AMD Instinct MI300X accelerators, offering greater compute capacity and memory for large-scale AI inference workloads.
  • The new Disaggregated Scheduled Fabric (DSF) for next-generation AI clusters aims to overcome limitations in scale, component supply options, and power density.

Collaboration with industry partners: Meta’s partnership with Microsoft and other tech giants is driving open innovation in AI infrastructure.

  • Meta and Microsoft have collaborated on various OCP initiatives, including the Switch Abstraction Interface (SAI) and Open Accelerator Module (OAM) standard.
  • The companies are currently working on Mount Diablo, a new disaggregated power rack featuring a scalable 400 VDC unit for enhanced efficiency and scalability.

The importance of open source in AI development: Meta emphasizes the critical role of open source in advancing AI technology and ensuring its benefits are widely accessible.

  • Open source software frameworks are essential for driving model innovation, ensuring portability, and promoting transparency in AI development.
  • Standardized models help leverage collective expertise, make AI more accessible, and work towards minimizing biases in AI systems.
  • Open AI hardware systems are crucial for delivering high-performance, cost-effective, and adaptable infrastructure necessary for AI advancement.

Looking ahead: Meta’s vision for the future of AI infrastructure emphasizes collaboration and open innovation to unlock the full potential of AI technology.

  • The company encourages engagement with the OCP community to address AI’s infrastructure needs collectively.
  • By fostering an open ecosystem for AI hardware and software development, Meta aims to make the benefits and opportunities of AI accessible to people worldwide.
Meta’s open AI hardware vision

Recent News

How AI will change scientific research forever

AI is transforming scientific research from broad theories into data-driven insights that can predict individual outcomes with greater precision.

Nielsen: Consumers find AI-generated advertising annoying, boring and confusing

As AI-produced ads struggle to connect with viewers, studies reveal consumers find them less engaging and authentic than traditional commercials.

Microsoft buys 2X more Nvidia AI chips than its competitors

Major tech companies pour billions into AI hardware as Microsoft leads with nearly half a million Nvidia chip purchases, signaling an unprecedented arms race for computing power.