×
Meta unveils open-source AI hardware strategy
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

The evolution of Meta’s AI infrastructure: Meta’s journey in scaling its AI capabilities has led to significant advancements in hardware design and infrastructure optimization to support increasingly complex AI models and workloads.

  • Meta has been integrating AI into its core products for years, including features like Feed and its advertising system.
  • The company’s latest AI model, Llama 3.1 405B, boasts 405 billion parameters and required training across more than 16,000 NVIDIA H100 GPUs.
  • Meta’s AI training clusters have rapidly scaled from 128 GPUs to two 24,000-GPU clusters in just over a year, with expectations for continued growth.

Networking challenges and solutions: The scale of Meta’s AI operations necessitates advanced networking solutions to ensure optimal performance and scalability.

  • AI clusters require tightly integrated high-performance computing systems and isolated high-bandwidth compute networks.
  • Meta anticipates needing injection bandwidth of around one terabyte per second per accelerator in the coming years, representing a tenfold increase from current capabilities.
  • To meet these demands, the company is developing a high-performance, multi-tier, non-blocking network fabric with modern congestion control mechanisms.

Open hardware initiatives: Meta is championing open hardware solutions to accelerate AI innovation and foster collaboration within the industry.

  • The company announced Catalina, a new high-powered rack designed for AI workloads, based on the NVIDIA Blackwell platform and capable of supporting up to 140kW of power.
  • Meta has expanded its Grand Teton AI platform to support AMD Instinct MI300X accelerators, offering greater compute capacity and memory for large-scale AI inference workloads.
  • The new Disaggregated Scheduled Fabric (DSF) for next-generation AI clusters aims to overcome limitations in scale, component supply options, and power density.

Collaboration with industry partners: Meta’s partnership with Microsoft and other tech giants is driving open innovation in AI infrastructure.

  • Meta and Microsoft have collaborated on various OCP initiatives, including the Switch Abstraction Interface (SAI) and Open Accelerator Module (OAM) standard.
  • The companies are currently working on Mount Diablo, a new disaggregated power rack featuring a scalable 400 VDC unit for enhanced efficiency and scalability.

The importance of open source in AI development: Meta emphasizes the critical role of open source in advancing AI technology and ensuring its benefits are widely accessible.

  • Open source software frameworks are essential for driving model innovation, ensuring portability, and promoting transparency in AI development.
  • Standardized models help leverage collective expertise, make AI more accessible, and work towards minimizing biases in AI systems.
  • Open AI hardware systems are crucial for delivering high-performance, cost-effective, and adaptable infrastructure necessary for AI advancement.

Looking ahead: Meta’s vision for the future of AI infrastructure emphasizes collaboration and open innovation to unlock the full potential of AI technology.

  • The company encourages engagement with the OCP community to address AI’s infrastructure needs collectively.
  • By fostering an open ecosystem for AI hardware and software development, Meta aims to make the benefits and opportunities of AI accessible to people worldwide.
Meta’s open AI hardware vision

Recent News

Enterprises are failing to keep up with AI governance and regulatory requirements

Amid a $200 billion AI market, half of global companies lack required compliance measures as the EU's landmark regulations loom in 2024.

The Edgelord who wooed Marc Andreessen and then made millions with an automous crypto agent

Experimental chatbot's viral crypto influence grows to $40 million in holdings, sparking unplanned test of AI financial autonomy safeguards.

How to create custom emojis with Apple’s new Genmoji AI tool

Apple's new AI-driven emoji creator allows users to generate custom emojis through text descriptions, but requires latest-gen devices due to processing demands.