×
Meta unveils open-source AI hardware strategy
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

The evolution of Meta’s AI infrastructure: Meta’s journey in scaling its AI capabilities has led to significant advancements in hardware design and infrastructure optimization to support increasingly complex AI models and workloads.

  • Meta has been integrating AI into its core products for years, including features like Feed and its advertising system.
  • The company’s latest AI model, Llama 3.1 405B, boasts 405 billion parameters and required training across more than 16,000 NVIDIA H100 GPUs.
  • Meta’s AI training clusters have rapidly scaled from 128 GPUs to two 24,000-GPU clusters in just over a year, with expectations for continued growth.

Networking challenges and solutions: The scale of Meta’s AI operations necessitates advanced networking solutions to ensure optimal performance and scalability.

  • AI clusters require tightly integrated high-performance computing systems and isolated high-bandwidth compute networks.
  • Meta anticipates needing injection bandwidth of around one terabyte per second per accelerator in the coming years, representing a tenfold increase from current capabilities.
  • To meet these demands, the company is developing a high-performance, multi-tier, non-blocking network fabric with modern congestion control mechanisms.

Open hardware initiatives: Meta is championing open hardware solutions to accelerate AI innovation and foster collaboration within the industry.

  • The company announced Catalina, a new high-powered rack designed for AI workloads, based on the NVIDIA Blackwell platform and capable of supporting up to 140kW of power.
  • Meta has expanded its Grand Teton AI platform to support AMD Instinct MI300X accelerators, offering greater compute capacity and memory for large-scale AI inference workloads.
  • The new Disaggregated Scheduled Fabric (DSF) for next-generation AI clusters aims to overcome limitations in scale, component supply options, and power density.

Collaboration with industry partners: Meta’s partnership with Microsoft and other tech giants is driving open innovation in AI infrastructure.

  • Meta and Microsoft have collaborated on various OCP initiatives, including the Switch Abstraction Interface (SAI) and Open Accelerator Module (OAM) standard.
  • The companies are currently working on Mount Diablo, a new disaggregated power rack featuring a scalable 400 VDC unit for enhanced efficiency and scalability.

The importance of open source in AI development: Meta emphasizes the critical role of open source in advancing AI technology and ensuring its benefits are widely accessible.

  • Open source software frameworks are essential for driving model innovation, ensuring portability, and promoting transparency in AI development.
  • Standardized models help leverage collective expertise, make AI more accessible, and work towards minimizing biases in AI systems.
  • Open AI hardware systems are crucial for delivering high-performance, cost-effective, and adaptable infrastructure necessary for AI advancement.

Looking ahead: Meta’s vision for the future of AI infrastructure emphasizes collaboration and open innovation to unlock the full potential of AI technology.

  • The company encourages engagement with the OCP community to address AI’s infrastructure needs collectively.
  • By fostering an open ecosystem for AI hardware and software development, Meta aims to make the benefits and opportunities of AI accessible to people worldwide.
Meta’s open AI hardware vision

Recent News

Social network Bluesky says it won’t train AI on user posts

As social media platforms debate AI training practices, Bluesky stakes out a pro-creator stance by pledging not to use user content for generative AI.

New research explores how cutting-edge AI may advance quantum computing

AI is being leveraged to address key challenges in quantum computing, from hardware design to error correction.

Navigating the ethical minefield of AI-powered customer segmentation

AI-driven customer segmentation provides deeper insights into consumer behavior, but raises concerns about privacy and potential bias.