×
AI Enthusiast Builds 192GB VRAM Server for Llama-3.1 in Basement
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

Cutting-edge AI infrastructure: A tech enthusiast has built a powerful LLM server in their basement, featuring 8 RTX 3090 GPUs with a total of 192GB VRAM, designed to run Meta’s Llama-3.1 405B model.

  • The project was motivated by the builder’s need for more VRAM capacity than their previous 48GB setup, which had become insufficient for their LLM experiments.
  • The custom-built server represents a significant investment in high-end hardware, reflecting the growing demand for powerful computing resources in AI research and development.

Key components and specifications: The LLM server boasts impressive hardware specifications, carefully selected to maximize performance and capability for running large language models.

  • The system is built around an Asrock Rack ROMED8-2T motherboard, offering 7 PCIe 4.0×16 slots and 128 PCIe lanes.
  • An AMD Epyc Milan 7713 CPU (64 cores/128 threads) provides the necessary processing power.
  • The server includes 512GB of DDR4-3200 3DS RDIMM memory for handling large datasets and model parameters.
  • Power is supplied by three 1600-watt power supply units to meet the high energy demands of the system.
  • The centerpiece is a set of 8 RTX 3090 GPUs, interconnected with 4 NVLinks, enabling data transfer rates of 112GB/s between each pair.

Challenges and learning experiences: The project involved overcoming various technical hurdles and gaining insights into advanced computing concepts.

  • The builder faced physical challenges such as drilling holes in metal frames and adding high-amperage electrical circuits to support the system’s power requirements.
  • They learned about the limitations of PCIe risers and the importance of using specialized components like SAS Device Adapters, Redrivers, and Retimers for stable PCIe connections.
  • The project provided hands-on experience with concepts such as NVLink speeds, PCIe bandwidth, and VRAM transfer rates.

Future content and knowledge sharing: The builder plans to document their experience and insights in a series of blog posts, covering various aspects of the project.

  • Upcoming posts will detail the assembly process, hardware selection rationale, and potential pitfalls to avoid.
  • The series will explore different inference engines supporting Tensor Parallelism, including TensorRT-LLM, vLLM, and Aphrodite Engine.
  • Guides on training and fine-tuning custom LLMs will be shared, making the knowledge accessible to other AI enthusiasts and researchers.

Reflections on technological progress: The project has prompted the builder to contemplate the rapid advancement of technology over the past two decades.

  • They draw a comparison between their excitement over a 60GB HDD in 2004 and the current system’s 192GB of VRAM, highlighting the exponential growth in computing capabilities.
  • This reflection underscores the motivation behind the project: contributing to the development of future technologies and inspiring others in the field.

Looking ahead: The basement LLM server project serves as a testament to the democratization of AI research and the potential for individual contributions to the field.

  • By sharing their experience and insights, the builder aims to lower the barriers to entry for others interested in experimenting with large language models.
  • The project raises questions about the future of AI infrastructure and the potential for even more powerful systems in the coming decades.
Serving AI From The Basement

Recent News

6 places where Google’s Gemini AI should be but isn’t

Despite impressive expansion, Gemini misses crucial opportunities where users need AI assistance most.

How to protect your portfolio from a potential AI bubble burst

Even AI champions like Altman and Zuckerberg are whispering about bubble risks.