AI Enthusiast Builds 192GB VRAM Server for Llama-3.1 in Basement

Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage

Join Now

Cutting-edge AI infrastructure: A tech enthusiast has built a powerful LLM server in their basement, featuring 8 RTX 3090 GPUs with a total of 192GB VRAM, designed to run Meta’s Llama-3.1 405B model.

The project was motivated by the builder’s need for more VRAM capacity than their previous 48GB setup, which had become insufficient for their LLM experiments.
The custom-built server represents a significant investment in high-end hardware, reflecting the growing demand for powerful computing resources in AI research and development.

Key components and specifications: The LLM server boasts impressive hardware specifications, carefully selected to maximize performance and capability for running large language models.

The system is built around an Asrock Rack ROMED8-2T motherboard, offering 7 PCIe 4.0×16 slots and 128 PCIe lanes.
An AMD Epyc Milan 7713 CPU (64 cores/128 threads) provides the necessary processing power.
The server includes 512GB of DDR4-3200 3DS RDIMM memory for handling large datasets and model parameters.
Power is supplied by three 1600-watt power supply units to meet the high energy demands of the system.
The centerpiece is a set of 8 RTX 3090 GPUs, interconnected with 4 NVLinks, enabling data transfer rates of 112GB/s between each pair.

Challenges and learning experiences: The project involved overcoming various technical hurdles and gaining insights into advanced computing concepts.

The builder faced physical challenges such as drilling holes in metal frames and adding high-amperage electrical circuits to support the system’s power requirements.
They learned about the limitations of PCIe risers and the importance of using specialized components like SAS Device Adapters, Redrivers, and Retimers for stable PCIe connections.
The project provided hands-on experience with concepts such as NVLink speeds, PCIe bandwidth, and VRAM transfer rates.

Future content and knowledge sharing: The builder plans to document their experience and insights in a series of blog posts, covering various aspects of the project.

Upcoming posts will detail the assembly process, hardware selection rationale, and potential pitfalls to avoid.
The series will explore different inference engines supporting Tensor Parallelism, including TensorRT-LLM, vLLM, and Aphrodite Engine.
Guides on training and fine-tuning custom LLMs will be shared, making the knowledge accessible to other AI enthusiasts and researchers.

Reflections on technological progress: The project has prompted the builder to contemplate the rapid advancement of technology over the past two decades.

They draw a comparison between their excitement over a 60GB HDD in 2004 and the current system’s 192GB of VRAM, highlighting the exponential growth in computing capabilities.
This reflection underscores the motivation behind the project: contributing to the development of future technologies and inspiring others in the field.

Looking ahead: The basement LLM server project serves as a testament to the democratization of AI research and the potential for individual contributions to the field.

By sharing their experience and insights, the builder aims to lower the barriers to entry for others interested in experimenting with large language models.
The project raises questions about the future of AI infrastructure and the potential for even more powerful systems in the coming decades.

Serving AI From The Basement

Osman's Odyssey: Byte & Build