Cutting-edge AI infrastructure: A tech enthusiast has built a powerful LLM server in their basement, featuring 8 RTX 3090 GPUs with a total of 192GB VRAM, designed to run Meta’s Llama-3.1 405B model.
- The project was motivated by the builder’s need for more VRAM capacity than their previous 48GB setup, which had become insufficient for their LLM experiments.
- The custom-built server represents a significant investment in high-end hardware, reflecting the growing demand for powerful computing resources in AI research and development.
Key components and specifications: The LLM server boasts impressive hardware specifications, carefully selected to maximize performance and capability for running large language models.
- The system is built around an Asrock Rack ROMED8-2T motherboard, offering 7 PCIe 4.0×16 slots and 128 PCIe lanes.
- An AMD Epyc Milan 7713 CPU (64 cores/128 threads) provides the necessary processing power.
- The server includes 512GB of DDR4-3200 3DS RDIMM memory for handling large datasets and model parameters.
- Power is supplied by three 1600-watt power supply units to meet the high energy demands of the system.
- The centerpiece is a set of 8 RTX 3090 GPUs, interconnected with 4 NVLinks, enabling data transfer rates of 112GB/s between each pair.
Challenges and learning experiences: The project involved overcoming various technical hurdles and gaining insights into advanced computing concepts.
- The builder faced physical challenges such as drilling holes in metal frames and adding high-amperage electrical circuits to support the system’s power requirements.
- They learned about the limitations of PCIe risers and the importance of using specialized components like SAS Device Adapters, Redrivers, and Retimers for stable PCIe connections.
- The project provided hands-on experience with concepts such as NVLink speeds, PCIe bandwidth, and VRAM transfer rates.
Future content and knowledge sharing: The builder plans to document their experience and insights in a series of blog posts, covering various aspects of the project.
- Upcoming posts will detail the assembly process, hardware selection rationale, and potential pitfalls to avoid.
- The series will explore different inference engines supporting Tensor Parallelism, including TensorRT-LLM, vLLM, and Aphrodite Engine.
- Guides on training and fine-tuning custom LLMs will be shared, making the knowledge accessible to other AI enthusiasts and researchers.
Reflections on technological progress: The project has prompted the builder to contemplate the rapid advancement of technology over the past two decades.
- They draw a comparison between their excitement over a 60GB HDD in 2004 and the current system’s 192GB of VRAM, highlighting the exponential growth in computing capabilities.
- This reflection underscores the motivation behind the project: contributing to the development of future technologies and inspiring others in the field.
Looking ahead: The basement LLM server project serves as a testament to the democratization of AI research and the potential for individual contributions to the field.
- By sharing their experience and insights, the builder aims to lower the barriers to entry for others interested in experimenting with large language models.
- The project raises questions about the future of AI infrastructure and the potential for even more powerful systems in the coming decades.
Serving AI From The Basement