The development of distributed AI training methods marks a significant shift in how large language models can be created, potentially democratizing access to AI development beyond major tech companies and specialized data centers.
Key breakthrough: Nous Research is pre-training a 15-billion parameter large language model using machines distributed across the internet, departing from traditional centralized data center approaches.
- The training process is being livestreamed on distro.nousresearch.com, showing real-time evaluation benchmarks and hardware locations across the U.S. and Europe
- The project utilizes Nous DisTrO (Distributed Training Over-the-Internet), reducing inter-GPU communication bandwidth requirements by up to 10,000x
- The system can operate on relatively modest internet connections of 100Mbps download and 10Mbps upload speeds
Technical innovation: Nous DisTrO’s efficiency gains represent a fundamental advancement in distributed AI training methods.
- The technology compressed data exchange between GPUs from 74.4 gigabytes to just 86.8 megabytes in tests using Llama 2 architecture
- DisTrO builds upon Decoupled Momentum Optimization (DeMo), an open-source algorithm designed to maintain training performance while reducing inter-GPU communication
- The pre-training process involves hardware contributions from partners including Oracle, Lambda Labs, Northern Data Group, Crusoe Cloud, and the Andromeda Cluster
Industry significance: This development could fundamentally alter the landscape of AI model development.
- The technology enables training of frontier-class LLMs without requiring expensive supercomputer clusters or low latency transmission
- Smaller institutions and independent researchers with consumer-grade internet access could potentially train large models
- Notable AI researcher Diederik P. Kingma, co-inventor of the Adam optimizer, has joined as a collaborator on the project
Current status and implementation: The pre-training process has demonstrated promising initial results.
- As of publication, the training run was over 75% complete with approximately 57 hours remaining
- The project follows Nous Research’s earlier release of Hermes 3, a Meta Llama 3.1 variant
- While currently using high-end Nvidia H100 GPUs, future applications could extend to less specialized hardware
Future implications: The democratization of AI training could reshape the power dynamics in artificial intelligence development.
- The technology opens possibilities for decentralized federated learning and training of various AI models, including image generation
- Questions remain about scalability to less specialized hardware and potential applications beyond language models
- The success of this project could shift AI development away from corporate control toward a more distributed, collaborative ecosystem
Nous Research is training an AI model using machines distributed across the internet