back
Get SIGNAL/NOISE in your inbox daily

Answer.AI has released an open-source system that enables training 70-billion parameter language models on consumer gaming GPUs for the first time. The breakthrough combines FSDP (Fully Sharded Data Parallel) and QLoRA techniques, making it possible to train massive AI models on two 24GB RTX 3090 or 4090 graphics cards—hardware costing under $10,000 compared to hundreds of thousands for data center equipment.

The big picture: This development democratizes large language model training by making it accessible to individual researchers, small labs, and the broader open-source community rather than limiting it to well-funded tech companies with expensive data center hardware.

Why this matters: Gaming GPUs offer similar performance to data center cards that cost 10x more, but have been unusable for training large models due to memory constraints—until now.

  • Data center GPUs like H100s have 80GB RAM and cost around $40,000 each, while gaming cards max out at 24GB but cost a fraction of the price.
  • The system addresses a key barrier that has kept high-quality model creation largely inaccessible to most people.
  • “With this capability we can take huge models to new heights locally, and gigantic, hundreds of billions of parameter models are now accessible by small labs,” said Teknium, creator of the popular OpenHermes models.

How the technology works: The system cleverly combines two existing techniques to overcome memory limitations while maintaining training efficiency.

  • QLoRA uses 4-bit quantization to compress model weights from 16 bits to 4 bits, reducing a 70B model from 140GB to 35GB.
  • FSDP shards the compressed model across multiple GPUs, allowing parallel processing instead of sequential layer-by-layer computation.
  • Additional optimizations include gradient checkpointing, CPU offloading, and Flash Attention 2 to maximize memory efficiency.

In plain English: Think of training an AI model like trying to fit a massive library into your house. Normally, you’d need an enormous mansion (expensive data center hardware) to store all the books. This new system is like having a super-efficient compression and organization method that lets you fit the same library across multiple regular rooms (gaming GPUs) in a standard house, with all rooms working together simultaneously rather than one at a time.

Technical breakthrough details: Answer.AI’s team solved several complex integration challenges that prevented these techniques from working together effectively.

  • They developed a method to store quantized parameters in the same data type as the model’s computation type.
  • The team fixed FSDP’s inability to sync quantization metadata between GPUs by quantizing models on each GPU individually.
  • They created layer-by-layer loading and quantization to avoid needing the entire model on a single GPU during setup.

Key collaborations: The project represents a partnership between Answer.AI, University of Washington’s Tim Dettmers (QLoRA creator), and Hugging Face engineers Titus von Koeller and Sourab Mangrulkar.

  • The work builds on Meta’s FSDP library and integrates with Hugging Face’s ecosystem including PEFT, Transformers, and bitsandbytes.
  • Support is already being integrated into major open-source tools like Accelerate, TRL, and the Axolotl finetuning library.

What makes Answer.AI different: The organization positioned itself uniquely to solve this problem that academia, big tech, and startups had reasons to avoid.

  • Academic researchers struggle to justify combining existing tools rather than creating novel research for publication.
  • Big tech companies already own expensive hardware and don’t need consumer GPU solutions.
  • Startups face investor pressure to focus on short-term gains rather than open-source public research.
  • Answer.AI operates as a public benefit company with a charter to produce long-term AI benefits through open-source work.

Getting started: Users need multiple GPUs and can rent dual 3090 systems from cloud providers for around $0.60/hour if they don’t own the hardware.

  • The system requires installing the latest versions of Transformers, PEFT, and bitsandbytes.
  • Training a llama2-7b model on dual 24GB cards uses the command: python train.py --train_type qlora --dataset alpaca --batch_size 8 --gradient_accumulation_steps 2 --output_dir qlora_output --log_to wandb
  • The team describes this as an alpha/preview release with more optimizations and benchmarking guidance coming soon.

Future implications: This represents just the first step toward making AI model creation more accessible, with the team planning additional improvements and expecting community contributions to further reduce training costs.

Recent Stories

Oct 17, 2025

DOE fusion roadmap targets 2030s commercial deployment as AI drives $9B investment

The Department of Energy has released a new roadmap targeting commercial-scale fusion power deployment by the mid-2030s, though the plan lacks specific funding commitments and relies on scientific breakthroughs that have eluded researchers for decades. The strategy emphasizes public-private partnerships and positions AI as both a research tool and motivation for developing fusion energy to meet data centers' growing electricity demands. The big picture: The DOE's roadmap aims to "deliver the public infrastructure that supports the fusion private sector scale up in the 2030s," but acknowledges it cannot commit to specific funding levels and remains subject to Congressional appropriations. Why...

Oct 17, 2025

Tying it all together: Credo’s purple cables power the $4B AI data center boom

Credo, a Silicon Valley semiconductor company specializing in data center cables and chips, has seen its stock price more than double this year to $143.61, following a 245% surge in 2024. The company's signature purple cables, which cost between $300-$500 each, have become essential infrastructure for AI data centers, positioning Credo to capitalize on the trillion-dollar AI infrastructure expansion as hyperscalers like Amazon, Microsoft, and Elon Musk's xAI rapidly build out massive computing facilities. What you should know: Credo's active electrical cables (AECs) are becoming indispensable for connecting the massive GPU clusters required for AI training and inference. The company...

Oct 17, 2025

Vatican launches Latin American AI network for human development

The Vatican hosted a two-day conference bringing together 50 global experts to explore how artificial intelligence can advance peace, social justice, and human development. The event launched the Latin American AI Network for Integral Human Development and established principles for ethical AI governance that prioritize human dignity over technological advancement. What you should know: The Pontifical Academy of Social Sciences, the Vatican's research body for social issues, organized the "Digital Rerum Novarum" conference on October 16-17, combining academic research with practical AI applications. Participants included leading experts from MIT, Microsoft, Columbia University, the UN, and major European institutions. The conference...