×
Video Thumbnail
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

Open source LLMs go local: maxing out the new RTX 5090

If you’re frustrated with subscription-based AI services or concerned about data privacy, there’s an exciting alternative: running large language models (LLMs) locally on your own computer. In a recent exploration, tech enthusiasts have pushed the boundaries of what’s possible with the new RTX 5090 graphics card, and the results are impressive.

Why run AI models locally?

Running LLMs on your own computer offers several advantages:

  • No subscription fees
  • Complete privacy (no data sent to third parties)
  • 24/7 access without internet connection
  • Freedom from usage restrictions

While open-source models might not match the capabilities of proprietary giants like ChatGPT, they’re surprisingly capable and improving rapidly.

The hardware matters

The RTX 5090’s massive 32GB of VRAM makes it possible to run sophisticated AI models that would choke lesser graphics cards. This demonstration showed how this GPU can handle models of various sizes:

  • DeepSeek R1 7B (used 10GB VRAM)
  • DeepSeek R1 14B (used 18GB VRAM)
  • DeepSeek R1 32B (used 20-32GB VRAM depending on settings)
  • Gemma 2 7B Vision model (used 24.4GB VRAM)
  • Tiny 360M parameter models (barely used 1GB VRAM)

Performance is impressive

The generation speeds, particularly with the smaller models, were remarkable:

  • DeepSeek R1 7B: ~78 tokens per second
  • DeepSeek R1 14B: ~

Recent Videos