×
DeepSeek launches reasoning AI models with reinforcement learning breakthroughs
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

DeepSeek has released its first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1, along with six distilled variants, offering new approaches to AI reasoning capabilities through reinforcement learning.

Key innovations: DeepSeek-R1-Zero represents a breakthrough in AI development by achieving strong reasoning capabilities through pure reinforcement learning, without requiring supervised fine-tuning.

  • The model demonstrates advanced capabilities including self-verification, reflection, and generating complex chains of thought
  • Despite its achievements, DeepSeek-R1-Zero faces challenges with repetition, readability, and language mixing
  • To address these limitations, researchers developed DeepSeek-R1, which incorporates initial training data before reinforcement learning

Technical specifications: The DeepSeek-R1 series comprises multiple models with varying parameters and capabilities, built on different base architectures.

  • The flagship models, DeepSeek-R1-Zero and DeepSeek-R1, each feature 671B total parameters with 37B activated parameters
  • Both models support a context length of 128K tokens
  • The distilled variants range from 1.5B to 70B parameters, based on either Qwen or Llama architectures

Practical applications: DeepSeek has made these models accessible through multiple channels for both research and commercial use.

  • Users can interact with DeepSeek-R1 through the official chat website (chat.deepseek.com) using the “DeepThink” feature
  • An OpenAI-compatible API is available through the DeepSeek Platform
  • Local deployment options are supported, with recommended temperature settings between 0.5 and 0.7 for optimal performance

Accessibility and licensing: The models are released under permissive licensing terms that encourage both academic research and commercial applications.

  • All models are available through HuggingFace, with MIT License coverage for the core repository
  • The distilled variants inherit licensing terms from their base models (Apache 2.0 for Qwen-based models, specific licenses for Llama-based versions)
  • Commercial use, modifications, and derivative works are explicitly permitted

Looking ahead: The success of DeepSeek’s pure reinforcement learning approach opens new possibilities for AI model development, while the effectiveness of model distillation demonstrates that smaller models can achieve impressive reasoning capabilities when trained on data from larger models. These developments could significantly influence future AI research and development strategies.

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B · Hugging Face

Recent News

North Korea unveils AI-equipped suicide drones amid deepening Russia ties

North Korea's AI-equipped suicide drones reflect growing technological cooperation with Russia, potentially destabilizing security in an already tense Korean peninsula.

Rookie mistake: Police recruit fired for using ChatGPT on academy essay finds second chance

A promising police career was derailed then revived after an officer's use of AI revealed gaps in how law enforcement is adapting to new technology.

Auburn University launches AI-focused cybersecurity center to counter emerging threats

Auburn's new center brings together experts from multiple disciplines to develop defensive strategies against the rising tide of AI-powered cyber threats affecting 78 percent of security officers surveyed.