×
China-based DeepSeek just released a very powerful ultra large AI model
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

DeepSeek, a Chinese AI startup, has released DeepSeek-V3, a new ultra-large AI model with 671B parameters that outperforms leading open-source competitors while approaching the capabilities of prominent closed-source models.

Key innovations: DeepSeek-V3 employs a mixture-of-experts architecture that selectively activates only 37B of its 671B parameters for each task, enabling efficient processing while maintaining high performance.

  • The model introduces an auxiliary loss-free load-balancing strategy that optimizes expert utilization without compromising performance
  • A new multi-token prediction feature allows the model to generate 60 tokens per second, three times faster than previous versions
  • The system uses multi-head latent attention (MLA) and DeepSeekMoE architectures for efficient training and inference

Technical specifications: The model underwent extensive training on 14.8T high-quality tokens and features significant context length capabilities.

  • DeepSeek-V3’s context length was extended in two stages, first to 32K and then to 128K
  • The training process included supervised fine-tuning and reinforcement learning to align with human preferences
  • The company implemented various optimizations, including FP8 mixed precision training and the DualPipe algorithm

Cost efficiency: DeepSeek achieved remarkable cost savings in the training process compared to industry standards.

  • The entire training process required approximately 2788K H800 GPU hours, costing about $5.57 million
  • This represents a significant reduction from typical training costs, such as the estimated $500 million spent on Llama-3.1

Performance benchmarks: DeepSeek-V3 demonstrates superior performance across multiple evaluation metrics.

  • The model outperforms open-source competitors like Llama-3.1-405B and Qwen 2.5-72B
  • It shows particular strength in Chinese language and mathematical tasks, scoring 90.2 on the Math-500 test
  • While matching or exceeding GPT-4o in most areas, it falls behind in specific English-focused tests like SimpleQA and FRAMES

Accessibility and pricing: The model is available through multiple channels with competitive pricing structure.

  • The code is accessible via GitHub under an MIT license
  • Users can access the model through DeepSeek Chat or via API for commercial applications
  • API pricing is set at $0.27/million input tokens and $1.10/million output tokens after February 8

Market implications: The emergence of DeepSeek-V3 signals a significant shift in the competitive landscape between open-source and closed-source AI models, potentially democratizing access to advanced AI capabilities while challenging the dominance of established players in the field.

DeepSeek-V3, ultra-large open-source AI, outperforms Llama and Qwen on launch

Recent News

China-based DeepSeek just released a very powerful ultra large AI model

Chinese startup achieves comparable performance to GPT-4 while cutting typical training costs by 99% through an innovative parameter activation approach.

7 practical tips and tools for using AI to improve your relationships

AI tools offer relationship support through structured communication guidance and conflict management, but experts emphasize they should complement rather than replace human interaction.

How AI-powered tsunami prediction will save lives in future disasters

Emergency response teams are leveraging AI systems to cut tsunami warning times from hours to minutes while improving evacuation planning and damage assessment.