×
China-based DeepSeek just released a very powerful ultra large AI model
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

DeepSeek, a Chinese AI startup, has released DeepSeek-V3, a new ultra-large AI model with 671B parameters that outperforms leading open-source competitors while approaching the capabilities of prominent closed-source models.

Key innovations: DeepSeek-V3 employs a mixture-of-experts architecture that selectively activates only 37B of its 671B parameters for each task, enabling efficient processing while maintaining high performance.

  • The model introduces an auxiliary loss-free load-balancing strategy that optimizes expert utilization without compromising performance
  • A new multi-token prediction feature allows the model to generate 60 tokens per second, three times faster than previous versions
  • The system uses multi-head latent attention (MLA) and DeepSeekMoE architectures for efficient training and inference

Technical specifications: The model underwent extensive training on 14.8T high-quality tokens and features significant context length capabilities.

  • DeepSeek-V3’s context length was extended in two stages, first to 32K and then to 128K
  • The training process included supervised fine-tuning and reinforcement learning to align with human preferences
  • The company implemented various optimizations, including FP8 mixed precision training and the DualPipe algorithm

Cost efficiency: DeepSeek achieved remarkable cost savings in the training process compared to industry standards.

  • The entire training process required approximately 2788K H800 GPU hours, costing about $5.57 million
  • This represents a significant reduction from typical training costs, such as the estimated $500 million spent on Llama-3.1

Performance benchmarks: DeepSeek-V3 demonstrates superior performance across multiple evaluation metrics.

  • The model outperforms open-source competitors like Llama-3.1-405B and Qwen 2.5-72B
  • It shows particular strength in Chinese language and mathematical tasks, scoring 90.2 on the Math-500 test
  • While matching or exceeding GPT-4o in most areas, it falls behind in specific English-focused tests like SimpleQA and FRAMES

Accessibility and pricing: The model is available through multiple channels with competitive pricing structure.

  • The code is accessible via GitHub under an MIT license
  • Users can access the model through DeepSeek Chat or via API for commercial applications
  • API pricing is set at $0.27/million input tokens and $1.10/million output tokens after February 8

Market implications: The emergence of DeepSeek-V3 signals a significant shift in the competitive landscape between open-source and closed-source AI models, potentially democratizing access to advanced AI capabilities while challenging the dominance of established players in the field.

DeepSeek-V3, ultra-large open-source AI, outperforms Llama and Qwen on launch

Recent News

Could automated journalism replace human journalism?

This experimental AI news site combines automation with journalistic principles, producing over 20 daily articles at just 30 cents each while maintaining factual accuracy and source credibility.

Biosecurity concerns mount as AI outperforms virus experts

AI systems demonstrate superior practical problem-solving in virology laboratories, posing a concerning dual-use scenario where the same capabilities accelerating medical breakthroughs could provide step-by-step guidance for harmful applications to those without scientific expertise.

How AI is transforming smartphone communication

AI capabilities are now being embedded directly into existing messaging platforms, eliminating the need for separate apps while maintaining conversational context for more efficient communication.