×
China-based DeepSeek just released a very powerful ultra large AI model
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

DeepSeek, a Chinese AI startup, has released DeepSeek-V3, a new ultra-large AI model with 671B parameters that outperforms leading open-source competitors while approaching the capabilities of prominent closed-source models.

Key innovations: DeepSeek-V3 employs a mixture-of-experts architecture that selectively activates only 37B of its 671B parameters for each task, enabling efficient processing while maintaining high performance.

  • The model introduces an auxiliary loss-free load-balancing strategy that optimizes expert utilization without compromising performance
  • A new multi-token prediction feature allows the model to generate 60 tokens per second, three times faster than previous versions
  • The system uses multi-head latent attention (MLA) and DeepSeekMoE architectures for efficient training and inference

Technical specifications: The model underwent extensive training on 14.8T high-quality tokens and features significant context length capabilities.

  • DeepSeek-V3’s context length was extended in two stages, first to 32K and then to 128K
  • The training process included supervised fine-tuning and reinforcement learning to align with human preferences
  • The company implemented various optimizations, including FP8 mixed precision training and the DualPipe algorithm

Cost efficiency: DeepSeek achieved remarkable cost savings in the training process compared to industry standards.

  • The entire training process required approximately 2788K H800 GPU hours, costing about $5.57 million
  • This represents a significant reduction from typical training costs, such as the estimated $500 million spent on Llama-3.1

Performance benchmarks: DeepSeek-V3 demonstrates superior performance across multiple evaluation metrics.

  • The model outperforms open-source competitors like Llama-3.1-405B and Qwen 2.5-72B
  • It shows particular strength in Chinese language and mathematical tasks, scoring 90.2 on the Math-500 test
  • While matching or exceeding GPT-4o in most areas, it falls behind in specific English-focused tests like SimpleQA and FRAMES

Accessibility and pricing: The model is available through multiple channels with competitive pricing structure.

  • The code is accessible via GitHub under an MIT license
  • Users can access the model through DeepSeek Chat or via API for commercial applications
  • API pricing is set at $0.27/million input tokens and $1.10/million output tokens after February 8

Market implications: The emergence of DeepSeek-V3 signals a significant shift in the competitive landscape between open-source and closed-source AI models, potentially democratizing access to advanced AI capabilities while challenging the dominance of established players in the field.

DeepSeek-V3, ultra-large open-source AI, outperforms Llama and Qwen on launch

Recent News

7 ways to optimize your business for ChatGPT recommendations

Companies must adapt their digital strategy with specific expertise, consistent information across platforms, and authoritative content to appear in AI-powered recommendation results.

Robin Williams’ daughter Zelda slams OpenAI’s Ghibli-style images amid artistic and ethical concerns

Robin Williams' daughter condemns OpenAI's AI-generated Ghibli-style images, highlighting both environmental costs and the contradiction with Miyazaki's well-documented opposition to artificial intelligence in creative work.

AI search tools provide wrong answers up to 60% of the time despite growing adoption

Independent testing reveals AI search tools frequently provide incorrect information, with error rates ranging from 37% to 94% across major platforms despite their growing popularity as Google alternatives.