DeepSeek Created a Self Teaching AI Beating OpenAI (AGAIN)... and It Actually Works!

Watch on YouTube

# DeepSeek’s Self-Teaching AI Outperforms OpenAI’s GPT-4o: A Breakthrough in AI Development

## Key Highlights from DeepSeek’s New Generative Reward Model (GRM)

DeepSeek has made a significant breakthrough in AI development with their new DeepSeek GRM system that can effectively teach itself to think better. This development represents a major shift in how AI models learn and improve, with performance metrics that now rival or even exceed OpenAI’s GPT-4o in certain benchmarks.

## How DeepSeek GRM Works

The core of DeepSeek’s innovation is a process called **Self-Principled Critique Tuning (SPCT)**, which allows AI models to:

1. Generate their own principles for evaluating answers
2. Critique potential responses using these principles
3. Score answers on a scale of 1-10 based on correctness, clarity, safety, and other factors

What makes this system particularly powerful is its ability to perform “repeated sampling at inference time.” This means the model can:
– Sample its internal judging process multiple times
– Average out or vote on the results
– Filter low-quality critiques using a “meta reward model” (MetaRM)

## The Two-Phase Training Process

DeepSeek GRM is trained through a two-phase approach:

**Phase 1: Rejective Fine-Tuning (RFT)**
– Uses 1.07 million pieces of general instruction data
– Includes 186,000 items of “rejectively sampled” data
– Focuses on challenging examples where the model’s initial predictions don’t align with known best responses
– Takes approximately 19.2 hours on 128 A100 GPUs

**Phase 2: Rule-Based Online Reinforcement Learning (GRPO)**
– Similar to Preference Optimization (PO)
– Awards +1 when model’s predicted best response matches actual best response
– Awards -1 when predictions are incorrect
– Includes a KL penalty to prevent the model from drifting too far
– Uses 237K data points over 900 steps
– Takes approximately 15.6 hours on 128 A100 GPUs

DeepSeek Created a Self Teaching AI Beating OpenAI (AGAIN)… and It Actually Works!

Outsider
Labs.

DeepSeek Created a Self Teaching AI Beating OpenAI (AGAIN)… and It Actually Works!

More videos

Alex Karp just told CNBC the AI industry is “effing insane.”

Claude Fable 5: When Capability Meets Economics

Run Agentic AI Entirely on Your Mac—No Cloud, No Latency, No Privacy Tradeoffs

All Signal.No Noise.

OutsiderLabs.

All Signal.
No Noise.

Outsider
Labs.