×
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

# Meta’s Llama 4 Release: Impressive Claims, Mixed Reality

Meta recently released their new Llama 4 family of AI models, but the reception has been mixed despite some impressive technical specifications. This release has sparked both excitement and controversy in the AI community.

## The Llama 4 Family: Massive but Specialized

Meta introduced three new mixture-of-experts (MoE) models:

– **Llama 4 Scout**: 109 billion parameters with 17 billion active parameters and 16 experts
– **Llama 4 Maverick**: 400 billion parameters with 17 billion active parameters and 128 experts
– **Llama 4 Behemoth**: 2 trillion parameters with 288 billion active parameters and 16 experts (still in training)

Unlike traditional dense models where all parameters activate for each token, MoE models only activate a fraction of their parameters at any time, allowing for more knowledge storage with reduced computational demands.

## Headline Features

– **Massive context window**: Llama 4 Scout supports a 10 million token context window, 78 times larger than most open models
– **Multimodal capabilities**: Built from the ground up with text and image understanding (but can’t generate images)
– **High benchmark scores**: Claims an impressive 1417 ELO on LM Arena

## The Controversies

Despite the impressive specifications, several issues have emerged:

1. **Poor instruction following**: Independent testers found Llama 4 models frequently failing to follow even basic instructions, such as counting letters in a word

2. **Benchmark discrepancies**: Significant gaps between Meta’s claimed benchmark results and independent testing

3. **Misleading marketing**: Some community members felt the marketing around “active parameters” was intentionally misleading

4. **Context quality issues**: On Fiction Live Bench (which tests comprehension rather than search ability), performance declined dramatically after just 400 tokens

5. **Different benchmark models**: Meta acknowledged using a “version optimized for conversations” for LM Arena benchmarks, not the actual released models

## Current Strengths

Despite the criticisms, Llama 4 does excel in vision understanding capabilities, performing well on vision

Recent Videos