Meta launches Llama 4 with advanced MoE models now available on Hugging Face

Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage

Join Now

Meta has launched Llama 4, a breakthrough generation of large language models featuring two new MoE architectures: Maverick (~~400B) and Scout (~~109B). These natively multimodal models represent a significant advancement in AI capability while maintaining efficient 17B active parameter design. Their arrival on Hugging Face, with full integration into the platform’s ecosystem from day one, marks an important milestone in making powerful, multilingual AI models accessible to developers and researchers worldwide.

The big picture: Meta has released two new Mixture of Experts (MoE) models under the Llama 4 series with immediate availability on Hugging Face, offering advanced capabilities while maintaining computational efficiency.

Llama 4 Maverick uses 17B active parameters out of a massive ~400B total parameter count with 128 experts, positioning it as Meta’s most capable model to date.
The more efficient Llama 4 Scout also employs 17B active parameters but from a smaller ~109B total parameter pool with just 16 experts, providing a balance of performance and resource usage.

Key capabilities: Both models feature native multimodality with early fusion architecture, allowing them to process both text and images seamlessly in a single workflow.

The models were trained on up to 40 trillion tokens spanning 200 languages, with specialized fine-tuning for 12 languages including Arabic, Spanish, German, and Hindi.
Hugging Face has ensured full support for all multimodal capabilities through its transformers library integration from launch day.

Implementation details: Hugging Face has created a comprehensive ecosystem to support immediate adoption of these models.

Both model checkpoints are available directly on the Hugging Face Hub under the meta-llama organization with optimized support through Text Generation Inference (TGI).
The platform offers quantization support for efficient model usage and Xet Storage capabilities for improved model uploads and downloads.

Why this matters: The immediate availability of these state-of-the-art models on Hugging Face democratizes access to advanced AI capabilities that combine massive parameter counts with efficient activation designs.

The native multimodality eliminates the need for separate text and image models, streamlining development workflows.
Full integration with Hugging Face’s tools reduces technical barriers to implementation, allowing developers to begin using these models immediately.

Welcome Llama 4 Maverick & Scout on Hugging Face

huggingface