×
Written by
Published on
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

Apple’s new open-source language models showcase the company’s AI prowess, with the 7B model outperforming leading open models and the 1.4B version surpassing competitors in its category.

Introducing DCLM models: Apple’s research team, as part of the DataComp for Language Models project, released a family of open DCLM models on Hugging Face, including a 7 billion parameter model and a 1.4 billion parameter model:

  • The models were trained using a high-quality dataset, DCLM-Baseline, assembled through model-based filtering, demonstrating the effectiveness of this data curation technique.
  • The project is truly open-source, with the release of model weights, training code, and the pretraining dataset, fostering transparency and collaboration in AI research.

Impressive performance of the 7B model: The larger DCLM model, trained on 2.5 trillion tokens, delivers competitive results on benchmarks compared to leading open models:

  • It achieves 63.7% 5-shot accuracy on MMLU, a 6.6 percentage point improvement over the previous state-of-the-art open-data language model, MAP-Neo, while using 40% less compute for training.
  • The model’s performance is close to that of Mistral-7B-v0.3, Llama3 8B, Google’s Gemma, and Microsoft’s Phi-3, despite these models using closed data.

Smaller model outshines competitors: The 1.4B version of the DCLM model, trained jointly with Toyota Research Institute on 2.6 trillion tokens, also showcases impressive performance:

  • It scores 41.9% on the 5-shot MMLU test, considerably higher than other models in its category, such as Hugging Face’s SmolLM, Qwen-1.5B, and Phi-1.5B.
  • The smaller model is released under the Apache 2.0 license, allowing for commercial use, distribution, and modification.

Broader implications: The release of Apple’s DCLM models highlights the importance of dataset design in training high-performing language models and serves as a starting point for further research on data curation. However, it is essential to note that these models are early research and may exhibit biases or produce harmful responses. As Apple continues to expand its AI offerings, the company’s commitment to open-source development and collaborative research could drive significant advancements in the field, while also raising questions about the potential impact on user privacy and the competitive landscape of AI technology.

Apple shows off open AI prowess: new models outperform Mistral and Hugging Face offerings

Recent News

New AI Video Tool Recreates (Glitchy Version) of Super Mario Bros

The AI model generates basic Super Mario Bros. gameplay from prompts, but faces significant limitations in speed and complexity.

AI-Powered Macs Outperform Copilot+ PCs, Apple Claims

Apple's latest marketing push compares the M3 MacBook Air's performance to Microsoft's Copilot+ PCs, claiming superior graphics and web browsing speeds based on internal benchmarks.

AI Sting Operations Target Online Child Predators

Police employ AI-generated images of fictional teenagers to catch online predators, raising ethical questions and highlighting potential flaws in social media safety measures.