×
Microsoft’s Small Phi-3.5 Model Outperforms Gemini and GPT-4o in STEM
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

Microsoft unveils Phi-3.5 small language model: Microsoft has released its latest iteration of small language models, Phi-3.5, available in three sizes and free to download.

Model specifications and performance: Phi-3.5 comes in 3.8 billion, 4.15 billion, and 41.9 billion parameter versions, with the smallest model trained on 3.4 trillion tokens of data using 512 Nvidia H100 GPUs over 10 days.

  • The model excels in reasoning tasks, second only to GPT-4o-mini among leading small models.
  • Phi-3.5 significantly outperforms Llama and Gemini on math benchmarks.
  • A vision-capable version of the model can process and understand images.
  • The largest version utilizes a mixture of expert models, comprising 16 3.8b parameter models trained on 4.9 trillion tokens over 23 days.

Advanced features and architecture: Phi-3.5 incorporates innovative elements to enhance its capabilities and efficiency.

  • The mixture of expert models approach splits learning tasks across different sub-networks, potentially improving performance and adaptability.
  • The inclusion of a vision-capable version expands the model’s applicability to multimodal tasks.
  • The model’s strong performance in reasoning and math suggests potential advantages in specialized applications.

Real-world testing and limitations: Despite impressive benchmark results, hands-on testing of the 3.8 billion parameter version revealed some discrepancies between expected and actual performance.

  • Some testers’ experience indicated issues with phrasing and difficulty with simple tests.
  • These limitations suggest that real-world applications may require careful consideration and potentially the use of larger model versions.
  • The mixture of expert models is expected to address some of these shortcomings, based on benchmark results.

Comparative strengths: Phi-3.5 demonstrates particular advantages in specific domains when compared to other small language models.

  • The model outperforms GPT-4o-mini specifically in STEM and social sciences areas.
  • This specialized strength could make Phi-3.5 particularly valuable for applications in these fields.

Broader implications for AI development: The release of Phi-3.5 highlights ongoing trends and challenges in the development of small language models.

  • The focus on creating more efficient, smaller models reflects the industry’s push for accessible AI that can run on less powerful hardware.
  • The discrepancy between benchmark performance and real-world testing underscores the importance of comprehensive evaluation methods for AI models.
  • Microsoft’s decision to make all versions freely available promotes open research and development in the AI community.

Looking ahead: While Phi-3.5 shows promise in certain areas, its real-world performance suggests that small language models still face challenges in matching the capabilities of their larger counterparts. The continued development and refinement of these models will likely focus on bridging the gap between benchmark results and practical applications, potentially leading to more efficient and specialized AI tools in the near future.

Microsoft reveals Phi-3.5 — this new small AI model outperforms Gemini and GPT-4o

Recent News

AI agents and the rise of Hybrid Organizations

Meta makes its improved AI image generator free to use while adding visible watermarks and daily limits to prevent misuse.

Adobe partnership brings AI creativity tools to Box’s content management platform

Box users can now access Adobe's AI-powered editing tools directly within their secure storage environment, eliminating the need to download files or switch between platforms.

Nvidia’s new ACE platform aims to bring more AI to games, but not everyone’s sold

Gaming companies are racing to integrate AI features into mainstream titles, but high hardware requirements and artificial interactions may limit near-term adoption.