×
Microsoft’s Small Phi-3.5 Model Outperforms Gemini and GPT-4o in STEM
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

Microsoft unveils Phi-3.5 small language model: Microsoft has released its latest iteration of small language models, Phi-3.5, available in three sizes and free to download.

Model specifications and performance: Phi-3.5 comes in 3.8 billion, 4.15 billion, and 41.9 billion parameter versions, with the smallest model trained on 3.4 trillion tokens of data using 512 Nvidia H100 GPUs over 10 days.

  • The model excels in reasoning tasks, second only to GPT-4o-mini among leading small models.
  • Phi-3.5 significantly outperforms Llama and Gemini on math benchmarks.
  • A vision-capable version of the model can process and understand images.
  • The largest version utilizes a mixture of expert models, comprising 16 3.8b parameter models trained on 4.9 trillion tokens over 23 days.

Advanced features and architecture: Phi-3.5 incorporates innovative elements to enhance its capabilities and efficiency.

  • The mixture of expert models approach splits learning tasks across different sub-networks, potentially improving performance and adaptability.
  • The inclusion of a vision-capable version expands the model’s applicability to multimodal tasks.
  • The model’s strong performance in reasoning and math suggests potential advantages in specialized applications.

Real-world testing and limitations: Despite impressive benchmark results, hands-on testing of the 3.8 billion parameter version revealed some discrepancies between expected and actual performance.

  • Some testers’ experience indicated issues with phrasing and difficulty with simple tests.
  • These limitations suggest that real-world applications may require careful consideration and potentially the use of larger model versions.
  • The mixture of expert models is expected to address some of these shortcomings, based on benchmark results.

Comparative strengths: Phi-3.5 demonstrates particular advantages in specific domains when compared to other small language models.

  • The model outperforms GPT-4o-mini specifically in STEM and social sciences areas.
  • This specialized strength could make Phi-3.5 particularly valuable for applications in these fields.

Broader implications for AI development: The release of Phi-3.5 highlights ongoing trends and challenges in the development of small language models.

  • The focus on creating more efficient, smaller models reflects the industry’s push for accessible AI that can run on less powerful hardware.
  • The discrepancy between benchmark performance and real-world testing underscores the importance of comprehensive evaluation methods for AI models.
  • Microsoft’s decision to make all versions freely available promotes open research and development in the AI community.

Looking ahead: While Phi-3.5 shows promise in certain areas, its real-world performance suggests that small language models still face challenges in matching the capabilities of their larger counterparts. The continued development and refinement of these models will likely focus on bridging the gap between benchmark results and practical applications, potentially leading to more efficient and specialized AI tools in the near future.

Microsoft reveals Phi-3.5 — this new small AI model outperforms Gemini and GPT-4o

Recent News

Nvidia’s new AI agents can search and summarize huge quantities of visual data

NVIDIA's new AI Blueprint combines computer vision and generative AI to enable efficient analysis of video and image content, with potential applications across industries and smart city initiatives.

How Boulder schools balance AI innovation with student data protection

Colorado school districts embrace AI in classrooms, focusing on ethical use and data privacy while preparing students for a tech-driven future.

Microsoft Copilot Vision nears launch — here’s what we know right now

Microsoft's new AI feature can analyze on-screen content, offering contextual assistance without the need for additional searches or explanations.