×
Microsoft’s Small Phi-3.5 Model Outperforms Gemini and GPT-4o in STEM
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

Microsoft unveils Phi-3.5 small language model: Microsoft has released its latest iteration of small language models, Phi-3.5, available in three sizes and free to download.

Model specifications and performance: Phi-3.5 comes in 3.8 billion, 4.15 billion, and 41.9 billion parameter versions, with the smallest model trained on 3.4 trillion tokens of data using 512 Nvidia H100 GPUs over 10 days.

  • The model excels in reasoning tasks, second only to GPT-4o-mini among leading small models.
  • Phi-3.5 significantly outperforms Llama and Gemini on math benchmarks.
  • A vision-capable version of the model can process and understand images.
  • The largest version utilizes a mixture of expert models, comprising 16 3.8b parameter models trained on 4.9 trillion tokens over 23 days.

Advanced features and architecture: Phi-3.5 incorporates innovative elements to enhance its capabilities and efficiency.

  • The mixture of expert models approach splits learning tasks across different sub-networks, potentially improving performance and adaptability.
  • The inclusion of a vision-capable version expands the model’s applicability to multimodal tasks.
  • The model’s strong performance in reasoning and math suggests potential advantages in specialized applications.

Real-world testing and limitations: Despite impressive benchmark results, hands-on testing of the 3.8 billion parameter version revealed some discrepancies between expected and actual performance.

  • Some testers’ experience indicated issues with phrasing and difficulty with simple tests.
  • These limitations suggest that real-world applications may require careful consideration and potentially the use of larger model versions.
  • The mixture of expert models is expected to address some of these shortcomings, based on benchmark results.

Comparative strengths: Phi-3.5 demonstrates particular advantages in specific domains when compared to other small language models.

  • The model outperforms GPT-4o-mini specifically in STEM and social sciences areas.
  • This specialized strength could make Phi-3.5 particularly valuable for applications in these fields.

Broader implications for AI development: The release of Phi-3.5 highlights ongoing trends and challenges in the development of small language models.

  • The focus on creating more efficient, smaller models reflects the industry’s push for accessible AI that can run on less powerful hardware.
  • The discrepancy between benchmark performance and real-world testing underscores the importance of comprehensive evaluation methods for AI models.
  • Microsoft’s decision to make all versions freely available promotes open research and development in the AI community.

Looking ahead: While Phi-3.5 shows promise in certain areas, its real-world performance suggests that small language models still face challenges in matching the capabilities of their larger counterparts. The continued development and refinement of these models will likely focus on bridging the gap between benchmark results and practical applications, potentially leading to more efficient and specialized AI tools in the near future.

Microsoft reveals Phi-3.5 — this new small AI model outperforms Gemini and GPT-4o

Recent News

Baidu reports steepest revenue drop in 2 years amid slowdown

China's tech giant Baidu saw revenue drop 3% despite major AI investments, signaling broader challenges for the nation's technology sector amid economic headwinds.

How to manage risk in the age of AI

A conversation with Palo Alto Networks CEO about his approach to innovation as new technologies and risks emerge.

How to balance bold, responsible and successful AI deployment

Major companies are establishing AI governance structures and training programs while racing to deploy generative AI for competitive advantage.