×
Microsoft’s latest AI model ‘Magma’ controls software and robots
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

Microsoft has developed Magma, an integrated AI foundation model that combines visual processing, language understanding, and physical control capabilities. This new system represents a significant advancement in multimodal AI, as it can both process various types of data and take direct actions in both digital interfaces and physical environments.

The breakthrough approach: Magma distinguishes itself from previous AI models by integrating perception and control capabilities into a single foundation model, rather than requiring separate systems for each function.

  • The model represents a collaboration between Microsoft Research and several academic institutions, including KAIST, the University of Maryland, and others
  • Unlike traditional vision-language models that focus solely on perception, Magma can actively manipulate objects and navigate interfaces
  • The system can formulate and execute multi-step plans to achieve specified goals

Technical innovations: Microsoft has introduced two key components that enable Magma’s unique capabilities.

  • Set-of-Mark technology identifies interactive elements in an environment by assigning numeric labels to clickable buttons or graspable objects
  • Trace-of-Mark learns movement patterns from video data to enable physical interactions
  • The model combines Transformer-based language model technology with these new spatial intelligence features

Performance metrics: Early testing shows promising results across various benchmarks and practical applications.

  • Magma-8B achieved an 80.0 score on the VQAv2 visual question-answering benchmark, surpassing GPT-4V’s 77.2
  • The model leads all compared systems with a POPE score of 87.4
  • In robot manipulation tasks, Magma has demonstrated superior performance compared to OpenVLA

Current limitations and next steps: Microsoft acknowledges that Magma still faces some technical challenges.

  • Complex multi-step decision-making remains a limitation for the system
  • Microsoft plans to release Magma’s training and inference code on GitHub for external researchers
  • The company continues research to improve the model’s capabilities through ongoing development

Shifting industry perspective: The development and reception of Magma reflect evolving attitudes toward AI agents.

  • Previous concerns about autonomous AI systems have given way to more mainstream acceptance of agentic AI research
  • Other major tech companies, including OpenAI and Google, are actively developing similar agent-based systems
  • The field has matured to a point where autonomous AI capabilities are viewed as a natural progression rather than a cause for alarm
Microsoft’s new AI agent can control software and robots

Recent News

AI-powered agents poised to upend US auto industry in customers’ favor

Car buyers show strong interest in AI assistance for maintenance alerts and repair verification as dealerships aim to restore consumer confidence.

Eaton’s AI data center stock dips on the arrival of DeepSeek

Market jitters over AI efficiency gains overlook tech giants' continued commitment to data center expansion.

Long story short: Top AI summarizers for articles and documents in 2025

Enterprise-grade AI document summarizers are gaining traction as companies seek to cut down the 20% of work time spent organizing information.