×
Microsoft’s latest AI model ‘Magma’ controls software and robots
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

Microsoft has developed Magma, an integrated AI foundation model that combines visual processing, language understanding, and physical control capabilities. This new system represents a significant advancement in multimodal AI, as it can both process various types of data and take direct actions in both digital interfaces and physical environments.

The breakthrough approach: Magma distinguishes itself from previous AI models by integrating perception and control capabilities into a single foundation model, rather than requiring separate systems for each function.

  • The model represents a collaboration between Microsoft Research and several academic institutions, including KAIST, the University of Maryland, and others
  • Unlike traditional vision-language models that focus solely on perception, Magma can actively manipulate objects and navigate interfaces
  • The system can formulate and execute multi-step plans to achieve specified goals

Technical innovations: Microsoft has introduced two key components that enable Magma’s unique capabilities.

  • Set-of-Mark technology identifies interactive elements in an environment by assigning numeric labels to clickable buttons or graspable objects
  • Trace-of-Mark learns movement patterns from video data to enable physical interactions
  • The model combines Transformer-based language model technology with these new spatial intelligence features

Performance metrics: Early testing shows promising results across various benchmarks and practical applications.

  • Magma-8B achieved an 80.0 score on the VQAv2 visual question-answering benchmark, surpassing GPT-4V’s 77.2
  • The model leads all compared systems with a POPE score of 87.4
  • In robot manipulation tasks, Magma has demonstrated superior performance compared to OpenVLA

Current limitations and next steps: Microsoft acknowledges that Magma still faces some technical challenges.

  • Complex multi-step decision-making remains a limitation for the system
  • Microsoft plans to release Magma’s training and inference code on GitHub for external researchers
  • The company continues research to improve the model’s capabilities through ongoing development

Shifting industry perspective: The development and reception of Magma reflect evolving attitudes toward AI agents.

  • Previous concerns about autonomous AI systems have given way to more mainstream acceptance of agentic AI research
  • Other major tech companies, including OpenAI and Google, are actively developing similar agent-based systems
  • The field has matured to a point where autonomous AI capabilities are viewed as a natural progression rather than a cause for alarm
Microsoft’s new AI agent can control software and robots

Recent News

AI agents reshape digital workplaces as Moveworks invests heavily

AI agents evolve from chatbots to task-completing digital coworkers as Moveworks launches comprehensive platform for enterprise-ready agent creation, integration, and deployment.

McGovern Institute at MIT celebrates a quarter century of brain science research

MIT's McGovern Institute marks 25 years of translating brain research into practical applications, from CRISPR gene therapy to neural-controlled prosthetics.

Agentic AI transforms hiring practices in recruitment industry

AI recruitment tools accelerate candidate matching and reduce bias, but require human oversight to ensure effective hiring decisions.