×
Microsoft’s latest AI model ‘Magma’ controls software and robots
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

Microsoft has developed Magma, an integrated AI foundation model that combines visual processing, language understanding, and physical control capabilities. This new system represents a significant advancement in multimodal AI, as it can both process various types of data and take direct actions in both digital interfaces and physical environments.

The breakthrough approach: Magma distinguishes itself from previous AI models by integrating perception and control capabilities into a single foundation model, rather than requiring separate systems for each function.

  • The model represents a collaboration between Microsoft Research and several academic institutions, including KAIST, the University of Maryland, and others
  • Unlike traditional vision-language models that focus solely on perception, Magma can actively manipulate objects and navigate interfaces
  • The system can formulate and execute multi-step plans to achieve specified goals

Technical innovations: Microsoft has introduced two key components that enable Magma’s unique capabilities.

  • Set-of-Mark technology identifies interactive elements in an environment by assigning numeric labels to clickable buttons or graspable objects
  • Trace-of-Mark learns movement patterns from video data to enable physical interactions
  • The model combines Transformer-based language model technology with these new spatial intelligence features

Performance metrics: Early testing shows promising results across various benchmarks and practical applications.

  • Magma-8B achieved an 80.0 score on the VQAv2 visual question-answering benchmark, surpassing GPT-4V’s 77.2
  • The model leads all compared systems with a POPE score of 87.4
  • In robot manipulation tasks, Magma has demonstrated superior performance compared to OpenVLA

Current limitations and next steps: Microsoft acknowledges that Magma still faces some technical challenges.

  • Complex multi-step decision-making remains a limitation for the system
  • Microsoft plans to release Magma’s training and inference code on GitHub for external researchers
  • The company continues research to improve the model’s capabilities through ongoing development

Shifting industry perspective: The development and reception of Magma reflect evolving attitudes toward AI agents.

  • Previous concerns about autonomous AI systems have given way to more mainstream acceptance of agentic AI research
  • Other major tech companies, including OpenAI and Google, are actively developing similar agent-based systems
  • The field has matured to a point where autonomous AI capabilities are viewed as a natural progression rather than a cause for alarm
Microsoft’s new AI agent can control software and robots

Recent News

North Korea unveils AI-equipped suicide drones amid deepening Russia ties

North Korea's AI-equipped suicide drones reflect growing technological cooperation with Russia, potentially destabilizing security in an already tense Korean peninsula.

Rookie mistake: Police recruit fired for using ChatGPT on academy essay finds second chance

A promising police career was derailed then revived after an officer's use of AI revealed gaps in how law enforcement is adapting to new technology.

Auburn University launches AI-focused cybersecurity center to counter emerging threats

Auburn's new center brings together experts from multiple disciplines to develop defensive strategies against the rising tide of AI-powered cyber threats affecting 78 percent of security officers surveyed.