The Allen Institute for AI (AI2) has released MolmoAct 7B, an open-source robotics AI model that enables robots to “reason in space” and “think” in three dimensions. This Action Reasoning Model challenges existing offerings from tech giants like Nvidia and Google by providing robots with enhanced spatial understanding capabilities, achieving a 72.1% task success rate in benchmarking tests that outperformed models from Google, Microsoft, and Nvidia.
What makes it different: MolmoAct represents a significant departure from traditional vision-language-action (VLA) models by incorporating genuine 3D spatial reasoning capabilities.
- “MolmoAct has reasoning in 3D space capabilities versus traditional vision-language-action (VLA) models,” AI2 explained, noting that “most robotics models are VLAs that don’t think or reason in space, but MolmoAct has this capability, making it more performant and generalizable from an architectural standpoint.”
- The model is based on AI2’s open-source Molmo foundation and comes with an Apache 2.0 license, while its training datasets are licensed under CC BY-4.0.
How it works: MolmoAct processes physical environments through a sophisticated multi-step approach that translates visual input into actionable robotic movements.
- The model outputs “spatially grounded perception tokens” using a vector-quantized variational autoencoder—a system that converts data inputs like video into specialized tokens that differ from traditional text-based VLA inputs.
- These tokens enable the model to estimate distances between objects and predict sequences of “image-space” waypoints to create navigation paths.
- After establishing waypoints, MolmoAct outputs specific physical actions such as dropping an arm by a few inches or stretching out movements.
Real-world applications: AI2 positions MolmoAct as particularly valuable for home robotics, where environments are irregular and constantly changing.
- “MolmoAct could be applied anywhere a machine would need to reason about its physical surroundings,” the company said, adding that “we think about it mainly in a home setting because that’s where the greatest challenge lies for robotics.”
- Researchers demonstrated that the model can adapt to different robot embodiments—whether mechanical arms or humanoid robots—with minimal fine-tuning required.
What experts are saying: Industry professionals view AI2’s research as a meaningful step forward in physical AI development, though they emphasize room for continued improvement.
- Alan Fern, professor at Oregon State University College of Engineering, described the research as “a natural progression in enhancing VLMs for robotics and physical reasoning” while noting that “these benchmarks still fall short of capturing real-world complexity and remain relatively controlled and toyish in nature.”
- Daniel Maturana, co-founder of startup Gather AI, praised the model’s accessibility: “This is great news because developing and training these models is expensive, so this is a strong foundation to build on and fine-tune for other academic labs and even for dedicated hobbyists.”
Competitive landscape: MolmoAct enters a rapidly expanding physical AI market where major tech companies are racing to merge large language models with robotics capabilities.
- Google Research’s SayCan helps robots reason about tasks using LLMs to determine movement sequences, while Meta and New York University’s OK-Robot uses visual language models for movement planning and object manipulation.
- Nvidia, which has proclaimed physical AI as the next major trend, recently released several models including Cosmos-Transfer1 to accelerate robotic training.
- Hugging Face has democratized access with a $299 desktop robot designed to make robotics development more accessible.
Why this matters: The shift from manually coding robot movements to LLM-based decision-making represents a fundamental change in how robots interact with their physical environments.
- Before LLMs, scientists had to program every single robotic movement, creating significant work overhead and limiting flexibility in possible actions.
- LLM-based methods now allow robots to determine subsequent actions based on objects they interact with, moving the industry closer to achieving general physical intelligence.
- As Fern noted, “large physical intelligence models are still in their early stages and are much more ripe for rapid advancements, which makes this space particularly exciting.”
Recent Stories
DOE fusion roadmap targets 2030s commercial deployment as AI drives $9B investment
The Department of Energy has released a new roadmap targeting commercial-scale fusion power deployment by the mid-2030s, though the plan lacks specific funding commitments and relies on scientific breakthroughs that have eluded researchers for decades. The strategy emphasizes public-private partnerships and positions AI as both a research tool and motivation for developing fusion energy to meet data centers' growing electricity demands. The big picture: The DOE's roadmap aims to "deliver the public infrastructure that supports the fusion private sector scale up in the 2030s," but acknowledges it cannot commit to specific funding levels and remains subject to Congressional appropriations. Why...
Oct 17, 2025Tying it all together: Credo’s purple cables power the $4B AI data center boom
Credo, a Silicon Valley semiconductor company specializing in data center cables and chips, has seen its stock price more than double this year to $143.61, following a 245% surge in 2024. The company's signature purple cables, which cost between $300-$500 each, have become essential infrastructure for AI data centers, positioning Credo to capitalize on the trillion-dollar AI infrastructure expansion as hyperscalers like Amazon, Microsoft, and Elon Musk's xAI rapidly build out massive computing facilities. What you should know: Credo's active electrical cables (AECs) are becoming indispensable for connecting the massive GPU clusters required for AI training and inference. The company...
Oct 17, 2025Vatican launches Latin American AI network for human development
The Vatican hosted a two-day conference bringing together 50 global experts to explore how artificial intelligence can advance peace, social justice, and human development. The event launched the Latin American AI Network for Integral Human Development and established principles for ethical AI governance that prioritize human dignity over technological advancement. What you should know: The Pontifical Academy of Social Sciences, the Vatican's research body for social issues, organized the "Digital Rerum Novarum" conference on October 16-17, combining academic research with practical AI applications. Participants included leading experts from MIT, Microsoft, Columbia University, the UN, and major European institutions. The conference...