back
Get SIGNAL/NOISE in your inbox daily

Researchers from UC Berkeley, the University of Warsaw and Stanford have developed a new technique called Embodied Chain-of-Thought (ECoT) reasoning to enhance the decision-making capabilities of vision-language-action (VLA) models used in robotic control systems.

Key Takeaways: ECoT enables robots to reason about their actions in a way that is grounded in their perception of the environment, combining semantic reasoning about tasks with “embodied” reasoning about the robot’s state and surroundings:

  • By generating intermediate reasoning steps, ECoT allows VLAs to better map the relationships between different parts of a problem and come up with more accurate solutions, similar to how Chain-of-Thought (CoT) prompting has improved the performance of large language models (LLMs).
  • ECoT goes beyond just breaking down tasks into sub-tasks, as is common in CoT for LLMs, by also requiring the model to reason about the environment, spatial relationships, and how the robot’s available actions can help achieve the goal.

Overcoming challenges in applying CoT to robotics: Directly applying CoT techniques used in LLMs to VLAs posed several challenges that the researchers had to address:

  • Current VLAs rely on relatively smaller, open-source vision-language models (VLMs) that are not as proficient at reasoning as the larger LLMs used in language applications.
  • Robotic tasks require the model to reason not only about the task itself but also about the environment and the robot’s own state, necessitating a more “embodied” form of reasoning.

Generating synthetic training data for ECoT: To enable VLA models to perform ECoT reasoning, the researchers created a pipeline to generate annotated training data:

  • Pre-trained object detectors, LLMs, and VLMs are used to annotate existing robot datasets with information that can be used for reasoning, such as object bounding boxes and spatial relationships.
  • Google’s Gemini model is then employed to generate the final reasoning chain, which includes rephrasing the instruction, outlining sub-tasks, identifying the current focus based on the environment and robot state, and predicting pixel locations of key elements.

Impressive performance gains and improved interpretability: Evaluating ECoT on a robotic manipulation setup using OpenVLA yielded significant improvements:

  • ECoT increased the task success rate by 28% compared to the baseline model, demonstrating strong generalization to new objects, scenes, viewpoints and instructions not present in the training data.
  • Expressing the reasoning steps in natural language made it much easier to understand why the model failed in certain situations, enabling humans to provide feedback and correct the policy’s behavior more effectively.

Broader implications for foundation models in robotics: ECoT is part of a growing trend of integrating foundation models, such as LLMs and VLMs, into robotic control systems to fill in gaps and enhance capabilities:

  • Foundation models’ ability to ingest large amounts of unlabeled data from the internet allows them to contribute to various parts of the robotics stack, from designing reward functions to reasoning about the environment and planning actions.
  • As the industry moves toward foundation models optimized for robotics, techniques like ECoT will play a crucial role in enabling more robust, interpretable, and generalizable robot control policies.

Recent Stories

Oct 17, 2025

DOE fusion roadmap targets 2030s commercial deployment as AI drives $9B investment

The Department of Energy has released a new roadmap targeting commercial-scale fusion power deployment by the mid-2030s, though the plan lacks specific funding commitments and relies on scientific breakthroughs that have eluded researchers for decades. The strategy emphasizes public-private partnerships and positions AI as both a research tool and motivation for developing fusion energy to meet data centers' growing electricity demands. The big picture: The DOE's roadmap aims to "deliver the public infrastructure that supports the fusion private sector scale up in the 2030s," but acknowledges it cannot commit to specific funding levels and remains subject to Congressional appropriations. Why...

Oct 17, 2025

Tying it all together: Credo’s purple cables power the $4B AI data center boom

Credo, a Silicon Valley semiconductor company specializing in data center cables and chips, has seen its stock price more than double this year to $143.61, following a 245% surge in 2024. The company's signature purple cables, which cost between $300-$500 each, have become essential infrastructure for AI data centers, positioning Credo to capitalize on the trillion-dollar AI infrastructure expansion as hyperscalers like Amazon, Microsoft, and Elon Musk's xAI rapidly build out massive computing facilities. What you should know: Credo's active electrical cables (AECs) are becoming indispensable for connecting the massive GPU clusters required for AI training and inference. The company...

Oct 17, 2025

Vatican launches Latin American AI network for human development

The Vatican hosted a two-day conference bringing together 50 global experts to explore how artificial intelligence can advance peace, social justice, and human development. The event launched the Latin American AI Network for Integral Human Development and established principles for ethical AI governance that prioritize human dignity over technological advancement. What you should know: The Pontifical Academy of Social Sciences, the Vatican's research body for social issues, organized the "Digital Rerum Novarum" conference on October 16-17, combining academic research with practical AI applications. Participants included leading experts from MIT, Microsoft, Columbia University, the UN, and major European institutions. The conference...