×
Written by
Published on
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

Researchers from UC Berkeley, the University of Warsaw and Stanford have developed a new technique called Embodied Chain-of-Thought (ECoT) reasoning to enhance the decision-making capabilities of vision-language-action (VLA) models used in robotic control systems.

Key Takeaways: ECoT enables robots to reason about their actions in a way that is grounded in their perception of the environment, combining semantic reasoning about tasks with “embodied” reasoning about the robot’s state and surroundings:

  • By generating intermediate reasoning steps, ECoT allows VLAs to better map the relationships between different parts of a problem and come up with more accurate solutions, similar to how Chain-of-Thought (CoT) prompting has improved the performance of large language models (LLMs).
  • ECoT goes beyond just breaking down tasks into sub-tasks, as is common in CoT for LLMs, by also requiring the model to reason about the environment, spatial relationships, and how the robot’s available actions can help achieve the goal.

Overcoming challenges in applying CoT to robotics: Directly applying CoT techniques used in LLMs to VLAs posed several challenges that the researchers had to address:

  • Current VLAs rely on relatively smaller, open-source vision-language models (VLMs) that are not as proficient at reasoning as the larger LLMs used in language applications.
  • Robotic tasks require the model to reason not only about the task itself but also about the environment and the robot’s own state, necessitating a more “embodied” form of reasoning.

Generating synthetic training data for ECoT: To enable VLA models to perform ECoT reasoning, the researchers created a pipeline to generate annotated training data:

  • Pre-trained object detectors, LLMs, and VLMs are used to annotate existing robot datasets with information that can be used for reasoning, such as object bounding boxes and spatial relationships.
  • Google’s Gemini model is then employed to generate the final reasoning chain, which includes rephrasing the instruction, outlining sub-tasks, identifying the current focus based on the environment and robot state, and predicting pixel locations of key elements.

Impressive performance gains and improved interpretability: Evaluating ECoT on a robotic manipulation setup using OpenVLA yielded significant improvements:

  • ECoT increased the task success rate by 28% compared to the baseline model, demonstrating strong generalization to new objects, scenes, viewpoints and instructions not present in the training data.
  • Expressing the reasoning steps in natural language made it much easier to understand why the model failed in certain situations, enabling humans to provide feedback and correct the policy’s behavior more effectively.

Broader implications for foundation models in robotics: ECoT is part of a growing trend of integrating foundation models, such as LLMs and VLMs, into robotic control systems to fill in gaps and enhance capabilities:

  • Foundation models’ ability to ingest large amounts of unlabeled data from the internet allows them to contribute to various parts of the robotics stack, from designing reward functions to reasoning about the environment and planning actions.
  • As the industry moves toward foundation models optimized for robotics, techniques like ECoT will play a crucial role in enabling more robust, interpretable, and generalizable robot control policies.
Researchers develop technique to give robots “embodied reasoning” abilities

Recent News

AI Tutors Double Student Learning in Harvard Study

Students using an AI tutor demonstrated twice the learning gains in half the time compared to traditional lectures, suggesting potential for more efficient and personalized education.

Lionsgate Teams Up With Runway On Custom AI Video Generation Model

The studio aims to develop AI tools for filmmakers using its vast library, raising questions about content creation and creative rights.

How to Successfully Integrate AI into Project Management Practices

AI-powered tools automate routine tasks, analyze data for insights, and enhance decision-making, promising to boost productivity and streamline project management across industries.