Alter3, a GPT-4 powered humanoid robot, showcases the potential of combining advanced language models with robotics to create more realistic and adaptable robot behaviors.
Harnessing the power of large language models: Alter3 leverages GPT-4’s vast knowledge to directly map natural language commands to robot actions, simplifying the process of controlling the robot’s 43 axes:
- Researchers at the University of Tokyo and Alternative Machine have designed Alter3 to take advantage of GPT-4’s capabilities, enabling it to perform complex tasks like taking a selfie or mimicking a ghost.
- GPT-4 acts as a planner, determining the steps required to perform the desired action, and then generates the necessary commands for the robot to execute each step using its in-context learning ability.
Refining actions through human feedback: Since language may not always precisely describe physical poses, Alter3 incorporates a feedback loop that allows humans to provide corrections, further improving the robot’s performance:
- Users can provide feedback such as “Raise your arm a bit more,” which is sent to another GPT-4 agent that reasons over the code, makes necessary corrections, and returns the updated action sequence to the robot.
- The refined action recipe and code are stored in a database for future use, enabling Alter3 to learn and adapt its behaviors over time.
Demonstrating emotional expression and realistic behaviors: GPT-4’s extensive knowledge about human behaviors and actions enables Alter3 to create more realistic behavior plans and even mimic emotions:
- Experiments show that Alter3 can mimic emotions such as embarrassment and joy, even when emotional expressions are not explicitly stated in the text instructions.
- GPT-4’s linguistic representations of movements can be accurately mapped onto Alter3’s body, resulting in more natural and human-like behaviors.
The growing trend of foundation models in robotics: Alter3 is part of a growing body of research that combines the power of foundation models with robotics systems:
- Other projects, such as Figure, RT-2-X, and OpenVLA, also utilize foundation models as reasoning and planning modules in robotics control systems, showcasing the potential of this approach.
- As multi-modality becomes the norm in foundation models, robotics systems will become better equipped to reason about their environment and choose their actions.
Analyzing deeper: While the integration of advanced language models like GPT-4 with robotics systems is a significant step forward, there are still challenges to be addressed:
- Projects like Alter3 often overlook the base challenges of creating robots that can perform primitive tasks such as grasping objects, maintaining balance, and moving around.
- Fine-tuned foundation models specifically designed for robotics commands, such as RT-2-X and OpenVLA, may produce more stable results and generalize better to various tasks and environments, but they require technical skills and are more expensive to create.
- The lack of data for low-level robot tasks remains a significant hurdle in the development of more advanced and adaptable robotics systems.
Alter3 is the latest GPT-4-powered humanoid robot