The University of Tokyo collaborated with Alternative Machine Company to develop a humanoid robot system called Alter3 that can directly map natural language commands to the robot's actions. This marks significant progress in research based on the combination of basic models and robotic systems. Alter3's background model uses GPT-4 technology, which enables it to complete a series of complex tasks, from simple selfies to complex behaviors such as simulating ghosts, showing great application potential. Although still in the research stage, this technology points the way for future development of robotics.
Researchers from the University of Tokyo in Japan have made a new breakthrough in their collaborative research with Alternative Machine Company, developing a humanoid robot system Alter3 that can directly map natural language commands to robot actions. Its background model uses GPT-4 technology and can complete a series of complex tasks, such as taking selfies or playing ghost.

This is one of the growing number of research results based on the combination of basic models and robotic systems. Although these systems have not yet reached scalable commercial solutions, they have advanced robotics research in recent years and have shown great potential.
Alter3 uses GPT-4 technology as a background model to receive natural language instructions describing actions or situations in which the robot needs to respond. First, the model uses an "agent framework" to plan the sequence of action steps the robot needs to take to complete its goal. Second, by coding the agent, generate the commands the robot needs to perform each step. Because GPT-4 was not trained on Alter3 programming commands, the researchers used its contextual learning capabilities to adapt its behavior to the robot's API.

Therefore, prompt contains a list of commands and a set of examples showing how to use each command. The model then maps each step to one or more API commands to send to the robot for execution.
The researchers added functionality so humans could provide feedback, such as "raise your arm a little higher." These instructions are sent to another GPT-4 agent, which reasons the code, makes necessary corrections, and returns the sequence of actions to the robot. Improved action recipes and codes are stored in a database for future use.

The researchers conducted multiple tests on the Alter3, including everyday actions, such as taking selfies and drinking tea, as well as imitated actions, such as pretending to be a ghost or a snake. They also tested the model's ability to cope with situations that require carefully planned actions. GPT-4’s extensive understanding of human behavior and motion enables the creation of more realistic behavior plans for humanoid robots such as Alter3. The researchers' experiments also showed that they were able to mimic emotions such as shame and joy in the robot.
Highlight:
- Alter3 is the latest humanoid robot to use GPT-4 technology for reasoning, capable of mapping natural language instructions directly to the robot's actions.
- The researchers leveraged the contextual learning capabilities of GPT-4 technology to adapt its behavior to the robot's API, allowing the robot to perform a desired series of action steps.
- Adding human feedback and memory could improve Alter3's performance, and the researchers' experiments also showed they were able to mimic emotions such as shame and joy in the robot.
The success of Alter3 proves the great potential of GPT-4 in the field of robot control, paving the way for smarter and more flexible robot systems in the future. This breakthrough in research heralds a new revolution in human-computer interaction.