“Our model is trained on a dataset of human-robot interactions, where an expert operator is asked to vary the interactions and mood of the robot, while the operator commands as well as the pose of the human and robot are recorded,” Disney Research explains.
“Our approach learns to predict continuous operator commands through a diffusion process and discrete commands through a classifier, all unified within a single transformer architecture,” the researchers continue.
From concept to reality
Once all the interactions were documented, the researchers handed them over to an AI system, which processed each and every one of them, while picking up on the operator’s movements and reactions. Over time, it learned to replicate the operator’s instinctive ability to make the robot move, respond, and behave in ways that felt remarkably lifelike.