1、Gemini Robotics 1.5:Pushing the Frontier ofGeneralist Robots with Advanced EmbodiedReasoning,Thinking,and Motion TransferGemini Robotics Team,Google DeepMind1General-purpose robots need a deep understanding of the physical world,advanced reasoning,andgeneral and dexterous control.This report introdu
2、ces the latest generation of the Gemini Roboticsmodel family:Gemini Robotics 1.5,a multi-embodiment Vision-Language-Action(VLA)model,andGemini Robotics-ER 1.5,a state-of-the-art Embodied Reasoning(ER)model.We are bringing togetherthree major innovations.First,Gemini Robotics 1.5 features a novel arc
3、hitecture and a Motion Transfer(MT)mechanism,which enables it to learn from heterogeneous,multi-embodiment robot data andmakes the VLA more general.Second,Gemini Robotics 1.5 interleaves actions with a multi-level internalreasoning process in natural language.This enables the robot to“think before a
4、cting”and notablyimproves its ability to decompose and execute complex,multi-step tasks,and also makes the robotsbehavior more interpretable to the user.Third,Gemini Robotics-ER 1.5 establishes a new state-of-the-artfor embodied reasoning,i.e.,for reasoning capabilities that are critical for robots,
5、such as visual andspatial understanding,task planning,and progress estimation.Together,this family of models takes usa step towards an era of physical agentsenabling robots to perceive,think and then act so they cansolve complex multi-step tasks.1.IntroductionTruly general robots will require a deep
6、 understanding of the physical world.Our previous work,Gemini Robotics(Gemini-Robotics-Team et al.,2025),established a strong foundation by leverag-ing Geminis rich world knowledge to create a Vision-Language-Action(VLA)model that exhibitsimpressive interactivity,generality,and dexterity in direct r