1、GR-3 Technical ReportByteDance SeedFull Author List in Contributions and AcknowledgementsAbstractWe report our recent progress towards building generalist robot policies,the development ofGR-3.GR-3 is a large-scale vision-language-action(VLA)model.It showcases exceptionalcapabilities in generalizing
2、 to novel objects,environments,and instructions involving abstractconcepts.Furthermore,it can be efficiently fine-tuned with minimal human trajectory data,enabling rapid and cost-effective adaptation to new settings.GR-3 also excels in handlinglong-horizon and dexterous tasks,including those requiri
3、ng bi-manual manipulation and mobilemovement,showcasing robust and reliable performance.These capabilities are achieved through amulti-faceted training recipe that includes co-training with web-scale vision-language data,efficientfine-tuning from human trajectory data collected via VR devices,and ef
4、fective imitation learningwith robot trajectory data.In addition,we introduce ByteMini,a versatile bi-manual mobile robotdesigned with exceptional flexibility and reliability,capable of accomplishing a wide range of taskswhen integrated with GR-3.Through extensive real-world experiments,we show GR-3
5、 surpassesthe state-of-the-art baseline method,0,on a wide variety of challenging tasks.We hope GR-3can serve as a step towards building generalist robots capable of assisting humans in daily life.Date:July 23,2025Correspondence:Project Page:https:/ pursuit of intelligent generalist robots that are
6、capable of assisting humans with daily tasks has been along-standing goal in robotics research 3,7,911,13,67.A key challenge stems from the immense diversityof the real world,requiring robot policies to possess strong generalization capabilities to handle a wide rangeof novel scenarios.Additionally,