《英伟达(NVIDIA):Cosmos 3:面向物理AI的全模态世界模型技术报告(英文版)(139页).pdf》由会员分享,可在线阅读,更多相关《英伟达(NVIDIA):Cosmos 3:面向物理AI的全模态世界模型技术报告(英文版)(139页).pdf(139页珍藏版)》请在三个皮匠报告上搜索。
1、2026-6-22Cosmos 3:Omnimodal World Models for Physical AINVIDIA1AbstractWe introduce Cosmos 3,a family of omnimodal world models designed to jointly process and generate lan-guage,image,video,audio,and action sequences within a unified mixture-of-transformers architecture.By supporting highly flexibl
2、e input-output configurations,Cosmos 3 seamlessly unifies critical modalitiesfor Physical AIeffectively subsuming vision-language models,video generators,world simulators,andworld-action models into a single framework.Our evaluation demonstrates that Cosmos 3 establishesa new state-of-the-art across
3、 a diverse suite of understanding and generation tasks,demonstratingomnimodal world models as scalable,general-purpose backbones for embodied agents.Our post-trainedCosmos 3 models were ranked as the best open-source Text-to-Image and Image-to-Video models by Arti-ficial Analysis,and the best policy
4、 model by RoboArena at the time the technical report was written.Toaccelerate open research and deployment in Physical AI,we make our code,model checkpoints,curatedsynthetic datasets,and evaluation benchmark available under the Linux Foundations OpenMDW-1.1License at and huggingface.co/collections/n
5、vidia/cosmos3.The projectwebsite is available at CodeC Model CheckpointCosmos3-Superhuggingface.co/nvidia/Cosmos3-SuperCosmos3-Nanohuggingface.co/nvidia/Cosmos3-NanoCosmos3-Super-Text2Imagehuggingface.co/nvidia/Cosmos3-Super-Text2ImageCosmos3-Super-Image2Videohuggingface.co/nvidia/Cosmos3-Super-Imag
6、e2VideoCosmos3-Nano-Policy-DROIDhuggingface.co/nvidia/Cosmos3-Nano-Policy-DROIDOpen Synthetic DatasetSDG-PhyxSimhuggingface.co/datasets/nvidia/PhysicalAI-WorldModel-Synthetic-Physical-Interaction-ScenesSDG-RobotSimhuggingface.co/datasets/nvidia/PhysicalAI-WorldModel-Synthetic-Embodied-Robot-ScenesSD