1、2026-6-22Cosmos 3:Omnimodal World Models for Physical AINVIDIA1AbstractWe introduce Cosmos 3,a family of omnimodal world models designed to jointly process and generate lan-guage,image,video,audio,and action sequences within a unified mixture-of-transformers architecture.By supporting highly flexibl
2、e input-output configurations,Cosmos 3 seamlessly unifies critical modalitiesfor Physical AIeffectively subsuming vision-language models,video generators,world simulators,andworld-action models into a single framework.Our evaluation demonstrates that Cosmos 3 establishesa new state-of-the-art across
3、 a diverse suite of understanding and generation tasks,demonstratingomnimodal world models as scalable,general-purpose backbones for embodied agents.Our post-trainedCosmos 3 models were ranked as the best open-source Text-to-Image and Image-to-Video models by Arti-ficial Analysis,and the best policy
4、 model by RoboArena at the time the technical report was written.Toaccelerate open research and deployment in Physical AI,we make our code,model checkpoints,curatedsynthetic datasets,and evaluation benchmark available under the Linux Foundations OpenMDW-1.1License at and huggingface.co/collections/n
5、vidia/cosmos3.The projectwebsite is available at CodeC Model CheckpointCosmos3-Superhuggingface.co/nvidia/Cosmos3-SuperCosmos3-Nanohuggingface.co/nvidia/Cosmos3-NanoCosmos3-Super-Text2Imagehuggingface.co/nvidia/Cosmos3-Super-Text2ImageCosmos3-Super-Image2Videohuggingface.co/nvidia/Cosmos3-Super-Imag
6、e2VideoCosmos3-Nano-Policy-DROIDhuggingface.co/nvidia/Cosmos3-Nano-Policy-DROIDOpen Synthetic DatasetSDG-PhyxSimhuggingface.co/datasets/nvidia/PhysicalAI-WorldModel-Synthetic-Physical-Interaction-ScenesSDG-RobotSimhuggingface.co/datasets/nvidia/PhysicalAI-WorldModel-Synthetic-Embodied-Robot-ScenesSD