1、基于物理条件约束的可信视觉生成大模型朱思语 复旦大学演讲嘉宾朱思语复旦大学教授复旦大学人工智能创新与产业研究院研究员,长聘正教授,博士生导师。朱思语本科毕业于浙江大学,博士毕业于香港科技大学。在博士阶段,作为联合创始人创立了3D视觉公司Alituzre,并后来被苹果公司收购。2017年至2023年,在阿里云人工智能实验室担任总监。2023年起,任职于复旦大学人工智能创新与产业研究院,担任研究员和博士生导师。朱思语的主要研究方向包括视频和三维生成式模型,涉及基于视觉的三维和视频的重建、生成、理解、方针和模拟。他发表了60余篇高水平会议和期刊论文,包括CVPR、ICCV、ICLR和TPAMI等计算
2、机视觉和机器学习领域,包括Hallo,Champ,AnimateAnything等有一定行业影响力的视频生成大模型。在40余个计算机视觉国际比赛和榜单上取得第一名。Visual generative modelVAE:maximize variational lower boundInputOutputVideo generative methodsGAN:Adversarial trainingVAE:maximize variational lower boundFlow-based models:Invertible transform of distributionsDiffusion
3、 models:Gradually add Gaussian noise and then reverse The field of video generation has seen rapid development,reaching several milestones.Diffusion for visual generation(1)Denoising Diffusion Probabilistic Models(DDPMs)Diffusion for visual generation(2)Stochastic Differential Equations(Score SDEs)K
4、ey Elements of visual Diffusion Models Pixel diffusion(original input)Latent space diffusion Unet TransformerSora,breakthrough Consistency:consistency in 3D rendering,long-range coherence,and object permanence.High fidelity.Surprising length:extended video length capability(Sora:1 minute vs.previous
5、 systems:seconds).Flexible resolution:generation of videos across various durations,aspect ratios,and resolutions.Sora,key technologies The DiT framework by Meta(2022.12)is designed for video processing.Googles MAGViT(2022.12)focuses on Video Tokenization.Google DeepMind introduced NaViT(2023.07)to
6、support various resolutions and aspect ratios.OpenAIs DALL-E 3(2023.09)enhances Video Caption generation for improved conditioned video creation.Modeling the physical world We know that it is very complicated real physical model.probabilistic bayesian inference;probabilistic graphical models.determi