1、Alex RamirezPlatforms performance Google DeepMindDelivering performant AI solutionsLarger ML models are better but larger models require more compute performanceDelivering more and more compute performance means deliver more accelerators which consume more power and do it fast!Compute capacity is ke
2、y to delivering the next wave of AI innovationsMaking the best use of the available compute resources mattersMake forward progress in the presence of more frequent failuresDistribute the compute budget:Model size vs.Training examplesScaling performance to the full system:Mapping the ML model to the
3、systemIn this presentation“To train the best model we can”that is not the same as“To train the largest ML model we can”but we will still take the best model,and make it as large as we canLarger models are better1960198020002020Deep Learning EraLarge Model EraTraining compute(FLOPs)Pre Deep Learning
4、EraImage source:Compute trends across three eras of machine learning.Jaime Sevilla et al.Performance demand grows faster than accelerator performanceThe number of accelerators per system is steadily increasingIncreasing number of acceleratorsTimeTimeDennard scaling is over More devices means more po
5、werProvisioning for that increasing number of accelerators becomes a critical part of performance deliveryMore accelerators means more powerNumber of devicesNumber of devicesTimeTimePower per devicePower per deviceTotal powerTotal powerDemand for higher compute performanceRequires larger systemsWith
6、 more accelerators which take more powerTwo options,both of them are challenging(and not exclusive):a)Keep constant power density Larger systems that need more spaceb)Increase power density Cooling and power delivery challengesPower provisioning becomes criticalDelivering more performance matters bu