《客户如何利用 AWS AI 基础设施大规模构建 AI.pdf》由会员分享,可在线阅读,更多相关《客户如何利用 AWS AI 基础设施大规模构建 AI.pdf(37页珍藏版)》请在三个皮匠报告上搜索。
1、Andrea KleinDirector EngineeringHenri DwyerSr Director EngineeringShaunak GodboleField CTOAniruddha DeodharPrincipal SpecialistA I M 2 5 2How customers build AI at scale with AWS AI infrastructure 2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.AgendaAWSWhat drives performance,cos
2、t and scaleScale out,Scale up,and Optimize AI stackArm Scaling out with GPU clustersContainerization,scheduling,orchestrationGenentech Scaling up model trainingAgentic Lab-in-the-loopFireworks Principles of building AgentsOptimizing across the inference stackQ&A 2025,Amazon Web Services,Inc.or its a
3、ffiliates.All rights reserved.Key workloads demanding rapid scaling4GenAI trainingLLMs(1B+to 100B+params)Frontier models(1T+params)Multi-modal modelsMixture of Experts(MoE)GenAI inferenceChat/voice botsImage/video generationCode assistantsAgentsHigh performance computeScientific simulationsAutonomou
4、s drivingProtein foldingMassive amount of parallel computing for 1B to 1T+parameter modelsProcess petabytes of dataPerform complex mathematical calculations 2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.What drives performance,cost,and scale?Training KPI:Time to trainPlacement g
5、roupAvailability ZoneAmazon FSx for LustreLonger duration,episodic runsLarge resilient clusters of latest high-performance accelerators Tightly coupled,I/O intensive,latency sensitive networkingEKS node groupAvailability Zone 1Availability Zone 2Inference KPI:Latency,Throughput,$/tokenGeo scaling wi
6、th high availabilitySpiky or sporadic,but long-termLoosely coupled,fast I/OQuery latency sensitiveCustomizationKPI:Iteration timeShort duration frequent cyclesHeterogenous computeLatency sensitive,I/O intensiveAvailability ZoneGPUGPUCPUAmazon FSx for Lustre 2025,Amazon Web Services,Inc.or its affili