当前位置:首页 > 报告详情

利用大规模人工智能基础设施迈向超级智能.pdf

上传人: 明**** 编号:1011505 2025-12-21 14页 3.72MB

1、Jasmeet BaggaSoftware Engineer,Network InfrastructureDriving Toward Super-intelligencewith Large-scale AI InfrastructureTODAYMeta is an AI company20062024202620282030Number of connected acceleratorsAI Cluster SizeSince 2022,weve seen a major AI infrastructure change6K clustersJob size:128-512 GPUs20

2、2216-24K clustersJob size:16K GPUs2023Software InfraPhysicalInfrastructureModels2024-2025AI at Meta100K+clustersPersonalizedSuper-intelligence requires large-scale infraPrometheus:1GW+cluster by 2026Hyperion:5GW over next few yearsDriving innovationfor AI scale and performanceDataSoftwareComputeNetw

3、orkScale Up DomainGrowingIncreased PowerEnvelopEfficiencyand SafetyScaleup DomainsTODAY2030AI Cluster Driven Opportunities for Rack Scale System Design100 Accelerators70 KW100 Accelerators0.5 MW+/-400V DCPath to Bigger Scale Up DomainDimensional Drivers:1200mm by 1200mmPower and Cooling NeedsWeight

4、ConsiderationsScale-up network:high-bw,low-latency,more acceleratorsThings to ConsiderScale-out,AI FabricsDisaggregated Scheduled Fabric(DSF):Lossless/reliable fabric of switchesTuned for AIProvides flexibility&speed across multiple generations and types of accelerators and NICsNext Step for Scale-o

5、ut,AI FabricsToday:Sharing 2-stage DSF that scales to 18K accelerators!(4x)DSF Dual stage(Building,4x AI ZONE)Size of single DSF L2 zone:18Kx 800G GPUsSDSWR3-stage 2 1128SDSWR3-stage 2 1128ZONE 1ZONE 2ZONE 3ZONE 4Non BlockingNext step for DSF Scale-out FabricWant the details?1:40 PM“Evolving FBOSS to support Generative AI Network Workloads”Physical InfrastructureSoftware InfraModelsLLAMAPyTorch,MAST,TectonicSiliconOnly 1%done-exciting times ahead!

word格式文档无特别注明外均可编辑修改,预览文件经过压缩,下载原文更清晰!
三个皮匠报告文库所有资源均是客户上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作商用。
明日何其多
明**...

该用户很懒,什么也没介绍

客服
商务合作
小程序
服务号
折叠