当前位置:首页 > 报告详情

面向人工智能的光子网络:挑战与机遇.pdf

上传人: 明**** 编号:1011389 2025-12-21 14页 1.94MB

1、George Zervas,Oriole NetworksPhotonic networks for AI:Challenges and OpportunitiesPhotonic networks for AI:Challenges and OpportunitiesGeorge Zervas,Oriole NetworksSPECIAL FOCUS:PHOTONICSAI Performance doubles every 9 monthsJensen “AI Revenues are Power-Limited”xPUs and Network I/OTraining xPUsInfer

2、ence xPUsNVIDIA GB300 Amazon AWS Trainium2 chipAMD MI325XAmazon AWS Inferentia2GroqCard LPUGoogle IronwoodMultiple Tb/s evolving to 10 Tb/s&beyondSub Tb/s to one or few Tb/sSignificant difference(2-7x)Training and Inference:Network requirementsTraining completion time per MW is KEY.Workloads tend to

3、 be long-lasting with a fixed computational graphA)New model training and B)tuning an existing model.Distributed training uses both scale-up(10s on nodes)and scale-out networks(10K-100k)with very different performance/scale abilities.Tokens/s/KW is fundamental.Workloads are very short-lived,and patt

4、erns are less deterministic than training.MoE and Reasoning could trigger a variable set of xPUs network connectivity and collectives per and within a lifetime of a LLM prompt.Collective communication networking is keyInferenceInferenceBoth networks benefit from fully connected deterministically syn

5、chronous lossless networks with fast reconfiguration=“zero”collective communication tail latency.TrainingTrainingGoogles deployment of OCS in Data CentersCan we carry on scaling the EPS networks?Diminishing returns in power consumption for future electronic switches and opticsOCS is used at the aggr

6、egation layer and supports 10s ms reconfiguration timesJupiter Evolving:Transforming Googles Datacenter Network via Optical Circuit Switches and Software-Defined Networking,SIGGCOM 22Hypercube OCS ArchitectureML 4096 TPU cluster-64 Racks-64 TPUs/RackA Rack of 64-TPUs with a 3D Torus connectivity and

word格式文档无特别注明外均可编辑修改,预览文件经过压缩,下载原文更清晰!
三个皮匠报告文库所有资源均是客户上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作商用。
根据报告的内容,全文主要内容概括如下: - **AI性能提升迅速**:AI性能每9个月翻倍,AI收入受电力限制。 - **训练与推理网络需求**:训练完成时间关键,分布式训练结合规模提升和扩展网络。 - **光子网络优势**:光子网络提供高速、低功耗、高可靠性连接。 - **Google的光子网络部署**:使用光路交换(OCS)和软件定义网络(SDN)在数据中心和机器学习应用中。 - **OCS网络挑战**:现有OCS网络重构时间慢,可能影响性能。 - **Oriole Networks**:专注于光子网络,以加速AI发展,总部位于伦敦,工程中心在帕伊恩顿,办公室在帕洛阿托。 关键点: - AI性能每9个月翻倍。 - 分布式训练结合规模提升和扩展网络。 - 光子网络提供高速、低功耗连接。 - Google使用OCS和SDN。 - 现有OCS网络重构时间慢。 - Oriole Networks专注于光子网络加速AI。
AI加速的未来?" "AI训练,光速升级!" "光子网络,挑战与机遇并存?"
客服
商务合作
小程序
服务号
折叠