当前位置：首页 > 报告详情

网络集体加速人工智能架构.pdf

上传人：明**** 编号：1011952 2025-12-21 PDF PDF 13页 885.50KB

该报告所属合集： 2025年OCP全球峰会（2025 OCP Global Summit）嘉宾演讲PPT合集

打包下载报告合集

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载报告到电脑，查找使用更方便

VIP专享文档

书签

分享

收藏

已收藏

版权投诉

/13

立即下载

《网络集体加速人工智能架构.pdf》由会员分享，可在线阅读，更多相关《网络集体加速人工智能架构.pdf（13页珍藏版）》请在三个皮匠报告上搜索。

1、Surendra Anubolu-BroadcomNikhil Shetty-OracleIn Network Collective acceleration for AI FabricsIn Network Collective Acceleration for AI FabricsSurendra AnuboluNikhil ShettyNetworkingBandwidth and Latency needs for AI fabricChallengesProposed In Network Collectives for Ethernet fabricShare data Tomah

2、awk Ultra In Network collective performanceCall To Action Infrastructure-APIsAgendaCollectives account for 90+%of the bandwidthAll Reduce All to AllAll GatherReduce ScatterLarge models sizes Very high bandwidthInference Low latency completionsMoE k of N multicastAI Fabric Bandwidth and latency chall

3、enges-Collectives consume most of the fabric bandwidth-Tensor Parallel and Expert Parallel have communications that are exposedAI workload traffic patternsParallelismCollectiveFabric loadTensor ParallelAllReduce50%Expert parallelAllToAll,Gather1 to 10%Sequence parallelAll Gather30 to 40%Data paralle

4、lAll Reduce 5%Pipeline PrallelP2P0.2%Example collective data transfer usageWhy offload-High-bandwidth communication is a major component of collectives-Network switches have one or two order magnitude more fabric bandwidth than end points-Predictable latency+Tail latency-Collectives require very lit

5、tle compute-51Tbps requires only 3 TFlops of BF16 adders-Some collectives like k of N do not require any computeFabric is a natural place to accelerate collectivesIn Network Collectives-offloadGPU1000 TFlops400 G-7 TbpsSwitch with INC3 Tflops50 TbpsSwitch participates in the collectiveOffloads the c

6、ollective compute such as all_reduceAt the start of the job,INC Manager allocates switch compute resources and builds a treeTree can be reused for multiple collectivesINC manager can work with load sharing facility to reserve resourcesxCCL,libfabric and MPI pluginsArchitecture fo

word格式文档无特别注明外均可编辑修改，预览文件经过压缩，下载原文更清晰！

三个皮匠报告文库所有资源均是客户上传分享，仅供网友学习交流，未经上传用户书面授权，请勿作商用。

根据报告的内容，全文主要内容概括如下： - **网络集体加速**：网络集体（Collectives）在AI fabrics中占90%以上的带宽，用于处理高带宽通信和低延迟完成。 - **带宽和延迟挑战**：AI fabric面临带宽和延迟挑战，特别是对于大型模型和推理任务。 - **网络集体性能**：Tomahawk Ultra支持网络集体，提供可预测的性能和低尾延迟，集体时间完成提高2倍。 - **INC架构**：In Network Collectives（INC）通过交换机参与集体计算，减少端点计算需求。 - **OCI性能挑战**：Oracle Cloud Infrastructure（OCI）面临集体算法执行和数据传输的挑战。 - **INC优势**：INC减轻了高层拥塞，降低完成时间，提高尾延迟，并释放计算资源。 - **OCI鼓励INC**：OCI鼓励使用开放接口实现INC，支持可配置性和监控，以及可扩展的 fabrics。

集体如何助力？" 带宽与延迟挑战" 如何优化AI工作负载？"

全行业研究报告分享下载平台

0731-84720580
商务合作：really158d
友链申请 (QQ)：1737380874

关于我们

更多

关于我们

三个皮匠报告微信公众号

三个皮匠报告微信小程序

扫码咨询网站充值下载问题

友情链接：

营销自动化亿欧智库微播易阿里妈妈

copyright@2008-2013 长沙景略智创信息技术有限公司版权所有网站备案/许可证号：湘B2-20190120 | 工信部备案号：湘ICP备17000430号-2 | 公安备案号：湘公网安备43010402001071号

客服

小程序

服务号

折叠