当前位置:首页 > 报告详情

扩展型人工智能网络方案——UALink、SUE 和 NVLink 对比.pdf

上传人: 明**** 编号:1011479 2025-12-21 15页 710.50KB

1、What Matters in Scale-up Fabrics?Sharada Yeluri,AVP of EngineeringAstera LabsConnecting GPUs and AI AcceleratorsNETWORKINGAgendaScale-Up FundamentalsFabric RequirementsScale-up TechnologiesOpen Ecosystem ComparisonSummary&ActionScale-up FundamentalsPurpose-built connectivityDesigned for GPUs and AI

2、acceleratorsRack-level scaleConnects XPUs within a single rack or serverHigher bandwidth612x more than scale-out networksNeeded for high bandwidth collective communicationScale-up FundamentalsCollective CommunicationXPUs exchange partial results from tensor/pipeline or sequence parallel operationsTi

3、ght coupling-limited opportunities to hide communication latenciesLoad/Store(LS)SemanticsNo SW overhead Aggregates memory across all XPUsLS exactly identical remote or local memoryLow Latency/Low JitterLonger latency-XPU threads stallsFor inference,XPU stalls have multiplicative effect for reasoning

4、 modelsCompiler scheduling relies on stable and deterministic communication latencies.Low jitter is important Larger radix enables single-stage fabric lower latencyM switches,each N x N connect N number of XPUs to the fabricusing M switching planesExample in diagram:N=8 and M=4Deployment example:N=2

5、88,M=8Scale-up FundamentalsHigh RadixFabric packet drops are fatal to the XPU process threadsLossless fabric with hop-by-hop credit-based flow control(link as well as application level)Reliable links with FEC and link-level retries are a mustNon-blocking fabric to eliminate deadlocks:use different V

6、Cs/traffic classes for requests and responsesScale-up FundamentalsCells to be TransmittedLossless+Non-Blocking FabricFabric RequirementsRequirementPurposeUltra-Low Latency&JitterMinimize stalls;predictable communicationMemory OrderingPreserve consistency;avoid reordering256B aligned memory must not

word格式文档无特别注明外均可编辑修改,预览文件经过压缩,下载原文更清晰!
三个皮匠报告文库所有资源均是客户上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作商用。
根据报告的内容,全文主要内容概括如下: - **Scale-up 基础**:专为 GPU 和 AI 加速器设计的连接,提供高带宽集体通信,支持 XPU 之间的紧密耦合。 - **关键要求**:超低延迟和抖动、无阻塞架构、无软件开销的直接内存语义、高带宽效率。 - **Scale-up 技术**:包括 UA Link、Ethernet、NVLink + NVSwitch 和 PCIe。 - **开放生态系统比较**:UA Link 在性能、带宽效率、延迟和抖动方面优于 Ethernet,但实施复杂性更高。 - **关键数据**:UA Link 的带宽效率超过 90%,延迟低于 1 微秒。 - **未来展望**:开放加速器接口(如 UPLI)对 rack-scale AI 网络至关重要,UA Link 是实现多厂商 AI 生态系统的关键。
UPLI协议优势" "如何打造低延迟AI网络?UALink技术解析" UPLI标准加速器接口展望"
客服
商务合作
小程序
服务号
折叠