当前位置:首页 > 报告详情

将 OCP 网卡扩展到 1.6T 及以上以支持人工智能.pdf

上传人: 明**** 编号:1011782 2025-12-21 17页 760.32KB

1、Damien Chong,MetaHemal Shah,BroadcomScaling OCP NIC to 1.6T and beyond for AIScaling OCP NIC to 1.6T and beyond for AIDamien Chong,Hardware Tech Lead,MetaHemal Shah,Distinguished Engineer and Architect,BroadcomSERVER:AI HW SW CO-DESIGN/NIC/HPCThis presentation discuss:NIC in AI Backend NetworkNIC 1.

2、6T+CharacteristicsPath&challenges to OCP NIC 1.6T and beyondPreviewAI Infrastructure Network ConnectivityScale OutNetworkScale UpNetworkInternalConnectivityCPUXPUNICNVMeSSDNICXPUCPUNVMeSSD.High-Bandwidth :800G and aboveLarge scale:100K-1M XPUsMessaging SemanticsUltra Low LatencySupports Peer-2-Peer

3、Data TransferXPU-XPU ConnectivityMemory SemanticsNICs in AI systemsCPUNICCPUNICCPU Front-End Network-Send&Receive Data-Pre-process Data-Schedule JobsGPU Scale-Out Network-Parallel GPU-GPU Compute beyond single rack-Model Training*GPU Scale-Up Network within rack typically not by NIC is not part of d

4、iscussion todayGPUGPUNICGPUGPUGPUGPUGPUGPUNICNICNICZoom into Front-End NIC for AI systemsCPUNICCPUNICCPU Front-End Network-Send&Receive Data-Pre-process Data-Schedule JobsMedium traffic intensity satisfy with 400G/800G NIC that is well supported by OCP NIC SFF/TSFFNext-gen expand to 1.6T and/or Liqu

5、id CoolingZoom into Back-End NIC for AI systemsGPU Scale-Out Network-AI racks are becoming increasingly dense because scale-up in-rack network provide much higher bandwidth compared to scale-out rack-to-rack network-Dense AI rack also squeeze area available for Scale-out network solution-Desire&dema

6、nd high GPU-to-GPU interconnectivity drive high Scale-out bandwidth*GPU Scale-Up Network within rack typically not by NIC is not part of discussion todayBandwidth per Area efficiency is ImportantGPUGPUNICGPUGPUGPUGPUGPUGPUNICNICNICPCIe Gen 6 and above host interface(x16 or x32 or x48 or x64 lanes co

word格式文档无特别注明外均可编辑修改,预览文件经过压缩,下载原文更清晰!
三个皮匠报告文库所有资源均是客户上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作商用。
根据标记内容,全文主要讨论了OCP网络接口卡(NIC)在AI基础设施中的应用和未来发展方向。以下是关键点: 1. **AI后端网络需求**:AI系统需要高带宽、低延迟的网络连接,支持100K-1M个XPU的扩展。 2. **OCP NIC 1.6T+特性**:包括800G以上带宽、PCIe Gen 6接口、多网络端口、OSFP连接器等。 3. **路径与挑战**:从800G升级到1.6T+,面临市场碎片化、尺寸、功耗、冷却、PCIe交换机集成等挑战。 4. **后端NIC用例**:推动更高带宽的需求,如GPU到GPU的高带宽连接。 5. **市场形式因素**:存在多种PCIe接口形式,需要统一或提出新的形式定义。 6. **未来展望**:探讨64 lanes PCIe、PCIe Gen7、OSFP连接等概念,并寻求反馈和合作。
"OCP网卡升级至1.6T+挑战" "AI时代网卡性能突破揭秘" 1.6T+带宽之路"
客服
商务合作
小程序
服务号
折叠