当前位置:首页 > 报告详情

短距离光接口规模化连接优化人工智能_机器学习应用中的内存数据传输.pdf

上传人: 明**** 编号:1011924 2025-12-21 42页 1.80MB

1、Short-reach Optical Interface(SRIO)Scale Connectivity for Optimized Data-movement Through Memory for AI/ML ApplicationsSiamak Tavallaei,Sr.Principal Engineer,Samsung Semiconductor,Inc.FTI Workshop:Short Reach Optical InterconnectsTrending requirement and building blocks to helpStargate ChallengeHow

2、a tool may guide the focus on architectural decisionsHow short-reach optics may helpCall-to-action for the SROI teamOutline Baseline Server NodeBaseline Server NodeSRAM/Cache T0CPU-Mem T1Local Node Storage T3Storage on DC Network T4 M:Local DDRx MemoryC:CPUS:NVMe/PCIe SSD StorageN:NICAI/ML-optimized

3、 Memory HierarchySRAM/Cache T0GPU-HBM T1CPU-Mem(+CXL)T2(T2+)Storage on SO T3-SOGPU-HBM-SU T1-SUCPU-Mem-SU(+CXL)T2(T2+)-SUStorage T3Storage on SU T3-SU Storage on DC Network T4 AI Infrastructure Memory SO T2-SOScale-up(SU)Scale-out(SO)Data-movement is through memoryPhysically DisaggregateLogically Co

4、mposeTraditional scale-up fabrics couple CPUs to build large symmetric multi-processing systems(SMP)Run large,parallel processing workloads under one OS with efficient protocols for Load/StoreMemory/cache-coherence in hardware with low-latency for small payloadsCXL fulfills these requirements(Major

5、CPU manufacturers Root Ports)Emerging scale-up fabrics for AI/ML natively couple xPUs to switchesUse the same protocol as native xPUs for distributed processing paradigmNot requiring the hardware-based cache-coherence or the low-latency featuresRequire high-throughput interconnects for emerging soft

6、ware to move data ahead of useHybrid fabric switches emerge to“bridge”between native scale-up fabric protocolsBridge xLink used by xPUs and the standard CXL protocolScale-up disaggregated memory pooling and sharingStretched PyramidIncrease Capacity at each TierSRAM/Cache T0GPU-HBM T1CPU-Mem(+CXL)T2(

word格式文档无特别注明外均可编辑修改,预览文件经过压缩,下载原文更清晰!
三个皮匠报告文库所有资源均是客户上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作商用。
根据报告的内容,全文主要内容概括如下: - **AI/ML应用需求**:随着AI/ML应用的数据密集型需求增长,对高效数据传输的需求日益增加。 - **内存优化架构**:文章探讨了优化内存层次结构,包括SRAM/Cache、GPU-HBM、CPU-Mem等,以支持AI/ML应用。 - **扩展性解决方案**:提出了扩展性解决方案,包括Scale-up (SU) 和 Scale-out (SO),以及使用短距离光学接口(SROI)来提高数据传输效率。 - **超级集群挑战**:提出了“Stargate”挑战,目标是构建一个包含100万个高效连接的处理单元的超级集群。 - **连接性与带宽**:强调了连接性和带宽的重要性,包括使用高比特率、高基数连接、高效岸线密度和低误码率。 - **工具需求**:指出需要工具如Super Cluster Builder来进行拓扑探索、复杂性分析和成本及性能分析。 - **短距离光学接口(SROI)**:介绍了SROI的优势,如降低每传输比特的能量消耗、减少插入损耗、提高温度稳定性、小型物理结构尺寸和降低成本。 核心数据包括: - 100万个处理单元 - 800Gb/s的互连带宽 - 16:1的聚合器齿轮比
如何连接百万处理器?" 光学互连如何助力?" 光学接口在AI/ML中的应用?"
客服
商务合作
小程序
服务号
折叠