当前位置:首页 > 报告详情

突破内存和 I_O 限制:面向 GenAI 时代的架构.pdf

上传人: 明**** 编号:1012010 2025-12-21 22页 2.50MB

1、Smashing Memory&I/O Limits:Architectures for the GenAI EraSmashing Memory&I/O Limits:Architectures for the GenAI EraRamin Farjadrad,CEO Eliyan CorporationSERVER:OPEN CHIPLETECONOMYPerformance Walls Power Wall,Bandwidth Walls:Memory&I/O WallsSBD:A novel D2D technology for superior Bandwidth/PowerImpl

2、ementation,silicon results,comparison High-performance D2D interconnects:The solution to Performance WallsAdvantages of 2.0D D2D:Long reach connectivity on large Std.substrates Multi-port Custom HBM(cHBM)to boost Memory&I/O bandwidthsPerformance limitations at system level,beyond the packageLeveragi

3、ng long reach 2.0D D2D to enable ultra high-BW AI or Switch ASICsGPU ASIC with 32TB/s memory bandwidth&50T-100Tbps I/O bandwidthSwitch ASIC with 400Tbps of aggregate bandwidthLeveraging SBD technology to double electrical port bandwidth:224G448GCall to ActionPresentation OutlineTop Bottlenecks in AI

4、 Performance:Memory WallDisparity in bandwidth gains vs compute gains is the top bottleneck for performanceTop Bottlenecks in AI Performance:Memory Wall&I/O WallDisparity in bandwidth gains vs compute gains is the top bottleneck for performanceMove to Chiplet-based System to Maximize Bandwidth/Power

5、Ultra high-performance AI systems need:10s of Tbps interconnect bandwidth between processor&memory devices The power for these interconnects must be ultra low to be practicalChiplet interconnects on a package substrate offer the best bandwidth/power ratio:1Tbps/WAI systems with superior BW/power nee

6、d very large pkg substrate with as many chiplets as possibleA Memory BW Wall Example Blackwell 200 B200 Compute Performance=10,000 TFLOPS for FP8.B200 Max Memory Bandwidth=8TB/s Up to 1250 FLOPs per every Byte from MemoryEven at very high arithmetic intensity of 500*,B200 will be utilized at 50%(=50

word格式文档无特别注明外均可编辑修改,预览文件经过压缩,下载原文更清晰!
三个皮匠报告文库所有资源均是客户上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作商用。
根据报告的内容,全文主要内容概括如下: 1. **性能瓶颈**:AI性能瓶颈主要在于内存带宽墙,带宽增长与计算增长不匹配。 2. **解决方案**:采用芯片let系统以最大化带宽/功率,需要10s Tbps的互连带宽和超低功耗。 3. **技术突破**:高带宽/功率的芯片let互连提供最佳带宽/功率比,例如64Gbps D2D-SP。 4. **NuLink技术**:Eliyan的NuLink D2D芯片let互连技术,提供长距离连接,如NuLink-SP2的64Gbps。 5. **内存扩展**:多端口cHBM通过级联HBMs突破内存墙,提高内存带宽和容量。 6. **I/O带宽**:64Gbps D2D-SP和cHBM结合,实现高达124Tbps的I/O带宽。 7. **未来展望**:通过长距离D2D-SP和cHBM,未来AI ASIC将突破内存和I/O墙,实现超大型AI芯片。
芯片let新纪元" 如何打破I/O瓶颈?" 解锁AI性能新高度?"
客服
商务合作
小程序
服务号
折叠