当前位置:首页 > 报告详情

大规模液冷GB200的实现.pdf

上传人: 明**** 编号:1011794 2025-12-21 19页 2.02MB

1、Wenying Zhang,Cheng ChenMeta Platforms,Inc.Enablement of Liquid Cooled GB200 At ScaleEnablement of Liquid Cooled GB200 At ScaleWenying Zhang,Cheng ChenMeta Platforms,Inc.ARTIFICIAL INTELLIGENCE(AI)This presentation introduces how Meta managed to deploy LC GB200 at scale,and our vision into future LC

2、 hardware.The following topics will be discussed:Goals and Challenges Liquid Cooling Design Leakage Handling Ship Wet or Dry Risk Reduction Results Future Trend Need of Community WorkPreviewGoalsTime to production deploy X racks within a shortened NPI cycleEfficiency minimize the power spend on cool

3、ingReliability limit the disruption caused by leakage eventsChallengesFirst time for many things,at such scaleProduct validation,build and process development in parallelEverything could break;every leak could be falseGoals and Challenges Deploy with AALC at 2:1 ratio No site constraint Better effic

4、iencyAALC-air assisted liquid cooling;L2A side car HX Multiple inlets/outlets Lower pressure drop Lower erosion risk Rope type Leak Sensor Low lead time concern Lower false alarm risk Better serviceabilityLiquid CoolingDesignAALC#1AALC#2GB200 RackLiquid CoolingDesignRMCCompute TraysSwitch TraysCompu

5、te TraysHeat ExchangerRPUHeat ExchangerRPURMF-ColdRMF-HotDual rack linked6x racks bundleAir cooled data centerConsiderationsEverything could leak,but dont know where and how frequentlyEvery detection and protection mechanism could break;but we could build redundancyHardware mechanism over network,so

6、ftware or manual interventionToo late to update BMS,but never for RMCBMS-building management systemRMC-rack management controllerLeakage HandlingRMC is the center of leakage handlingManages both leak detection and responseAll connections are hard wiredMultiple tiers of leakage detectionsOperate with

word格式文档无特别注明外均可编辑修改,预览文件经过压缩,下载原文更清晰!
三个皮匠报告文库所有资源均是客户上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作商用。
根据《Enablement of Liquid Cooled GB200 At Scale》的内容,以下为全文关键点概括: 1. **目标与挑战**:目标是缩短生产周期、提高冷却效率、确保可靠性。挑战包括首次大规模部署、产品验证、并行开发等。 2. **液冷设计**:采用空气辅助液冷(AALC)和GB200,实现多入口/出口、低压力降、低磨损风险。 3. **泄漏处理**:RMC(机架管理控制器)负责泄漏检测和响应,具有多级泄漏检测机制。 4. **湿干运输**:湿件运输需考虑冷却液兼容性、运输要求等,干件则需考虑存储寿命。 5. **风险降低**:通过严格的质量控制和泄漏测试来降低风险。 6. **结果**:部署了数千个GB200机架,泄漏率极低,系统稳定。 7. **未来趋势**:需要社区合作支持液冷技术的规模化,包括物流、可靠性、操作等方面。
"液冷GB200部署挑战" "液冷技术效率解析" "未来AI硬件趋势展望"
客服
商务合作
小程序
服务号
折叠