当前位置:首页 > 报告详情

CSIG:人工智能时代的拥塞信号.pdf

上传人: 明**** 编号:1011578 2025-12-21 16页 1.55MB

1、Abhiram Ravi(Google)Jai Kumar(Broadcom)CSIG:Congestion Signaling in the AI eraCSIG:Congestion Signaling in the AI eraAbhiram Ravi(Google)Jai Kumar(Broadcom)OCP SPECIAL FOCUS:OCP SPECIAL FOCUS:ARTIFICIAL ARTIFICIAL INTELLIGENCE INTELLIGENCE(AI)(AI)Continuing trends in the AI era:Horizontal scaling is

2、 inevitableExtreme reliability,performance and efficiency requirements for scale-up and scale-out networks serving AI workloads AI workloads are extremely bandwidth-hungry and tail latency-intolerantNew norms for network congestion in AI workloadsAI Workloads:Era of Extreme Network DemandsCentral ob

3、servation:Accurate and fine-grained congestion signals needed for observability and controlCentral observation:Accurate and fine-grained congestion signals needed for observability and controlMassive,synchronized burststhat amplify as the network fabric scalesCongestion events that manifestat sub-mi

4、llisecondtimescaleson network switchesPredictable and repeating patterns of short-lived congestionMany control loops operate at different timescales to Efficiently utilize available network capacity at fine-grained timescalesEnable tight guarantees on tail latency and throughput for collectivesConge

5、stion control,load balancing,multipathing,scheduling,traffic engineering,provisioningCongestion control,load balancing,multipathing,scheduling,traffic engineering,provisioningAccurately detecting congestion locallyon a switch requires signal measurements at sub-millisecondtimescalesHigh-resolution n

6、etwork signals are necessaryPort Tx utilization(1 secondresolution)Port Tx utilization(100 microsecondresolution)Real-world example from a GPU ToRat Google:Shifting from 1-second to 100-sec telemetry exposes the fine-grained,repeating congestion patterns and idle gaps inherent to AI workloadsline-ra

word格式文档无特别注明外均可编辑修改,预览文件经过压缩,下载原文更清晰!
三个皮匠报告文库所有资源均是客户上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作商用。
明日何其多
明**...

该用户很懒,什么也没介绍

客服
商务合作
小程序
服务号
折叠