1、Elastic Management Framework for the AI ClusterWiwynn Engineering Transformation TeamAI ClustersElastic Management Framework for the AI ClusterKarl ChiangDirector/Engineering Transformation TeamOutline4321Challenges of Next-Gen AI Data CenterNew Equipment for Rack-wise Liquid Cooling ManagementThe S
2、oftware Architecture for AI Cluster Data Management A Real Use Case From WiwynnOne of the Challenges of Next-Gen AI Data CenterHigh Power ConsumptionLiquid Cooling Infra.Nightmare?Rack-wise Liquid Cooling Management SystemModulating ValveFlow Meter&SensorsFacility WaterIn Row UMSIn-Row CDUIn Rack UM
3、S(Universal Management System)Server Leakage DetectSystem MonitoringSwitch Leakage DetectPower Shelf MonitoringPerformance MonitoringDrip Tray Leakage DetectRack Catch Pan Leakage Detect In Row UMS manage one CDU/Sidecar and related mechanical device In Rack UMS manage leakage and all device statist
4、ics in the rackManifold Leakage DetectThe Architecture for Wiwynn AI Cluster Data Management Legacy DatacenterAI ServerNetwork SwitchGeneric ServerStorageNext-Gen AI DatacenterDatacenterInfrastructure2-PIC TankSidecarIn row/rackCDUFluid ControllerFluid MonitorLiquid Cooling GPUServerData CollectorRe
5、dfishRedfishRedfish,Restful,ModbusTime Serial Data ClusterData VisualizerOperation Support SystemSCADA Notification Manager A Real Use Case From WiwynnThanos-PrometheusCompactorQuerierRulesPrometheusThanos SidecarAlertManagerGrafanaObject StoreOSSPromQLExportersIM Application(Teams,Slack,and etc)AdministratorGPU RacksPrometheusThanos SidecarExportersPrometheusThanos SidecarExportersPrometheusThanos SidecarExportersPrometheusThanos SidecarExportersInfrastructureNetworkCDUValveSensorPowerBMSHigh Speed StorageCtrl+User StorageRedfishRedfishRedfishRedfishRestfulThank You!