1、Brian Vandecoevering,AMIAnil Agrawal,METAAI Infrastructure Rack Management:Exploring Scalable Solutions Through Open-source CollaborationAI Infrastructure Rack Management:Exploring Scalable Solutions Through Open-source CollaborationBrian Vandecoevering,AMIAnil Agrawal,METASYSTEMS MANAGEMENTWhat is
2、Rack ManagementWhat is Rack ManagementRack ManagerDiscoverLocationBMC Proxy/FirewallHealthPower/Thermal/Liquid CoolingCompositionTelemetry /AggregationFirmware UpdateAttestation/SecurityPlatform Root of Trust(PRoT)UEFI FW(Boot)Baseboard Management Controller(BMC)Data Center Management SoftwareSatell
3、ite Mgmt Controller(SMC)GPU Mgmt Responsible for managing entire rack Compute Nodes including Accelerators Networking&Storage Power&Cooling Legacy Benefits Simplifies Management Single point of management Single protocol Improves Scale Out Management Enable group operations Adds layer of aggregation
4、 More responsive Improves securityWhere does RMC resideA rack manager can sit just about anywhereDedicated compute systemDedicated RMCTop of rack management switchPower Shelf PMCCDUBMC on one of the nodesOutside of rack(row/pod manager)Rack as a whole is seen as a single unit(FRU)TOR Mgmt SwitchDedi
5、cated RMPowerShelfCompute BMCCDUOCP sub-project under hardware management.Primary goal is to define the northbound interface.Not prescriptive of hardware or softwareCurrent implementation is 1.1.Features included:Hardware Inventory Rack/Node Power mgmt.Node health firmware update(BIOS/BMC only)Group
6、 operations Authentication,others1.2 Active work features include Telemetry and Composability1.3 Future planned features include;Scale Up,Advanced Power Control,Policies,Layers of aggregationOpenRMC OverviewRack ManagerRedfish(OpenRMC)RedfishIPMISNMPModBusOthersMuch higher single rack density with i