可组合存储器的最新学术研究.pdf-三个皮匠报告

1、JongsePark,Associate Professor,KAISTAntonis Psistakis,PhD.Student,Univ.of IllinoisModerator:Brian Hirano,XCENALatest Academic Research for Composable MemoryInvited ResearchersJongse ParkAssociate ProfessorKAISTAntonis PsistakisPhD.Student University of IllinoisHW/SW co-design for AI systemsDistribut

2、ed Systems,Fault Tolerance,CXLJongseParkAssociate Professor,KAISTLLMServingSim:A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale SERVER:COMPOSABLE MEMORY SYSTEMS(CMS)LLM Serving and Its Resource RequirementsGPT-oss-120b117B model params10,000 users8,096 context1,024 tokens 1 se

3、cHardware for LLM ServingLLMServingSimA HW/SW Co-Simulation Infrastructure for LLM Inference ServingOur simulator code is availablehttps:/ ExtensionsComposable ArchitectureCXL-based memory poolingDevicehostremote memory hierarchyPrefilldecode split with prefix&chunked cachingNear-Memory ComputingPro

4、cessing-in-Memory(PIM)Processing-near-Memory(PNM)Computational storage integrationMulti-Instance SystemParallel serving across multiple clustersScalable resource utilizationTowards Efficient and Fault Tolerant CXL SharingAntonis Psistakis,Chloe Alverti,Josep TorrellasUniversity of Illinois Urbana-Ch

5、ampaignSERVER:COMPOSABLE MEMORY SYSTEMS(CMS)Compute Express Link revives Distributed Shared MemoryShared CXL memoryLoad/StoreLocalDRAMCPUCachesCompute Node 0LocalDRAMCPUCachesCompute Node 1LocalDRAMCPUCachesCompute Node NLoad/StoreOperating System 0 Operating System N Operating System 1 Compute Node

6、 0Serverless PlatformFunc A()Func AFunc AFunc AFunc AFunc AFunc ARedundancy Opportunity for data and state sharing across nodes!Remote Fork and high concurrency workloadsCompute Node 1Compute Node 2Compute Node NCXLfork key idea:Share(Meta)Data over CXLCPUCachesCPUCachesCPUCachesCompute Node 0Comput

可组合存储器的最新学术研究.pdf

相关报告