当前位置:首页 > 报告详情

可组合存储器的最新学术研究.pdf

上传人: 明**** 编号:1011803 2025-12-21 17页 1.44MB

1、JongsePark,Associate Professor,KAISTAntonis Psistakis,PhD.Student,Univ.of IllinoisModerator:Brian Hirano,XCENALatest Academic Research for Composable MemoryInvited ResearchersJongse ParkAssociate ProfessorKAISTAntonis PsistakisPhD.Student University of IllinoisHW/SW co-design for AI systemsDistribut

2、ed Systems,Fault Tolerance,CXLJongseParkAssociate Professor,KAISTLLMServingSim:A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale SERVER:COMPOSABLE MEMORY SYSTEMS(CMS)LLM Serving and Its Resource RequirementsGPT-oss-120b117B model params10,000 users8,096 context1,024 tokens 1 se

3、cHardware for LLM ServingLLMServingSimA HW/SW Co-Simulation Infrastructure for LLM Inference ServingOur simulator code is availablehttps:/ ExtensionsComposable ArchitectureCXL-based memory poolingDevicehostremote memory hierarchyPrefilldecode split with prefix&chunked cachingNear-Memory ComputingPro

4、cessing-in-Memory(PIM)Processing-near-Memory(PNM)Computational storage integrationMulti-Instance SystemParallel serving across multiple clustersScalable resource utilizationTowards Efficient and Fault Tolerant CXL SharingAntonis Psistakis,Chloe Alverti,Josep TorrellasUniversity of Illinois Urbana-Ch

5、ampaignSERVER:COMPOSABLE MEMORY SYSTEMS(CMS)Compute Express Link revives Distributed Shared MemoryShared CXL memoryLoad/StoreLocalDRAMCPUCachesCompute Node 0LocalDRAMCPUCachesCompute Node 1LocalDRAMCPUCachesCompute Node NLoad/StoreOperating System 0 Operating System N Operating System 1 Compute Node

6、 0Serverless PlatformFunc A()Func AFunc AFunc AFunc AFunc AFunc ARedundancy Opportunity for data and state sharing across nodes!Remote Fork and high concurrency workloadsCompute Node 1Compute Node 2Compute Node NCXLfork key idea:Share(Meta)Data over CXLCPUCachesCPUCachesCPUCachesCompute Node 0Comput

word格式文档无特别注明外均可编辑修改,预览文件经过压缩,下载原文更清晰!
三个皮匠报告文库所有资源均是客户上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作商用。
根据报告的内容,全文主要内容概括如下: - **研究主题**:聚焦于可组合内存系统(CMS)和人工智能(AI)系统中的硬件/软件协同设计。 - **核心数据**: - GPT-oss-120b模型参数:117B - 用户数:10,000 - 文本上下文:8,096 - 每个token处理时间:<1秒 - **关键点**: - Jongse Park教授和Antonis Psistakis博士在KAIST和UIUC进行相关研究。 - LLMServingSim:一个用于大规模LLM推理服务的硬件/软件协同仿真基础设施。 - CXL技术用于提高分布式共享内存的性能和容错性。 - CXLfork:在CXL网络上实现快速远程分叉。 - Diktamo:基于复制的容错机制,提高CXL系统的可靠性。
LLMServingSim揭秘" 共享内存新篇章" CXL架构新突破"
客服
商务合作
小程序
服务号
折叠