当前位置:首页 > 报告详情

利用 CXL 技术实现内存高效的 RAG 流水线.pdf

上传人: 明**** 编号:1011717 2025-12-21 25页 2.76MB

1、Towards Memory-efficient RAG pipelines with CXL technology Siamak Tavallaei,Arun George,Kyumin Park SamsungOpen Systems for AI-ITOutline:Memory-efficient RAGRAG with CXL A brief on how RAG pipelines use memory How effective adding memory is in increasing performance How CXL Memory Expansion helps Ho

2、w CXL-enabled Memory Pooling helps dynamic pipelinesMemory usages in RAG pipeline We focus on the memory usage scenarios In particular,Vector DB use cases are interesting We observe memory demand scenarios1.Data Ingestion phase Huge transient memory demand Large memory required for creation and hand

3、ling of Vector embeddings 2.Query phase Memory pressure based on workload These scenarios qualify for memory expansion(consistent/on-demand)Semantic searchVector DBVectorizationPrompt engineeringLocal Knowledge BaseQuery generationLLMAnswerDB InsertionDB Search1.Data Ingestion2.QueryConvert knowledg

4、e base into vector embeddingsLocal documents:text,PDFs etc.User asks a questionPerform semantic search on Vector DB and get similar results(Context)Augment the question with this contextQuestion+context is used in the model to find the answerMemory demandMemory demandExplorationExploration:How can C

5、XLCXL help?Consistent memory demand:CXL-based memory expansion can help expand memory CMM-D provides memory expansionTransient memory demand:CXL-based memory pooling can handle transient memory spikes CXL 3.0 Dynamic Capacity Devices(DCD)provide the mechanismSolutions:CXL Switch+CMM-D(CXL-connected

6、DRAM)Vertical/Horizontal scaling with CMM-D CMM-D can deliver price,performance,and scalability benefits in vertical and horizontal scalingCPUServer#1RDIMMCMM-DVertical scaling with CMM-DRDIMMCPUServer#3CMM-DRDIMMCPUServer#4CMM-DRDIMMCPUServer#1CMM-DRDIMMCPUServer#5CMM-DRDIMMCPUServer#2CMM-DHorizont

word格式文档无特别注明外均可编辑修改,预览文件经过压缩,下载原文更清晰!
三个皮匠报告文库所有资源均是客户上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作商用。
根据文章内容,以下是全文主要内容的简明概括: 1. **RAG管道内存使用**:文章探讨了RAG(Retrieval-Augmented Generation)管道中的内存使用情况,特别是向量数据库(Vector DB)的使用。 2. **内存扩展与性能提升**:通过CXL®技术,特别是CXL Memory Expansion和CXL-enabled Memory Pooling,可以有效地增加内存,从而提升性能。 3. **CXL Memory Expansion优势**:CXL Memory Expansion提供一致的内存扩展,而CXL-based memory pooling可以处理瞬时的内存需求。 4. **性能提升数据**:使用CMM-D(CXL-connected DRAM)在向量数据库搜索中,性能提升了高达19%。 5. **成本效益**:与传统的DDR内存相比,使用CMM-D可以降低41.67%的TCO(总拥有成本)。 6. **内存池化**:CXL 3.0支持动态容量设备(DCD),有助于处理按需内存需求。 7. **合作呼吁**:文章最后呼吁合作,共同推进CXL在内存池化和共享环境中的应用。
"CXL技术如何提升RAG性能?" "RAG管道中CXL内存池化优势" "CXL助力RAG集群成本效益分析"
客服
商务合作
小程序
服务号
折叠