当前位置:首页 > 报告详情

黄瀛-PCP auto-tuning.pdf

上传人: 张** 编号:159300 2024-04-05 23页 285.50KB

1、PCP Size Auto-tuning forPage Allocator ScalabilityHuang,Ying2024 MarchAgenda Problems and background Design and implementation Performance evaluation ConclusionProblems and BackgroundBuddy SystemLCPULCPULCPULCPULCPULCPUNodePCPPCPBuddyPCPPCPPCPPCPAlloc/free One buddy per zone(node)protected by one zo

2、ne lock-Scalability issue!One PCP(Per-CPU Pageset)per LCPU(logical CPU)Buddy System-Continued Physical memory management:node-zone One buddy(page allocator)per zone(per node in practice)All logical CPU of one NUMA node share one zone lock-Scalability issue!More and more cores in one NUMA node in the

3、 future PCP(Per-CPU pageset)can reduce zone lock contention Batching allocation/freeing Less allocation/freeing in zonePossible Solution 1:Fake NUMA NodeLCPULCPULCPULCPULCPULCPUNodeBuddy 0Buddy 1Buddy 2Fake node 0PreferFake node 1Fake Node 2 Very good scalability:Zone,reclaim,compaction,etc.Easy to

4、implement More management burdenPossible Solution 2:Splitting BuddyLCPULCPULCPULCPULCPULCPUNodeBuddy0Buddy1Buddy2ZonePrefer Refused by community for now https:/lore.kernel.org/linux-mm/20230511065607.37407-1- be revisited in the future if necessaryRegion0Region1Region2Possible Solution 3:Larger Allo

5、cation Unit Large folios:Smaller cache footprint with zone lock held0.005.0010.0015.0020.0025.0030.0035.0040.0045.000102030405060Will-it-scale page allocate/free throughput(GB/s)on ICX-SPorder=2order=0Possible Solution 4:PCP Auto-tuningLCPULCPULCPULCPULCPULCPUNodePCPPCPBuddyPCPPCPPCPPCPAlloc/free La

6、rger PCP in effect:less allocation/freeing in buddy Auto-tune:as large as requiredAutoAutoAutoAutoAutoAllocation Patterns-Kbuild Pattern 1:amplitude 100,period 1s Original PCP Pattern 2:amplitude 25k,period:0.5s-1s Auto-tuned PCP high Pattern 3:amplitude=100k,period=10s Not covered by PCPDesign and

word格式文档无特别注明外均可编辑修改,预览文件经过压缩,下载原文更清晰!
三个皮匠报告文库所有资源均是客户上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作商用。
本文介绍了为了解决多核处理器中内存分配的可扩展性问题,提出了一种名为PCP(Per-CPU Pageset)的自动调优方法。传统的buddy系统在每个节点上只有一个buddy(页分配器),导致在拥有多个逻辑CPU的节点上,所有CPU共享一个zone锁,从而引起可扩展性问题。PCP为每个LCPU(逻辑CPU)提供独立的页集,减少了对zone锁的争用。文章提出了几种可能的解决方案,包括假NUMA节点、拆分buddy和更大的分配单元,但其中一些方案未被社区接受。PCP自动调优通过减少从buddy系统进行页面分配/释放的次数,有效降低了zone锁的争用,并减少了内存浪费。在测试配置为2 socket SPR SP系统上,通过kbuild实例的测试,该方法将构建时间减少了5%,并将buddy分配的正常化比例从100%降低到20%。这表明PCP高自动调优能有效减少zone锁争用并控制内存浪费。
"PCP自动调优如何工作?" "更大分配单元能解决什么问题?" "如何减少Zone锁竞争和内存浪费?"
客服
商务合作
小程序
服务号
折叠