《OpenGeMM:一款高利用率的 GeMM 加速器生成器具有轻量级 RISC-V 控制和紧密内存耦合.pdf》由会员分享,可在线阅读,更多相关《OpenGeMM:一款高利用率的 GeMM 加速器生成器具有轻量级 RISC-V 控制和紧密内存耦合.pdf(51页珍藏版)》请在三个皮匠报告上搜索。
1、1OpenGeMM:A High-Utilization GeMM Accelerator Generator with Lightweight RISC-V Control and Tight Memory CouplingASP-DAC 2025Xiaoling Yi1,Ryan Antonio1,Joren Dumoulin1,Jiacong Sun1,Josse Van Delm1,Guilherme Paim1,2,Marian Verhelst11MICAS-ESAT,KU Leuven,Belgium,2INESC-ID,Instituto Superior Tcnico,Uni
2、versidade de Lisboa,Portugal23rdJanuary 2025Outline Edge AI Computing Background and MotivationOpenGeMM System ArchitectureOverviewGeMM Accelerator GeneratorMechanisms for High UtilizationReusability and Flexibility SummaryExperimental Results and SotA ComparisonConclusion and Future Work2Edge AI Co
3、mputing-NecessityDNN models become pervasive while evolving rapidly3Image ClassificationVideo GenerationLanguage AssistanceIntelligent RoboticsModel size of language modelsEdge AI Computing-NecessityDNN models become pervasive while evolving rapidlyEdge DNN deployment challenges1)High performance an
4、d energy efficiencyReal-time applicationLow battery capacity 4Edge AI Computing-NecessityDNN models become pervasive while evolving rapidlyEdge DNN deployment challenges1)High performance and energy efficiency2)FlexibilityReusable across DNN models5Edge AI Computing-NecessityDNN models become pervas
5、ive while evolving rapidlyEdge DNN deployment challenges1)High performance and energy efficiency2)Flexibility3)UnderutilizationLow effective computation6Edge AI Computing SotA WorksEfficiency vs.Flexibility/Reusability 7DSAs:NVDLA 1DepFiN 3Programmable platforms:CPUs/GPUs/FPGAEffi.Flex.Low efficienc
6、y and high control overheadReusable for diverse workloadsHigher flexibilityHigher efficiencyTailored to specific workloads Limited reusability and programmabilityEdge AI Computing SotA WorksEfficiency vs.Flexibility/Reusability 8RISC-V AI platforms:Gemmini 2,RedMule 3Flexible GeMM acceleratorEffi.Fl