当前位置:首页 > 报告详情

024-洪培翔.pdf

上传人: 山哈 编号:725329 2025-07-04 13页 704.68KB

1、AutoIREE:Automatic Performance Tuning for AI models on RISC-V Vector Architectures洪培翔,張元銘,吳奕緯Andes Technology CorporationRISC-V Summit China,2024/08/222Subject to change without noticecopyright 2021-2024 Andes TechnologyBackground IntroductionIREE Andes Vector Processor FamiliesImplementation and In

2、novationExperiment ResultsFuture WorksOutline3Subject to change without noticecopyright 2021-2024 Andes TechnologyIREE,MLIR-based Compiler+RuntimeModel inputTargetsPrebuilt AI Model(Pytorch,TF,TFL)Andes CoresGPUsCourtesy of IREE website4Subject to change without noticecopyright 2021-2024 Andes Techn

3、ologyAndes Vector Processor Familiesmore featuresIntegrated Matrix Ext.(IME)8-core cluster16-core cluster with private L1/L2HVM(High-speed Vector Memory)InterfaceACE for RVVACE(Andes Automated Custom Extension)int464,fp1664;bf16(conv.)+bf16(full arithmetic)+fp8VLEN:128,256,512VLEN:128,256,512,1024+2

4、048RVV 0.8RVV 1.05-stage single-issue8-stage dual-issue with shared cache for multicore*Future products subject to changeAX47MPV*AX46MPV*NX27VAX45MPVAX25-V100AX25-V100 is adopted in Metas training and Inference Accelerator(MTIA)v1.5Subject to change without noticecopyright 2021-2024 Andes Technology

5、AX45MPV VPU FeaturesVALUVMACV0.V1.V31.scoreboardVLEN.VIQVectorL/S ControlForwarding/ChainingRVVLd/StDual Issue/Dispatch,Out-of-order executionMultiple Vector Functional Units(VFUs)operating independently and simultaneouslyUp to 5 DLEN results are generated per cycleSupport precise exception(optional

6、)ACE-RVV VLEN/DLEN/BIU CombinationsVLENSIMD(DLEN)BIU(AXI)10241024512/256/1281024512512/256/128512512512/256/128512256256/128256256256/1282561281281281281286Subject to change without noticecopyright 2021-2024 Andes TechnologyImplementationTunerTransform library controls:Size of tiling levelSize of Ca

word格式文档无特别注明外均可编辑修改,预览文件经过压缩,下载原文更清晰!
三个皮匠报告文库所有资源均是客户上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作商用。
本文主要介绍了AutoIREE,一个针对RISC-V向量架构上AI模型自动性能调优的工具。关键点如下: 1. **IREE与Andes向量处理器:** IREE是一个基于MLIR的编译器和运行时模型,可用于优化在Andes向量处理器上运行的预建AI模型。 2. **性能优化:** 提出了一种创新的子图相似性分析方法,将600个子图分组为60个类别,减少了调优次数,提高了效率。 3. **实验结果:** 使用遗传算法进行20000次迭代,MobileNet fp32模型的速度提升了2.1~2.7倍。 4. **实验环境:** 在FPGA上测试了Andes AX45MPV核心,不同配置下的MobileNet v1 INT8和FP32模型,速度提升显著。 5. **未来工作:** 将减少编译/调优时间,使用启发式早期停止和基于AI的成本模型预测周期。 核心数据引用:MobileNet fp32模型子图数量从600减少到60类,速度提升2.1~2.7倍;AutoIREE在RVV上的速度提升相较于纯标量执行,INT8为14.54~58.19倍,FP32为1.53~2.42倍。
"IREE如何提升AI模型性能?" "Andes向量处理器有哪些创新功能?" "AutoIREE如何减少调优时间?"
客服
商务合作
小程序
服务号
折叠