当前位置:首页 > 报告详情

SemiAnalysis InferenceMAX:人工智能前沿基准测试.pdf

上传人: 明**** 编号:1011680 2025-12-21 36页 3.93MB

1、Dylan PatelSemiAnalysis InferenceMAX:Benchmarking the AI FrontierOCP SPECIAL FOCUS:MARKET IMPACT&OCP ADOPTION STORIESSemiAnalysis:What We DoKey Trends in AI InferenceIntroduction to InferenceMAXInferenceMAX v1 ResultsFuture PlansContents2SemiAnalysis:What We DoSector Coverage Tokenomics,Applications

2、,IntelligenceAI Companies,AI Models,Cloud ServicesAccelerators,Networking,Optics,DatacentersODM and OEMsIC DesignFoundry and IDMsTesting and PackagingSemicap EquipmentSubassemblies,Materials,ConsumablesTeam MembersSemiAnalysis:What We Do42 team members globally(plus interns!)26 in North America(offi

3、ces in SF,NYC)5 in Europe11 in AsiaSemiAnalysis:What We D(186,000 subscribers on free tier)SemiAnalysis:What We D(institutional business)continued on next pageSemiAnalysis:What We D(institutional business)continued on next pageSemiAnalysis:What We D(institutional business)ToolsDataLLM Inference syst

4、ems are not a monolith:dedicated compute and system-level optimizations are the standardPrefill vs.Decode specialization:Treat prefill and decode as different workloads(compute vs memory/latency bound)to cut TTFT and ITL.Separate GPU pools for prefill and decode;transfer KV cache between pools to ra

5、ise overall utilization.KV cache as a first-class resource:Cache-aware routing(pick the worker with best cache hit&lowest load)and fast KV transfer are becoming core platform features.MoE+Expert Parallelism(DeepEP,etc.):Route tokens to a few experts across GPUs;use throughput-optimized dispatch for

6、prefill and low-latency dispatch for decodeDynamic rate matching:Continuously tune the Ctx:Gen GPU ratio per model/traffic to balance TTFT,ITL,and tokens per second per GPUAI Inference Is Becoming Increasingly SpecializedKey Trends in AI InferenceExisting Benchmarks:Lack transparency,with opaque dec

word格式文档无特别注明外均可编辑修改,预览文件经过压缩,下载原文更清晰!
三个皮匠报告文库所有资源均是客户上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作商用。
根据报告的内容,全文主要内容概括如下: - **SemiAnalysis简介**:SemiAnalysis是一个专注于半导体和人工智能领域的分析公司,拥有42名团队成员,遍布北美、欧洲和亚洲。 - **AI推理趋势**:AI推理正在变得越来越专业化,需要针对不同的工作负载进行优化。 - **InferenceMAX™介绍**:InferenceMAX™是一个开源的AI推理基准测试工具,提供透明、实时的性能数据。 - **InferenceMAX™ v1结果**:在DeepSeek R1 0528模型上,GB200 NVL72比H200和B200更节能;在GPT-OSS 120B模型上,MI355X vLLM在低交互性下性能/TCO与B200 vLLM相当。 - **未来计划**:InferenceMAX™计划在未来两个月内增加对Google TPU和AWS Trainium的测试,并继续优化性能。
"AI推理新标杆?" "InferenceMAX™如何颠覆基准测试?" "GPU性能大比拼,哪家更强?"
客服
商务合作
小程序
服务号
折叠