蔡尚铭_SGLang高性能推理现状与未来路线全景解析.pdf

上传人：蓝***

编号：1270120

2026-06-20

PDF 33页 4.38MB

《蔡尚铭_SGLang高性能推理现状与未来路线全景解析.pdf》由会员分享，可在线阅读，更多相关《蔡尚铭_SGLang高性能推理现状与未来路线全景解析.pdf（33页珍藏版）》请在三个皮匠报告上搜索。

1、Shangming Cai SGLang Core DeveloperSGLang High-Performance Inference Milestones&Roadmap目录目录Part 01.Part 02.Part 03.SGLang OverviewMilestones&Recent Highlight 2026 Q2 Roadmap0SGLang OverviewSGLang OverviewHigh-PerformanceServing FrameworkFast,production-grade serving for LLMs and multimodal modelsLow

2、-Latency&High-ThroughputDeeply optimized from single GPU to large-scale distributed clustersFully Open-Source&Widely AdoptedIncubated by LMSYS Org,widely adopted in the community and industry02Milestones&Recent HighlightLarge-Scale DeploymentSGLang is the first open-source system that nearly match t

3、he performance of DeepSeek official blog with PD disaggregation and large EP on 12 H100 nodes.Performance at(May.2025)52.3k input token/s/node22.3k output token/s/node5x cheaper than DeepSeek API priceReproduced by 10+other teams.Hierarchical Caching IntegrationUp to 6 throughput and 84%lower TTFT v

4、ia hierarchical KV reuseHiRadixTree+GPU-assisted I/O efficientcross-tier caching(GPU/CPU/Storage)Smart prefetch&write policies hide transferlatency,boost cache hit ratePluggable backends:Mooncake,3FS easy get/exist/set integrationAdopted by Ant Group,Novita AI,Alibaba Cloud and othersSpecBundle&Spec

5、ForgeSpecForge:A framework for training draftmodels that integrate natively with SGLangScalable Distributed Training,Memory-Efficient TrainingSpecBundle:a collection of production-grade EAGLE-3 model checkpoints trained on large-scale datasets.Native Support for Advanced Model Architectures:Llama 4,

6、DeepSeek,Qwen3 MoE,GTP-OSSCollaborated with multiple industry partners-including Ant,Meituan,Nex-AGI,and EigenAIRL IntegrationsSGLang as the efficient rollout engine for reinforcement learningRecent features supported:true on-policy RL,rollout routing replay,low-precisionrollout,MTP trainingHas been

蔡尚铭_SGLang高性能推理现状与未来路线全景解析.pdf

相关报告