《蔡尚铭_SGLang高性能推理现状与未来路线全景解析.pdf》由会员分享,可在线阅读,更多相关《蔡尚铭_SGLang高性能推理现状与未来路线全景解析.pdf(33页珍藏版)》请在三个皮匠报告上搜索。
1、Shangming Cai SGLang Core DeveloperSGLang High-Performance Inference Milestones&Roadmap目录目录Part 01.Part 02.Part 03.SGLang OverviewMilestones&Recent Highlight 2026 Q2 Roadmap0SGLang OverviewSGLang OverviewHigh-PerformanceServing FrameworkFast,production-grade serving for LLMs and multimodal modelsLow
2、-Latency&High-ThroughputDeeply optimized from single GPU to large-scale distributed clustersFully Open-Source&Widely AdoptedIncubated by LMSYS Org,widely adopted in the community and industry02Milestones&Recent HighlightLarge-Scale DeploymentSGLang is the first open-source system that nearly match t
3、he performance of DeepSeek official blog with PD disaggregation and large EP on 12 H100 nodes.Performance at(May.2025)52.3k input token/s/node22.3k output token/s/node5x cheaper than DeepSeek API priceReproduced by 10+other teams.Hierarchical Caching IntegrationUp to 6 throughput and 84%lower TTFT v
4、ia hierarchical KV reuseHiRadixTree+GPU-assisted I/O efficientcross-tier caching(GPU/CPU/Storage)Smart prefetch&write policies hide transferlatency,boost cache hit ratePluggable backends:Mooncake,3FS easy get/exist/set integrationAdopted by Ant Group,Novita AI,Alibaba Cloud and othersSpecBundle&Spec
5、ForgeSpecForge:A framework for training draftmodels that integrate natively with SGLangScalable Distributed Training,Memory-Efficient TrainingSpecBundle:a collection of production-grade EAGLE-3 model checkpoints trained on large-scale datasets.Native Support for Advanced Model Architectures:Llama 4,
6、DeepSeek,Qwen3 MoE,GTP-OSSCollaborated with multiple industry partners-including Ant,Meituan,Nex-AGI,and EigenAIRL IntegrationsSGLang as the efficient rollout engine for reinforcement learningRecent features supported:true on-policy RL,rollout routing replay,low-precisionrollout,MTP trainingHas been