1、1NVIDIA LLM 全栈式方案使用和优化最佳实践周国峰(Chandler)NVIDIA 技术研发经理GTC 2024 China AI Day,Mar.19,20242NVIDIA Full-Stack Solution for LLMBest Practices of NVIDIA Megatron-Core for LLM TrainingBest Practices of NVIDIA TensorRT-LLM for LLM InferenceBest Practices of NVIDIA Triton Inference Sever for LLM DeploymentConc
2、lusion and ProspectAgenda3NVIDIA Full-Stack Solution for LLMBest Practices of NVIDIA Megatron-Core for LLM TrainingBest Practices of NVIDIA TensorRT-LLM for LLM InferenceBest Practices of NVIDIA Triton Inference Sever for LLM DeploymentConclusion and ProspectAgenda4NVIDIA Full-Stack Solution for LLM
3、From Training,Inference to DeploymentNVIDIA Megatron-Core(M-core)for LLM Training An open-source library for GPU optimized techniques for LLM training.For customers to build custom LLM framework.NVIDIA TensorRT-LLM for LLM Inference An open-source library that accelerates and optimizes inference per
4、formance of the latest large language models(LLMs)NVIDIA Triton Inference Sever for LLM deployment An open-source library that standardizes AI model deployment and execution across every workloadTensorRT-LLM+Triton Inference Server for deployment The suggested way to deploy LLM-based services on NVI
5、DIA AI platform SOTA performance and rich functionalities TensorRT-LLM backend.The Triton backend for TensorRT-LLM,including in-flight batching,paged KV cache and more.5Overview of NVIDIAs Large Language Model Offerings for TrainingSolutions at Each Level of the Tech StackTransformer Engine:Hopper a
6、ccelerated Transformer models.Specific acceleration library,including FP8 on Hopper.Megatron-LM:A lightweight framework reference for using Megatron-Core to build your own LLM framework.Nemo Framework:Easy to use OOTB FW with a large model collections for enterprise users to experiment,train,and dep