《张驰-veRL for Training Coding Agent.pdf》由会员分享,可在线阅读,更多相关《张驰-veRL for Training Coding Agent.pdf(27页珍藏版)》请在三个皮匠报告上搜索。
1、veRL for Training Coding Agent姓名张驰 verl项发起巫锡斌 verl核maintainer01Introduction to verlverl in 2025-Same front-end code-Auto-backend selectionReinforcement Learning is importantReinforcement learning(RL)at Bytedance Classic Alignment with Human values Reasoning:O1/Claude-3.7 performance on math benchmar
2、ks Image/video/music generation Agentic LLM tool using Desktop operator,coding assistant,GamingIntroduction to Reinforcement LearningSupervised fine-tuning Learning from labeled examples Optimize a single modelReinforcement Learning Maximize the reward Play with multiple models in a single systemHyb
3、ridFlow Programming AbstractionMake LLM training/rollout as a service Classic Alignment with Human values Reasoning:O1/Claude-3.7 performance on math benchmarks Image/video/music generation Agentic LLM tool using Desktop operator,coding assistant,GamingAgentsAgent:software systems that use AI to rea
4、soning,planning,and memory and autonomy to make decisions,learn,and adapt.-Tool calling:Allowing the LLM to select and use various tools as needed.-Memory:Enabling the agent to retain and use information from previous steps.-Planning:Empowering the LLM to create and follow multi-step plans to achiev
5、e goals.Agent RL:training LLM to make better decisions in complex,dynamic,real world.Reinforcement Learning for Agents-AgentLoopRolloutAgentLoop:given a user prompt,execute user defined loop,output multi-turn chat history as trajectory.-Search:online web search-MCP tools:image,video edit,-Code sandb
6、ox:execute code,python,java,-Virtual machine:operate browser,ppt,excel,-Android emulator:operate app-Reinforcement Learning for Agents-SystemHighlight-Server mode:vllm/sglang AsyncLLM engine-Parallel running:asyncio loop run multiple prompts in parallel-Load balance and sticky session:better kv cach