张驰-veRL for Training Coding Agent.pdf

上传人： bu****ng

编号：1188809

2026-03-31

PDF 27页 1.61MB

《张驰-veRL for Training Coding Agent.pdf》由会员分享，可在线阅读，更多相关《张驰-veRL for Training Coding Agent.pdf（27页珍藏版）》请在三个皮匠报告上搜索。

1、veRL for Training Coding Agent姓名张驰 verl项发起巫锡斌 verl核maintainer01Introduction to verlverl in 2025-Same front-end code-Auto-backend selectionReinforcement Learning is importantReinforcement learning(RL)at Bytedance Classic Alignment with Human values Reasoning:O1/Claude-3.7 performance on math benchmar

2、ks Image/video/music generation Agentic LLM tool using Desktop operator,coding assistant,GamingIntroduction to Reinforcement LearningSupervised fine-tuning Learning from labeled examples Optimize a single modelReinforcement Learning Maximize the reward Play with multiple models in a single systemHyb

3、ridFlow Programming AbstractionMake LLM training/rollout as a service Classic Alignment with Human values Reasoning:O1/Claude-3.7 performance on math benchmarks Image/video/music generation Agentic LLM tool using Desktop operator,coding assistant,GamingAgentsAgent:software systems that use AI to rea

4、soning,planning,and memory and autonomy to make decisions,learn,and adapt.-Tool calling:Allowing the LLM to select and use various tools as needed.-Memory:Enabling the agent to retain and use information from previous steps.-Planning:Empowering the LLM to create and follow multi-step plans to achiev

5、e goals.Agent RL:training LLM to make better decisions in complex,dynamic,real world.Reinforcement Learning for Agents-AgentLoopRolloutAgentLoop:given a user prompt,execute user defined loop,output multi-turn chat history as trajectory.-Search:online web search-MCP tools:image,video edit,-Code sandb

6、ox:execute code,python,java,-Virtual machine:operate browser,ppt,excel,-Android emulator:operate app-Reinforcement Learning for Agents-SystemHighlight-Server mode:vllm/sglang AsyncLLM engine-Parallel running:asyncio loop run multiple prompts in parallel-Load balance and sticky session:better kv cach

张驰-veRL for Training Coding Agent.pdf

相关报告