张驰-基于多模态大语言模型的GUI智能体.pdf

编号:182443 PDF 37页 5.22MB 下载积分:VIP专享
下载报告请您先登录!

张驰-基于多模态大语言模型的GUI智能体.pdf

1、基于多模态大语言模型的GUI智能体张驰 博士腾讯QQ光影实验室/西湖大学 AGI 实验室演讲嘉宾张驰入选美国斯坦福大学发布的2023全球前2顶尖科学家榜单张驰博士,现就职于腾讯公司担任研究科学家。将在 2024 年秋季加入西湖大学担任博导助理教授,并创建 AGI 实验室。2021年博士毕业于新加坡南洋理工大学计算机学院。他的博士研究课题专注于高效低成本的机器学习和计算机视觉算法,目前的研究方向是大模型时代下的多模态模型与AIGC模型。曾在CVPR、ICCV、NeurIPS、TPAMI等顶级国际会议期刊上发表多篇学术论文。目 录CONTENTS1.背景:大语言模型与 Agent2.难点与挑战3.

2、用户界面与人机交互4.基于 GUI 的智能体 AppAgentLarge language models as agentPrompt:Imagine that you are a home robot,and is asked to go to the kitchen to fetch something from the fridge.The image below shows your current position.Please plan your next action.GPT-4V:My next action would be turn right and move forwa

3、rd towards the hallway,as I can see that the kitchen is likely located in that direction.I would then continue to navigate through the hallway until I reach the kitchen and locate the fridge.Large language models as agentPrompt:Now you are at the kitchen,this is what you can see from there.Can you s

4、ee the fridge from here?What is your next action?GPT-4V:From this position,I can see the fridge on the right side of the kitchen.My next action would be to move forward and turn slightly to the right to approach the fridgeLarge language models as agentPrompt:Imagine that you are a robot operating a

5、computer You are asked to find a detailed recipe of Mapo Tofu.Below is what you see on the computer screen,predict your next moveGPT-4V:Next move:I would move the mouse to the Google Chrome icon located on the left side of the screen,approximately 1/3 of the way down from the top.I would then click

6、on the icon to open the web browser.DecisionPerceptionActionChallengesLimitation of text-only input in current agent designControl alone is a big disciplineInsufficient knowledge in LLM&high learning costA Simple Agent PrototypeSignificance of InterfaceAdvantagesActions have clear semantic meaning,p

友情提示

1、下载报告失败解决办法
2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
4、本站报告下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。

本文(张驰-基于多模态大语言模型的GUI智能体.pdf)为本站 (张5G) 主动上传,三个皮匠报告文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知三个皮匠报告文库(点击联系客服),我们立即给予删除!

温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。
客服
商务合作
小程序
服务号
折叠