评估驱动的开发工作流程:最佳实践和实际场景.pdf

编号:719006 PDF 32页 1.53MB 下载积分:VIP专享
下载报告请您先登录!

评估驱动的开发工作流程:最佳实践和实际场景.pdf

1、Evaluation-Driven Development Workflows:Best Practices and Real-World ScenariosVivian(Wenwen)Xie,Arthur DoonerJune 9About UsArthur Dooner,Specialist Solutions ArchitectVivian(Wenwen)XieSpecialist Solutions ArchitectEvaluation FoundationsEvaluation in a process of assessing the quality of an AI syste

2、m in a repeatable and standardized wayEvery evaluation you run of an AI system will have different definitions of success and“correct”as the goals of each AI system is differentWe need to compare from one version of our AI system to another,how a system is performing compared to its previous version

3、Without evaluation,we cannot prove that our iterations amount to development progressWhat is Evaluation?Why do we Evaluate our AI Systems?4Evaluation is a MindsetTypes of EvaluationGet F1 scores or equivalents for an AI system to ask:Is it labeling this record correctly?What percentage of the time i

4、s the record labeled?What labels?What score makes sense(F1,RMSE,etc)Retrieval performanceRetrieval qualityData Structures(are chunks connected and meaningful)Prompt ExpansionSystem and Contextual PromptsLatency and CostMine traces to see what tools are usedMore compound retrievalTool quality/perf in

5、dividuallyTool results togetherAgent System PromptDeterminismClassification of AccuracyRAG System EvaluationAgent System Evaluation5Evaluation Exists in Many Forms-Identify which one you need to do!Evaluation-Driven DevelopmentDefinition:EDD integrates continuous evaluation into the AI development l

6、ifecycle,ensuring models meet quality,cost and latency benchmarksImportance:Facilitates iterative improvements,aligns models with user needs,and ensures compliance with organizational standardsPrerequisites:Gather requirements to validate gen AI fit and identify constraintsDesign your solution archi

友情提示

1、下载报告失败解决办法
2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
4、本站报告下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。

本文(评估驱动的开发工作流程:最佳实践和实际场景.pdf)为本站 (Flechazo) 主动上传,三个皮匠报告文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知三个皮匠报告文库(点击联系客服),我们立即给予删除!

温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。
客服
商务合作
小程序
服务号
折叠