自动反馈:使用自定义评估模型扩展人工反馈.pdf

编号:167762 PDF 41页 5.04MB 下载积分:VIP专享
下载报告请您先登录!

自动反馈:使用自定义评估模型扩展人工反馈.pdf

1、2024 Databricks Inc.All rights reservedAUTOFEEDBACK:SCALING HUMAN FEEDBACK WITH CUSTOM EVALUATION MODELSArjun Bansal,Log10.ioArjun Bansal,Log10.ioJune 13,2024June 13,202412Outline1.Problem:Challenges in task specific evaluation of LLM applications2.Solution:AutoFeedback3.Results4.Deployment architec

2、ture5.Next steps/Free trial3Measuring and improving LLM accuracy today is hardOut of the box accuracy Human review is the gold standard but time consuming and expensiveAI based approaches are biased and inaccurateLLM Evaluators Recognize and Favor their own generations.Panickssery et al.,2024Models

3、prefer their own outputsPositional biasVerbosity biashttps:/huggingface.co/blog/open-llm-leaderboard-rlhfChallengesChallenges?SolutionAccurate,Unbiased evaluationLLM as a judgeLog10 AutoFeedback11AutoFeedbackScale human review of LLM output with custom AI modelsDatasetTL;DR dataset(Volske et al.,201

4、7,Stiennon et al.,2020)Reddit summariesSummary grading taskAxes such as coherence,accuracy,coverage and overall scored on a 1-7 rangeQualitative comment/reviewer reasoning Training superset:Subset of 5521 examplesTest:Different subset of 100 examplesDetailed rubricRubricYou are an evaluator of summa

5、ries of articles on reddit.You are tasked with grading the summaries for accuracy,coherence,coverage and overall.CoherenceFor this axis,answer the question“how coherent is the summary on its own?”A summary iscoherent if,when read by itself,its easy to understand and free of English errors.A summary

6、isnot coherent if its difficult to understand what the summary is trying to say.Generally,its moreimportant that the summary is understandable than it being free of grammar errors.Rubric:Score of 1:The summary is impossible to understand.Score of 4:The summary has mistakes or confusing phrasing that

友情提示

1、下载报告失败解决办法
2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
4、本站报告下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。

本文(自动反馈:使用自定义评估模型扩展人工反馈.pdf)为本站 (张5G) 主动上传,三个皮匠报告文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知三个皮匠报告文库(点击联系客服),我们立即给予删除!

温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。
客服
商务合作
小程序
服务号
折叠