1、2024 Databricks Inc.All rights reserved1Vikram ChatterjiVikram ChatterjiJune 11,2024June 11,2024Mitigating LLM Hallucination RiskHallucination RiskThrough Research Backed Metrics2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved2NLP at scaleBottleneck:Bottleneck:Input/Out
2、put Evaluations cost millions$and took months.AI Evaluations at AI Evaluations at Scale.Scale.Powered by research-backed metricsFocus for today:Focus for today:As NLP has transitioned to GenAI,what does this mean for Evaluations of these new AI Systems?We will discuss 2 new methods for high accuracy
3、 metrics.2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved3NonNon-deterministic deterministic nature of LLMs nature of LLMs 2024 Databricks Inc.All rights reserved4“LLMs are dream machines”“LLMs are dream machines”2024 Databricks Inc.All rights reserved5“DreamsDreams”:fe
4、ature or bug?:feature or bug?2024 Databricks Inc.All rights reserved6We are in the Era of NonWe are in the Era of Non-Deterministic Software.Deterministic Software.=New crop of concerns=New crop of concerns for Enterprise AI for Enterprise AI 2024 Databricks Inc.All rights reserved7McKinsey State of
5、 AI Report 20242024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved8How How AI Teams AI Teams Detect/Evaluate Detect/Evaluate HallucinationsHallucinationsToday.Today.Quantifying LLM HallucinationsQuantifying LLM HallucinationsN N-Gram Matching Gram Matching Ask GPT Ask GPT
6、There are 3 TechniquesThere are 3 TechniquesHuman EvaluationHuman Evaluation123BLEU|ROUGEBLEU|ROUGE-N N Compare to one or more reference completions.A score between zero and one indicating similarity to the reference,one indicating a perfect matchMETEORMETEOR Consider synonym,stemming and word order