1、2024 Databricks Inc.All rights reservedBRING TEXTBRING TEXT-TOTO-SQL SQL TO BI PRODUCTION TO BI PRODUCTION IN LARGE IN LARGE ENTERPRISEENTERPRISEBig Data Platform Department|PCGBig Data Platform Department|PCGJunJun 2024202412024 Databricks Inc.All rights reservedThe intended audience:The intended a
2、udience:data platform product managers,product strategists,data engineers,data scientists,and ML/AI practitionersThe audience will benefit:The audience will benefit:First-hand experience from Tencents large-scale practice adapting the latest LLM to BI.Combining the latest open source and commercial
3、LLM,RAG pattern/framework.Build the next-generation BI capabilities with a natural language interface.2TARGETTARGET AUDIENCEAUDIENCE2024 Databricks Inc.All rights reserved1.Text-to-SQL capability is available with finefine-tuningtuning and prompt engineering prompt engineering under the hood.2.Attem
4、pt to bridge the gap between POC Text-to-SQL tools and querying with natural language for users in Tencent.3.We chose DeepSeekCoderDeepSeekCoder-33B as the foundational model for fine33B as the foundational model for fine-tuning using LORAtuning using LORA.4.We developed a training instruction set a
5、 training instruction set with two primary goals:1.query pattern and syntax coverage.2.representing how business context is referenced,especially for multi-table queries.5.Optimizing the performance and cost-effectiveness of the inference process.6.Our Model is evaluated and compared with GPT3.5/4 u
6、sing BIRD and a set of complex real-life queries from Tencent.7.The model performs better(i.e.,more accurately)performs better(i.e.,more accurately)than GPT4 in cases requiring table joinsin cases requiring table joins,though not as capable of dealing with ambiguous expressions of query intentions.3