《Delta Lake 中的多表事务.pdf》由会员分享,可在线阅读,更多相关《Delta Lake 中的多表事务.pdf(18页珍藏版)》请在三个皮匠报告上搜索。
1、2024 Databricks Inc.All rights reservedPrakhar Jain,DatabricksPrakhar Jain,Databricks1Towards MultiTowards Multi-Statement Statement Transactions inTransactions inDeltaDelta2024 Databricks Inc.All rights reserved No multi-statement transactions No multi-table transactions No catalog integrationDelta
2、 Lake commits have limitationsDelta Lake commits have limitations2024 Databricks Inc.All rights reserved Filesystem based commitsLeverages atomic filesystem primitives i.e.put-if-absent Commit=Write.jsonin _delta_log directoryModern Day Delta CommitsModern Day Delta CommitsINSERT INTO mydb.employees
3、VALUES(1,John Doe,1980-01-01),(2,Jane Smith,1990-05-15),(3,Bob Johnson,1985-12-31);INSERT INTO mydb.employeesVALUES(4,Mike Johnson,1980-01-01);INSERT INTO mydb.employeesVALUES(4,Mike Johnson,1980-01-01);Retry Commit_delta_log/0000.json_delta_log/0000.json0001.json_delta_log/0000.json0001.json0002.js
4、on2024 Databricks Inc.All rights reserved Cloud object stores do not provide APIs to write multiple files atomically E.g.Cant write following two commit files together145.json on table1124.json on table2NO MULTINO MULTI-TABLE TRANSACTIONSTABLE TRANSACTIONS2024 Databricks Inc.All rights reserved Cata
5、log is updated in a best effort manner(#2409)Information(e.g.Schema)in catalog could be staleCauses split brain between Delta and CatalogNO CATALOG INTEGRATIONNO CATALOG INTEGRATIONQueryonClusterObject StoreCatalog(e.g.HMS)2024 Databricks Inc.All rights reservedNO CATALOG INTEGRATIONNO CATALOG INTEG
6、RATION Catalog is updated in a best effort manner(#2409)Information(e.g.Schema)in catalog could be staleCauses split brain between Delta and CatalogQueryonClusterObject StoreCatalog(e.g.HMS)2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved7DeltaDeltaManagedManagedCommits