1、This information is provided to outline Databricks general product direction and is for informational purposes only.Customers who purchase Databricks services should make their purchase decisions relying solely upon services,features,and functions that are currently available.Unreleased features or
2、functionality described in forward-looking statements are subject to change at Databricks discretion and may not be delivered as planned or at allProduct safe harbor statementDelta Lake Liquid Clustering:Lightning-Fast Queries on Massive DatasetsCindy Jiang-Product Manager Rahul Mahadev-Sr.Software
3、EngineerCindy JiangProduct ManagerData IntelligenceOpen LakehouseTalk to me aboutAll things DeltaAll things storageRahul MahadevSenior Software EngineerTechnical Lead,Liquid Clustering Talk to me aboutAll things DeltaAll things storageCustomers Want Their Queries to Go FastCustomers Want Their Queri
4、es to Go FastChoosing the best data layout is a hard problem-decision:what columns do I pick?-decision:is the cardinality correct?CREATECREATE TABLETABLE tbl1 PARTITIONPARTITION BYBY date ASAS;-further ZORDER by columns within the partition-decision:what do you partition by vs.ZORDER by?-run this re
5、gularly!OPTIMIZEOPTIMIZE tbl1 ZORDERZORDER BYBY customerId;ChallengePartitioning and ZORDERing:good for performance,but complicated:Which columns should be partitioned?Which columns should be ZORDERed What if the column is high cardinality?What if things change over time?IntroducingLiquid Clustering
6、 Fast Faster writes and similar reads vs.well-tuned partitioned tables Incremental Low write amplification Self-tuning/skew-resistant Avoids over-and under-partitioning Produces consistent file sizes Flexible Want to change the clustering columns?No problem!Benefits of Liquid ClusteringFar simpler t