当前位置:首页 > 报告详情

4 Snowflake-The Evolving Landscape of Iceberg REST Catalog.pdf

上传人: 哆哆 编号:186196 2024-11-01 26页 2.15MB

1、The Evolving Landscape of Iceberg REST CatalogYufei G 2024 Snowflake Inc.All Rights Reserved Caused by:org.apache.iceberg.exceptions.NotFoundException:Failed to open input stream for file:s3:/some/path/to/table/metadata/13648-45c53fb2-5124-4541-ace3-c63ed91e1d26.metadata.jsonAgendaWhy REST Catalog?R

2、ecent Features&Future DirectionGrowing EcosystemA Server Implementation:Apache Polaris(Incubating)Storage OnlylTable still workslPoor addressing:using path for namespace and table lLack of Data Governance,e.g.,data lineagelPerformance concerns No caching,Repeated file reads for queries and metadatal

3、Compatibility concerns:old clients vs.new clientsENGINEENGINE 2024 Snowflake Inc.All Rights Reserved Traditional Iceberg CatalogAddress table nicelyProvide basic data governance featuresEngine writes metadata.json filesEngine retries for conflict resolution:lock the tablePerformance concerns:No cach

4、ingCompatibility concern:old clients vs.new clients 2024 Snowflake Inc.All Rights Reserved CATALOGCATALOGENGINEENGINEHive CatalogJDBC CatalogIceberg Optimistic ConcurrencyIceberg Table Client1Client2v1v1v2v2v3v3Happy PathIceberg Optimistic ConcurrencyIceberg Table Client1Client2v1v1v2v2v3v3v2v2Confl

5、ict ResolutionReliable and Low-latency CommitsBetter Conflict ResolutionEasy Client ImplementationData GovernanceIceberg REST Catalog Shifting Client responsibilities to the server A simple idea to change way we use the Iceberg table completely 2024 Snowflake Inc.All Rights Reserved CATALOGENGINERES

6、T CatalogShift Client Responsibilities to ServeroServer-side handles commit and retry.No table lock.Faster and more reliableoServer-side writes the metadata.json file,no compatibility issueImproved Decision Making at Commit TimeoBetter Conflicts resolutionDDL vs DML,dont failed append when it confli

word格式文档无特别注明外均可编辑修改,预览文件经过压缩,下载原文更清晰!
三个皮匠报告文库所有资源均是客户上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作商用。
本文主要介绍了Iceberg REST Catalog的发展现状和未来方向。Iceberg REST Catalog是一个开放源代码的REST协议,用于多种计算引擎读写Apache Iceberg表,如Apache Flink、Apache Spark、Trino等。其主要特点是将客户端的责任转移到服务器端,包括处理元数据冲突解决和写入metadata.json文件,从而简化了客户端实现,并提供了更好的兼容性和性能。 当前Iceberg Catalog存在一些问题,如表地址处理不佳、数据治理不足、性能担忧(无缓存,查询和元数据重复读取)以及兼容性担忧(旧客户端与新客户端)。而Iceberg REST Catalog的提出,旨在解决这些问题,提供更可靠和低延迟的提交,更好的冲突解决,以及更易于实现的客户端。 未来方向上,Iceberg REST Catalog将实现服务器端元数据表、细粒度提交、凭据刷新等功能,并简化加载表的过程。此外,Polaris作为Iceberg REST Catalog的服务器实现,也将提供更完善的生态系统,支持多种存储选项,如HDFS,以及支持除Iceberg之外的表类型,如Hive表。
了解Iceberg REST Catalog如何随着技术的发展而不断改进,以及它如何适应日益增长的数据管理需求。 探索Apache Polaris这个新孵化的项目,了解它的功能、优势以及它是如何为Apache Iceberg提供更加高效和可靠的数据管理的。 讨论Iceberg REST Catalog如何实现跨不同数据处理引擎(如Spark、Flink等)的一致和高效的数据管理,以及这对于数据工程师和数据科学家来说意味着什么。
客服
商务合作
小程序
服务号
折叠