《在 AWS 上构建基于 Apache Iceberg 的 Lakehouse 架构的最佳实践.pdf》由会员分享,可在线阅读,更多相关《在 AWS 上构建基于 Apache Iceberg 的 Lakehouse 架构的最佳实践.pdf(37页珍藏版)》请在三个皮匠报告上搜索。
1、 2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.A N T 3 4 3Best practices for building Apache Iceberg based lakehouse architectures on AWSPurvaja Narayanaswamy(She/her)Senior Engineering Manager AWS Glue Data Cata
2、log and AWS Lake FormationMike Araujo(He/him)Principal Data Architect MedidataSrikanth Sopirala(He/him)Principal GenAI/ML Solutions Architect AWS 2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.Agenda Data Lake Crisis Journey of Apache Iceberg in Lakehouse AWS Stack Architecture O
3、verview of the integrated solution Getting Data In Production-ready architecture patterns Voice of the Customer Real-world experiences and outcomes Getting Data Out Multi-compute integrations and optimizations 2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.From Data Lake Crisis t
4、o Apache Iceberg LakehouseTraditional Data LakeData CorruptionSlow QueriesNo ACIDNo HistoryIceberg Table APIMetadata LayerSnapshotsManifestsSchemaPartitionsData Files(Parquet/ORC/Avro)ACIDTime TravelSchema EvolutionFast QueriesRow UpdatesApache Iceberg Table FormatSparkFlinkTrinoAthena 2025,Amazon W
5、eb Services,Inc.or its affiliates.All rights reserved.Apache Iceberg V3 Closing the gap 2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.AWS Stacked ArchitectureAWS Glue Data CatalogAWS Lake FormationAmazon SageMaker CatalogCatalog&GovernanceUnified access controlBusiness CatalogTe
6、chnical Catalog3P Apache Iceberg-compatible engine(s)AWS GlueAmazon EMR Amazon RedshiftProcessData SourcesOperational databasesOther data sourcesBusinessapplicationsData WarehousesStreamingIngestAmazon S3 Amazon S3 TablesRedshift Managed Storage(RMS)StorageData LakesOptimized storageFully managed Ic