《使用 S3 表和 SageMaker 构建可操作的数据湖 [重复].pdf》由会员分享,可在线阅读,更多相关《使用 S3 表和 SageMaker 构建可操作的数据湖 [重复].pdf(18页珍藏版)》请在三个皮匠报告上搜索。
1、 2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.S T G 4 1 3-RJonathan WongNishant JainBuilding an operational data lake using S3 Tables and SageMa
2、kerHe/himSenior Product Manager(S3)He/himSenior Software Engineer(S3)2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.Design the right architecture?Implement security and governance?Optimize for both query performance and storage cost?Integrate with my existing tools and workflows?
3、Customer challenges:How do I 2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.Amazon S3 Tables foundationsSecurity&access patternsService integrationsArchitecture whiteboardingAgenda 2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.S3 Tables:Our data lake foundati
4、onManage table maintenance by policyParquet files get automatically compacted to optimize performance.Orphaned data automatically cleaned up.Works with most query engines that support Iceberg,including Athena,Redshift,EMR,Snowflake,and more.Integrated with AWS&partner enginesEvery table is an AWS re
5、source.This is a big deal.Enables table-level management with AWS tooling.Fully-managed Apache Iceberg tables 2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.1.Snapshot managementIceberg table snapshots managed with configurable retention.Enables time travel while optimizing stora
6、ge costs.2.Unreferenced file removalIdentifies and deletes all objects not referenced by any table snapshots.3.CompactionSmall files automatically merged into larger,optimized files to improve query performance.S3 Table maintenance 2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.a