当前位置:首页 > 报告详情

基于Lakehouse架构实现湖内建仓实践经验.pdf

上传人: 2*** 编号:121882 2023-03-29 17页 771.19KB

1、基于Lakehouse架构实现湖内建仓实践经验1背景与行业现状2基于Lakehouse湖内建仓参考架构目录3湖内建仓典型场景方案介绍4后续规划数据湖理解的几个误区Wikipeida的定义A data lake is a system or repository of data stored in its natural/raw format,usually object blobs or files.A data lake is usually a single store of all enterprise data including raw copies of source system

2、 data and transformed data used for tasks such as reporting,visualization,advanced analytics and machine learning.A data lake can include structured data from relational databases(rows and columns),semi-structured data(CSV,logs,XML,JSON),unstructured data(emails,documents,PDFs)and binary data(images

3、,audio,video).A data swamp is a deteriorated and unmanaged data lake that is either inaccessible to its intended users or is providing little value.AWS定义A data lake is a centralized repository that allows you to store all your structured and unstructured data at any scale.You can store your data as-

4、is,without having to first structure the data,and run different types of analyticsfrom dashboards and visualizations to big data processing,real-time analytics,and machine learning to guide better decisions.Azure定义Azure Data Lake includes all the capabilities required to make it easy for developers,

5、data scientists,and analysts to store data of any size,shape,and speed,and do all types of processing and analytics across platforms and languages.It removes the complexities of ingesting and storing all of your data while making it faster to get up and running with batch,streaming,and interactive a

6、nalytics.Azure Data Lake works with existing IT investments for identity,management,and security for simplified data management and governance.It also integrates seamlessly with operational stores and data warehouses so you can extend current data applications.Weve drawn on the experience of working

word格式文档无特别注明外均可编辑修改,预览文件经过压缩,下载原文更清晰!
三个皮匠报告文库所有资源均是客户上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作商用。
本文主要介绍了基于Lakehouse架构实现湖内建仓的实践经验。首先,文章解释了数据湖的概念,并澄清了几个常见的误区,如数据湖仅用于存储海量数据,不处理结构化数据等。接着,文章详细介绍了Lakehouse架构,它结合了数据湖和数据仓库的优势,支持各种数据类型和规模,并提供实时计算和流批一体能力。文章还讨论了Lakehouse在数据处理和分析中的应用,包括作为贴源层、构建数据集市和实现流批一体。最后,文章提出了基于Lakehouse的数仓一体建设参考方案,包括统一计算集群、统一存储集群和多样集市,以实现高效的数据管理和分析。
Lakehouse架构如何实现实时数据处理? Lakehouse架构如何支持流批一体? Lakehouse架构如何实现数据湖内建仓?
客服
商务合作
小程序
服务号
折叠