《Uber 的批量分析从 Hive 到 Spark 的演变.pdf》由会员分享,可在线阅读,更多相关《Uber 的批量分析从 Hive 到 Spark 的演变.pdf(34页珍藏版)》请在三个皮匠报告上搜索。
1、2024 Databricks Inc.All rights reservedKumudini,AkshayaprakashKumudini,AkshayaprakashUberUber1UBERS BATCH UBERS BATCH ANALYTICS EVOLUTION ANALYTICS EVOLUTION FROM HIVE TO SPARKFROM HIVE TO SPARK2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedHive UberMotivation To Migra
2、teMigration StrategyHive to SparkSQL TranslationShadow Testing&Data validationHive-Spark DisparityResults&Future Work2AgendaAgenda2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedBatch Analytics UberBatch Analytics UberHive on SparkSparkHive ETLIngestionNotebooksSpark ET
3、LInteractiveCompliance ReportingFinancial ReportingPlanning&ForecastingAI/ML PlatformFraud&Risk AnalysisAdtech Analysis2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedHive UberHive UberTotal ETLs18KMonthly Scheduled Queries5MMonthlyInteractive Queries150KYarn Usage35%Hi
4、ve:2.3Spark:2.4.32024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedHive UberHive UberArchitectureArchitectureWorkflow DAGP1P2P3CREATE TABLE T1.;CREATE TEMPORARY TABLE TMP_1;CREATE TEMPORARY TABLE TMP_2;INSERT OVERWRITE T1 SELECT*FROM TMP_1 JOIN TMP_2;SparkSubmitHeartbeatH
5、ive PayloadPoll StatusYarnHoSHive ProxyHive ProxyRest proxy to run Hive queries asynchronouslySpark submit HoS job with the PayloadConfig Guardrails&TransformationsZero Downtime Hive ReleaseHive on Spark ExecutionServerless(without HS2)executionExecutes SQL statements in the payloadQuery planning wi
6、thin the DriverHeartbeat to the Hive Proxy2024 Databricks Inc.All rights reservedHive on Spark(HoS)has inactive OSS development,Obsolete!Spark3 has vibrant OSS communityActive OSS developmentBetter PerformanceUnified Batch AnalyticsActive OSS developmentBetter PerformanceUnified Batch Analytics6Hive