1、How You Could Be Saving 50%of Your Spark CostsSponsored by:definityRoy Daniel,CEO&Co-Founder,definityJune 20252ContextLakehouseLakehouseredefines redefines data architecturedata architectureHybrid query enginesOpen table formatsIncreased scale,Increased scale,missionmission-criticality,criticality,a
2、nd complexityand complexityShift from BI to ML and AgenticData democratizationStrong data Strong data cost pressures cost pressures and FinOpsand FinOpsTightened budgetsShift to“full-stack”toolsIntelligent metadata Intelligent metadata becomes becomes a gamea game-changerchangerActionability over al
3、ertsContext is keyTectonic shifts in data engineeringNew AI and data trends are reshaping the data ecosystemNew AI and data trends are reshaping the data ecosystem3ChallengeNo job-level visibilityHeard to optimize jobs50%50%Wasted infra cost10 mon10 monDelayed upgradesHard to validate changesBusines
4、s&security risksData PlatformData Platform$MM$MMBusiness impactLate detection at-restLow coverage of data&jobs35%35%Slower eng velocityNo context of jobs&dataHard to troubleshoot and fixData EngineeringData EngineeringSource:Customer example cases,2023-2025“Data Observability”is brokenLakehouse team
5、s are constantly firefightingLakehouse teams are constantly firefighting4ChallengeSource:Customer example cases,2023-20258 8-1212%Data distribution checks atData distribution checks at-restrestChecks run after-the-fact,driving added compute5 5-10%10%PostPost-deployment degradationsdeployment degrada
6、tionsUn-tested code changes,avoidable inefficienciesTypical wasteTypical wasteLearningLearning50%50%OverOver-provisioning,data skewprovisioning,data skewMonitoring at cluster-level masks wasteExample:data skewWhat monitoring 1B vCore-hours taught usHidden costs in Spark data engineeringHidden costs