1、Saving Millions from Millions Navigating Towards Cost-Efficiency in Pinterests Spark JobsNan Zhu PinterestJune 2025About Me Nan Zhu Software Engineer Pinterest Focusing on Batch Processing Platform:SparkK8S,Spark Job Cost Optimization,Data Infra for fast AI/ML iterations,etc.Open Source Activities C
2、odingCatGitHub Involving in Multiple Open Source projectsApache Spark/Celeborn/Iceberg,etc.2Agenda Introduction to Spark PlatformPinterest Cost Efficiency Challenges in Pinterest Towards Cost Efficiency of Spark Jobs in Pinterest Cost Issue Debuggability/Platform Innovations/Fine-grained Cost Manage
3、ment Summary Q&ASpark PlatformPinterestSpark PlatformPinterest(1)Spark PlatformPinterest(2)Consolidating to K8SSpark is the backbone of data processing in PinterestCore ML FoundationAds ML InfraAds Retrieval InfraTrust&SafetySearch Ads MeasurementAds Reporting InfraHome feed RankingLong User Sequenc
4、eAds IndexRec GPTTrust&SafetyML feature offline evaluationContent SignalsAds SignalsData WarehousePinterest owns one of the largest Spark platforms in the world8Cost Efficiency Challenges Fast GrowthUp to 3-4%usage growth per month*Complexity Variety of use cases:pure SQL jobs,memory-heavy user defi
5、ned data structures,mixed Scala/C+jobs,etc.Scale Millions of Jobs every weekCost Efficiency Challenges 10*Pinterest internal dataSaving X of Millions of Dollars from Millions of JobsHolistic Framework Addressing Cost Efficiency Challenges Improving Cost Debuggability Platform Level InnovationsShuffl
6、e Optimizations with Remote Shuffle Service Burst-aware Memory Allocation Fine(r)-grained Cost ManagementJob level fine tune Managed JobSet 12Towards Cost Efficiency of Spark Jobs-Cost Debuggability ImprovementsGoal of Cost Debuggability ImprovementSurface Root Cause of Cost Inefficiency For various