《使用 Spark Streaming 和 Delta Lake 将身份图谱提取扩展到每秒 100 万个事件.pdf》由会员分享,可在线阅读,更多相关《使用 Spark Streaming 和 Delta Lake 将身份图谱提取扩展到每秒 100 万个事件.pdf(43页珍藏版)》请在三个皮匠报告上搜索。
1、Scaling Identity Graph Ingestion to 1M events/sec using Spark structured streaming&Delta LakeAkanksha NagpalJianmei Ye2021 Adobe.All Rights Reserved.Adobe Confidential.2021 Adobe.All Rights Reserved.Adobe Confidential.IntroductionAkanksha NagpalSr.Software Engineer,AdobeLinkedIn:akanksha_nagpalJianm
2、ei YeSr.Software Engineer,AdobeLinkedIn:jianmeiye2020 Adobe.All Rights Reserved.Adobe Confidential.Agenda Overview Adobe Identity Graph Journey to 10 x Scaling&Optimization techniques Privacy Compliance Strategies Custom deployment workflows Lessons Learned&Takeaways Q/A Overview2020 Adobe.All Right
3、s Reserved.Adobe Confidential.Adobe Experience Platform:Real time CDPUnified Customer ProfilesActionable AudiencesActivationPersonalization at scale2020 Adobe.All Rights Reserved.Adobe Confidential.Adobe Identity GraphWhat is Identity Graph?Unifies fragmented identifiers(e.g.,emails,device IDs,cooki
4、es)into a single viewEnables consistent consumer recognition across channels and devices2020 Adobe.All Rights Reserved.Adobe Confidential.Identity Service Petabytes data processed daily1M records/sec50+billion identitiesThese capabilities are built from the ground up to link disconnected identities
5、into a single,unified profile to deliver consistent,connected experiences.70B+billion records/dayJourney to 10 x Evolution Phases2020 Adobe.All Rights Reserved.Adobe Confidential.Data Collection Streaming Topic 1PipelineStream Processing PipelineStreaming Topic 2Streaming Topic 3Initial Architecture
6、Identity Graph storeJob ManagerJob ServiceTask ManagerKubernetes Cluster2020 Adobe.All Rights Reserved.Adobe Confidential.Challenges Why we need to evolve?Operational OverheadSingle pipeline Coupled processing+StorageFragmented logic across batch&streaming Noisy Neighbor Multi-tenant pipelineWe init