1、Scaling Identity Graph Ingestion to 1M events/sec using Spark structured streaming&Delta LakeAkanksha NagpalJianmei Ye2021 Adobe.All Rights Reserved.Adobe Confidential.2021 Adobe.All Rights Reserved.Adobe Confidential.IntroductionAkanksha NagpalSr.Software Engineer,AdobeLinkedIn:akanksha_nagpalJianm
2、ei YeSr.Software Engineer,AdobeLinkedIn:jianmeiye2020 Adobe.All Rights Reserved.Adobe Confidential.Agenda Overview Adobe Identity Graph Journey to 10 x Scaling&Optimization techniques Privacy Compliance Strategies Custom deployment workflows Lessons Learned&Takeaways Q/A Overview2020 Adobe.All Right
3、s Reserved.Adobe Confidential.Adobe Experience Platform:Real time CDPUnified Customer ProfilesActionable AudiencesActivationPersonalization at scale2020 Adobe.All Rights Reserved.Adobe Confidential.Adobe Identity GraphWhat is Identity Graph?Unifies fragmented identifiers(e.g.,emails,device IDs,cooki
4、es)into a single viewEnables consistent consumer recognition across channels and devices2020 Adobe.All Rights Reserved.Adobe Confidential.Identity Service Petabytes data processed daily1M records/sec50+billion identitiesThese capabilities are built from the ground up to link disconnected identities
5、into a single,unified profile to deliver consistent,connected experiences.70B+billion records/dayJourney to 10 x Evolution Phases2020 Adobe.All Rights Reserved.Adobe Confidential.Data Collection Streaming Topic 1PipelineStream Processing PipelineStreaming Topic 2Streaming Topic 3Initial Architecture
6、Identity Graph storeJob ManagerJob ServiceTask ManagerKubernetes Cluster2020 Adobe.All Rights Reserved.Adobe Confidential.Challenges Why we need to evolve?Operational OverheadSingle pipeline Coupled processing+StorageFragmented logic across batch&streaming Noisy Neighbor Multi-tenant pipelineWe init