《智能与预测性故障转移:迈向弹性的错误对抗之路.pdf》由会员分享,可在线阅读,更多相关《智能与预测性故障转移:迈向弹性的错误对抗之路.pdf(21页珍藏版)》请在三个皮匠报告上搜索。
1、Intelligent and Predictive Failover-The race against errors by marching towards Resiliency.!TEJA SWAROOP MYLAVARAPU(Lead Software Engineer-Capital One)1.Terminology Definitions2.What is Resiliency?3.RTO,RPO and MTD4.Infrastructure Premise with Alarms and notifications5.Failover Mechanism-Active-Acti
2、ve Failover6.Distributed System tightly intertwined with Downstream Systems7.Failover Mechanism-Reactive Failover8.Concept of Predictive Failover 9.Predictive Failover-Intelligent and Deep Health Checks10.Intelligent Monitoring with ML11.Synthetic traffic and disaster simulation12.Chaos EngineeringA
3、GENDAAWS Infrastructure-Cluster vs EC2 vs Docker ContainersDefinitionsContainerA Standalone,executable package that includes all the configs needed to run an application in any environment/OSEC2 InstanceA virtual server in Amazon Web Services(AWS).Similar to Virtual Machines in AzureDatabaseA logica
4、l collection of structured information acting as a storage mechanism.AWS Infrastructure-Lambdas,Load Balancer and Route53DefinitionsLambdaAWS Lambda is an event-driven,serverless computing platform managed by AWS which can be used for event-driven or web based applicationsLoad BalancerDistributes in
5、coming application traffic across multiple target groups such as EC2 Instances or LambdasRoute 53Scalable Domain Name System(DNS)web service which connects user requests to AWS ComponentsWhat is Resiliency?Textbook Definition:The ability of an application to resist or recover from any disasters rela
6、ted to Infrastructure,downstream systems,traffic spikes,network issues.It is not only the ability to recover from downtimes,but also the ability to avoid the downtimes and still be consistent with security,consistency and performance.Resilience=Brand ValueOverall low Customer Impact,low failures and