How to Keep Running When Things Go Wrong High Availability On AWS How to Keep Running When Things Go Wrong 2017 AWS Specialised Partner of the Year
Peter Joseph Cloud Architect @cloudynetwork Key Points: 1) Location of data. 2) 3)
High Level HA Approaches for AWS Overview High Level HA Approaches for AWS Considerations for HA We will be looking at a few different HA architectures for AWS. Along with specific considerations around using the. Can’t deep dive into specifics or many supporting parts, but please catch up with us later…
An Agreed Availability Above What is ‘Normal’ What is High Availability anyway? Many Definitions An Agreed Availability Above What is ‘Normal’ Many many definitions. Production has an agreed availability above the ‘normal’ level for say dev/prod.
Build for the Availability You Need Not the Availability You Want Key Points: Does it really have to be up all the time? Back office system for 10 users v a website serving millions per month? And everything in between.
The True Cost of the Extra 9 Full Multi-Region Multi AZ / Some Multi Region Single AZ / Single Server Key Points: Does it really have to be up all the time? Back office system for 10 users v a website serving millions per month? And everything in between. Lower Availability Higher Availability
With AWS The Cost Of HA is Lower Than Ever Before!!! The Good News With AWS The Cost Of HA is Lower Than Ever Before!!! Key Points: Does it really have to be up all the time? Back office system for 10 users v a website serving millions per month? And everything in between.
Define Your Availability Metrics What Does ‘Up’ or ‘Available’ Mean To You? Ping Doesn’t Cut It Anymore* Service Measures – Response Time etc Successful User Transactions * By “Anymore” I mean since 1996 Key Points: What is your availability figure based on. What Do you consider Available? Real world measures of the service availability, not artificial ones. Artificial ones are easy. Really think about this…
Consider the Entire Service Changes NTP DNS Service Response Times Authentication AD/LDAP Customer Reachability ‘The User Experience’ From Multiple Locations Key Points: Consider the entire service, not just the parts you run. Things outside of your direct control that you rely on may impact service availability.
Has Become The Baseline Approaches Multi AZ Has Become The Baseline Multi Region Key Points: Broadly speaking Multi-AZ and Multi Region Deploying Across Multi Region
Multi AZ Some Services Require You To Choose EC2 - RDS Many have ‘Built in Multi AZ’ S3 – DynamoDB - Lambda Key Points: Broadly speaking Multi-AZ and Multi Region
Multi Region Significant Additional Complexity (If App Not Natively Capable) Potentially ‘Zero’ Downtime Key Points: Broadly speaking Multi-AZ and Multi Region. Databases are a specific challenge. Zero Downtime
Be Aware of SPOFs Single Region Buckets Single Instances From Your Apps Perspective Single Region Buckets Examples! Not An Exhaustive List! Single Instances Single Database Depending on your scope for HA. Single Region Buckets , use replication, no good operating in multi regions if you depend on everything in one bucket. Replicate!
Monitoring And Alerting Knowing When It’s Down Leverage CloudWatch + SNS Automate Responses with CloudWatch Events Key Points: Avoid human error. If you automate builds, you can easily audit and compare with a known good state. No snowflakes. Security. Automating logging setup and analytics means you can be sure to collect everything. If the worst should happen, compromise, etc then you can rebuild more rapidly and with confidence.
Best Practices Test Your Recovery Implement Automatic Recovery Scale Horizontally Monitor Service Capacity Key Points: 1) AWS capacity (limits excluded) is effectively unlimited for us mortals…but you must have the ability to monitor the capacity of your service if you want to ensure you meet your Availability numbers.
Pitfalls AWS Service Limits! AWS Service Limits! Trusted Advisor can Help Think BEYOND AWS for SPOFs Overlooking the User Experience/Journey Not Considering Changes Key Points: AWS service limits… trusted advisor VPNS, Users connectivity etc….local DNS …
Roundup Defining HA (For Your Needs) HA Approaches for AWS Measuring and Reacting Best Practices and Pitfalls Key Points: 1) AWS capacity (limits excluded) is effectively unlimited for us mortals…but you must have the ability to monitor the capacity of your service if you want to ensure you meet your Availability numbers.
Everything Fails, All The Time – Werner Vogels, Amazon CTO Key Points: Serverless, emphasis not on individual cattle
Anything That Can Go Wrong, Will Go Wrong – Murphy? Key Points 1) I would extend this to ‘Cloud is a mindset’ not a skillset too. It’s not about
AWS Well Architected Framework Bedtime Reading AWS Well Architected Framework Key Points 1) Werner Vogels Amazon CTO called this out during his keynote. We have mentioned this a lot.
The Twelve-Factor App https://12factor.net/ Key Points 1) Werner Vogels Amazon CTO called this out during his keynote. We have mentioned this a lot.
Thanks Questions?