Designing for scalability and high availability on Microsoft Azure Microsoft Ignite 2016 9/22/2018 5:47 PM BRK3207 Designing for scalability and high availability on Microsoft Azure Adam Glick Sr. Program Manager – Azure Resiliency AGlick@Microsoft.com @MobileGlick https://linkedin.com/in/MrAdamGlick adamglick © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Session objectives and takeaways Tech Ready 15 9/22/2018 Session objectives and takeaways Session objective(s): Design highly available applications on the Azure platform Understand best practices and common pitfalls to avoid Learn the inner workings of Azure and how to best work around hidden limitations Note: This session focuses mostly on the design of Cloud-based apps using IaaS. © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
High availability What does HA mean to you? What are the business requirements? Different requirements mean a different architecture
Resiliency Resiliency Backup Disaster recovery High availability Cost / Complexity RPO/RTO Backup Disaster recovery High availability RTO >> 0 RTO > 0 RTO = 0 Best For: Data deletion Data corruption Legal, governance & compliance Protection for unplanned failures Don’t want to re-architect for HA Don’t want the cost of HA Large scale failures Mission critical apps New apps Localized failures
Mean time between failure (MTBF) Metric used to identify reliability Component failures Cost of delivering cloud infrastructure On-premises vs cloud Hardware failure is inevitable
Designing for the cloud No single point of failure State stays at the edges of your app stack Loosely couple your components Build for scale Process centrally, deliver locally Automate Trust and verify Scale out, not up
Scaling vertical (up) vs horizontal (out) Cost VMSS and autoscale
Good candidates for Azure Built versus bought software Software designed for cloud Applications designed around a single-state datastore
What can go wrong? Compute Networking Storage All 3
Compute
Availability Sets Governs fault and update domains Tied to a role in your application Required for 99.95% SLA from Microsoft Provides fault domains and upgrade domains
Availability Sets (upgrade domain) Fault Domain is single point of failure Upgrade domain used for patching ASM (2 FDs) ARM (3 FDs) VM Update Domain Fault Domain VM1 VM2 1 VM3 2 VM4 3 VM5 4 VM Update Domain Fault Domain VM1 VM2 1 VM3 2 VM4 3 VM5 4 VM6 5
Deployment scale Availability Set
Azure VMs AVSet FE Subnet Resource Manager Template Virtual Network
Azure VMs Front-end web servers Active Directory SQL Server cluster Virtual Network Availability Set Subnet Front-end web servers Active Directory SQL Server cluster Resource Manager Template
Azure VMs East US West US Active Directory AD Repl. SQL AO Async TechReady 23 9/22/2018 5:47 PM Azure VMs Virtual Network Availability Set Subnet East US West US Active Directory AD Repl. SQL AO Async Backup Resource Manager Template © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Virtual Machine Scale Sets (VMSS) A set of VMs which are identical (cattle) Deployed with configuration management techniques Can be auto-scaled Based off a common base image Portal / ARM template Configuration management
Deployment scale - VMSS Region 1
Deployment scale – multi-region
Network
Layer 4 vs layer 7 Load balancers Application Gateway 3rd party solutions (NGINX, HAProxy, F5, Barracuda) L4 Availability Set L7 L7 L7
Demo Stages of availability
What do you do if a region fails? a) Use a Load Balancer to shift to additional resources b) Use Azure DNS to point to a paired region c) Use Traffic Manager to change regions d) Initiate your failover runbook in Azure Site Recovery e) Complain to @Azure on Twitter
How Traffic Manager handles endpoint loss Update DNS Zzzzzzz You there? Yup, all good. GET X No Response 200 OK 10s 20s 10s 20s 10s 20s 10s 20s 10s 20s 10s Time
Traffic Manager DNS-based load balancing and failover Health probe of endpoints to determine availability Impact of Time To Live (TTL) Multiple performance policies Performance Priority Weighted
Storage
Storage Types of storage Locally redundant (LRS) - 3 local copies strongly consistent Zone redundant (ZRS) – 3 copies across 2 or 3 facilities Geo-redundant (GRS) – 3 local copies strongly consistent, 3 remote copies eventually consistent Read access geo-redundant (RA-GRS) – Same as GRS but with read-only access of eventually consistent data Provides tables, blobs, queues, and files
Paired Regions Paired regions provide: Isolation Replication Primary Secondary North Central US South Central US East US West US West US 2 West Central US US East 2 Central US North Europe West Europe South East Asia East Asia Germany Northeast Germany Central Primary Secondary East Asia South East Asia East China North China Japan East Japan West Brazil South South Central US Australia East Australia Southeast Canada Central Canada East UK South UK West Paired regions provide: Isolation Replication Region order recovery Sequential updates Data residency https://msdn.microsoft.com/en-us/library/azure/hh873027.aspx
Storage – hidden point of failure? Standard vs Premium Disks live in a storage account A storage account exists on a storage stamp A storage stamp is a single point of failure Storage outage can span fault domains !
Demo: Storage account information tool
Storage account information TechReady 23 9/22/2018 5:47 PM http://aka.ms/StorageAccountInfo* POST to (Yes, it’s an Azure Function): https://storageinfo.azurewebsites.net/api/StorageInfo Post Content: {"storageUrl" : "storagetestblob1.blob.core.windows.net"} Response (in JSON): { "Region":"West US", "Datacenter":"4", "Stage":"Production", "Type":"Standard", "Stamp":"07", "Version":"Primary" } *This tool is not endorsed, supported, or in any way officially endorsed by Microsoft. Use at your own risk. Send feedback to aglick@microsoft.com © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Common architectures
Consistency models Strong consistency Session consistency Eventual consistency Consistency Availability Partition Tolerant == CAP Theorem AP CP
Common single-region deployment- RDBMS Storage (LRS) App Availability Set Front End … Database Queue Distributed Cache Blobs
Multi-region deployment Active/Passive Only one region online with DR failover Active/Active Round robin (all regions accept traffic) Active performance Many regions, user is routed to nearest
Multi-region deployment for OLTP (active/passive) Storage (RA-GRS) App Availability Set Front End … Database Queue Distributed Cache Blobs Storage (RA-GRS) App Availability Set Front End … Database Queue Blobs Distributed Cache Async West US (Primary) East US (Secondary)
Multi-region deployment (active/active) Storage (RA-GRS) App Availability Set Front End … Database Queue Distributed Cache Blobs Storage (RA-GRS) App Availability Set Front End … Database Queue Distributed Cache Blobs Session Consistency West US (Primary) East US (Secondary)
Demconvention.com Highly available Cross-regional Thousands of requests per second Under DDoS attack on Tuesday Running on Azure
Using a CDN for availability Protection from DDoS Greater availability and scale for your origin Other benefits: Lower latency content delivery Faster data delivery Lower data usage Origin obfuscation
Demo: Azure CDN
Fault Injection Chaos Monkey Tests for single point of failure Automated architecture testing Simulates failure
Session objectives and takeaways Tech Ready 15 9/22/2018 Session objectives and takeaways Session objective(s): Design highly available applications on the Azure platform Understand best practices and common pitfalls to avoid Learn the inner workings of Azure and how to best work around hidden limitations Note: This session focuses mostly on the design of Cloud-based apps using IaaS. © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Resources Azure High Availability High Availability Checklist – https://aka.ms/AzureHaChecklist Designing for High Availability – https://azure.microsoft.com/en-us/documentation/articles/resiliency-high-availability-azure-applications/ Overview of HA and DR - https://azure.microsoft.com/en-us/documentation/articles/resiliency-disaster-recovery-high-availability-azure-applications/
Related content Backup born-in-the-cloud and hybrid applications with Operations Management Suite and Azure Backup 9/27 – 10:45AM-12:00PM Review best practices for disaster recovery – from design to operations 9/27 – 2:15PM-3:30PM Reinvent disaster recovery leveraging Microsoft Azure cloud infrastructure 9/28 – 2:15PM-3:30PM Migrate and disaster recovery Azure workloads using Operations Management Suite 9/29 – 9:00AM-10:15AM Run highly available solutions on Microsoft Azure 9/29 – 12:30PM-1:45PM
Free IT Pro resources To advance your career in cloud technology Microsoft Ignite 2016 9/22/2018 5:47 PM Free IT Pro resources To advance your career in cloud technology Plan your career path Microsoft IT Pro Career Center www.microsoft.com/itprocareercenter Cloud role mapping Expert advice on skills needed Self-paced curriculum by cloud role $300 Azure credits and extended trials Pluralsight 3 month subscription (10 courses) Phone support incident Weekly short videos and insights from Microsoft’s leaders and engineers Connect with community of peers and Microsoft experts Get started with Azure Microsoft IT Pro Cloud Essentials www.microsoft.com/itprocloudessentials Demos and how-to videos Microsoft Mechanics www.microsoft.com/mechanics Connect with peers and experts Microsoft Tech Community https://techcommunity.microsoft.com © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Free IT Pro resources To advance your career in cloud technology Microsoft Ignite 2016 9/22/2018 5:47 PM Free IT Pro resources To advance your career in cloud technology Plan your career path IT Pro Career Center http://www.microsoft.com/itprocareercenter Get started with Azure IT Pro Cloud Essentials https://www.microsoft.com/itprocloudessentials Demos and how-to videos Microsoft Mechanics https://www.microsoft.com/mechanics Connect with peers and experts Ask questions, get answers, exchange ideas https://techcommunity.microsoft.com Azure Solutions Get started with Azure Solutions today http://azure.com/solutions Azure monthly webinar series Join live or watch on-demand http://aka.ms/AzureMonthlyWebinar © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Please evaluate this session 9/22/2018 5:47 PM Please evaluate this session Your feedback is important to us! From your PC or Tablet visit MyIgnite at http://myignite.microsoft.com From your phone download and use the Ignite Mobile App by scanning the QR code above or visiting https://aka.ms/ignite.mobileapp © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
9/22/2018 5:47 PM Thank You! © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
9/22/2018 5:47 PM © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
9/22/2018 Q&A If you have questions please proceed to the Q&A MICROPHONE located in your session room. © 2015 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
High availability checklist Use Traffic Manager Avoid single VMs Use load balancers in front of web-facing VMs Put your stateless servers in Availability Sets Use VMSS for your stateless server scaling Use Premium Storage for your production VMs Use internal load balancers (or queues) between tiers Distribute your database Use caches Contact support before a high scale event Store static assets in Blob Storage Use a CDN in front of your static assets a a a a a a a a a a a