Presentation is loading. Please wait.

Presentation is loading. Please wait.

Microservice Best Practices Lessons Learned from Azure Service Fabric Mark Russinovich CTO, Microsoft

Similar presentations


Presentation on theme: "Microservice Best Practices Lessons Learned from Azure Service Fabric Mark Russinovich CTO, Microsoft"— Presentation transcript:

1 Microservice Best Practices Lessons Learned from Azure Service Fabric Mark Russinovich CTO, Microsoft Azure @markrussinovich markruss@microsoft.com

2 Microservice types

3 Stateful microservice replication Service Fabric Cluster VMs Primary Secondary Replication

4 Azure Service Fabric Microsoft Azure On-Premise Private Cloud Other Clouds Features & Capabilities

5 Programming models and Languages Any code (arbitrary images and DLLs) Reliable Services Reliable Actors ASP.NET Core.NET and Java More coming…

6 Demo: Visual Objects: Java App in a Container http://aka.ms/ServiceFabricSamples http://aka.ms/ServiceFabricSamples

7 History of microservices at Microsoft 2003: “Windows Fabric” microservices platform project started 2007: Azure SQL DB service project starts with Windows Fabric 2009- many Azure and Microsoft services use Windows Fabric 2014: Decision to make Windows Fabric a public platform – renamed to Service Fabric April 2015: Developer SDK preview released at //build November 2015: Public preview of Azure Service Fabric March 2016: GA of Azure Service Fabric on Windows Sept 2016: Public preview of Azure Service Fabric for Linux

8 1000s of VMs Azure Core Infrastructure Services built with Service Fabric Billions of transactions per week Azure Document DB 4 trillion requests each week Azure Event Hubs 1.6 million DBs Azure SQL Database 500 million evals/sec Bing Cortana 1.1M devices Microsoft Intune … and more IoT Hub Skype for Business Power BI CRM Dynamics

9 Cluster sizes Small clusters are bad for HA Service Fabric recommends 5+ nodes Teams trying to use a smaller node count have experienced pain Lesson learned: “Embrace the yellow” At scale, machines fail Do not get your ability to go home on whether your cluster is A-OK

10 RDFE: Service management monolith Red Dog Frontend Large codebase working across compute, storage, networking and other core infrastructure components 4 releases per year 3-4 week release process One large blessed certificate for many operations across domains Primary DC for worldwide management operations

11 RDFE  Resource Provider Microservices Refactored into many microservices: compute, network, storage, … Teams think of microservices as.NET assemblies Had to avoid splits that cause excessive chattiness RPs have regional instances Improved availability, reduced latency Tokyo compute uses Tokyo CRP instead of Texas Worldwide upgrades in three days or less Security win: microservices operate within their limited domain Massive performance and scalability win: Compute and data collocated Efficient packing of VMs The team only rolls forward to simplify data versioning Feature flighting thanks to parallel microservice instances

12 CRP microservices design Compute Resource Provider 3 stateless app types, 5+ stateful app types

13 Partition Planning Initially wanted to have each data partition manage 1 – 10 Azure subscriptions Benefit: Fine-grained placement and load balancing Problem: explosion in microservice instances as subscription counts grow Lessons Learned: Partitioning scheme is most important aspect of design on Service Fabric Too many makes for more work for replication and too many open db connections Too few and your nodes are working with very large dataset Settled on 100 subscriptions/partition

14 Cascading failures

15 Cascading failures lessons Understand capacity Cascading failures are possible if a node dies within a cluster near maximum capacity Test with real world workloads Understand the effect of having nodes die with regard to aggregate capacity Watch your disk queue and seeks!

16 Counter-intuitive failures

17 Counter-intuitive failures lessons  Some actions you undertake will have counter intuitive reactions.  The initial cost of replication when adding new nodes can cause a cluster to spike disk IO, which can cause failures if under heavy load.  Incorporate throttling to lessen the load on the cluster if adding nodes near capacity

18 Thanks! Download the Service Fabric SDK http://aka.ms/servicefabricsdk Sample Code http://aka.ms/servicefabricsamples Documentation http://aka.ms/servicefabricdocs http://aka.ms/servicefabric Mark Russinovich CTO, Microsoft Azure Twitter: @markrussinovich E-mail: markruss@microsoft.commarkruss@microsoft.com

19 Reliability Subsystem Reliability, availability, replication, microservice orchestration Management Subsystem Application and microservice lifecycle Deployment, upgrade and monitoring Transport Subsystem Secure point-to-point communication Federation Subsystem Federates a set of nodes to form a consistent scalable cluster of machines Communication Subsystem Microservice discovery Hosting & Activation Application and microservice container activation Application Programming Models Declarative application description,.NET and C++ APIs Scalable, Reliable, Managed Applications Service Fabric Architecture Testability Subsystem Fault injection, test in production


Download ppt "Microservice Best Practices Lessons Learned from Azure Service Fabric Mark Russinovich CTO, Microsoft"

Similar presentations


Ads by Google