@mayralois Mushroom Cloud Effect Alois Mayr Technology Lead – Dynatrace.

Slides:

Advertisements

Similar presentations

How We Manage SaaS Infrastructure Knowledge Track

Advertisements

Shape of a Network.

A Ridiculously Easy & Seriously Powerful SQL Cloud Database Itamar Haber AVP Ops & Solutions.

Auto-scaling Axis2 Web Services on Amazon EC2 By Afkham Azeez.

High Availability Deep Dive What’s New in vSphere 5 David Lane, Virtualization Engineer High Point Solutions.

How Law Firms are Using AWS Cloud IaaS FOR THE LEGAL IT PRO.

Towards Autonomic Adaptive Scaling of General Purpose Virtual Worlds Deploying a large-scale OpenSim grid using OpenStack cloud infrastructure and Chef.

VoipNow Core Solution capabilities and business value.

Adding scalability to legacy PHP web applications Overview Mario A. Valdez-Ramirez.

Business Continuity and DR, A Practical Implementation Mich Talebzadeh, Consultant, Deutsche Bank

Microsoft Ignite /16/2017 2:42 PM

Features Scalability Availability Latency Lifecycle Data Integrity Portability Manage Services Deliver Features Faster Create Business Value.

Building Resilient, Scalable Services with Microsoft Azure Service Fabric Mark Fussell Principal Program Manager Vipul Modi Principal Software.

Large Scale File Distribution Troy Raeder & Tanya Peters.

DatacenterMicrosoft Azure Consistency Connectivity Code.

Running Your Database in the Cloud Eran Levin VP R&D - Xeround.

Windows 2000 Advanced Server and Clustering Prepared by: Tetsu Nagayama Russ Smith Dale Pena.

High-Availability Linux.  Reliability  Availability  Serviceability.

Protect Your Business-Critical Data in the Cloud with SoftNAS, a Full-Featured, Highly Available Solution for the Agile Microsoft Azure Platform MICROSOFT.

How AWS Pricing Works Jinesh Varia Technology Evangelist.

Building micro-service based applications using Azure Service Fabric

Securing Microservice Architectures Laura Bell M239.

Stairway to the cloud or can we take the highway? Taivo Liik.

Ashish Prabhu Douglas Utzig High Availability Systems Group Server Technologies Oracle Corporation.

Alwayson Availability Groups

Creating highly available and resilient Microservices on Microsoft Azure Service Fabric

Auto-scaling Axis2 Web Services on Amazon EC2 By Afkham Azeez.

Navigation software platform: Automating the server configuration Igor Jovic, Whitecity Soft Case Study.

Fronting Tomcat With Apache V0.1 – Nguyễn Bá Thành Software Manager, Game Platform & Integration.

Shape of a Network 10/10/07. Topology  The way the computers are cabled together  Four different layouts  Logical topology describes the way data travels.

Features Scalability Manage Services Deliver Features Faster Create Business Value Availability Latency Lifecycle Data Integrity Portability.

Mick Badran Using Microsoft Service Fabric to build your next Solution with zero downtime – Lvl 300 CLD32 5.

LHC Logging Cluster Nilo Segura IT/DB. Agenda ● Hardware Components ● Software Components ● Transparent Application Failover ● Service definition.

And scales by cloning the app on multiple servers/VMs/Containers Traditional architecture approach Microservices architecture approach A microservice.

High-Availability MySQL with DR:BD and Heartbeat: MTV Japan mobile services ©2008 MTV Networks Japan K.K.

Structured Container Delivery Oscar Renalias Accenture Container Lead (NOTE: PASTE IN PORTRAIT AND SEND BEHIND FOREGROUND GRAPHIC FOR CROP)

Deploying Docker Datacenter on AWS © 2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.

MySQL HA An overview Kris Buytaert. ● Senior Linux and Open Source ● „Infrastructure Architect“ ● I don't remember when I started.

Architecting Enterprise Workloads on AWS Mike Pfeiffer.

Microservice Best Practices Lessons Learned from Azure Service Fabric Mark Russinovich CTO, Microsoft

100% Exam Passing Guarantee & Money Back Assurance

Running micro service environments is no free lunch

Dockerize OpenEdge Srinivasa Rao Nalla.

OpenLegacy Training Day Four Introduction to Microservices

Docker Birthday #3.

In-Depth Introduction to Docker

Introduction to Microservices Prepared for

Welcome to AWS Certification Exam

Microsoft Azure Service Fabric Overview

Service Fabric Patterns & Best Practices

2017 Real Questions

Get Amazon AWS-DevOps-Engineer-Professional Exam Real Questions - Amazon AWS-DevOps-Engineer-Professional Dumps Realexamdumps.com

Kubernetes Container Orchestration

Ansible and Zabbix Rushikesh Prabhune (Software Technical Consultant)

Microsoft Build /8/2018 5:15 AM © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY,

Microsoft Azure P wer Lunch

Securing Cloud-Native Applications Jason Schmitt CEO

Kubernetes intro.

Security in a Container based World

Learn. Imagine. Build. .NET Conf

How to Keep Running When Things Go Wrong

Service Virtualization

Cloud Computing Architecture

Redefinition of Business Continuity Strategies using Cloud Native Enterprise Architectures Frank Stienhans, CTO August 2016.

Johan Lindberg, inRiver

Setting up PostgreSQL for Production in AWS

Containers on Azure Peter Lasne Sr. Software Development Engineer

9/16/2019 6:55 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.

Presentation transcript:

@mayralois Mushroom Cloud Effect Alois Mayr Technology Lead – Dynatrace

@mayralois Monitoring for Highly Dynamic Container Environments

@mayralois Source: …there’s been the mushroom cloud effect oh yeah, everything screwed up

@mayralois The Mushroom Cloud Effect or What Happens When Containers Fail?

@mayralois Biggest LatAm e-commerce Company ~ U$ 2.5 billion revenue 4 sites: Americanas, Shoptime, Submarino, Soubarato ~ 150 hosts across 4 regions 5k-15k containers 1k-3k services

@mayralois TL;DR

@mayralois About Cloud-Scale Systems

@mayralois Important Aspects… Lots of (micro-)services Lots of communication between services Service dependencies Versioning and API compatibilities Zero downtime

@mayralois Platform-related Aspects Most often container-based Clustered for scalability Ephemeral containers Resilient architecture Cross AZ fail-overs SDN for communication

@mayralois Deployments are no Longer Static 7:00 a.m. Low load, service running with minimum redundancy 12:00 p.m. Scaled up service during peak load with failover of problematic node 7:00 p.m. Scaled back down to lower load, move to different geolocation

@mayralois Anatomy of dynamic environments

@mayralois All About (Service) Dependencies

@mayralois Failing containers… …may or may not have an (immediate) impact to service performance

@mayralois Cascading Failures Lead to a Mushroom Cloud Effect

@mayralois

The Hungry Container Breakdown Shared /logs partition on host No log rotation, no archiving for app logs No proper log management used for Docker environment Shared /logs partition ran out of space What was the problem?

@mayralois The Hungry Container Breakdown Container health checks failed Orchestration killed container and rescheduled new one Still no free space on /logs Termination and rescheduling /var/lib/docker ran out of space Cluster nodes were no longer able to run any containers How the problem has evolved over time?

@mayralois The Hungry Container Breakdown Services at the top of the graph Increased failure rates Lots of depending Tomcat and DB services affected How the problem affected services?

@mayralois

The Hungry Container Breakdown Log management tools for app logs --log-driver=none|syslog Remove container --rm=true /var/lib/docker deserves its own partition How the problem could have been avoided?

@mayralois The Hungry Container Breakdown Buggy Containers May Kill Your Nodes

@mayralois Try to Break Your Clusters Early (And be Prepared for Black Friday)

@mayralois Break Your Clusters Early Massive load testing! Survive three days of pain Include everything Services, Containers, Orchestration, EC2 instances

@mayralois Testing everything 13.3k containers (+nodes) 3,451 services

@mayralois

Automation Needed to Pinpoint the Root Cause of Cascading Failures!

@mayralois Want to learn more? Stop by our booth! G15

@mayralois Thank you!