Proactive RCA with Vitrage, Kubernetes, Zabbix and Prometheus

Slides:



Advertisements
Similar presentations
If you knew what I know or CloudWave - Improving services in the Cloud through collaborative adaptation Eliot Salant IBM Haifa Research.
Advertisements

Doctor Implementation Plan (Discussion) Feb. 6, 2015 Ryota Mibu, Tomi Juvonen, Gerald Kunzmann, Carlos Goncalves.
Cisco Confidential 1 © 2010 Cisco and/or its affiliates. All rights reserved. Next Generation Monitoring in Cisco Security Cloud Leon De Jager and Nitin.
Keeping our websites running - troubleshooting with Appdynamics Benoit Villaumie Lead Architect Guillaume Postaire Infrastructure Manager.
Efficient Autoscaling in the Cloud using Predictive Models for Workload Forecasting Roy, N., A. Dubey, and A. Gokhale 4th IEEE International Conference.
24 February 2015 Ryota Mibu, NEC
Maintaining and Updating Windows Server 2008
1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC.
Virtualization for Cloud Computing
H-1 Network Management Network management is the process of controlling a complex data network to maximize its efficiency and productivity The overall.
VAP What is a Virtual Application ? A virtual application is an application that has been optimized to run on virtual infrastructure. The application software.
An Introduction to AlarmInsight
Windows Azure Conference 2014 Running Docker on Windows Azure.
Get More out of SQL Server 2012 in the Microsoft Private Cloud environment Guy BowermanMadhan Arumugam DBI208.
Alert Logic Security and Compliance Solutions for vCloud Air High-level Overview.
Alert Logic Security and Compliance Solutions for vCloud Air High-level Overview.
Using Virtual Servers for the CERN Windows infrastructure Emmanuel Ormancey, Alberto Pace CERN, Information Technology Department.
Gerald Kunzmann, DOCOMO Carlos Goncalves, NEC Ryota Mibu, NEC
1 ALCATEL-LUCENT — PROPRIETARY AND CONFIDENTIAL COPYRIGHT © 2015 ALCATEL-LUCENT. ALL RIGHTS RESERVED. NFV transforms the way service providers architect.
Cloud Computing Lecture 5-6 Muhammad Ahmad Jan.
Gerald Kunzmann, DOCOMO Carlos Goncalves, NEC Ryota Mibu, NEC
Narasimha Reddy Gopu Jisha J. Agenda Introduction to AlwaysOn * AlwaysOn Availability Groups (AG) & Listener * AlwaysOn Failover * AlwaysOn Active Secondaries.
© 2012 Eucalyptus Systems, Inc. Cloud Computing Introduction Eucalyptus Education Services 2.
Ashiq Khan NTT DOCOMO Congress in NFV-based Mobile Cellular Network Fault Recovery Ryota Mibu NEC Masahito Muroi NTT Tomi Juvonen Nokia 28 April 2016OpenStack.
Ashiq Khan NTT DOCOMO Congress in NFV-based Mobile Cellular Network Fault Recovery Ryota Mibu NEC Masahito Muroi NTT Tomi Juvonen Nokia 28 April 2016OpenStack.
2016 Global Seminar 按一下以編輯母片標題樣式 Virtualization apps simplify your IoT development Alfred Li.
Failure Inspection in Doctor utilizing Vitrage and Congress
I/Watch™ Weekly Sales Conference Call Presentation (See next slide for dial-in details) Andrew May Technical Product Manager Dax French Product Specialist.
SQL Database Management
11/19/2017 9:41 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Fault Management with OpenStack Congress and Vitrage, Based on OPNFV Doctor Framework Barcelona 2016 Ryota Mibu NEC Ohad Shamir Nokia Masahito Muroi.
Avenues International Inc.
Dockerize OpenEdge Srinivasa Rao Nalla.
Doctor + OPenStack Congress
OptiView™ XG Network Analysis Tablet
Ashiq Khan, NTT DOCOMO Ryota Mibu, NEC
Gain visibility into your apps with Azure Application Insights
Docker Birthday #3.
Cisco Data Virtualization
6/12/ :48 PM BRK3223 Analyze and Debug apps across your DevOps workflow with Azure Application Insights Evgeny Ternovsky Senior Program Manager Rahul.
OPNFV Doctor - How OPNFV project works -
Doctor PoC Booth Vitrage Demo
Outline Introduction Characteristics of intrusion detection systems
Tomi Juvonen SW Architect, Nokia
Microsoft Build /20/2018 5:17 AM © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY,
Kubernetes Container Orchestration
Vitrage Project Update, OpenStack Summit Vancouver
Securing Cloud-Native Applications Jason Schmitt CEO
Healthcare Cloud Security Stack for Microsoft Azure
Intro to Docker Containers and Orchestration in the Cloud
Virtualization Layer Virtual Hardware Virtual Networking
Is your deployment in pants-down mode?
Vitrage hands-on lab Muhamad Najjar, Eyal Bar-Ilan CloudBand, Nokia
Chapter 2: System Structures
File Transfer Issues with TCP Acceleration with FileCatalyst
If you knew what I know or CloudWave - Improving services in the Cloud through collaborative adaptation Eliot Salant IBM Haifa Research.
IBM Containers Docker in the Cloud
On the Way to Cloud Native:
Technical Capabilities
AIMS Equipment & Automation monitoring solution
Beyond FTP & hard drives: Accelerating LAN file transfers
Vitrage hands-on lab Muhamad Najjar, Marina Koushnir CloudBand, Nokia
Windows Azure Hybrid Architectures and Patterns
OpenStack Summit Berlin – November 14, 2018
Vitrage Project Update, OpenStack Summit Berlin
What’s Happening with my App, Application Insights?
Title: Robust ONAP Platform Controller for LCM in a Distributed Edge Environment (In Progress) Source: ONAP Architecture Task Force on Edge Automation.
SQL Server on Containers
Presentation transcript:

Proactive RCA with Vitrage, Kubernetes, Zabbix and Prometheus Anna Reznikov (Nokia CloudBand) Dr. Liat Pele (Nokia CloudBand) Vitrage: Proactive Root Cause Analysis , OpenStack Summit Vancouver May 2018

We’re going to talk about… Reactive vs. Proactive monitoring & RCA Why Vitrage? Newly added data sources Demo Other ways of using Vitrage for Proactive RCA Future plans Diagnostic actions Bell Labs change detection algorithm Vitrage: Proactive Root Cause Analysis , OpenStack Summit Vancouver May 2018

Reactive vs. Proactive Monitoring and RCA Application execution is stopped Errors are not prevented  Warnings are not correlated Proactive: The end user is not affected Application execution is not interrupted Errors are prevented (manually or automatically) Warnings are correlated Diagnostics and preventive actions are triggered Vitrage: Proactive Root Cause Analysis , OpenStack Summit Vancouver May 2018

Introduction to Vitrage – Root Cause Analysis Service Project background​ Started 2.5 years ago at Nokia, during the Mitaka cycle​ Became an official project 6 months later​ First official version – Newton​ ~10 active contributors in the Queens release​ Vitrage: Advanced Use Cases, OpenStack Summit Boston May 2017

https://docs.openstack.org/vitrage/latest/ Introduction to Vitrage Vitrage is the OpenStack Root Cause Analysis project: Holistic & complete view of the system structure Organize OpenStack alarms & events Deduce alarms and states Root Cause Analysis Passing information through Vitrage notifiers https://docs.openstack.org/vitrage/latest/ Vitrage: Proactive Root Cause Analysis , OpenStack Summit Vancouver May 2018

What does Vitrage Include? Entity Graph Represents the relationships between the different entities Topology Graph Represents system health, allowing focusing on failing resources Visualized RCA Root cause analysis between alarms in the graph Cluste r Zone Host V M Vitrage: Proactive Root Cause Analysis , OpenStack Summit Vancouver May 2018

Using Vitrage for Proactive Root Cause Analysis Decisions based on information from several data sources Monitors on both physical , virtual and applications layers Decisions based on RCA Changing states Deducing alarms Cause actions Vitrage: Proactive Root Cause Analysis , OpenStack Summit Vancouver May 2018

Using Vitrage for Proactive Root Cause Analysis Automatic corrective actions using Mistral Auto evacuation Vitrage: Proactive Root Cause Analysis , OpenStack Summit Vancouver May 2018

Using Vitrage for Proactive Root Cause Analysis Use monitoring systems’ predictive capabilities Zabbix and Prometheus Vitrage: Proactive Root Cause Analysis , OpenStack Summit Vancouver May 2018

Using Vitrage for Proactive Root Cause Analysis Execute diagnostic actions Expensive tests – example: memory scan On demand monitoring More details later on Vitrage: Proactive Root Cause Analysis , OpenStack Summit Vancouver May 2018

Newly added data sources to Vitrage Vitrage: Proactive Root Cause Analysis , OpenStack Summit Vancouver May 2018

Tech stack of cloud-native VNFs Docker and Kubernetes "Docker packages applications and their dependencies together into an isolated container making them portable to any infrastructure. Eliminate the “works on my machine” problem once and for all." source: docker.com "Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications."

Deployment methods for container based VNFs Hybrid environment VNF VNF C C VM Kubernetes Docker Kubernetes OpenStack Bare-metal Docker HW Advantages of containers over VMs: Full isolation – better security Advantages of containers over bare metal: - Native performance - Light weight - Access to hardware functionality - Full portability

https://prometheus.io/docs/introduction/overview/ What is Prometheus? Efficient time series DB Flexible query language Alerting Many exports and integrations 63% of Kubernetes clusters https://prometheus.io/docs/introduction/overview/ Vitrage: Proactive Root Cause Analysis , OpenStack Summit Vancouver May 2018

Demo Zabbix predictive alarm “high CPU” on host Cause CPU stress Vitrage deduces critical alarm on host Causes performance degradation in k8s pod Prometheus alarm “performance degradation” on VM Vitrage: Proactive Root Cause Analysis , OpenStack Summit Vancouver May 2018

Demo Vitrage: Proactive Root Cause Analysis , OpenStack Summit Vancouver May 2018

More proactive possibilities in Vitrage Instead of deducing alarms, execute actions using Mistral Use Prometheus predictive functions Pluggable data sources Combine data from several data sources across multiple architecture layers Vitrage: Proactive Root Cause Analysis , OpenStack Summit Vancouver May 2018

Bell Labs Change Detection System Future plans: Diagnostic actions Bell Labs Change Detection System Vitrage: Proactive Root Cause Analysis , OpenStack Summit Vancouver May 2018

Diagnostic actions Alarm is raised on resource Vitrage suspects a root cause Trigger health check – Diagnostic actions Get response and present it Diagnostic action can be something that we not usually monitor, Some expensive test like memory scan. Vitrage: Proactive Root Cause Analysis , OpenStack Summit Vancouver May 2018

Bell Labs Change Detection System Background: OpenStack systems has many components and each one has a log. While errors are reflected in the logs, there are too many logs which are difficult to read. Challenge: Find the root cause of a problem and proactively notify problems, based on logs changed behavior. gil.einziger@nokia.com Vitrage: Proactive Root Cause Analysis , OpenStack Summit Vancouver May 2018

‘Typical’ stream behavior. Current stream behavior. Change Detection System: Under the hood. Compare a small (“lead”) window to a larger (“lag”) window in terms of average & stdev. Detection threshold is deduced dynamically and autonomously  from the lag window. Space efficiency through approximate windows. Lag window: Lead window: ‘Typical’ stream behavior. Current stream behavior. Flactuations Vitrage: Proactive Root Cause Analysis , OpenStack Summit Vancouver May 2018

Example: keystone logs Vitrage: Proactive Root Cause Analysis , OpenStack Summit Vancouver May 2018