Failure Inspection in Doctor utilizing Vitrage and Congress

Slides:



Advertisements
Similar presentations
LACP Project Proposal.
Advertisements

High Availability Project Qiao Fu Project Progress Project details: – Weekly meeting: – Mailing list – Participants: Hui Deng
Doctor Implementation Plan (Discussion) Feb. 6, 2015 Ryota Mibu, Tomi Juvonen, Gerald Kunzmann, Carlos Goncalves.
1 Security on OpenStack 11/7/2013 Brian Chong – Global Technology Strategist.
Virtualized Infrastructure Deployment Policies (Copper) 19 February 2015 Bryan Sullivan, AT&T.
SDN in Openstack - A real-life implementation Leo Wong.
Zhipeng (Howard) Huang
Keith Wiles DPACC vNF Overview and Proposed methods Keith Wiles – v0.5.
FI-WARE – Future Internet Core Platform FI-WARE Cloud Hosting July 2011 High-level description.
24 February 2015 Ryota Mibu, NEC
(OpenStack Ceilometer)
1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC.
**DRAFT** Doctor Southbound API 14 April 2015 Ryota Mibu, NEC.
Understanding Active Directory
1 Doctor Fault Management - Updates - 30 July 2015 Ryota Mibu, NEC.
Presented by: Sanketh Beerabbi University of Central Florida COP Cloud Computing.
TOSCA Monitoring Straw-man for Initial Minimal Monitoring Use Case Roger Dev CA Technologies Revision 3 May 21, 2015.
**DRAFT** Blueprints Alignment (OpenStack Ceilometer) 4 March 2015 Ryota Mibu, NEC.
Fault Localization (Pinpoint) Project Proposal for OPNFV
EXPOSING OVS STATISTICS FOR Q UANTUM USERS Tomer Shani Advanced Topics in Storage Systems Spring 2013.
Gerald Kunzmann, DOCOMO Carlos Goncalves, NEC Ryota Mibu, NEC
Extending OVN Forwarding Pipeline Topology-based Service Injection
Ceilometer + Gnocchi + Aodh Architecture
DPACC Management Aspects
Gerald Kunzmann, DOCOMO Carlos Goncalves, NEC Ryota Mibu, NEC
1 OPNFV Summit 2015 Doctor Fault Management Gerald Kunzmann, DOCOMO Carlos Goncalves, NEC Ryota Mibu, NEC.
TOSCA Monitoring Straw-man for Initial Minimal Monitoring Use Case Roger Dev CA Technologies Revision 2 May 1, 2015.
© 2015 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property. 1 VF (Virtual Functions) Event.
Ashiq Khan NTT DOCOMO Congress in NFV-based Mobile Cellular Network Fault Recovery Ryota Mibu NEC Masahito Muroi NTT Tomi Juvonen Nokia 28 April 2016OpenStack.
Ashiq Khan NTT DOCOMO Congress in NFV-based Mobile Cellular Network Fault Recovery Ryota Mibu NEC Masahito Muroi NTT Tomi Juvonen Nokia 28 April 2016OpenStack.
Secure Access and Mobility Jason Kunst, Technical Marketing Engineer March 2016 Location Based Services with Mobility Services Engine ISE Location Services.
Co-ordination & Harmonisation of Advanced e-Infrastructures for Research and Education Data Sharing Grant.
What is OPNFV? Frank Brockners, Cisco. June 20–23, 2016 | Berlin, Germany.
June 20–23, 2016 | Berlin, Germany. Copper: Configuration Policy Management in OPNFV Colorado Bryan Sullivan, AT&T.
Congress Blueprint --policy abstraction
**DRAFT** Doctor+Congress OPNFV Summit June 2016 Doctor+Congress PoC team.
NFV Infrastructure Maintenance Automation by OPNFV Doctor
Keeping My (Telco) Cloud Afloat
Master Service Orchestrator (MSO)
ONAP and MEF LSO External API Framework Functional Reference Architecture 12 July 2017 Andy Mayer, Ph.D. © 2016 AT&T Intellectual Property. All rights.
Doron Orbach UCMDB Product Manager
Fault Management with OpenStack Congress and Vitrage, Based on OPNFV Doctor Framework Barcelona 2016 Ryota Mibu NEC Ohad Shamir Nokia Masahito Muroi.
X V Consumer C1 Consumer C2 Consumer C3
Doctor + OPenStack Congress
Ashiq Khan, NTT DOCOMO Ryota Mibu, NEC
Principles of Computer Security
OPNFV Doctor - How OPNFV project works -
Doctor PoC Booth Vitrage Demo
Multi-VIM/Cloud High Level Architecture
Tomi Juvonen SW Architect, Nokia
Tomi Juvonen Software Architect, Nokia
OpenStack Ani Bicaku 18/04/ © (SG)² Konsortium.
Proactive RCA with Vitrage, Kubernetes, Zabbix and Prometheus
Northbound API Dan Shmidt | January 2017
Vitrage Project Update, OpenStack Summit Vancouver
DPACC Management Aspects
Chapter 2 Database Environment Pearson Education © 2009.
Searchlight Lei Zhang Search service for OpenStack
Vitrage hands-on lab Muhamad Najjar, Eyal Bar-Ilan CloudBand, Nokia
On the Way to Cloud Native:
Technical Capabilities
Vitrage hands-on lab Muhamad Najjar, Marina Koushnir CloudBand, Nokia
OpenStack Ceilometer Blueprints for Liberty
Vitrage Project Update, OpenStack Summit Berlin
Chapter 2 Database Environment Pearson Education © 2009.
Doctor OpenStack Controller changes Tomi Juvonen Nokia
**DRAFT** NOVA Blueprint 03/10/2015
**DRAFT** Doctor Southbound API 23 Feb 2016 Ryota Mibu, NEC.
Title: Robust ONAP Platform Controller for LCM in a Distributed Edge Environment (In Progress) Source: ONAP Architecture Task Force on Edge Automation.
Presentation transcript:

Failure Inspection in Doctor utilizing Vitrage and Congress Ryota Mibu, NEC Ohad Shamir, Nokia Masahito Muroi, NTT

Agenda Failure Inspection in OPNFV Doctor - Ryota OpenStack Vitrage - Ohad OpenStack Congress - Masa

Failure Inspection in OPNFV Doctor

Virtualized Platform VM VM VM Net Net PM PM H/W Switch H/W Switch Volume Port Port Port Volume Net Net PM PM H/W Switch H/W Switch

Virtualized Platform X X VM VM VM Net Net PM PM H/W Switch H/W Switch Volume X Port Port Port Volume Net Net PM PM H/W Switch H/W Switch

What is “failure”? Depends on So, “failure” has to be configurable Applications (VNFs) Back-end technologies used in the deployment Redundancy of the equipment/components Operator Policy Regulation Topologies of Network / Power-supply So, “failure” has to be configurable

Fault Management Architecture Designed by Doctor Project Applications Manager 0. Set Alarm 6-. Action 5. Notify Error Virtualized Infrastructure 4. Notify all Controller Controller Notifier Nova Controller Resource Map Alarm Conf. Neutron Ceilometer+Aodh Cinder 3. Update State 2. Find Affected 4. (alt) Notify Monitor Monitor Inspector Monitor Failure Policy Congress 1. Raw Fault Vitrage

Inspector Module Options OpenStack Vitrage Various data source (OpenStack, Nagios, etc.) Ability to store and refer physical topologies for correlation Holistic & complete view of the system OpenStack Congress Dynamic data collection from OpenStack services Flexible policy definition for correlation (Datalog) Well integrated with other OpenStack projects

Stained glass art produced through the combination of brilliantly colored glass in varying degrees of transparency, creating a dynamic art form which is transformed with every variation in light Stained glass window Taken alone they don’t make sense Together they create a pattern

Vitrage in a nutshell Official OpenStack project for Root Cause Analysis Vitrage Functions Root Cause Analysis – understand what causes faults to occur Deduced alarms and states – raising alarms and modifying states based on system insights Holistic & complete view of the system Architecture Highlights Resource topology graph - reflect how different entities relate to one another Multiple data sources Configurable business logic Clear visualization of Vitrage insights

Vitrage High Level Architecture Horizon Plug-in: Hierarchical view Vitrage alarm list RCA diagram per alarm Entity graph view Expose Vitrage alarms and state changes to other projects or external systems Multiple Data Sources (extendible): External monitoring tools: Nagios, Zabbix OpenStack projects Physical topology Templates for deduced alarms and RCA: Each template can contain one or more scenarios (scenario = condition + action) Human readable Configurable

How does it work? Entity Graph + Evaluator Vitrage holds the system state (resources & alarms) as a property graph Entities are represented as graph vertices, and relationships are the edges between the vertices Each vertex and edge can also have additional properties Each edge has a special “label” indicating the type of relationship Intuitive modeling of relationship/interaction data Vitrage Evaluator listens to change events in the entity graph and upon event: Retrieve templates (scenarios) relevant to event Evaluate condition against the state of the Entity Graph Execute actions for each matched condition Switch Name: sw1 id: XXX Host Name: comp 0-0 attached Instance Status: Error contains Alarm on

Vitrage User Interface (Horizon plug-in)

Vitrage Example Host NIC failure (no HA) Monitor host NIC (public/ tenant network) by Nagios/ Zabbix, raise an alarm when failed Vitrage will receive alarm from monitoring tool, add it to the entity graph, connected to the Host NIC vertex Find matching scenario (template) and perform the following actions: Raise deduce alarm on host add it to the vitrage entity graph as well Change host state in Vitrage may trigger also calling Nova API to modify state Add causal relationship between alarms Once the deduced alarm on host is added, a similar flow will occur on the hosted instances and on relevant VNF Zone 1 Alarm 1 causes on attached Host NIC Host 1 Host 2 on contains Alarm 2 causes on VM 1 Alarm 3 causes on Alarm 4 VNF

Vitrage as Doctor Inspector Push and pull interfaces to various monitoring tools (e.g. Nagios, Zabbix) and to OpenStack projects -> fast failure notification Mapping between physical and logical failures Expose more faults and changes to resources (deduced alarms and states) Provide Root Cause Analysis indicators to the application manager Can be configured differently for different systems

OpenStack Congress

What is Congress? Governance as a Service Policy Define and enforce policy for Cloud Services Policy No single definition Law/Regulations Business Rules Security Requirements Application Requirements Any Service, any Policy

Congress Architecture Policy Data Policy Policy Enforcement Congress API Policy Engine Nova DataSourceDriver Neutron DataSourceDriver Keystone DataSourceDriver Security System DataSourceDriver Nova Neutron Keystone Security System

Requirements and gaps for Congress as Inspector Congress Features Gaps Fast Failure Notification Periodical polling and policy enforcement Real-time policy enforcement Mapping of a physical failure to a logical failure Write a rule for mapping None Adaptability Change Policy rules

Congress PushType DataSource Driver Policy Policy Another Service Policy Enforcement New data flow Congress Enables services outside Congress to push data, and improves reaction time for policy enforcement API PushType DataSourceDriver Policy Engine Nova DataSourceDriver Neutron DataSourceDriver Keystone DataSourceDriver Security System DataSourceDriver Nova Neutron Keystone Security System

Congress Doctor Driver Data Policy 1. Policy Monitor Policy Enforcement New data flow Congress API 2. Doctor DataSourceDriver 4. Policy Engine 3. Nova DataSourceDriver Monitor notifies hardware failure event to Congress Doctor Driver receives failure event, insert it to event list of Doctor Data Policy Engine receives the failure event, then evaluate registered policy and enforce state correction Policy Engine instruct Nova Driver to perform host service force down and reset state of VM(s) Neutron DataSourceDriver Keystone DataSourceDriver Security System DataSourceDriver Nova Neutron Keystone Security System

Congress Doctor Driver (Detail) Driver Schema(HW failure example) +--------+-----------------------------------------------------+ | table | columns | +--------+-----------------------------------------------------+ | events | {'name': 'id', 'description': 'None'}, | | | {'name': 'time', 'description': 'None'}, | | | {'name': 'type', 'description': 'None'}, | | | {'name': 'hostname', 'description': 'None'}, | | | {'name': 'status', 'description': 'None'}, | | | {'name': 'monitor', 'description': 'None'}, | | | {'name': 'monitor_event_id', 'description': 'None'} | +--------+-----------------------------------------------------+ Event List of Doctor Data +----------------+-------------------------------+----------------+---------------+--------+--------------+------------------+ | id | time | type | hostname | status | monitor | monitor_event_id | +----------------+-------------------------------+----------------+---------------+--------+--------------+------------------+ | 0123-4567-89ab | 2016-03-09T07:39:27.230277464 | host.nic1.down | demo-compute0 | down | demo_monitor | 111 | +----------------+-------------------------------+----------------+---------------+--------+--------------+------------------+

Doctor Wiki: https://wiki.opnfv.org/display/doctor Visit Doctor Booth!