1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC.

Slides:



Advertisements
Similar presentations
HP OpenView Network Node Manager
Advertisements

© 2012 IBM Corporation Architecture of Quantum Folsom Release Yong Sheng Gong ( 龚永生 ) gongysh #openstack-dev Quantum Core developer.
CloudStack Scalability Testing, Development, Results, and Futures Anthony Xu Apache CloudStack contributor.
High Availability Project Qiao Fu Project Progress Project details: – Weekly meeting: – Mailing list – Participants: Hui Deng
Doctor Implementation Plan (Discussion) Feb. 6, 2015 Ryota Mibu, Tomi Juvonen, Gerald Kunzmann, Carlos Goncalves.
An Approach to Secure Cloud Computing Architectures By Y. Serge Joseph FAU security Group February 24th, 2011.
Virtualized Infrastructure Deployment Policies (Copper) 19 February 2015 Bryan Sullivan, AT&T.
SDN in Openstack - A real-life implementation Leo Wong.
Zhipeng (Howard) Huang
Policy Architecture Discussion 18 May 2015 Bryan Sullivan, AT&T.
© 2015 AT&T Intellectual Property. All rights reserved. AT&T, the AT&T logo and all other AT&T marks contained herein are trademarks of AT&T Intellectual.
OSCAR Project Proposed Project for OPNFV
24 February 2015 Ryota Mibu, NEC
OSCAR Project Proposed Project for OPNFV
(OpenStack Ceilometer)
**DRAFT** Doctor Southbound API 14 April 2015 Ryota Mibu, NEC.
QTIP Version 0.2 4th August 2015.
Cisco and OpenStack Lew Tucker VP/CTO Cloud Computing Cisco Systems,
OpenContrail for OPNFV
Opensource for Cloud Deployments – Risk – Reward – Reality
1 Doctor Fault Management - Updates - 30 July 2015 Ryota Mibu, NEC.
An Introduction to IBM Systems Director
Lec 3: Infrastructure of Network Management Part2 Organized by: Nada Alhirabi NET 311.
COMS E Cloud Computing and Data Center Networking Sambit Sahu
Virtual Server Monitoring Solution Overview. Agenda MonitorIT Overview Solution Demonstration Questions Contact Information.
Microsoft Virtual Academy. STANDARDIZATION SELF SERVICEAUTOMATION Give Customers of IT services the ability to identify, access and request services.
**DRAFT** Blueprints Alignment (OpenStack Ceilometer) 4 March 2015 Ryota Mibu, NEC.
Fault Localization (Pinpoint) Project Proposal for OPNFV
Using Heat to Deploy and Manage Applications in OpenStack Trevor Roberts Jr, VMware, Inc. CNA1763 #CNA1763.
Gerald Kunzmann, DOCOMO Carlos Goncalves, NEC Ryota Mibu, NEC
Extending OVN Forwarding Pipeline Topology-based Service Injection
Promise Resource Reservation 09 November 2015
Ceilometer + Gnocchi + Aodh Architecture
Gerald Kunzmann, DOCOMO Carlos Goncalves, NEC Ryota Mibu, NEC
1 OPNFV Summit 2015 Doctor Fault Management Gerald Kunzmann, DOCOMO Carlos Goncalves, NEC Ryota Mibu, NEC.
Building Cloud Solutions Presenter Name Position or role Microsoft Azure.
Module Objectives At the end of the module, you will be able to:
© 2015 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property. 1 VF (Virtual Functions) Event.
Ashiq Khan NTT DOCOMO Congress in NFV-based Mobile Cellular Network Fault Recovery Ryota Mibu NEC Masahito Muroi NTT Tomi Juvonen Nokia 28 April 2016OpenStack.
Ashiq Khan NTT DOCOMO Congress in NFV-based Mobile Cellular Network Fault Recovery Ryota Mibu NEC Masahito Muroi NTT Tomi Juvonen Nokia 28 April 2016OpenStack.
Co-ordination & Harmonisation of Advanced e-Infrastructures for Research and Education Data Sharing Grant.
What is OPNFV? Frank Brockners, Cisco. June 20–23, 2016 | Berlin, Germany.
Lec 3: Infrastructure of Network Management Part2 Organized by: Nada Alhirabi NET 311.
Failure Inspection in Doctor utilizing Vitrage and Congress
Project Cumulus Overview March 15, End Goal Unified Public & Private PaaS for GlassFish/Java EE Simplify deployment of Java EE Apps on top of.
**DRAFT** Doctor+Congress OPNFV Summit June 2016 Doctor+Congress PoC team.
Doctor Tech Deep Dive Tomi Juvonen, Nokia Ryota Mibu, NEC.
NFV Infrastructure Maintenance Automation by OPNFV Doctor
Keeping My (Telco) Cloud Afloat
Fault Management with OpenStack Congress and Vitrage, Based on OPNFV Doctor Framework Barcelona 2016 Ryota Mibu NEC Ohad Shamir Nokia Masahito Muroi.
X V Consumer C1 Consumer C2 Consumer C3
Doctor + OPenStack Congress
Ashiq Khan, NTT DOCOMO Ryota Mibu, NEC
OPNFV Doctor - How OPNFV project works -
Doctor PoC Booth Vitrage Demo
Dovetail project update
Cloud Management Mechanisms
Usage of Openstack Cloud Computing Architecture in COE Seowon Jung Systems Administrator, COE
Tomi Juvonen SW Architect, Nokia
Tomi Juvonen Software Architect, Nokia
OpenStack Ani Bicaku 18/04/ © (SG)² Konsortium.
OpenStack-alapú privát felhő üzemeltetés
OpenStack Ceilometer Blueprints for Liberty
Robert Down & Pranay Sadarangani Nov 8th 2011
Doctor OpenStack Controller changes Tomi Juvonen Nokia
**DRAFT** NOVA Blueprint 03/10/2015
**DRAFT** Doctor Southbound API 23 Feb 2016 Ryota Mibu, NEC.
Latest Update on Gap Analysis of Openstack for DPACC
Doctor Host Maintenance
Presentation transcript:

1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC

2 Doctor Overview One of OPNFV Requirement Project (Identify requirement, Gap Analysis, Implementation Study) Goal –Build fault management and maintenance framework for high availability of Network Services on top of virtualized infrastructure –Valuable and acceptable framework for other industries Status –Initial Requirement study, architecture design, Gap analysis : Done (See Document [link]) –Collaborative Development: Started (Blueprints are proposed to Nova and Ceilometer) –Standardization Sync: On-going (by NFV member efforts, joint meeting)

3 Use Case 1: Fault management

4 Use Case 2: Maintenance

5 High Level Architecture Virtualized Infrastructure Applications VIM User and Administrator Virtualized Infrastructure Manager (VIM) = OpenStack Virtual Compute Virtual Storage Virtual Network Virtualization Layer Hardware Resources App

6 Fault Management Sequence Virtualized Infrastructure Applications VIM User and Administrator Virtualized Infrastructure Manager (VIM) = OpenStack Virtual Compute Virtual Storage Virtual Network Virtualization Layer Hardware Resources App Detection Reaction Doctor Initial Focus

7 Key Requirements as VIM Immediate notification to VIM user and administrator Fault notification of affected virtual resources (Correlation) Configurable notification by VIM admin and user (Pub/Sub) Catch all faults in NFVI (pluggability for various technologies and future extentions)

8 Key Requirements as VIM Immediate Notification Consistent Resource State Awareness Extensible Monitoring Fault Correlation

9 TO-BE: Functional Blocks Virtualized Infrastructure Applications VIM User and Administrator VIM Virtual Compute Virtual Storage Virtual Network Virtualization Layer Hardware Resources App Notifier Monitor Controller Inspector

10 Fault Management Scenarios (1/2) Monitor Notifier User-side Manager Virtualized Infrastructure Alarm Conf. 3. Update State 2. Find Affected Applications Controller Resourc e Map 1. Raw Failure Inspector 4. Notify all 4. (alt) Notify Admin-side Manager 5. Notify Error 0. Set Alarm 6-. Action Failur e Policy Monitor

11 Fault Management Scenarios (2/2) Monitor Notifier User-side Manager Virtualized Infrastructure Alarm Conf. 3. Update State 2. Find Affected Applications Controller Resourc e Map 1. Raw Failure Inspector 4. Notify all 4. (alt) Notify Admin-side Manager 5. Notify Error 0. Set Alarm 6-. Action Failur e Policy Monitor

12 AS-IS: OpenStack Kilo (1/3) How can you find faults as a tenant user? –Keep-a-live check to each VM –Polling VM state to Nova API –Set alarm on metering service (e.g. CPU runtime)

13 AS-IS: OpenStack Kilo (2/3) How does the metering service work? 1.Resource controller such as Nova monitors usage of resource [Periodically] 2.Get samples from resource controller and register them to DB [Periodically] 3.Evaluate alarm definition on samples [Periodically] 4.Raise alarm depend on result of the evaluation Machine Hypervisor VM NovaCeilometer(Heat) Samples

14 AS-IS: OpenStack Kilo (3/3) Notification –OpenStack components post events to messaging queue –Ceilometer collects, transform and publish those events which can be used for billing 14 NFVI NeutronCeilometer(Billing) Samples Nova Cinder Queue

15 Implementation Plan in OpenStack 15 Ceilomter Virtualized Infrastructure Applications Zabbix VIM User and Administrator Error Injection Plugin ? Event Alarm Immediate Notification Queue Inspector Nova

16 Demo (1/3) User Scenario Web Server Load Balancer HTTP Clients Public Net Private Net Launch New VM

17 Demo (2/3) Demo 1 Demo 2 Machine Hypervisor VM Nova Ceilometer (Heat) Samples 1. Collect CPU time samples 2. Alarm Heat if CPU runtime = 0 3. Create New Web Server 1. Hook 3. Alarm Heat AgentAlarm 2. Notify as Event Machine Hypervisor VM Nova Ceilometer (Heat)AgentAlarm

18 Demo (3/3) Results Demo 1 Demo 2 90 sec 26 sec

19 Doctor Southbound API User NFVI Conf. Polic y ControllerInspectorNotifier Admin Conf. Monitor Configuration Fault Messaging Unified Event API Monitor Threshold Enable

20 Case 1: Obvious Fault User NFVI Conf. Polic y ControllerInspectorNotifier Admin Conf. Monitor ZabbixBMC (Inspector ) Nova Ceilomete r User Configuration Fault Messaging SNMP Trap (Power-off) HTTP POST (Host A down) HTTP POST (Host A down, VM A1-A3 down) HTTP POST (VM A1 down) HTTP POST (Alert: VM A1 down) HTTP POST (Create Alarm) Enable

21 Case 2: Threshold Exceeded Fault (Admin Config) User NFVI Conf. Polic y ControllerInspectorNotifier Admin Conf. Monitor Zabbix Monitor Agent (Inspector ) Nova Ceilomete r User Configuration Fault Messaging HTTP POST (Switch down) HTTP POST (Host A down, VM A1-A3 down) HTTP POST (VM A1 down) HTTP POST (Alert: VM A1 down) HTTP POST (Create Alarm) Threshold Enable vSwitch collectd Admin Threshold

22 Backup

23 Fault Management Sequence (Optional) Virtualized Infrastructure Applications VIM User and Administrator Virtualized Infrastructure Manager (VIM) = OpenStack Virtual Compute Virtual Storage Virtual Network Virtualization Layer Hardware Resources App Auto Reaction Detection Reaction

24 Fault Management Scenarios (Optional) Monitor Notifier User-side Manager Virtualized Infrastructure Alarm Conf. 3. Update State 2. Find Affected Applications Controller Resourc e Map 1. Raw Failure Inspector 4. Notify all 4. (alt) Notify Admin-side Manager 5. Notify Error 0. Set Alarm 6-. Action Failur e Policy Monitor Auto Reaction Monitor

25 Configuration / Policy Enforcement 25 User NFVI Conf. Polic y InspectorNotifier Admin Policy Service Conf. Monitor Configuration Fault Messaging Option 1: Policy Service Integration Option 2: Using Metadata in Controller Metadata Threshold Enable Metadata Controller Policy Threshold Enable

26 Case 3: Threshold Exceeded Fault (User Config) 26 User NFVI Conf. Polic y ControllerInspectorNotifier Admin Conf. Monitor Zabbix Monitor Agent (Inspector ) Nova Ceilomete r User Configuration Fault Messaging HTTP POST (Switch down) HTTP POST (Host A down, VM A1-A3 down) HTTP POST (VM A1 down) HTTP POST (Alert: VM A1 down) HTTP POST (Create Resource with Policy Label) vSwitch collectd Admin Policy Service Enable Threshold Enable Threshold Policy Congress HTTP POST (Set Policy) HTTP POST (Data) Metadata