Download presentation
Presentation is loading. Please wait.
Published byJayson Hart Modified over 9 years ago
1
1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC
2
2 Doctor Overview One of OPNFV Requirement Project (Identify requirement, Gap Analysis, Implementation Study) Goal –Build fault management and maintenance framework for high availability of Network Services on top of virtualized infrastructure –Valuable and acceptable framework for other industries Status –Initial Requirement study, architecture design, Gap analysis : Done (See Document [link]) –Collaborative Development: Started (Blueprints are proposed to Nova and Ceilometer) –Standardization Sync: On-going (by NFV member efforts, joint meeting)
3
3 Use Case 1: Fault management
4
4 Use Case 2: Maintenance
5
5 High Level Architecture Virtualized Infrastructure Applications VIM User and Administrator Virtualized Infrastructure Manager (VIM) = OpenStack Virtual Compute Virtual Storage Virtual Network Virtualization Layer Hardware Resources App
6
6 Fault Management Sequence Virtualized Infrastructure Applications VIM User and Administrator Virtualized Infrastructure Manager (VIM) = OpenStack Virtual Compute Virtual Storage Virtual Network Virtualization Layer Hardware Resources App Detection Reaction Doctor Initial Focus
7
7 Key Requirements as VIM Immediate notification to VIM user and administrator Fault notification of affected virtual resources (Correlation) Configurable notification by VIM admin and user (Pub/Sub) Catch all faults in NFVI (pluggability for various technologies and future extentions)
8
8 Key Requirements as VIM Immediate Notification Consistent Resource State Awareness Extensible Monitoring Fault Correlation
9
9 TO-BE: Functional Blocks Virtualized Infrastructure Applications VIM User and Administrator VIM Virtual Compute Virtual Storage Virtual Network Virtualization Layer Hardware Resources App Notifier Monitor Controller Inspector
10
10 Fault Management Scenarios (1/2) Monitor Notifier User-side Manager Virtualized Infrastructure Alarm Conf. 3. Update State 2. Find Affected Applications Controller Resourc e Map 1. Raw Failure Inspector 4. Notify all 4. (alt) Notify Admin-side Manager 5. Notify Error 0. Set Alarm 6-. Action Failur e Policy Monitor
11
11 Fault Management Scenarios (2/2) Monitor Notifier User-side Manager Virtualized Infrastructure Alarm Conf. 3. Update State 2. Find Affected Applications Controller Resourc e Map 1. Raw Failure Inspector 4. Notify all 4. (alt) Notify Admin-side Manager 5. Notify Error 0. Set Alarm 6-. Action Failur e Policy Monitor
12
12 AS-IS: OpenStack Kilo (1/3) How can you find faults as a tenant user? –Keep-a-live check to each VM –Polling VM state to Nova API –Set alarm on metering service (e.g. CPU runtime)
13
13 AS-IS: OpenStack Kilo (2/3) How does the metering service work? 1.Resource controller such as Nova monitors usage of resource [Periodically] 2.Get samples from resource controller and register them to DB [Periodically] 3.Evaluate alarm definition on samples [Periodically] 4.Raise alarm depend on result of the evaluation Machine Hypervisor VM NovaCeilometer(Heat) Samples 1. 2. 3. 4.
14
14 AS-IS: OpenStack Kilo (3/3) Notification –OpenStack components post events to messaging queue –Ceilometer collects, transform and publish those events which can be used for billing 14 NFVI NeutronCeilometer(Billing) Samples Nova Cinder Queue
15
15 Implementation Plan in OpenStack 15 Ceilomter Virtualized Infrastructure Applications Zabbix VIM User and Administrator Error Injection Plugin ? Event Alarm Immediate Notification Queue Inspector Nova
16
16 Demo (1/3) User Scenario Web Server Load Balancer HTTP Clients Public Net Private Net Launch New VM
17
17 Demo (2/3) Demo 1 Demo 2 Machine Hypervisor VM Nova Ceilometer (Heat) Samples 1. Collect CPU time samples 2. Alarm Heat if CPU runtime = 0 3. Create New Web Server 1. Hook 3. Alarm Heat AgentAlarm 2. Notify as Event Machine Hypervisor VM Nova Ceilometer (Heat)AgentAlarm
18
18 Demo (3/3) Results Demo 1 Demo 2 90 sec 26 sec
19
19 Doctor Southbound API User NFVI Conf. Polic y ControllerInspectorNotifier Admin Conf. Monitor Configuration Fault Messaging Unified Event API Monitor Threshold Enable
20
20 Case 1: Obvious Fault User NFVI Conf. Polic y ControllerInspectorNotifier Admin Conf. Monitor ZabbixBMC (Inspector ) Nova Ceilomete r User Configuration Fault Messaging SNMP Trap (Power-off) HTTP POST (Host A down) HTTP POST (Host A down, VM A1-A3 down) HTTP POST (VM A1 down) HTTP POST (Alert: VM A1 down) HTTP POST (Create Alarm) Enable
21
21 Case 2: Threshold Exceeded Fault (Admin Config) User NFVI Conf. Polic y ControllerInspectorNotifier Admin Conf. Monitor Zabbix Monitor Agent (Inspector ) Nova Ceilomete r User Configuration Fault Messaging HTTP POST (Switch down) HTTP POST (Host A down, VM A1-A3 down) HTTP POST (VM A1 down) HTTP POST (Alert: VM A1 down) HTTP POST (Create Alarm) Threshold Enable vSwitch collectd Admin Threshold
22
22 Backup
23
23 Fault Management Sequence (Optional) Virtualized Infrastructure Applications VIM User and Administrator Virtualized Infrastructure Manager (VIM) = OpenStack Virtual Compute Virtual Storage Virtual Network Virtualization Layer Hardware Resources App Auto Reaction Detection Reaction
24
24 Fault Management Scenarios (Optional) Monitor Notifier User-side Manager Virtualized Infrastructure Alarm Conf. 3. Update State 2. Find Affected Applications Controller Resourc e Map 1. Raw Failure Inspector 4. Notify all 4. (alt) Notify Admin-side Manager 5. Notify Error 0. Set Alarm 6-. Action Failur e Policy Monitor Auto Reaction Monitor
25
25 Configuration / Policy Enforcement 25 User NFVI Conf. Polic y InspectorNotifier Admin Policy Service Conf. Monitor Configuration Fault Messaging Option 1: Policy Service Integration Option 2: Using Metadata in Controller Metadata Threshold Enable Metadata Controller Policy Threshold Enable
26
26 Case 3: Threshold Exceeded Fault (User Config) 26 User NFVI Conf. Polic y ControllerInspectorNotifier Admin Conf. Monitor Zabbix Monitor Agent (Inspector ) Nova Ceilomete r User Configuration Fault Messaging HTTP POST (Switch down) HTTP POST (Host A down, VM A1-A3 down) HTTP POST (VM A1 down) HTTP POST (Alert: VM A1 down) HTTP POST (Create Resource with Policy Label) vSwitch collectd Admin Policy Service Enable Threshold Enable Threshold Policy Congress HTTP POST (Set Policy) HTTP POST (Data) Metadata
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.