Download presentation
Presentation is loading. Please wait.
Published byRolf Ellis Modified over 8 years ago
1
Ashiq Khan NTT DOCOMO Congress in NFV-based Mobile Cellular Network Fault Recovery Ryota Mibu NEC Masahito Muroi NTT Tomi Juvonen Nokia 28 April 2016OpenStack Summit Austin
2
Agenda NFV Requirement [Ashiq] 8 min Implementation in OpenStack –High-level architecture for fault management [Ryota] 8 min –Resource state awareness [Tomi] 8min –Congress-based Inspection [Masahito] 10min –Demo [Ryota] 6min 2
3
NFV Requirement 3
4
Telco Requirements Mobile network requires high service availability 4 BTS Mobility Management Entity (MME) Each of these nodes hosts few thousands subscriber sessions - if down - all mobile phones will be disconnected - consequently will try to reconnect simultaneously - creating an ‘Attach’ storm - leading to further congestion/failure Failure recovery needs to be performed in sub-second order control data Global Datapath Gateway (P- GW) Local Datapath Gateway (S-GW) Local Datapath Gateway (S-GW)
5
Functional requirements Speedy failure detection and notification to the users –User could be a VNF Manager (VNFM) 5 Virtualized Infrastructure Manager (VIM) Hardware Hypervisor VNF (ACT) VNF (SBY) VNF Manager Who should I inform? A VIM (OpenStack) shall detect a failure event, find out appropriate users affected by the failure, and then notify the users VNF (ACT)
6
What is “failure”? Depends on –Applications (VNFs) –Back-end technologies used in the deployment –Redundancy of the equipment/components –Operator Policy –Regulation So, “failure” has to be configurable 6
7
High-level architecture for Fault Management 7
8
High Level Architecture of NFV 8 Virtualized Infrastructure Applications Application Manager (VIM User) Virtualized Infrastructure Manager (VIM) Virtual Compute Virtual Storage Virtual Network Virtualization Layer Hardware Resources App
9
Fault Management Flow 9 Virtualized Infrastructure Applications Application Manager (VIM User) Virtualized Infrastructure Manager (VIM) Virtual Compute Virtual Storage Virtual Network Virtualization Layer Hardware Resources App Detection Reaction Our Focus
10
Fault Management Functional Blocks and Sequence 10 Monitor Notifier Manager Virtualized Infrastructure Alarm Conf. 3. Update State 2. Find Affected Applications Controller Resourc e Map 1. Raw Fault Inspector 4. Notify all 4. (alt) Notify 5. Notify Error 0. Set Alarm 6-. Action Failur e Policy Monitor
11
Fault Management Functional Block Mapping 11 Monitor Notifier Manager Virtualized Infrastructure Alarm Conf. 3. Update State 2. Find Affected Applications Controller Resourc e Map 1. Raw Fault Inspector 4. Notify all 4. (alt) Notify 5. Notify Error 0. Set Alarm 6-. Action Failur e Policy Monitor Cinder Neutron Nova Ceilometer+Aodh Vitrage Congres s
12
Challenges in OpenStack Introducing Fault Management sequence across multiple OpenStack projects Letting OpenStack users know corresponding resource state properly/immediately Building correlation mechanism to support various OpenStack deployment flavors and operator policies 12
13
Blueprints in Liberty/Mitaka Cycles ProjectBlueprintSpec DrafterDeveloperStatus Ceilomete r/Aodh Event Alarm Evaluator Ryota Mibu (NEC) Completed (Liberty) Nova New nova API call to mark nova- compute down Tomi Juvonen (Nokia) Roman Dobosz (Intel) Completed (Liberty) Support forcing service down Tomi Juvonen (Nokia) Carlos Goncalves (NEC) Completed (Liberty) Get valid server state Tomi Juvonen (Nokia) Completed (Mitaka) Add notification for service status change Balazs Gibizer (Ericsson) Completed (Mitaka) CongressPush Type DataSource Driver Masahito Muroi (NTT) Completed (Mitaka) 13
14
Virtualized Infrastructure 4. (alt) Notify Development in OpenStack Monitor Notifier Manager Alarm Conf. 3. Update State 2. Find Affected Applications Controller Resourc e Map 1. Raw Fault Inspector 4. Notify all 5. Notify Error 0. Set Alarm 6-. Action Failur e Policy Monitor Cinder Neutron Nova Vitrage Congres s 14 Ceilometer+Aodh Resource State Awareness Congress-based Inspection
15
Resource State Awareness 15
16
Nova – Force-Down and Exposing host_state Host / Machine Hypervisor VM nova compute nova api nova conductor nova scheduler nova DB queue External Monitoring Service vSwitch BMC EXISTING (periodic update) Force-down API [Mark Nova-Compute Down] Notifying that the nova service is no longer available Client Service disable API Evacuation API Reset Server State API Allows Nova to integrate with External Monitoring Services, and make sure Nova handles requests for the host properly service state
17
Ceilometer - Event Alarm sample Notification-driven alarm evaluator NEW Shortcut (notification-based) EXISTING (polling-based) Manager Audit Service stats notification event CinderNeutronNova
18
Host / Machine Doctor BP Detail: Nova – Get valid server state ‘host_state’ can have values: UP if nova-compute is up. UNKNOWN if nova-compute not reported by service group driver. DOWN if nova-compute is forced down. MAINTENANCE if nova-compute is disabled. Empty string indicates there is no host for server. This attribute appears in the response only if the policy permits. Default is for admin, but in NFV case also owner should be enabled. nova api nova DB Force-down API Server API Server APIs to have ‘host_state’: GET /v2.1/ {tenant_id} /servers/detail GET /v2.1/ {tenant_id} /servers/ {server_id} nova compute nova conductor nova scheduler queue periodic update Service disable API service state
19
Congress-based Inspection 19
20
What is Congress? Governance as a Service –Define and enforce policy for Cloud Services Policy –No single definition Law/Regulations Business Rules Security Requirements Application Requirements –Any Service, any Policy 20
21
Congress Architecture 21 API Policy Engine Nova DataSourceDriv er Neutron DataSourceDriv er Keystone DataSourceDriv er Security System DataSourceDrive r Nova Neutron Keystone Security System Congress Policy Data Policy Enforcement
22
Requirements and gaps for Congress as Inspector RequirementsCongress FeaturesGaps Fast Failure NotificationPeriodical polling and policy enforcement Real-time policy enforcement Mapping of a physical failure to a logical failure Write a rule for mappingNone AdaptabilityChange Policy rulesNone 22
23
Congress PushType DataSource Driver 23 API Policy Engine Nova DataSourceDriv er Neutron DataSourceDriv er Keystone DataSourceDriv er Security System DataSourceDrive r Nova Neutron Keystone Security System Congress Policy Data Policy Enforcement New data flow PushType DataSourceDriv er Another Service Enables services outside Congress to push data, and improves reaction time for policy enforcement
24
Congress Doctor Driver 24 API Policy Engine Nova DataSourceDriv er Neutron DataSourceDriv er Keystone DataSourceDriv er Security System DataSourceDrive r Nova Neutron Keystone Security System Congress Policy Data Policy Enforcement New data flow Doctor DataSourceDriv er Monitor 1. 1.Monitor notifies hardware failure event to Congress 2.Doctor Driver receives failure event, insert it to event list of Doctor Data 3.Policy Engine receives the failure event, then evaluate registered policy and enforce state correction 4.Policy Engine instruct Nova Driver to perform host service force down and reset state of VM(s) 2. 3.4.
25
Congress Doctor Driver (Detail) Driver Schema ( HW failure example ) +--------+-----------------------------------------------------+ | table | columns | +--------+-----------------------------------------------------+ | events | {'name': 'id', 'description': 'None'}, | | | {'name': 'time', 'description': 'None'}, | | | {'name': 'type', 'description': 'None'}, | | | {'name': 'hostname', 'description': 'None'}, | | | {'name': 'status', 'description': 'None'}, | | | {'name': 'monitor', 'description': 'None'}, | | | {'name': 'monitor_event_id', 'description': 'None'} | +--------+-----------------------------------------------------+ Event List of Doctor Data +----------------+-------------------------------+----------------+---------------+--------+--------------+------------------+ | id | time | type | hostname | status | monitor | monitor_event_id | +----------------+-------------------------------+----------------+---------------+--------+--------------+------------------+ | 0123-4567-89ab | 2016-03-09T07:39:27.230277464 | host.nic1.down | demo-compute0 | down | demo_monitor | 111 | +----------------+-------------------------------+----------------+---------------+--------+--------------+------------------+ 25
26
Demo 26
27
Demo Scenarios Scenario 1. one of redundant NIC ports down failure Scenario 2. set of redundant NIC ports down failure Sequence 1.NIC Port(s) Down 2.Detect and notify the failure –Monitor detects the HW failure event and notifies it to Congress –Congress updates state of effected VMs to error –Ceilometer and Aodh notify the VM’s failure to app manager 3.Healing –The app manager switches active-standby, so that the service can continue 27 Controller (Nova, Congress, etc.) Compute1 Switch End User Video Serer (ACT) Compute2 Video Serer (SBY) Switch (Mgmt) Router
28
Conclusions Resource state awareness has been improved by state correction API enhancement, immediate notification to user and exposing host state flexible inspection is available with Congress Fault event API is opening up the way to support various backend technologies 28
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.