Ashiq Khan NTT DOCOMO Congress in NFV-based Mobile Cellular Network Fault Recovery Ryota Mibu NEC Masahito Muroi NTT Tomi Juvonen Nokia 28 April 2016OpenStack.

Slides:



Advertisements
Similar presentations
ETSI NFV Management and Orchestration - An Overview
Advertisements

© 2012 IBM Corporation Architecture of Quantum Folsom Release Yong Sheng Gong ( 龚永生 ) gongysh #openstack-dev Quantum Core developer.
Bringing Together Linux-based Switches and Neutron
High Availability Project Qiao Fu Project Progress Project details: – Weekly meeting: – Mailing list – Participants: Hui Deng
Doctor Implementation Plan (Discussion) Feb. 6, 2015 Ryota Mibu, Tomi Juvonen, Gerald Kunzmann, Carlos Goncalves.
It’s the App, Stupid! Orchestration, Automation, Scaling & What’s in Between Yaron Parasol, Uri
1 Security on OpenStack 11/7/2013 Brian Chong – Global Technology Strategist.
Virtualized Infrastructure Deployment Policies (Copper) 19 February 2015 Bryan Sullivan, AT&T.
SDN in Openstack - A real-life implementation Leo Wong.
Zhipeng (Howard) Huang
Keith Wiles DPACC vNF Overview and Proposed methods Keith Wiles – v0.5.
Policy Architecture Discussion 18 May 2015 Bryan Sullivan, AT&T.
© 2015 AT&T Intellectual Property. All rights reserved. AT&T, the AT&T logo and all other AT&T marks contained herein are trademarks of AT&T Intellectual.
24 February 2015 Ryota Mibu, NEC
storage service component
(OpenStack Ceilometer)
HA Scenarios.
1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC.
**DRAFT** Doctor Southbound API 14 April 2015 Ryota Mibu, NEC.
1 Doctor Fault Management - Updates - 30 July 2015 Ryota Mibu, NEC.
Microsoft and Community Tour 2011 – Infrastrutture in evoluzione Community Tour 2011 Infrastrutture in evoluzione.
COMS E Cloud Computing and Data Center Networking Sambit Sahu
Introduction Database integral part of our day to day life Collection of related database Database Management System : software managing and controlling.
**DRAFT** Blueprints Alignment (OpenStack Ceilometer) 4 March 2015 Ryota Mibu, NEC.
Fault Localization (Pinpoint) Project Proposal for OPNFV
BoF: Open NFV Orchestration using Tacker
EXPOSING OVS STATISTICS FOR Q UANTUM USERS Tomer Shani Advanced Topics in Storage Systems Spring 2013.
Gerald Kunzmann, DOCOMO Carlos Goncalves, NEC Ryota Mibu, NEC
Extending OVN Forwarding Pipeline Topology-based Service Injection
Promise Resource Reservation 09 November 2015
Ceilometer + Gnocchi + Aodh Architecture
DPACC Management Aspects
Scaling the CERN OpenStack cloud Stefano Zilli On behalf of CERN Cloud Infrastructure Team 2.
Gerald Kunzmann, DOCOMO Carlos Goncalves, NEC Ryota Mibu, NEC
1 OPNFV Summit 2015 Doctor Fault Management Gerald Kunzmann, DOCOMO Carlos Goncalves, NEC Ryota Mibu, NEC.
**DRAFT** Doctor Host Maintenance Maintenance changes to OpenStack Nova 17 May 2016 Tomi Juvonen Nokia.
Ashiq Khan NTT DOCOMO Congress in NFV-based Mobile Cellular Network Fault Recovery Ryota Mibu NEC Masahito Muroi NTT Tomi Juvonen Nokia 28 April 2016OpenStack.
Secure Access and Mobility Jason Kunst, Technical Marketing Engineer March 2016 Location Based Services with Mobility Services Engine ISE Location Services.
Co-ordination & Harmonisation of Advanced e-Infrastructures for Research and Education Data Sharing Grant.
What is OPNFV? Frank Brockners, Cisco. June 20–23, 2016 | Berlin, Germany.
Failure Inspection in Doctor utilizing Vitrage and Congress
Doctor Host Maintenance Maintenance changes to OpenStack Nova 13 Jun 2016 Tomi Juvonen Nokia.
Congress Blueprint --policy abstraction
**DRAFT** Doctor+Congress OPNFV Summit June 2016 Doctor+Congress PoC team.
Doctor Tech Deep Dive Tomi Juvonen, Nokia Ryota Mibu, NEC.
NFV Infrastructure Maintenance Automation by OPNFV Doctor
Keeping My (Telco) Cloud Afloat
Security on OpenStack 11/7/2013
Fault Management with OpenStack Congress and Vitrage, Based on OPNFV Doctor Framework Barcelona 2016 Ryota Mibu NEC Ohad Shamir Nokia Masahito Muroi.
X V Consumer C1 Consumer C2 Consumer C3
MOBILE NETWORKS DISASTER RECOVERY USING SDN-NFV
Doctor + OPenStack Congress
Ashiq Khan, NTT DOCOMO Ryota Mibu, NEC
Maintenance changes to OpenStack Nova 21 Jun 2016 Tomi Juvonen Nokia
OPNFV Doctor - How OPNFV project works -
Doctor PoC Booth Vitrage Demo
Tomi Juvonen SW Architect, Nokia
Tomi Juvonen Software Architect, Nokia
OpenStack Ani Bicaku 18/04/ © (SG)² Konsortium.
GGF15 – Grids and Network Virtualization
Infrastructure Maintenance & Upgrade: Zero VNF Downtime with OPNFV Doctor on OCP Hardware AirFrame Open Rack V1 and V2 compatible Demo video:
Proactive RCA with Vitrage, Kubernetes, Zabbix and Prometheus
DPACC Management Aspects
Multi-VIM/Cloud High Level Architecture
OpenStack Ceilometer Blueprints for Liberty
Doctor OpenStack Controller changes Tomi Juvonen Nokia
**DRAFT** NOVA Blueprint 03/10/2015
**DRAFT** Doctor Southbound API 23 Feb 2016 Ryota Mibu, NEC.
Doctor Host Maintenance
Presentation transcript:

Ashiq Khan NTT DOCOMO Congress in NFV-based Mobile Cellular Network Fault Recovery Ryota Mibu NEC Masahito Muroi NTT Tomi Juvonen Nokia 28 April 2016OpenStack Summit Austin

Agenda NFV Requirement [Ashiq] 8 min Implementation in OpenStack –High-level architecture for fault management [Ryota] 8 min –Resource state awareness [Tomi] 8min –Congress-based Inspection [Masahito] 10min –Demo [Ryota] 6min 2

NFV Requirement 3

Telco Requirements Mobile network requires high service availability 4 BTS Mobility Management Entity (MME) Each of these nodes hosts few thousands subscriber sessions - if down - all mobile phones will be disconnected - consequently will try to reconnect simultaneously - creating an ‘Attach’ storm - leading to further congestion/failure Failure recovery needs to be performed in sub-second order control data Global Datapath Gateway (P- GW) Local Datapath Gateway (S-GW) Local Datapath Gateway (S-GW)

Functional requirements Speedy failure detection and notification to the users –User could be a VNF Manager (VNFM) 5 Virtualized Infrastructure Manager (VIM) Hardware Hypervisor VNF (ACT) VNF (SBY) VNF Manager Who should I inform? A VIM (OpenStack) shall detect a failure event, find out appropriate users affected by the failure, and then notify the users VNF (ACT)

What is “failure”? Depends on –Applications (VNFs) –Back-end technologies used in the deployment –Redundancy of the equipment/components –Operator Policy –Regulation So, “failure” has to be configurable 6

High-level architecture for Fault Management 7

High Level Architecture of NFV 8 Virtualized Infrastructure Applications Application Manager (VIM User) Virtualized Infrastructure Manager (VIM) Virtual Compute Virtual Storage Virtual Network Virtualization Layer Hardware Resources App

Fault Management Flow 9 Virtualized Infrastructure Applications Application Manager (VIM User) Virtualized Infrastructure Manager (VIM) Virtual Compute Virtual Storage Virtual Network Virtualization Layer Hardware Resources App Detection Reaction Our Focus

Fault Management Functional Blocks and Sequence 10 Monitor Notifier Manager Virtualized Infrastructure Alarm Conf. 3. Update State 2. Find Affected Applications Controller Resourc e Map 1. Raw Fault Inspector 4. Notify all 4. (alt) Notify 5. Notify Error 0. Set Alarm 6-. Action Failur e Policy Monitor

Fault Management Functional Block Mapping 11 Monitor Notifier Manager Virtualized Infrastructure Alarm Conf. 3. Update State 2. Find Affected Applications Controller Resourc e Map 1. Raw Fault Inspector 4. Notify all 4. (alt) Notify 5. Notify Error 0. Set Alarm 6-. Action Failur e Policy Monitor Cinder Neutron Nova Ceilometer+Aodh Vitrage Congres s

Challenges in OpenStack Introducing Fault Management sequence across multiple OpenStack projects Letting OpenStack users know corresponding resource state properly/immediately Building correlation mechanism to support various OpenStack deployment flavors and operator policies 12

Blueprints in Liberty/Mitaka Cycles ProjectBlueprintSpec DrafterDeveloperStatus Ceilomete r/Aodh Event Alarm Evaluator Ryota Mibu (NEC) Completed (Liberty) Nova New nova API call to mark nova- compute down Tomi Juvonen (Nokia) Roman Dobosz (Intel) Completed (Liberty) Support forcing service down Tomi Juvonen (Nokia) Carlos Goncalves (NEC) Completed (Liberty) Get valid server state Tomi Juvonen (Nokia) Completed (Mitaka) Add notification for service status change Balazs Gibizer (Ericsson) Completed (Mitaka) CongressPush Type DataSource Driver Masahito Muroi (NTT) Completed (Mitaka) 13

Virtualized Infrastructure 4. (alt) Notify Development in OpenStack Monitor Notifier Manager Alarm Conf. 3. Update State 2. Find Affected Applications Controller Resourc e Map 1. Raw Fault Inspector 4. Notify all 5. Notify Error 0. Set Alarm 6-. Action Failur e Policy Monitor Cinder Neutron Nova Vitrage Congres s 14 Ceilometer+Aodh Resource State Awareness Congress-based Inspection

Resource State Awareness 15

Nova – Force-Down and Exposing host_state Host / Machine Hypervisor VM nova compute nova api nova conductor nova scheduler nova DB queue External Monitoring Service vSwitch BMC EXISTING (periodic update) Force-down API [Mark Nova-Compute Down] Notifying that the nova service is no longer available Client Service disable API Evacuation API Reset Server State API Allows Nova to integrate with External Monitoring Services, and make sure Nova handles requests for the host properly service state

Ceilometer - Event Alarm sample Notification-driven alarm evaluator NEW Shortcut (notification-based) EXISTING (polling-based) Manager Audit Service stats notification event CinderNeutronNova

Host / Machine Doctor BP Detail: Nova – Get valid server state ‘host_state’ can have values: UP if nova-compute is up. UNKNOWN if nova-compute not reported by service group driver. DOWN if nova-compute is forced down. MAINTENANCE if nova-compute is disabled. Empty string indicates there is no host for server. This attribute appears in the response only if the policy permits. Default is for admin, but in NFV case also owner should be enabled. nova api nova DB Force-down API Server API Server APIs to have ‘host_state’: GET /v2.1/ ​ {tenant_id} ​ /servers/detail GET /v2.1/ ​ {tenant_id} ​ /servers/ ​ {server_id} ​ nova compute nova conductor nova scheduler queue periodic update Service disable API service state

Congress-based Inspection 19

What is Congress? Governance as a Service –Define and enforce policy for Cloud Services Policy –No single definition Law/Regulations Business Rules Security Requirements Application Requirements –Any Service, any Policy 20

Congress Architecture 21 API Policy Engine Nova DataSourceDriv er Neutron DataSourceDriv er Keystone DataSourceDriv er Security System DataSourceDrive r Nova Neutron Keystone Security System Congress Policy Data Policy Enforcement

Requirements and gaps for Congress as Inspector RequirementsCongress FeaturesGaps Fast Failure NotificationPeriodical polling and policy enforcement Real-time policy enforcement Mapping of a physical failure to a logical failure Write a rule for mappingNone AdaptabilityChange Policy rulesNone 22

Congress PushType DataSource Driver 23 API Policy Engine Nova DataSourceDriv er Neutron DataSourceDriv er Keystone DataSourceDriv er Security System DataSourceDrive r Nova Neutron Keystone Security System Congress Policy Data Policy Enforcement New data flow PushType DataSourceDriv er Another Service Enables services outside Congress to push data, and improves reaction time for policy enforcement

Congress Doctor Driver 24 API Policy Engine Nova DataSourceDriv er Neutron DataSourceDriv er Keystone DataSourceDriv er Security System DataSourceDrive r Nova Neutron Keystone Security System Congress Policy Data Policy Enforcement New data flow Doctor DataSourceDriv er Monitor 1. 1.Monitor notifies hardware failure event to Congress 2.Doctor Driver receives failure event, insert it to event list of Doctor Data 3.Policy Engine receives the failure event, then evaluate registered policy and enforce state correction 4.Policy Engine instruct Nova Driver to perform host service force down and reset state of VM(s)

Congress Doctor Driver (Detail) Driver Schema ( HW failure example ) | table | columns | | events | {'name': 'id', 'description': 'None'}, | | | {'name': 'time', 'description': 'None'}, | | | {'name': 'type', 'description': 'None'}, | | | {'name': 'hostname', 'description': 'None'}, | | | {'name': 'status', 'description': 'None'}, | | | {'name': 'monitor', 'description': 'None'}, | | | {'name': 'monitor_event_id', 'description': 'None'} | Event List of Doctor Data | id | time | type | hostname | status | monitor | monitor_event_id | | ab | T07:39: | host.nic1.down | demo-compute0 | down | demo_monitor | 111 |

Demo 26

Demo Scenarios Scenario 1. one of redundant NIC ports down  failure Scenario 2. set of redundant NIC ports down  failure Sequence 1.NIC Port(s) Down 2.Detect and notify the failure –Monitor detects the HW failure event and notifies it to Congress –Congress updates state of effected VMs to error –Ceilometer and Aodh notify the VM’s failure to app manager 3.Healing –The app manager switches active-standby, so that the service can continue 27 Controller (Nova, Congress, etc.) Compute1 Switch End User Video Serer (ACT) Compute2 Video Serer (SBY) Switch (Mgmt) Router

Conclusions Resource state awareness has been improved by state correction API enhancement, immediate notification to user and exposing host state flexible inspection is available with Congress Fault event API is opening up the way to support various backend technologies 28