Doctor Tech Deep Dive Tomi Juvonen, Nokia Ryota Mibu, NEC.

Slides:



Advertisements
Similar presentations
© 2012 IBM Corporation Architecture of Quantum Folsom Release Yong Sheng Gong ( 龚永生 ) gongysh #openstack-dev Quantum Core developer.
Advertisements

Doctor Implementation Plan (Discussion) Feb. 6, 2015 Ryota Mibu, Tomi Juvonen, Gerald Kunzmann, Carlos Goncalves.
1 Security on OpenStack 11/7/2013 Brian Chong – Global Technology Strategist.
Virtualized Infrastructure Deployment Policies (Copper) 19 February 2015 Bryan Sullivan, AT&T.
Zhipeng (Howard) Huang
24 February 2015 Ryota Mibu, NEC
(OpenStack Ceilometer)
1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC.
**DRAFT** Doctor Southbound API 14 April 2015 Ryota Mibu, NEC.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
1 Doctor Fault Management - Updates - 30 July 2015 Ryota Mibu, NEC.
**DRAFT** Blueprints Alignment (OpenStack Ceilometer) 4 March 2015 Ryota Mibu, NEC.
Fault Localization (Pinpoint) Project Proposal for OPNFV
Gerald Kunzmann, DOCOMO Carlos Goncalves, NEC Ryota Mibu, NEC
CoprHD and OpenStack Ideas for future.
Ceilometer + Gnocchi + Aodh Architecture
Updates made to latest draft since Herndon Sony Corporation Toshiaki Kojima.
Gerald Kunzmann, DOCOMO Carlos Goncalves, NEC Ryota Mibu, NEC
1 OPNFV Summit 2015 Doctor Fault Management Gerald Kunzmann, DOCOMO Carlos Goncalves, NEC Ryota Mibu, NEC.
Service Charging Platform. EMS (Entity Management System) 0 Logging Agent Provides detailed activity logs and reports all raw facts as they happen to.
Active-HDL Server Farm Course 11. All materials updated on: September 30, 2004 Outline 1.Introduction 2.Advantages 3.Requirements 4.Installation 5.Architecture.
**DRAFT** Doctor Host Maintenance Maintenance changes to OpenStack Nova 17 May 2016 Tomi Juvonen Nokia.
Automating operational procedures with Daniel Fernández Rodríguez - Akos Hencz -
Ashiq Khan NTT DOCOMO Congress in NFV-based Mobile Cellular Network Fault Recovery Ryota Mibu NEC Masahito Muroi NTT Tomi Juvonen Nokia 28 April 2016OpenStack.
Ashiq Khan NTT DOCOMO Congress in NFV-based Mobile Cellular Network Fault Recovery Ryota Mibu NEC Masahito Muroi NTT Tomi Juvonen Nokia 28 April 2016OpenStack.
Co-ordination & Harmonisation of Advanced e-Infrastructures for Research and Education Data Sharing Grant.
What is OPNFV? Frank Brockners, Cisco. June 20–23, 2016 | Berlin, Germany.
Failure Inspection in Doctor utilizing Vitrage and Congress
Doctor Host Maintenance Maintenance changes to OpenStack Nova 13 Jun 2016 Tomi Juvonen Nokia.
The Troubleshooting Process. Hardware Maintenance Make sure that the hardware is operating properly.  Check the condition of parts.  Repair or replace.
Congress Blueprint --policy abstraction
**DRAFT** Doctor+Congress OPNFV Summit June 2016 Doctor+Congress PoC team.
ONAP E2E Flow `.
NFV Infrastructure Maintenance Automation by OPNFV Doctor
Keeping My (Telco) Cloud Afloat
Security on OpenStack 11/7/2013
Fault Management with OpenStack Congress and Vitrage, Based on OPNFV Doctor Framework Barcelona 2016 Ryota Mibu NEC Ohad Shamir Nokia Masahito Muroi.
Mapping ETSI NFV & Ceilometer alarms
Essentials of UrbanCode Deploy v6.1 QQ147
X V Consumer C1 Consumer C2 Consumer C3
Doctor + OPenStack Congress
Ashiq Khan, NTT DOCOMO Ryota Mibu, NEC
OPEN-O Multiple VIM Driver Project Use Cases
Maintenance changes to OpenStack Nova 21 Jun 2016 Tomi Juvonen Nokia
OPNFV Doctor - How OPNFV project works -
Doctor PoC Booth Vitrage Demo
Cloud Management Mechanisms
Tomi Juvonen SW Architect, Nokia
Tomi Juvonen Software Architect, Nokia
OpenStack Ani Bicaku 18/04/ © (SG)² Konsortium.
Uniform Resource Locators
Infrastructure Maintenance & Upgrade: Zero VNF Downtime with OPNFV Doctor on OCP Hardware AirFrame Open Rack V1 and V2 compatible Demo video:
Proactive RCA with Vitrage, Kubernetes, Zabbix and Prometheus
Multi-VIM/Cloud High Level Architecture
Cloud Management Mechanisms
Cloud computing mechanisms
Technical Capabilities
Uniform Resource Locators
Vitrage hands-on lab Muhamad Najjar, Marina Koushnir CloudBand, Nokia
OpenStack Ceilometer Blueprints for Liberty
PerformanceBridge Application Suite and Practice 2.0 IT Specifications
Robert Down & Pranay Sadarangani Nov 8th 2011
Doctor OpenStack Controller changes Tomi Juvonen Nokia
**DRAFT** NOVA Blueprint 03/10/2015
**DRAFT** Doctor Southbound API 23 Feb 2016 Ryota Mibu, NEC.
Latest Update on Gap Analysis of Openstack for DPACC
Figure 3-2 VIM-NFVI acceleration management architecture
Doctor Host Maintenance
Presentation transcript:

Doctor Tech Deep Dive Tomi Juvonen, Nokia Ryota Mibu, NEC

Fault Management Flow Detection Reaction Virtualized Infrastructure Applications Application Manager (VIM User) Virtualized Infrastructure Manager (VIM) Virtual Compute Virtual Storage Virtual Network Virtualization Layer Hardware Resources App Our Focus

Fault Management Architecture 4 Monitor Notifier Manager Virtualized Infrastructure Alarm Conf. 3. Update State 2. Find Affected Applications Controller Resource Map 1. Raw Fault Inspector 4. Notify all 4. (alt) Notify 5. Notify Error 0. Set Alarm 6-. Action Failure Policy Monitor Cinder Neutron Nova Ceilometer+Aodh Vitrage Congress Designed by Doctor Project

Challenges in OpenStack Introducing Fault Management sequence across multiple OpenStack projects Letting OpenStack users know corresponding resource state properly/immediately Building correlation mechanism to support various OpenStack deployment flavors and operator policies

Blueprints in Liberty/Mitaka Cycles ProjectBlueprintSpec DrafterDeveloperStatus Ceilometer /Aodh Event Alarm Evaluator Ryota Mibu (NEC) Completed (Liberty) Nova New nova API call to mark nova- compute down Tomi Juvonen (Nokia) Roman Dobosz (Intel) Completed (Liberty) Support forcing service down Tomi Juvonen (Nokia) Carlos Goncalves (NEC) Completed (Liberty) Get valid server state Tomi Juvonen (Nokia) Completed (Mitaka) Add notification for service status change Balazs Gibizer (Ericsson) Completed (Mitaka) CongressPush Type DataSource Driver Masahito Muroi (NTT) Completed (Mitaka)

Nova – Objects Server (VM) Service / Host Nova Compute Process on Host Nova API/DB User Enable/Disable Force Down Reset State Report Server(Instance): vm_state, … Service: status, enable, force_down, … External Monitoring Service Admin Boot/Shutdown Fencing Exec (e.g. Evacuate)

Nova – Force-Down & Exposing Service State Host / Machine Hypervisor VM nova compute nova api nova conductor nova scheduler nova DB queue External Monitoring Service vSwitch BMC EXISTING (periodic update) Force-down API [Mark Nova-Compute Down] Notifying that the nova service is no longer available Client Service disable API Evacuation API Reset Server State API Allows Nova to integrate with External Monitoring Services, and make sure Nova handles requests for the host properly service state service.update notification

Nova – Force-Down API ~]$ curl controller -g -i -X PUT -H "X-OpenStack- Nova-API-Version: 2.11" -H "Content-Type: application/json" -H "Accept: application/json" -H "X-Auth-Token: $OS_AUTH_TOKEN" -d '{"binary": "nova- compute", "host": "compute2", "forced_down": true}' HTTP/ OK Content-Length: 80 Content-Type: application/json X-Openstack-Nova-Api-Version: 2.11 Vary: X-OpenStack-Nova-API-Version X-Compute-Request-Id: req-a92108ee-0cd4-487e-8beb-c6b5b55a819c Date: Thu, 09 Jun :52:02 GMT} {"service": {"binary": "nova-compute", "host": "compute2", "forced_down": true}}

Nova – Force-Down Command ~]$nova help service-force-down usage: nova service-force-down [--unset] Force service to down. (Supported by API versions '2.11' - '2.latest') [hint: use '--os-compute-api-version' flag to show help message for proper version] Positional arguments: Name of host. Service binary. Optional arguments: --unset Unset the force state down of service. ~]$nova service-force-down --unset compute2 nova-compute | Host | Binary | Forced down | | compute2 | nova-compute | False |

Ceilometer/Adoh – Event Alarm sample Notification-driven alarm evaluator NEW Shortcut (notification-based) EXISTING (polling-based) Manager Audit Service stats notification event Cinder Neutron Nova Aodh

Ceilometer/Aodh – Create Event Alarm POST :8777/v2/alarms/ { "type": "event“, "name": "Standard test2: InstanceStatusAlarm", "description": "Standard test: an event alarm", "enabled": true, “alarm_actions”: [ " :8000/alarm" ], "repeat_actions": false, "severity": "moderate", "event_rule": { "event_type": "compute.instance.update", "query" : [ { "field" : "traits.instance_id", "type" : "string", "value" : "c1b55d43-9cd9-49ba-ba41-1dc5628ddfe7", "op" : "eq“ }, { "field" : "traits.state", "type" : "string", "value" : "stopped", "op" : "eq“ } ] }

Aodh – Alarm Notification (1/3) POST :8000/alarm { "alarm_id": "718a79b5-f066-4f60-aa37-df e02", "alarm_name": "Standard test: InstanceStatusAlarm", "current": "alarm", "previous": "insufficient data", "reason": "Event (message_id=cbe c16-490a-bc18-12a1292f03ab) hit the query of alarm (id=718a79b5- f066-4f60-aa37-df e02)", "reason_data": { "event": { "event_type": "compute.instance.update", "generated": " T06:22: ", "message_id": "cbe c16-490a-bc18-12a1292f03ab", "message_signature": "938198b933f640560a0d88a0f72739d95a21bb6a362b12fe026ba847c46303f7",

Aodh – Alarm Notification (2/3) "raw": {}, "traits": [ [ "state", 1, "stopped" ], [ "user_id“, 1, "47de9baced284660b63a7ee6c218f058" ], [ "service", 1, "compute" ], [ "disk_gb", 2, 20 ], [ "instance_type", 1, "m1.small" ], [ "tenant_id", 1, "abf c4154abcdc23e033d1f3b" ], [ "root_gb", 2, 20 ], [ "ephemeral_gb", 2, 0 ], [ "instance_type_id“, 2, 5 ], [ "vcpus", 2, 1 ], [ "memory_mb", 2, 2048 ], [ "instance_id", 1, "415f624b-9a4e-4448-bce0-05ee44581a81“ ],

Aodh – Alarm Notification (3/3) [ "host“, 1, "opnfv-demo-controller" ], [ "request_id", 1, "req bc-c9a3-4fa d9479d15a" ], [ "project_id", 1, "abf c4154abcdc23e033d1f3b" ], [ "launched_at", 4, " T00:09:07" ] ] }, "type": "event" }, "severity": "moderate" }

Nova – Get valid host state Host / Machine ‘host_status’ can have values: UP if nova-compute is up. UNKNOWN if nova-compute not reported by service group driver. DOWN if nova-compute is forced down. MAINTENANCE if nova-compute is disabled. Empty string indicates there is no host for server. nova API nova DB Force-down API Server API Server APIs to have ‘host_status’: GET /v2.1/ ​ {tenant_id} ​ /servers/detail GET /v2.1/ ​ {tenant_id} ​ /servers/ ​ {server_id} ​ nova compute nova conductor nova scheduler queue EXISTING (periodic update) Service disable API service state

Nova – Servers API with host_status ~]$ nova show vm | Property | Value | | OS-DCF:diskConfig | MANUAL | | OS-EXT-AZ:availability_zone | nova | | OS-EXT-SRV-ATTR:host | compute2 | | OS-EXT-SRV-ATTR:hostname | vm1 | | OS-EXT-SRV-ATTR:hypervisor_hostname | compute2 |... | | hostId | 2ac4c517ae75cba421755a6af321a4910c83d763da85b23aa619039b | | host_status | UP | | id | 8c184ded-9d88-4f8f-9d2f feede8 | | image | cirros x86_64-uec (8edd0cdb-4c f6-93cb025d5a9c) | Note! ’host_status’ is supported from API micro-version '2.16‘. The policy file needs to be configured to show ‘host_status’ to the admin and the server owner: "os_compute_api:servers:show:host_status": "rule:admin_or_owner"

Blueprints Under Work ProjectBlueprintSpec DrafterDeveloperStatus NovaMaintenance Reason To Server Tomi Juvonen (Nokia) Drafting (Ocata) This blueprint spec will continue review in OpenStack Ocata. Basic idea is that Nova project do not want more host specific functionality like maintenance, but it can have a link to external tool hosting more information. There will be new API to set this link and it will be available trough different APIs for user and admin. Also notification will be sent when link changes, that can be further more consumed externally and alarm can be then risen to tenant for example about coming maintenance effecting to his VMs. As spec deals with exposing new field to several APIs, it at the same time exposes ‘forced_down’ also to hypervisor API. This blueprint will be an enabler to implement maintenance part of Doctor requirements.

Neutron – Objects Port Agent Agent Process on Compute Host Neutron Server User Enable/Disable Force Down Update status Report External Monitoring Service Admin Up/Down Port: status, admin_state, … Agent: alive, admin_state, force_down?, Notification Fencing Note: Under Discussion

Doctor Wiki: Visit Doctor Booth! OPNFV Doctor OpenStack Congress