HA Scenarios.

Slides:



Advertisements
Similar presentations
Remus: High Availability via Asynchronous Virtual Machine Replication
Advertisements

ETSI NFV Management and Orchestration - An Overview
Benchmarking VNFs and their Infrastructure Al Morton March 7, 2014.
Ed Duguid with subject: MACE Cloud
High Availability Deep Dive What’s New in vSphere 5 David Lane, Virtualization Engineer High Point Solutions.
CloudStack Scalability Testing, Development, Results, and Futures Anthony Xu Apache CloudStack contributor.
High Availability Project Qiao Fu Project Progress Project details: – Weekly meeting: – Mailing list – Participants: Hui Deng
NDN in Local Area Networks Junxiao Shi The University of Arizona
Virtualization and Cloud Computing. Definition Virtualization is the ability to run multiple operating systems on a single physical system and share the.
Transaction.
ITC561 Cloud Computing Topic 4: Cloud Architecture …. Continue….
Network Operating Systems Users are aware of multiplicity of machines. Access to resources of various machines is done explicitly by: –Logging into the.
Zhipeng (Howard) Huang
1 ITC242 – Introduction to Data Communications Week 12 Topic 18 Chapter 19 Network Management.
VMware Update 2009 Daniel Griggs Solutions Architect, Virtualization Servers & Storage Solutions Practice Dayton OH.
High Availability for OPNFV
storage service component
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
The ANSA project Failures and Dependability in ANSA.
Microsoft Load Balancing and Clustering. Outline Introduction Load balancing Clustering.
Server 2008 & Virtualization. Costs are too highCan’t meet SLAs Providing business continuity for operating systems and applications Expensive space across.
Copyright 2007 FUJITSU LIMITED Systemwalker Resource Coordinator Virtual server Edition V13.2 December, 2007 Fujitsu Limited Functions Blade Server Management.
ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability.
OpenContrail for OPNFV
Achieving Qualities 1 Võ Đình Hiếu. Contents Architecture tactics Availability tactics Security tactics Modifiability tactics 2.
Failure Spread in Redundant UMTS Core Network n Author: Tuomas Erke, Helsinki University of Technology n Supervisor: Timo Korhonen, Professor of Telecommunication.
INTRODUCTION TO CLOUD COMPUTING CS 595 LECTURE 2.
◦ What is an Operating System? What is an Operating System? ◦ Operating System Objectives Operating System Objectives ◦ Services Provided by the Operating.
Virtual Machine Security Systems Presented by Long Song 08/01/2013 Xin Zhao, Kevin Borders, Atul Prakash.
Mark A. Magumba Storage Management. What is storage An electronic place where computer may store data and instructions for retrieval The objective of.
Computer Networks with Internet Technology William Stallings
Chapter 1 Introduction to Databases. 1-2 Chapter Outline   Common uses of database systems   Meaning of basic terms   Database Applications  
VIRTUAL SWITCH/ROUTER BENCHMARKING Muhammad Durrani Ramki Krishnan Brocade Communications Sarah Banks Akamai 1 © 2013 Brocade Communications Systems, Inc.
Concurrency Control. Objectives Management of Databases Concurrency Control Database Recovery Database Security Database Administration.
Operating System Principles And Multitasking
Visual Studio Windows Azure Portal Rest APIs / PS Cmdlets US-North Central Region FC TOR PDU Servers TOR PDU Servers TOR PDU Servers TOR PDU.
Topic 5a Operating System Fundamentals. What is an operating system? a computer is comprised of various types of software device drivers (storage, I/O,
Fault Localization (Pinpoint) Project Proposal for OPNFV
BoF: Open NFV Orchestration using Tacker
Efficient Live Checkpointing Mechanisms for computation and memory-intensive VMs in a data center Kasidit Chanchio Vasabilab Dept of Computer Science,
Gerald Kunzmann, DOCOMO Carlos Goncalves, NEC Ryota Mibu, NEC
The Totem Single-Ring Ordering and Membership Protocol Y. Amir, L. E. Moser, P. M Melliar-Smith, D. A. Agarwal, P. Ciarfella.
Virtualized Network Function (VNF) Pool Problem Statement IETF 90 th, Toronto, Canada. Melinda Shore Ning Zong Linda Dunbar Diego Lopez Georgios Karagiannis.
Cloud Computing Lecture 5-6 Muhammad Ahmad Jan.
1© Copyright 2014 EMC Corporation. All rights reserved. ViPR DR Requirement.
CLOUD COMPUTING WHAT IS CLOUD COMPUTING?  Cloud Computing, also known as ‘on-demand computing’, is a kind of Internet-based computing,
Specific SDK Specific SDK NFVO Specific VNFM Specific VNFM VNF Message Queue JSON REST API.
Telecom Unclassified Fault Prioritisation Telecom, Auckland 30 th October 2007 Telecom Wholesale Industry Consultation Seminar.
Escalator Questionnaire. Why a Questionnaire? To understand better the challenges of smooth upgrade in the OPNFV context –Part 1: Questions investigating.
Ashiq Khan NTT DOCOMO Congress in NFV-based Mobile Cellular Network Fault Recovery Ryota Mibu NEC Masahito Muroi NTT Tomi Juvonen Nokia 28 April 2016OpenStack.
Ashiq Khan NTT DOCOMO Congress in NFV-based Mobile Cellular Network Fault Recovery Ryota Mibu NEC Masahito Muroi NTT Tomi Juvonen Nokia 28 April 2016OpenStack.
Failure Inspection in Doctor utilizing Vitrage and Congress
Distributed Computing Network Laboratory Reliability; Report on Models and Features for E2E Reliability ETSI GS REL 003 양현식.
When RINA Meets NFV Diego R. López Telefónica
Chen Wei China Mobile Acceleration descriptors requirements discussion based on China Mobile Smallcell GW Chen Wei China Mobile.
VCO POC April 2017.
X V Consumer C1 Consumer C2 Consumer C3
8.6. Recovery By Hemanth Kumar Reddy.
Escalator: Refreshing Your OPNFV Environment With Less Troubles
Unit OS10: Fault Tolerance
Introduction to Computers
Tomi Juvonen SW Architect, Nokia
Replication Middleware for Cloud Based Storage Service
Documenting ONAP components (functional)
Fault Tolerance Distributed Web-based Systems
IFA007: VNF LCM The Or-Vnfm reference point is used for exchanges between Network Functions Virtualization Orchestrator (NFVO) and Virtualized Network.
Cloud computing mechanisms
Concurrency Control.
CS 295: Modern Systems Organizing Storage Devices
Presentation transcript:

HA Scenarios

Service Availability Levels Recovery Time Customer Type Recommendations SAL 1 5 – 6 seconds Network Operator Control Traffic Government/Regulatory Emergency Services Redundant resources to be made available on-site to ensure fast recovery. SAL32 10 – 15 seconds Enterprise and/or large scale Customers Network Operators service traffic Redundant resources to be available as a mix of on-site and off-site as appropriate. On-site resources to be utilized for recovery of real-time services. Off-site resources to be utilized for recovery of data services. SAL 3 20 – 25 seconds General Consumer Public and ISP Traffic Redundant resources to be mostly available off-site. Real-time services should be recovered before data services Source: ETSI GS NFV-REL 001 V1.1.1

Scenarios State Redundancy in VNF Failure detection Use Case VNF Statefull yes VNF only UC1 VNF & NFVI UC2 no UC3 UC4 Stateless UC5 UC6 UC7 UC8 UC9: Repeated failure in VNF

UC1: Statefull VNF with Redundancy 2014-12-12 UC1: Statefull VNF with Redundancy Failure detection: VNF only NFVO Recovery time 1. VNFC fails 2. NF fails* 3. VNF detects the failure VNF’s Services NF 4. VNF isolates VNFC VNF VNFM 5. VNF fails over 6. NF recovers STB ACT ACT STB 7. VNF repairs VNFC NFVI’s Services VM VM VM VM NFVI VIM Nothing new in this scenario *Steps 1&2 are simultaneous they are separated for clarity

UC1 Comments SAL 1 can be achieved The HA mechanism may be completely embedded in the VNF or exposed to the VNFM – requires additional communication toward the VNFM In the VNF different error detection mechanisms can be used to detect the VNFC failure The assumption is that the fault causing the failure is in the VNFC If the VNFM manages the HA it needs to be in charge of the failure detection as well.

UC2: Statefull VNF with Redundancy 2014-12-12 UC2: Statefull VNF with Redundancy Failure detection: VNF & NFVI NFVO Recovery time 1. VM fails 2. VM Service fails 3. VNFC fails VNF’s Services NF 4. NF fails* VNF VNFM 5a. VNF detects the failure 6a. VNF fails over STB ACT ACT STB 7a. NF recovers 5b. NFVI detects the failure NFVI’s Services 6b. NFVI reports to VIM VM VM VM VM VM NFVI VIM 7b. VIM reports to VNFM 8. VNFM ok to VIM 9. VIM repairs VM 10. VM Service recovers 11. VNF repairs VNFC *Steps 1-4 are simultaneous they are separated for clarity

UC2 Comments UC1 comments apply here except for 3.a. The VNF and VNFM do not know the actual cause of the failure, which could be VM, host, hypervisor, networking etc. failure The VIM needs to resolve the cause The VNF likely to detect the failure through heartbeating via network The VNF and NFVI may detect the failure in any order, which in turn determines the order and exchange of messages between the VNFM and VIM

UC3: Statefull VNF with No Redundancy 2014-12-12 UC3: Statefull VNF with No Redundancy Failure detection: VNF only NFVO VNFC checkpoints its state to VD, which is HA Recovery time 1. VNFC fails 2. NF fails* VNF’s Services 3. VNF detects the failure NF VNF VNFM 4. VNF isolates VNFC 5. VNF repairs VNFC ACT ACT 6. VNFC gets state 7. NF recovers state state state NFVI’s Services VM VD VM VM NFVI VIM *Steps 1&2 are simultaneous they are separated for clarity

UC3 Comments The assumption is that the vDisk used by the VNFC is highly available meaning that Failures are handled by the NFVI transparently for the VNF The VNF have access to its state stored in the vDisk for at least 99.999% of the time The VNFC’s availability management could be Embedded in the VNF or More likely: Done by the VNFM – requires additional communication between the VNF and VNFM see UC3-b

UC3-b: Statefull VNF with No Redundancy 2014-12-12 UC3-b: Statefull VNF with No Redundancy Failure detection: VNF only NFVO VNFC checkpoints its state to VD, which is HA Recovery time 1. VNFC fails 2. NF fails* VNF’s Services 3. VNF detects the failure NF VNF VNFM 4. VNF reports to VNFM 5. VNFM isolates VNFC ACT ACT 6. VNFM repairs VNFC 7. VNFC gets state state state state 8. NF recovers NFVI’s Services VM VD VM VM NFVI VIM *Steps 1&2 are simultaneous they are separated for clarity

UC4: Statefull VNF with No Redundancy 2014-12-12 UC4: Statefull VNF with No Redundancy Failure detection: VNF & NFVI NFVO VNFC checkpoints its state to VD, which is HA 1. VM fails Recovery time 2. VM Service fails 3. VNFC fails VNF’s Services NF 4. NF fails* VNF VNFM 5a. VNF detects the failure 6a. VNF reports to VNFM ACT ACT 5b. NFVI detects the failure 6b. NFVI reports to VIM state state state 7. VIM reports to VNFM NFVI’s Services 8. VNFM ok to VIM VM VM VD VM VM NFVI VIM 9. VIM repairs VM 10. VM Service recovers 11. VIM informs VNFM 12. VNFM repairs VNFC 13. VNFC gets state 14. NF recovers *Steps 1-4 are simultaneous they are separated for clarity

UC4 Comments Comments of UC3 apply Considering the time needed to repair the VM the target SALs cannot be met without the introduction of some redundancy The VNF(M) does not know the actual cause of the failure, which could be VM, host, hypervisor, networking etc. failure. The VIM needs to resolve the cause The VNFM and NFVI may detect the failure in any order, which in turn determines the order and exchange of messages between the VNFM and VIM

UC5: Stateless VNF with Redundancy 2014-12-12 UC5: Stateless VNF with Redundancy Failure detection: VNF only NFVO Spare VNFC may or may not be instantiated 1. VNFC fails Recovery time VNF’s Services 2. NF fails* NF VNF VNFM 3. VNF detects the failure 4. VNF isolates VNFC Spare ACT Spare ACT 5. VNF fails over 6. NF recovers 7. VNF restores redundancy NFVI’s Services VM VM VM VM NFVI VIM Nothing new in this scenario *Steps 1&2 are simultaneous they are separated for clarity

UC6: Stateless VNF with Redundancy 2014-12-12 UC6: Stateless VNF with Redundancy Failure detection: VNF & NFVI NFVO Spare VNFC may or may not be instantiated Recovery time 1. VM fails 2. VM Service fails 3. VNFC fails VNF’s Services NF 4. NF fails* VNF VNFM 5a. VNF detects the failure 6a. VNF fails over Spare ACT ACT Spare 7a. NF recovers 5b. NFVI detects the failure NFVI’s Services 6b. NFVI reports to VIM VM VM VM VM VM NFVI VIM 7b. VIM reports to VNFM 8. VNFM ok to VIM 9. VIM repairs VM 10. VM Service recovers 11. VNF restores redundancy *Steps 1-4 are simultaneous they are separated for clarity

UC5 & UC6 Comments Comments are similar to UC1 & UC2

UC7: Stateless VNF with No Redundancy 2014-12-12 UC7: Stateless VNF with No Redundancy Failure detection: VNF only NFVO Recovery time 1. VNFC fails 2. NF fails* VNF’s Services 3. VNF detects the failure NF VNF VNFM 4. VNF reports to VNFM 5. VNF isolates VNFC ACT ACT 6. VNF repairs VNFC 7. NF recovers NFVI’s Services VM VD VM VM NFVI VIM *Steps 1&2 are simultaneous they are separated for clarity

UC8: Stateless VNF with No Redundancy 2014-12-12 UC8: Stateless VNF with No Redundancy Failure detection: VNF & NFVI NFVO Recovery time 1. VM fails 2. VM Service fails 3. VNFC fails VNF’s Services NF 4. NF fails* VNF VNFM 5a. VNF detects the failure 6a. VNF reports to VNFM ACT ACT 5b. NFVI detects the failure 6b. NFVI reports to VIM 7. VIM reports to VNFM NFVI’s Services VM VM VD VM VM 8. VNFM ok to VIM NFVI VIM 9. VIM repairs VM 10. VM Service recovers 11. VIM informs VNFM 12. VNF repairs VNFC 13. NF recovers *Steps 1-4 are simultaneous they are separated for clarity

UC7 & UC8 Comments In UC7 the HA mechanism may be embedded in the VNF or VNFM needs to provide it including the failure detection In addition for UC8 the comments are similar to UC4 Most importantly the target SALs cannot be achieved in case of UC8 without redundancy

UC9: Stateless VNF with No Redundancy 2014-12-12 UC9: Stateless VNF with No Redundancy Failure detection: VNF only – BUT Repeatedly NFVO 1. VNFC fails 2. NF fails 3. VNF detects the failure and counts UC7 VNF’s Services NF 4. VNF isolates VNFC VNF VNFM 5. VNF repairs VNFC 6. NF recovers ACT ACT ACT ACT 1 4 2 3 …. VNFC fails….2 …. VNFC fails….3 …. VNFC fails….4 NFVI’s Services VM VD VM VM NFVI VIM Fault is not in the VNFC!

UC9: Stateless VNF with No Redundancy 2014-12-12 UC9: Stateless VNF with No Redundancy Failure detection: VNF only – BUT Repeatedly NFVO 1. VNFC fails 2. NF fails 3. VNF detects the failure and counts VNF’s Services NF 4. VNF isolates VNFC VNF VNFM 5. VNF repairs VNFC 6. NF recovers ACT 4 …. VNFC fails….2 …. VNFC fails….3 …. VNFC fails….4 NFVI’s Services N. VNF reports to VNFM VM VM VD VM VM NFVI VIM N+1. VNFM reports to VIM N+2. VIM isolates VM N+3. VIM repairs VM N+4. VM Service recovers N+5. VNF repairs VNFC N+6. NF recovers

UC9 Comments Repeated failure at the VNF level within a short period of time typically indicate that the fault is not at the VNF level, it only manifesting there. This applies to any of the cases when the failure is detected at the VNF level only Faults in the HW, host OS, hypervisor may propagate to the tenant VMs in such a way that only the tenant can detect it VIM is the only entity that knows the relation between the physical and virtual resources and can correlated these failures observed in one or more tenants

To Be Considered HA of the VD HA of the Network Different redundancy schemas HA of the Network Switches and routers Correlation of failures Failure of multiple VMs in the same host Different types of failures (i.e. compute, storage…)