Presentation is loading. Please wait.

Presentation is loading. Please wait.

OpenStack Cloud DR PoC NEC India.

Similar presentations


Presentation on theme: "OpenStack Cloud DR PoC NEC India."— Presentation transcript:

1 OpenStack Cloud DR PoC NEC India

2 Contents Environment Setup Results Appendix Background
Requirement for DR Solution in Cloud Different type of DR solutions Selection Criteria for DR solutions Environment Setup Results Summary Test Scenarios Functional test outcome Performance test outcome Issues identified Appendix Architecture of Trilio

3 Background The following material highlights the Disaster Recovery activity done for Backup/Restore based Recovery Solution, using the Trilio® V2.4 DR Solution. This activity has been carried out to: Find requirement and current market trends for Cloud systems Examine possible solutions to these requirements, understand the solutions and how it can generate business, and create PoC based on this understanding. Promote talent who can do proposals, introduce the business requirements and do operation of the system.

4 Requirement for DR Solution in Cloud
With migration of application to cloud systems, cloud systems have become very important. Minor failure in the systems or human errors can cause major business impact ( like [1], [2]) Disaster recovery solution provide business continuity in case of complete failover of primary site. In case of a disaster situation, primary site workload would be switched to Replica site and would be made as active site so that businesses are not affected. This solution focus on switching to Replica site with defined RTO/RPO, so in case such a disaster occurs, the customer is notified of the maximum time in which his system will come back online. [1]: [2]:

5 Requirement for DR Solution in Cloud
Market Trend for DR solution: Although the concept of DR in cloud is still nascent, a lot of SMBs are beginning to opt for DR to guard their business applications It is an attractive alternative for companies that are strapped for IT resources and who have a secondary infrastructure which is not effectively utilized. Having DR sites in cloud reduces Data Centre costs/space, IT resources, IT Infra leading to significant cost reduction. Each leading Cloud Service Provider ( AWS, Azure, Openstack ) now provide DR solutions via from their own end or via third-party software.

6 Different type of DR Solutions
Multiple options for DR mechanism to fit into Customer requirements: Option 1: Use Backup Images to recover at DR cloud Option 2: Use storage replication to replicate data continuously to recover the secondary site Option 3: Combination of the above two solutions leading to a hybrid solution.

7 Selection Criteria for DR solutions
Feature Criteria System Recovery Post Restoration, we should have the same system as was existent before the disaster Low Recovery Point Objective The recovery point objective defines the last backed up copy. RPO depends on the method by which data is being copied from Master to Replica ( e.g. Volume Backup, Block Level Synchronization etc) Low Recovery Time Objective Recovery Time Objective defines how fast can the system be recovered. That depends on how the data has been managed at the Replica site and how quickly can it deploy the necessary applications. Supported Storages Different variations of storages supported. VM Granularity Granularity allows prioritizing VM backup by allowing backing up specific VMs Recovery Points Number of recovery points supported Hypervisor Compatibility Type of Hypervisors it can support backup Support for Incremental updates Whether the backup images can be taken incrementally or not Maximum number of VM supported Maximum number of VM which can be backed up Multi-site support for DR Whether multiple sites can be backed up to a single replica cloud Scaling Whether the DR solution works with scaling the Cloud Up/Down Openstack version support Whether the DR solution is limited to specific Openstack Versions

8 Environment Setup

9 Environment Setup As mentioned earlier, the following material tracks the Proof Of Concept work done for DR Solution using Option 1 (Restoration using Backup Images). For creating the following Proof of Concept, we have used Trilio® DR Solution v2.4 For the Proof of Concept, we have tested Trilio on Single-Site system: For functional testing Multi-Site system: For Scalability and Performance testing

10 Trilio Architecture http://www.trilio.io
Trilio has 3 major components: Trilio API : This component exists on the Controller and exposes the API for user control Trilio Datamover : This component exists on all the Compute nodes. This component is responsible for taking the snapshots of the volumes and copying the data from Compute nodes to the TrilioVault TrilioVault : This component exists as the VM and does most of the control and processing. Operator Send Backup/Restore command Trilio API Initiate Backup/Restore TrilioVault (VM) Execute operations for VM and Volume Backup/Restore Trilio DM Trilio DM Trilio DM

11 Trilio DR Solution : High Level proposed design(Single Site)
Configure Trilio via GUI/CLI Operator Execute Openstack CLI /control via dashboard

12 Trilio DR Solution: Single Site
Local Storage Trilio Services Trilio Node NFS Mount NFS FS NFS Node Rsync to 2nd Site NTP Mounted NFS NTP Local Storage Glance Controller Node Trilio API Keystone Trilio DM Compute Node mysql RabbitMQ Horizon Cinder Volume Legend: Mgmt/Ext network: Data network: Control network: Trilio Network Neutron Nova-api, nova-cond,nova-sched Cinder Linux Bridge Nova-compute LVM Neutron LinuxBrg Agent Network Interfaces HDD Network Interfaces

13 Single site Trilio deployment model: Hardware Spec.
Total Nodes: 8 ( Replica and Primary have same configuration ) CPU Memory(GB) HDD(GB) OS Controller 4 6 10 Ubuntu 14.04 Compute 20 --same-- NFS 2 Trilio 16

14 Trilio DR Solution: Multi-Node (Primary-Site)
Layout for Multi-Node of Primary Site ( Same on secondary ) Legend: As a VM: Baremetal: Compute Node Trilio DM Cinder Volume Controller Node NFS Node RabbitMQ Keystone NFS FS Nova-compute Trilio API Neutron LB Agent mysql NTP Neutron Horizon Cinder eth1 eth1 Nova-api, nova-cond,nova-sched Linux Bridge LVM eth1 HDD Data , Trilio, Control and External N/W Local Storage Glance eth1 eth1 eth1 Compute Node ( On separate machine) Linux Bridge Controller Node Neutron Keystone eth 1 Trilio DM Compute Node Trilio Node mysql Trilio Services Cinder Volume RabbitMQ Horizon Cinder NTP Neutron LB Agent Nova-compute Trilio API Nova-api, nova-cond,nova-sched LVM HDD

15 Multiple site Trilio deployment model: Hardware Spec.
Total Nodes: 14 ( 7 on each site ) CPU Memory(GB) Network Bandwidth HDD(GB) OS Controller-1 1 8 10 Gbps 20 Ubuntu 14.04 Controller-2 --same-- NFS 4 5 10 Trilio 18 25 Compute-1 32 30 Compute-2 100 Compute-3 Note :Storage is on the Controller node

16 Results

17 DR Test Scenarios

18 Summary of test outcomes
DISASTER RECOVERY PERFORMANCE SCALABILITY GOOD AVERAGE Test Execution Summary: Reason for environmental limitations: System: Due to lack of space, we couldn’t recover the Trilio backup job for 12 VM/15 VMs during system testing. Type of Testing # of tests executed # of incomplete tests due to Environment limitations # of tests passed # of tests failed due to Trilio issue # of tests conditionally passed Functional 27 23 2 System 6 4

19 System Test Outcomes Backup size (GB) Backup time ( min) The following tests were conducted keeping the number of VMs constant and varying the size of volumes. Backup time seemed to tend toward a more constant value with increase in the size of volumes. Backup size seemed to be increasing with increase in size of volume and deduplicated data.

20 System Test Outcomes As expected, RPO , RTO and Backup size increases with increase in VM and Volume size. VM and volume size do not directly impact the data backup size if the duplicated data in the volume is high. RTO could not be calculated effectively due to failure to restore the systems at higher configurations( restoring larger number of VMs ) Number of Volumes

21 Openstack Issues identified
Trilio: After restoration fails, Neutron ports , Glance images and Cinder volumes may continue to exist in the system. On Restarting the restore, we observed that new ports were created, causing us to reach the quota limit at an accelerated pace.In case restore fails the created ports, images and volumes can be deleted, just like the VMs which were recovered, are deleted if restore fails. In Mitaka , for Glance v1 configuration files in Controller with Keystone defined, config files can use “default” as the default-project-id. However, when accessing the keystone authentication from a different node, the exact ID needs to be specified in the “default-project-id”. Ubuntu has a problem where in static IP addresses can be overridden with DHCP provided IP addresses if the dhclient is running : . Solved by shutting down dhclient RabbitMQ reported ChannelError which caused a lot of agents going out-of-sync in Mitaka.

22 Trilio DR Solution : Issues identified and dependencies
Following issues/dependencies were identified during initial investigation: Dependency on Nova User/Group ID: As per the deployment guide shared by Trilio, Nova User/Group ID should be 162. While deploying Openstack, we observed that the User ID of Nova may not always be 162, but we need to change the User ID of Nova after it is deployed, which may seem like a restriction. Dependency on Ubuntu 14.04: Trilio requires Controller/Compute nodes to be on Ubuntu 14.04, while Openstack in the current version favors It would also be beneficial to test Trilio with CentOS/RedHat. NFS Permission : Seems like without ``no-rootsqaush`` option, and mode 755 on the NFS mounted folder, Trilio cannot write data on the mount point. Observation: When Controller was working at 100% capacity ( w.r.t. CPU utilization ) , the Trilio Backup job was stuck, which was expected. The job didn’t recover after Controller came back to normal processing, though. We think this Backup size of Incremental images: Incremental and Full backup images seem to be of the same size. This may put a restriction on the overall disk size. If Incremental Images are smaller than Full Images, then it helps us maintaining more backup generations.

23 Conclusion There are some issues with the User Interface and the error messages , which have been reported to Trilio , for improvement in future if not done in the current version of v2.4. Trilio uses Mitaka for v2.4. Due to Openstack Mitaka being EOL, we recommend Trilio to provide support with a newer version of Openstack. We verified that Trilio uses Glance v1. As Glance v1 has been deprecated, we recommend Trilio to provide support with Glance v2. Trilio uses Ubuntu Ubuntu is an older version of Ubuntu which is not supported by recent releases of Openstack. We recommend Trilio to provide support with a newer version of Openstack. Overall, we recommend using Trilio , but with some of the issues and changes mentioned earlier.

24


Download ppt "OpenStack Cloud DR PoC NEC India."

Similar presentations


Ads by Google