Presentation is loading. Please wait.

Presentation is loading. Please wait.

Geographically Dispersed Resiliency (GDR) for Power Systems

Similar presentations


Presentation on theme: "Geographically Dispersed Resiliency (GDR) for Power Systems"— Presentation transcript:

1 Geographically Dispersed Resiliency (GDR) for Power Systems
The VM restart HADR solution for Power Systems

2 Executive Summary Many shops are using manual and/or complex middleware replication DR implementations Many shops do not test their DR operations due to complexity , best of breed shops test DR biannually RPO capabilities of many contemporary DR implementations are unacceptable IBM Geographically Dispersed Resiliency (GDR) provides an easy to deploy and easy to use DR solution Significantly less expensive than contemporary options from both a CAPX and OPX perspective Scales from very small shops to large enterprise shops GDR provides total automation through integration with the PowerVM platform Highly automated, consistent, reliable disaster recovery, easy to conduct compliance testing IBM services offers customization of GDR solution and deployment assistance Save costs by customizing use of Enterprise pools and systems in DR site DR testing without disruption to main site GDR along with customized services will provide a complete Power System data center DR solution Recovery Time Objective (RTO), Recovery Point Objective (RPO) Repeatable, reliable, simple to use DR testing

3 Where is your disaster recovery profile ?
7% Confident they can execute D/R plan 62% No D/R plan, no offsite copies of data or copies of data nearby 12% Regular testing, but not confident they can execute D/R plan 19% D/R plan in place, copies in offsite facilities, … but no D/R testing 3

4 Introducing: GDR for Power Systems
Announce : Oct 11, 2016, Generally Available : Nov. 18, 2016, Enhancements planned for 1H 2017 & 4Q 2017 Delivered as part of GTS Resiliency Service Offering New automation S/W – one time charge, priced per h/w core (only those in VM restart partitions) Installation services, Software maintenance both Power & GTS BP’s & Distributor’ enabled to sell: April 18, 2017 Two Deployment Models On Customer Premise – initial release DR as a Service : IBM Resiliency Services provides DR infrastructure ~ 2017

5 Introducing: GDR for Power Systems
VM Restart based DR: Simple, Reliable Disaster Recovery Solution for Power Systems A disaster recovery solution for everyone Automated Disaster Recovery management Low cost to acquire and nearly no cost to manage Eliminates hardware and software resources at backup site Easier deployment compared to clustering or middleware replication technologies VM restart technology has no OS or middleware dependencies Support for IBM POWER7® and POWER8® Systems Support for heterogeneous guest OSs AIX Red Hat SUSE Ubuntu IBM i

6 K-Sys: C(K)ontrol System LPAR
K-sys: Controller System: AIX LPAR that orchestrates the DR operations Alerts administrator about key events Administrator initiated DR automation Scripting support: Daily validations & Event notifications Site 1 Site 2 K-sys Controller System LPAR Networks Storage Mirroring HMCs, VIOSs, LPARs (VMs), Storage HMCs, VIOSs, Storage

7 GDR for Power Systems – how it works
The storage subsystem at the backup host is prepared and mapped to VIOS VM1 and VM2 are booted up VMs from site 1 are now restarted on the backup host in site 2 The underlying mechanism that enables this to happen is the KSYS orchestrator at site 2 From a customer perspective, this operation is accomplished with a single command

8 Automation: Critical for successful Business Continuity
Capacity Management Reliable, consistent recovery time Essentially eliminates human intervention and errors Cross site or Intra site CPU and memory adjustments before DR Enterprise Pool Exploitation Single Point of Control Validation Daily verification across sites Eg: Check missing mirrors etc alerts to administrator Facilitates simple to use regular testing for repeatable results Centralized status reporting Centralized administration through HMCs (eg: centralized LPM initiations) Uni-command based administration

9 GDR product licensing example
GDR for Power Systems Software tier small processor group/core medium processor group price/core List price for managed cores $1020 $1575 Linux AIX AIX Licensing structure: no charge base PID registered to KSYS server Two tier features: small or medium One quantity feature = # of processor cores to be restarted IBM i Linux # of restart features = number of cores to be restarted 80, 120 & 240 hours $23,886, $35,828 and 71,657 U.S. prices which can vary by geo and subject to change an any time 4 systems: 2 production, 2 DR systems DR system must be = or less than production site server processor group (or software tier) Production site: 2 systems, 5 VM partitions, 14 production cores, 14 AIX LPPs, 14 PowerVM LPPs DR site: 2 recovery systems, 5VM partitions, 14 DR cores, one AIX LPAR for Ksys DR site: 1 KSYS system, 1 AIX, 1 Base PID, tier feature = small, quantity feature = 14 restart features => $14280 rge $23,886 = $38,166 (US prices & subject to change at the discretion of IBM) Implementation service package options: 80 hours, 120 hours, 240 hours > 240 hours

10 GDR First Release Capabilities
Capability Customer Value Supports P7 & P8 Systems Enables customer to move older P7 systems to DR site and pair with P8 in main data center Support for AIX, IBM i and Linux Guest VMs DR for AIX, IBM I, and Linux (all major flavors) enables uniform DR solution for Power platform Enterprise Pool Support Flexible capacity management to reduce costs Daily validation Early detection of faulty configuration and other issues Storage Replication Offloaded uniform data copy methods. Support for EMC SRDF Async in 2016 Customization framework Plugin script to do custom checks every day and custom process events as they occur Easy to deploy Less than 10 steps deployment enables simplified DR

11 GDR Pre-requisites Guest OS in VMs AIX: V6 or later IBM I (June 2017)
Linux: RedHat(LE/BE): 7.2 or later SUSE(LE/BE): 12.1 or later Ubuntu: 16.04 VIOS VIOS (2016) HMC V8 R8.6.0 (2016) EMC Storage VMAX family, Solutions Enabler SYMAPI V , PowerPath IBM DS8K SVC, Storwize June 2017 KSYS LPAR AIX 7.2 TL1 1 2 3 4 5

12 GDR for Power: DR Roadmap
2017 4Q 2017 Nov 2016 Jun 2017 Sep 2017 2018 GA Release 1.1 GA Release SP: 1.1 SP1 Beta Release Early prototype GA Release 1.2 GA Release 1.3 Support for P7 and P8 Systems Support for vSCSI, NPIV EMC SRDF Support Async Capacity management Admin controlled recovery Support for other storage replications: SVC/Storwize Sync & Async DS8K Async EMC Sync IBM i support Advanced DR policies (Host Groups, etc) Failover Rehearsal (DR Test) Hitachi Mirror support VLAN per site support Support for sub capacity LPAR DR start Linux Ksys PowerVC integration HA for KSYS VM Agent: Site specific Network DS8K Sync XIV HP 3PAR Future: Graphical Management Novalink Integration

13 Who should be interested in GDR ?
PowerHA on AIX standard edition customers with manual DR operations AIX customers with either no or a roll your own HA solution AIX customers using PowerVC for their datacenter HA operations Linux on Power customers IBM i customers, Storewize and DS8K support (GA June 2017) Customers using host based middleware software replication for their DR solution Software replication solutions tend to require human intervention to insure synchronization of source and target databases before a DR test, they require additional processing overhead GDR is simple to use and insures that the production and DR VMs are in sync by definition and the restart is simple Customers looking to have their DR operations hosted via DRaaS IBM DRaaS (statement of direction) 3rd party DRaaS

14 GDR & DRaaS : DR as a Service (Hybrid Cloud)
IBM Cloud Centers Easy to deploy & Manage DR solution, central control Extensive Automation, Monitoring, & Validation across sites Manage DR for 100’s of LPARs (AIX, Linux, IBM i) Non-disruptive DR testing Entire datacenter (100’s of LPARs) failover in less than an hour. No software costs for the backup site Reduced hardware costs (Enterprise Pool support) Pair your old P7 systems with P8 for DR Support for EMC SRDF. 2017: Storwize, DS8K, Hitachi etc Wiki, FAQ, blogs:

15 Example topology: Two site two server (E870s) GDR configuration
Two-site GDR & CBU for Enterprise Systems CBU feature code: EB3K GDR provides the failover/restart capability to the secondary location CBU for Enterprise Systems at the Disaster Recovery location for huge savings $$$$

16 Power Systems CBU for Enterprise Servers Offering
All transferable entitlements must originate on the primary system and may not run concurrently on the primary system and the CBU system Subsequent to the initial workload deployment, production partitions may be moved to the CBU system for workload balancing etc The total number of processor entitlements running production across both servers can not exceed the original total licensed entitlements. entitlement transfer Primary CBU Offering requirements overview Primary system and CBU system must be the same machine type model (only one CBU to one production server for registration purposes but multiple production servers to one CBU is allowed. More than one Enterprise CBU to a primary or production server not allowed. Primary can be new or installed box, CBU must be a new box A minimum of one entitlement of AIX or IBM i & PowerHA on the CBU or if alternative HA/DR solution is used, as many IBM i or AIX entitlements as needed to support the workload (such as a middleware replication workload) 8 processor static activations on the CBU (no more no less) Minimum of 25% of DIMM memory active on the CBU No charge Memory ECOD days must be activated upon install of CBU system and remain active for 365 days Registration of primary system and CBU prior to CBU order fulfillment (registration validates configuration)Primary and CBU must be within the same enterprise Offering for Power System E880, E880C, E870 & E870C customers HA/DR deployments requiring fast failover via active standby memory CBU offering Features: Deeply discounted processor nodes matching the installed production server processor nodes No charge active standby memory = 365x(N-8)x32 GB, where N is the number of active mobile cores on the production server renewable annually Mobile processor activations are transferred from production to CBU via Enterprise Pool transfers The CBU system can be ordered with no-charge processor nodes (subject to the primary system configuration listed below) and one year’s worth of ECOD processor days and memory. ECOD processor days will match the E880 Primary system cores that are licensed with PowerHA -One or two nodes on the primary = 1 no-charge node on the CBU -Three or four nodes on the primary = 2 no-charge nodes on the CBU All transferable entitlements must originate on the primary system and may not run concurrently on the primary system and CBU system Subsequent to the initial workload deployment, production partitions may be moved to the CBU system for workload balancing etc Only whole processor units are transferable (ie no micro-partitions) The total number of processor entitlements running production across both servers can not exceed the original total licensed entitlements. One entitlement (AIX/IBM i and PowerHA) on the primary is not transferable and one core must be permanently licensed on the CBU This CBU system is available to clients with existing E880 or purchasing a new E880 that meets specific configuration requirements E880 primary system must have a minimum of 75% of the processors activated and 50% of installed DIMM memory active CBU system must have a minimum of 25% DIMM memory active, no charge ECOD days to match the primary (minus 25%) CBU will have 8 processor activations and one year of ECOD days for each PowerHA processor licensed on the primary system (minus 8) IBM will provide to the client three hundred and sixty-five (365) ECOD processor days for each active processor core licensed with PowerHA, minus the eight processor cores that the client is required to purchase on the Power E880 CBU system. The ECOD processor days can only be used on the Power E880 CBU system ECOD activations on the CBU must be renewed annually One or two nodes on the primary = 1 no-charge node on the CBU Three or four nodes on the primary = 2 no-charge nodes on the CBU 16

17 Backup

18 Host Pairing & Site Level DR
K-sys Host Pairing Hosts are paired one to one across sites Hosts capacity should be sufficient to host all VMs LPM and Remote Restart Can Co exist. K sys will be a observer for these operations and align as part of daily scans of configurations Disk Group: K Sys will create a single group for the entire site and manage the same. DR move support Site Level DR support. All the hosts will be moved to backup site as part of the move Site 1 Site 2 Host Pair 1 Host 11 Host 21 Host Pair 2 Host 12 Host 22 Host Pair 3 Host 13 Host 23 Host Pair 4 Host 14 Host 24 Disk Group 1

19 GDR: Cost savings on DR site
Cross Site Enterprise Pool Deployments Optional Enterprise Pool Exploitation (Cross Site) Easy to use interface to acquire and release resources centrally. Quick resource allocation (No DLPAR related delays involved) Site 1 (home,active) Site 2 (backup) K-sys HMC_1_1 HMC_2_1 PAIR Enterprise Pool Fig 1: Two Hosts Enterprise Pool Recovery Site Enterprise Pool Capacity Management Site 1 (home,active) Site 2 (backup) K-sys When site 1 fails: Admin will do the following Go to K-sys and run command to reduce resources allocated to Host_2_2 (low priority). Then return the resources used by Host_2_2 to Enterprise Pool (EP) Then using command allocate those resources to Host 2_1 on Site 2 Then use ksysmgr to start the recovery on site 2 HMC_1_1 HMC_2_1 Low Priority LPARs Host 2_2 PAIR Host 2_1 Enterprise Pool Fig 2: Enterprise Pool within a Site

20 GDR Validations & Notifications
Daily proactive environment validations and Text alerts to administrator about issues/events found Extensible frameworks Administrator can plugin scripts to: Perform custom checks as part of verification Handle events as they occur

21 GDR for Power Service Offerings
Description Site specific network configuration IBM team will review, plan, and deploy VM based scripts to achieve site specific network configuration Flexible Capacity management across sites Services will develop scripts to automate cpu and memory capacity management across sites On demand DR testing automation Services will review the environment and develop scripts and processes to enable easy DR testing (clone secondary and then start VMs) w/o impacting production

22 DR Move times KSYS does as much parallel restart as possible
Up to 32 VMs restarts are initiated per VIOS/Host Number of HMCs and VIOS/resources play a role in restart times Early measurements, more work to do Iteration # DR Move Type Configuration #VMs Time in Minutes and Seconds Shutdown Replication Reversal Restart VMs Total Time 1 Planned Site 1 Hosts=P8 Site 2 Hosts=P7+ #HMCs/Site = 2 #Hosts/Site = 2 #Disks=140 100 (Host 1: 60 Host 2:40) 9 m 32 s 3 m 15 s 31 m 20 s 44 m 7 s 2 Unplanned - 4 m 21 mins Mins 3 Site 2 Hosts=P7 #HMCs/Site = 1 #Disks=1100 120 Host 1:50 Host 2:70 22m 8s 3m 58s 35m 29s 61m 31s 4 3 m 16 s 1m 26 s 29 m 8 s 34 m 5 Site 1,2 Hosts= P7, P7+ #HMCs/Site=2 #Hosts/Site=5 #Disks=395 300 Host 1-4:55 Host 5:70 20 m 55m 5s 79m

23 SAP HANA Disaster Recovery
System Test: Test SAP HANA VMs + NetWeaver VM back & forth DR failovers 96 Hour Continuous failover testing: Many failovers and failbacks SAP HANA and its workload will be checked for recovery + functionality (not performance) Backup Site Home Site Power GDR KSYS Host 1_2 Host 2_2 Source: APP NW or HANA App VM1 VM2 Pair VIOS 12 VIOS 22 Storage Replication

24 Priority based VM Restarts
Tier VMs per your environment needs and start them in prioritized order Home Site Backup Site Power GDR KSYS 1 VM_11 VM_12 VM_13 HIGH 2 VM_21 VM_22 VM_23 MEDIUM 3 VM_31 VM_32 VM_33 LOW

25 Host Group based GDR management
SITE 1 (Active) SITE 2 (Recovery/ Backup) Host Group 1: Client_1_Hosts_Group Create Group of Hosts And do DR failovers at Host group level DR test of Client_1 one weekend etc Expected that VM operations such as LPM etc will occur within the group. PAIR Host 21 Host 11 Host 22 Host 12 PAIR Host 23 Host 13 Host 24 Host 14 Host 15 Host 25 Client_2_Hosts_Group Disk Group 1 Disk Group 2

26 S1 S2 S2C DR Failover Rehearsal Non-disruptive DR testing … …
Failover Rehearsal: Easy DR testing Site 1 (home,active) Site 2 (backup) No disruption to production or replication Host 11 Host 21 LPAR 1_11 LPAR 1_12 LPAR 1_1m LPAR 1_11 LPAR 1_12 LPAR 1_1m VIOS 1_11 VIOS 1_12 VIOS 2_11 VIOS 2_12 Disk Group 1 Mirror S1 S2 S2C K-sys enables non-disruptive DR testing Copy storage on recovery site and use that to start VMs on back up system Network isolation needs to be established by the administrator If site specific VM network config is supported: Test related network config could also be supported

27 Flex Capacity DR Support
Flexible capacity management for backup site Use Min value of profile on backup site CPU=70% Memory=90% OR CPU=110% Memory=120% OR CPU=MIN Memory=MIN Home / Active Site Backup Site Failover Cloud Cloud Replication DR failover with less (or more) CPU and memory capacity on back up site Benefits/use cases Achieve Disaster Recovery solution with fewer resources Ideal for doing DR testing with fewer resources Pair newer/powerful systems with earlier systems. Flex Capacity support at: Host, Host group, Site or VM level

28 Site Specific Network Configuration
Different networks on different sites: VM Agent based Network Configuration Site 1 (home,active) Site 2 (backup) Host 21 Host 11 Boot time Config LPAR 1 VM1 Agent/Scripts in VM based network configuration/re-configuration 2016/17: Services based scripting Future: VM Agent support for AIX and Linux Boot time site specific network re-configuration Boot device management Application monitoring VM health monitoring

29 HA/DR evolution with Cloud
For many workloads simplicity of Restart HADR preferred in lieu of added values of Cluster HADR Cluster HA/DR: Critical Workloads protected using Cluster HA/DR Typically 5% of LPARs in the data center Prefer VM Restart HA for the rest of LPARs/VMs VM Restart HADR Restart HADR is a key requirement HMC and PowerVC provide restart HA support Some customers doing restart DR manually (XL sheets etc) Fault Tolerance Availability/Coverage Clusters Critical workloads VM Restart Non- Critical workloads Single Server Technology/Complexity

30 Cluster Disaster Recovery vs VM Restart Disaster Recovery
PowerHA Enterprise Edition Similar to VMWare SRM & GDPS Faster recovery time Integrated solutions for workloads (eg:HyperSwap Active-Active Sites) Simplified Deployment Model Reduced License Costs Reduced Administrative costs AIX, Linux guest VMs VS


Download ppt "Geographically Dispersed Resiliency (GDR) for Power Systems"

Similar presentations


Ads by Google