Download presentation
Presentation is loading. Please wait.
Published byGodwin Wells Modified over 9 years ago
2
© 2011 IBM Corporation Architect’s 2013 Guide to Designing HA, BC, and DR - Best Practices Industry Best Practices - IT HA DR BC Provided by: John Sing, Executive IT Consultant, San Jose, California singj@us.ibm.comsingj@us.ibm.com
3
© 2013 IBM Corporation Industry Best Practices – IT HA DR BC September 2013 2 Contents Principles of architecting traditional IT HA, DR, BC Technology and location considerations Traditional Workloads vs. Internet Scale Workloads Best Practices Step by Step Methodology
4
© 2013 IBM Corporation Industry Best Practices – IT HA DR BC September 2013 3 Four Stages of Data Center Efficiency: (pre-req’s for HA/BC/DR) http://public.dhe.ibm.com/common/ssi/ecm/en/rlw03007usen/RLW03007USEN.PDF http://www-935.ibm.com/services/us/igs/smarterdatacenter.html April 2012
5
© 2013 IBM Corporation Industry Best Practices – IT HA DR BC September 2013 4 Application 1 Application 3 Analytics report management reports http://xyz.xml decision point MQseries Web Sphere Application 2 SQL db2 Business process A Business process B Business process C Business process D Business process E Business process F Business process G Infrastructure Application Business 1. An error occurs on a storage device that correspondingly corrupts a database 2. The error impacts the ability of two or more applications to share critical data 3. The loss of both applications affects two distinctly different business processes IT Business Continuity must recover at the business process level Business Process is the Recoverable Unit
6
© 2013 IBM Corporation Industry Best Practices – IT HA DR BC September 2013 5 Still true: synergistic overlap of valid data protection techniques Protection of critical Business dataOperations continue after a disasterCosts are predictable and manageableRecovery is predictable and reliable Fault-tolerant, failure-resistant streamlined infrastructure with affordable cost foundation 1. High Availability Non-disruptive backups and system maintenance coupled with continuous availability of applications 2. Continuous Operations Protection against unplanned outages such as disasters through reliable, predictable recovery 3. Disaster Recovery IT Data Protection
7
© 2013 IBM Corporation Industry Best Practices – IT HA DR BC September 2013 6 Done? ? Still true: Timeline of an IT Recovery ==> Production ☺ Data Operating System Physical Facilities Telecom Network Management Control Execute hardware, operating system, and data integrity recovery AssessRPO Application transaction integrity recovery Applications Applications Staff Recovery Time Objective (RTO) of transaction integrity Recovery Time Objective (RTO) of hardware data integrity Recovery Point Objective (RPO) How much data must be recreated? Outage! RPO Telecom bandwidth still the major delimiter for any fast recovery
8
© 2013 IBM Corporation Industry Best Practices – IT HA DR BC September 2013 7 ? Still true: value of Automation for real-time failover ===> Production ☺ Data Operating System Physical Facilities Telecom Network Management Control Assess RPO Trans. Recov. Applications Applications Staff RTO trans. integrity RTO H/W Recovery Point Objective (RPO) How much data must be recreated? Outage! RPO HW Reliability Repeatability Scalability Frequent Testing Value of automation
9
© 2013 IBM Corporation Industry Best Practices – IT HA DR BC September 2013 8 Tape Backup SecsMinsHrsDays WksSecsMinsHrsDays Wks Recovery Point Recovery Time Synchronous replication / HA Periodic Replication Asynchronous replication Still true: Replication Technology Drives RPO For example:
10
© 2013 IBM Corporation Industry Best Practices – IT HA DR BC September 2013 9 Recovery Time includes: – Fault detection – Recovering data – Bringing applications back online – Network access Manual Tape Restore SecsMinsHrsDays WksSecsMinsHrsDays Wks Recovery Point Recovery Time End to end automated clustering Storage automation Still true: Recovery Automation Drives Recovery Time For example:
11
© 2013 IBM Corporation Industry Best Practices – IT HA DR BC September 2013 10 Integration into IT Manage Business Prioritization Strategy Design risk assessment business impact analysis Risks, Vulnerabilities and Threats program assessment Impacts of Outage RTO/RPO Maturity Model Measure ROI Roadmap for Program Program Design Current Capability Implement program validation Estimated Recovery Time Resilience Program Management Awareness, Regular Validation, Change Management, Quarterly Management Briefings Business processes drive strategies and they are integral to the Continuity of Business Operations. A company cannot be resilient without having strategies for alternate workspace, staff members, call centers and communications channels. crisis team business resumption disaster recovery high availability 1.People 2.Processes 3.Plans 4.Strategies 5.Networks 6.Platforms 7.Facilities Database and Software design High Availability Servers Storage, Data Replication High Availability design Source: IBM STG, IBM Global Services Still true: “ideal world” construct for IT High Availability and Business Continuity
12
© 2013 IBM Corporation Industry Best Practices – IT HA DR BC September 2013 11 The 2013 Bottom line: (IT Business Continuity Planning Steps) For today’s real world environment………. Integration into IT Manage Business Prioritization Strategy Design risk assessment business impact analysis Risks, Vulnerabilities and Threats program assessment Impacts of Outage RTO/RPO Maturity Model Measure ROI Roadmap for Program Program Design Current Capability Implement program validation Estimated Recovery Time Resilience Program Management Awareness, Regular Validation, Change Management, Quarterly Management Briefings crisis team business resumption disaster recovery high availability 1.People 2.Processes 3.Plans 4.Strategies 5.Networks 6.Platforms 7.Facilities Database and Software design High Availability Servers Data Replication high availability design i.e. how to streamline this “ideal” process? 1.Collect information for prioritization 2.Vulnerability, risk assessment, scope 3.Define BC targets based on scope 4.Solution option design and evaluation 5.Recommend solutions and products 6.Recommend strategy and roadmap 4.Solution option design and evaluation 5.Recommend solutions and products 6.Recommend strategy and roadmap 2013 key #2: Workload type 2013 key #1: need a basic Data Strategy Need faster way than even this simplified 2007 version:
13
© 2013 IBM Corporation Industry Best Practices – IT HA DR BC September 2013 12 Streamlined BC Actions Input Output 2. Vulnerability / Risk Assessment List of vulnerabilities Defined vulnerabilities 3. Define desired HA/BC targets based on scope Existing BC capability, KPIs, targets, and success rate Defined BC baseline targets, architecture, decision and success criteria 4. Solution design and evaluation Technologies and solution options Business process segments and solutions 5. Recommend solutions and products Generic solutions that meet criteria Recommended IBM Solutions and benefits 1.Collect info for prioritization Business processes, Key Perf. Indicators, IT inventory Scope, Resource Business Impact Component effect on business processes 6. Recommend strategy and roadmap Budget, major project milestones, resource availability, business process priority Baseline Bus. Cont. strategy, roadmap, benefits, challenges, financial implications and justification 2005 version
14
© 2013 IBM Corporation Industry Best Practices – IT HA DR BC September 2013 13 Scope definition of Business Continuity program Frequency of Occurrences Per Year Consequences (Single Occurrence Loss) in Dollars per Occurrence 1,000 100 10 1 1/10 1/100 1/1,000 1/10,000 1/100,000 Virus Worms Disk Failure Component Failure Power Failure frequent infrequent lower higher Natural Disaster Application Outage Data Corruption Network Problem Building Fire Terrorism/Civil Unrest availability-related recovery-related This becomes the scope of HA/BC progrom
15
© 2013 IBM Corporation Industry Best Practices – IT HA DR BC September 2013 14 Define scope based on prioritized vulnerabilities Set expectation for phased implementation Example chart at left shows Vulnerability / Risk Assessment: –Define what will be on the chart –This defines the scope of the Business Continuity solution Divide Scope into implementation phases –Do not try to solve all vulnerabilities at once –Instead, focus on delivering tangible visible value in each project step –Portray that scope expands as project progresses –This matches expenditure with increasing probability over time risk 6 months12 months18 months Total Scope Likelihood Impact risk Risk Assessment
16
© 2013 IBM Corporation Industry Best Practices – IT HA DR BC September 2013 15 Recovery Time Objective (guidelines only) 15 Min.1-4 Hr..4 -8 Hr..8-12 Hr..12-16 Hr..24 Hr..Days Cost / Value BC Tier 4 – Add Point in Time replication to Backup/Restore BC Tier 3 – VTL, Data De-Dup, Remote vault BC Tier 2 – Tape libraries + Automation BC Tier 7 – Add Server or Storage replication with end-to-end automated server recovery BC Tier 6 – Add real-time continuous data replication, server or storage BC Tier 1 – Restore from Tape Step by Step: Typical three phase approach to implementing High Availability, Business Continuity Technologies Balancing recovery time objective with cost / value BC Tier 5 – Add Application/database integration to Backup/Restore Recovery from a disk imageRecovery from tape copy
17
© 2013 IBM Corporation Industry Best Practices – IT HA DR BC September 2013 16 Recovery Time Objective 15 Min.1-4 Hr..4 -8 Hr..8-12 Hr..12-16 Hr..24 Hr..Days Cost / Value BC Tier 4 – Add Point in Time replication to Backup/Restore BC Tier 3 – VTL, Data De-Dup, Remote vault BC Tier 2 – Tape libraries + Automation BC Tier 7 – Add Server or Storage replication with end-to-end automated server recovery BC Tier 6 – Add real-time continuous data replication, server or storage BC Tier 1 – Restore from Tape Recovery from a disk imageRecovery from tape copy Step by Step Virtualization, High Availability, Business Continuity data strategy Balancing recovery time objective with cost / value BC Tier 5 – Add Application/database integration to Backup/Restore Continuous Availability Rapid Data Recovery Backup/Restore Workload types Storage Pools Cloud deployment if needed
18
© 2013 IBM Corporation Industry Best Practices – IT HA DR BC September 2013 17 ? IT Virtualization, Consolidation enhances Data Protection Funding given today’s cost crunch? Complexity of infrastructure to recover? Priorities? Resources? Data Protection is an intended side benefit of Consolidation, Virtualization Fact: accelerating IT Consolidation, Virtualization, will accelerate Data Protection Strategic Approach: Data protection is intended side-benefit of IT Virtualization Data Protection Fewer Components to Recover Invest percentage of Savings Invest in more robust Business Resiliency Standardize and optimize IT and Business Resiliency solution design Load Balancing Solution architecture HA/BC pre-requisite: IT Virtualization and Consolidation Cost-Effective Storage and IT Efficiency Application Servers High-End Workstations Database End Users Proto cols SAN CIFS NFS HTTP FTP Mana geme nt Centra l Admini stratio n Monito ring File Mgmt Availability Data Migration Replication Backup
19
© 2013 IBM Corporation Industry Best Practices – IT HA DR BC September 2013 18 For traditional IT - Virtualization is fundamental to addressing today’s IT diversity Virtuali zation
20
© 2013 IBM Corporation Industry Best Practices – IT HA DR BC September 2013 19 IT Virtualization is the means to achieve IT Business Continuity I.e. consolidate Servers, Storage, into virtualized systems Provides the change agent and political momentum to enable Business Continuity implementation Reduces management complexity using integrated virtualization and management software Provides workload optimization needed for affordable maximum performance and efficiency Becomes possible to identify what to replicate and manage that replication Implements key tools such as virtual resource mobility within the ensemble Is perfect foundation to implement the necessary IT strategy, design, tools, procedures, and testing to create IT Business Continuity Because it also provides the umbrella and political change-agent required to allow IT Business Continuity to be implemented as a by-product
21
© 2013 IBM Corporation Industry Best Practices – IT HA DR BC September 2013 20 Virtualized IT infrastructure Business Processes Virtualized systems become the resource pools that enable the recoverability For traditional IT - Consolidated virtualized systems become the Recoverable Units for IT Business Continuity Virtuali zation
22
© 2013 IBM Corporation Industry Best Practices – IT HA DR BC September 2013 21 IT storage infrastructure …… Before: End Users Servers and Storage Database Underutilized Segmented Storage Copies of Data Application Servers High-End Workstations
23
© 2013 IBM Corporation Industry Best Practices – IT HA DR BC September 2013 22 Transformation To Standardization, Virtualization Servers And Storage Database Underutilized Segmented Storage Copies of Data Application Servers High-End Workstations (animated chart) End Users Virtualized Storage Virtualization SAN NAS Management Central Administration Monitoring File Mgmt Availability Data Migration Replication Backup Virtualized Storage Ability to move data between storage pools Tiered Storage Virtualized De-dup, tape High performance petabyte scale Here are the benefits:
24
© 2013 IBM Corporation Industry Best Practices – IT HA DR BC September 2013 23 Transformation to Standardization, Virtualization Virtualization SAN NAS Management Central Administration Monitoring File Mgmt Availability Data Migration Replication Backup Virtualized Storage Ability to move data between storage pools Tiered Storage Virtualized De-dup, tape High performance petabyte scale End usersHigh end workstations Application servers Database Virtualized Storage (non-animated chart)
25
© 2013 IBM Corporation Industry Best Practices – IT HA DR BC September 2013 24 Key strategy: using standardized virtualization, segment data into logical data storage pools by appropriate Data Protection characteristics Continuous Availability (CA) – E2E automation enhances RDR –RTO = near continuous, RPO = small as possible (Tier 7) –Priority = uptime, with high value justification Lower cost Rapid Data Recovery (RDR) – enhance backup/restore –For data that requires it –RTO = minutes, to (approx. range): 2 to 6 hours –BC Tiers 6, 4 –Balanced priorities = Uptime and cost/value Backup/Restore (B/R) – assure efficient foundation –Standardize base backup/restore foundation –Provide universal 24 hour - 12 hour (approx) recovery capability –Address requirements for archival, compliance, green energy –Priority = cost Mission Critical Know and categorize your data - Provides foundation for affordable data protection Enabled by virtualization
26
© 2013 IBM Corporation Industry Best Practices – IT HA DR BC September 2013 25 Recovery Time Objective 15 Min.1-4 Hr..4 -8 Hr..8-12 Hr..12-16 Hr..24 Hr..Days Cost / Value BC Tier 4 – Add Point in Time replication to Backup/Restore BC Tier 3 – VTL, Data De-Dup, Remote vault BC Tier 2 – Tape libraries + Automation BC Tier 7 – Add Server or Storage replication with end-to-end automated server recovery BC Tier 6 – Add real-time continuous data replication, server or storage BC Tier 1 – Restore from Tape High Availability, Business Continuity Step by Step virtualization journey Balancing recovery time objective with cost / value BC Tier 5 – Add Application/database integration to Backup/Restore Recovery from a disk imageRecovery from tape copy Foundation Storage pools
27
© 2013 IBM Corporation Industry Best Practices – IT HA DR BC September 2013 26 Storage Pools Apply appropriate server, storage technology Real Time replication (storage or server or software) Periodic PiT replication: -File System - Point in Time Disk - VTL to VTL with Dedup - Foundation backup/restore - Physical or electronic transport PetaByte Unstructured Petabyte Unstructured Petabyte unstructured, due to usage and large scale, typically uses application level intelligent redundancy failure toleration design Real-time replication Point in time Removable media File, application, or disk-to-disk periodic replication Add automated failover to replicated storage
28
© 2013 IBM Corporation Industry Best Practices – IT HA DR BC September 2013 27 Step by step – architecting remote solution
29
© 2013 IBM Corporation Industry Best Practices – IT HA DR BC September 2013 28 Methodology Traditional IT: HA / BC / DR in stages, from bottom up SAN Add: Point-in-time Copy, disk to disk, Tiered Storage (Tier 4) Foundation: electronic vaulting, automation, tape lib (Tier 3) Foundation: standardized, automated tape backup (Tier 2, 1) Disk VTL/De-Dup Disk VTL/De-Dup IBM FlashCopy, SnapShot IBM XIV, SVC, DS, SONAS IBM Tivoli Storage Productivity Center 5.1 IBM ProtecTier IBM Virtual Tape Library IBM Tivoli Storage Manager Backup/restore VTL, de-dup, remote replication at tape level
30
© 2013 IBM Corporation Industry Best Practices – IT HA DR BC September 2013 29 SAN Add: Point-in-time Copy, disk to disk for backup/restore (Tier 4) Foundation: electronic vaulting, automation, tape lib (Tier 3) Foundation: standardized, automated tape backup (Tier 2, 1) Disk VTL/De-Dup Disk VTL/De-Dup Application integration Application integration Automate applications, database for replication and automation (Tier 5) Consolidate and implement real time data availability (Tier 6) Data replication Data replication End to end automated site failover servers, storage, applications (Tier 7) Dynamic End to end Automated Failover: Server Storage Applications Methodology Traditional IT HA / BC / DR in stages, from bottom up If storage: Metro Mirror, Global Mirror, Hitachi UR XIV, SVC, DS, other storage TPC 5.1 VMWare PowerHA on p Tivoli FlashCopy Manager Server virtualization
31
© 2013 IBM Corporation Industry Best Practices – IT HA DR BC September 2013 30 IBM Disk Mirroring Technology naming DS8000 DS6000 ESS DS5000 DS4000 DCS3700 DS3000 V3700 V7000 N series. Entry MidrangeNASEnterprise SAN SVC V7000 Virtualization Metro / Global Mirror Three site synchronous and asynchronous mirroring – DS8000 (sync+async) – N series (only async) FlashCopy Point in time copy SVC, V7000, DS3000, DS4000, DS5000, DS6000, DS8000, ESS, XIV, SONAS, N series Global Mirror Asynchronous Mirroring SVC, V7000, DCS3700, DS4000, DS5000, DS6000, DS8000, ESS, XIV, SONAS, N series Metro Mirror Synchronous Mirroring SVC, V7000, DS3500, DCS3700, DS4000, DS5000, DS6000, DS8000, ESS, XIV, N series XIV SONAS
32
© 2013 IBM Corporation Industry Best Practices – IT HA DR BC September 2013 31 Recovery Time Objective 15 Min.1-4 Hr..4 -8 Hr..8-12 Hr..12-16 Hr..24 Hr..Days Cost / Value BC Tier 4 – Add Point in Time replication to Backup/Restore BC Tier 3 – VTL, Data De-Dup, Remote vault BC Tier 2 – Tape libraries + Automation BC Tier 7 – Add Server or Storage replication with end-to-end automated server recovery BC Tier 6 – Add real-time continuous data replication, server or storage BC Tier 1 – Restore from Tape Today’s world: High Availability, Business Continuity is a Step by Step data strategy / workload journey Balancing recovery time objective with cost / value BC Tier 5 – Add Application/database integration to Backup/Restore Recovery from a disk imageRecovery from tape copy Workload Types Data Strategy Cloud deployment if needed
33
© 2013 IBM Corporation Industry Best Practices – IT HA DR BC September 2013 32 Summary – IT High Availability / Business Continuity Best Practices 2012 Production Backup/Restore Tier 1, 2 Foundation: Storage, server virtualization and consolidation Understand my data Define scope of recovery Implement remote sites (Tier 1, 2) Backup/Restore Tier 1, 2 replicated foundation: SAN and server virtualization and consolidation Implement Tier 3 – Consolidate and standardize Backup/Restore methods. Implement tape VTL, data de-dup, Server / Storage Virtualization / Mgmt tools, basic automation Backup / Restore Implement Tier 4 – Standardize use of disk to disk and Point in Time disk copy Implement Tier 5 - Standardize DB / Application Mirroring methods Implement Tier 6 – Standardize high volume data replication method Rapid Data Recovery Implement BC Tier 7 – Standardize use of Continuous Availability automated Failover Continuous Availability Workload types Data strategy Recovery
34
© 2013 IBM Corporation Industry Best Practices – IT HA DR BC September 2013 33 Key IT High Availability, Business Continuity Requirements Questions (in proper order): 1. What applications or databases to recover? 2. What platform? (z, p, i, x and Windows, Linux, heterogeneous open, heterogeneous z+Open) 3. What is desired Recovery Time Objective (RTO)? 4. What is distance between the sites? (if there are 2 sites) 5. What is the connectivity, infrastructure, and bandwidth between sites? 7. What is the Level of Recovery? - Planned Outage - Unplanned Outage - Transaction Integrity 8. What is the Recovery Point Objective? 9. What is the amount of data to be recovered (in GB or TB)? 10. Who willdesign the solution? 11. Who will implement the solution? 12. Remaining solutions arevalid choices to give todetailed DR evaluation team 6. What are the specific h/w equipment(s) that needs to be recovered? Tier 4 Tier 3 Tier 2 Tier 7 Tier 6 Tier 5 Tier 1
35
© 2013 IBM Corporation Industry Best Practices – IT HA DR BC September 2013 34 Summary Cloud deployment options Principles of architecting traditional IT HA, DR, BC Technology and location considerations Traditional Workloads vs. Internet Scale Workloads Best Practices Step by Step Methodology
36
© 2013 IBM Corporation Industry Best Practices – IT HA DR BC September 2013 35
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.