Presentation is loading. Please wait.

Presentation is loading. Please wait.

Business Continuity Overview

Similar presentations


Presentation on theme: "Business Continuity Overview"— Presentation transcript:

1 Business Continuity Overview
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved. Business Continuity Overview Module 4.1

2 Business Continuity Overview
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved. Business Continuity Overview After completing this module, you will be able to: Define and differentiate between Business Continuity and Disaster Recovery Differentiate between Disaster Recovery and Disaster Restart Define terminology such as Recovery Point Objective and Recovery Time Objective Describe (at high level) Business Continuity Planning Identify Single Points of Failure and describe solutions to eliminate them Information has become a critical asset for businesses. The survival of a business depends on uninterrupted availability of the data. Steps should be taken to ensure continuous availability of data in the event of a disaster. The objectives for this module are shown here. Please take a moment to review them. Business Continuity Overview

3 What is Business Continuity?
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved. What is Business Continuity? Business Continuity is the preparation for, response to, and recovery from an application outage that adversely affects business operations Business Continuity Solutions address systems unavailability, degraded application performance, or unacceptable recovery strategies Business Continuity Overview

4 Why Business Continuity
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved. Why Business Continuity Lost Productivity Lost Revenue Know the downtime costs (per hour, day, two days...) Number of employees impacted (x hours out * hourly rate) Direct loss Compensatory payments Lost future revenue Billing losses Investment losses Damaged Reputation Financial Performance Customers Suppliers Financial markets Banks Business partners Revenue recognition Cash flow Lost discounts (A/P) Payment guarantees Credit rating Stock price There are many factors that need to be considered when calculating the cost of downtime. A formula to calculate the costs of the outage should capture both the cost of lost productivity of employees and the cost of lost income from missed sales. The Estimated average cost of 1 hour of downtime = (Employee costs per hour) *( Number of employees affected by outage) + (Average Income per hour). Employee costs per hour is simply the total salaries and benefits of all employees per week, divided by the average number of working hours per week. Average income per hour is just the total income of an institution per week, divided by average number of hours per week that an institution is open for business. Other Expenses Temporary employees, equipment rental, overtime costs, extra shipping costs, travel expenses... Business Continuity Overview

5 Information Availability
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved. Information Availability % Uptime % Downtime Downtime per Year Downtime per Week 98% 2% 7.3 days 3hrs 22 min 99% 1% 3.65 days 1 hr 41 min 99.8% 0.2% 17 hrs 31 min 20 min 10 sec 99.9% 0.1% 8 hrs 45 min 10 min 5 sec 99.99% 0.01% 52.5 min 1 min 99.999% 0.001% 5.25 min 6 sec % 0.0001% 31.5 sec 0.6 sec Information Availability ensures that applications and business units have access to information whenever it is needed. The primary components of information availability are: Protection from data loss Ensuring data access Appropriate data security The online window for some critical applications has moved to % of time. Business Continuity Overview

6 Importance of Business Continuity and Planning
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved. Importance of Business Continuity and Planning Millions of US Dollars per Hour in Lost Revenue Retail 1.1 Insurance 1.2 Information technology 1.3 Financial institutions 1.5 Manufacturing 1.6 Call location 1.6 Telecommunications 2.0 Credit card sales authorization Down time is expensive! 2.6 Energy 2.8 Point of sale 3.6 Retail brokerage 6.5 Source Meta Group, 2005 Business Continuity Overview

7 Recovery Point Objective (RPO)
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved. Recovery Point Objective (RPO) Wks Days Hrs Mins Secs Secs Mins Hrs Days Wks Recovery Point Recovery Point Recovery Time Recovery Time Asynchronous Replication Asynchronous Replication Synchronous Replication Synchronous Replication Periodic Replication Periodic Replication Tape Backup Tape Backup Recovery Point Objective (RPO) is the point in time to which systems and data must be recovered after an outage. This defines the amount of data loss a business can endure. Different business units within an organization may have varying RPOs. Business Continuity Overview

8 Recovery Time Objective (RTO)
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved. Recovery Time Objective (RTO) Wks Days Hrs Mins Secs Secs Mins Hrs Days Wks Recovery Point Recovery Point Recovery Time Recovery Time Manual Migration Manual Migration Tape Restore Global Cluster Tape Restore Global Cluster Recovery Time includes: Fault detection Recovering data Bringing apps back online Recovery Time Objective (RTO) is the period of time within which systems, applications, or functions must be recovered after an outage. This defines the amount of downtime that a business can endure, and survive. Business Continuity Overview

9 Disaster Recovery versus Disaster Restart
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved. Disaster Recovery versus Disaster Restart Most business critical applications have some level of data interdependencies Disaster recovery Restoring previous copy of data and applying logs to that copy to bring it to a known point of consistency Generally implies the use of backup technology Data copied to tape and then shipped off-site Requires manual intervention during the restore and recovery processes Disaster restart Process of restarting mirrored consistent copies of data and applications Allows restart of all participating DBMS to a common point of consistency utilizing automated application of recovery logs during DBMS initialization The restart time is comparable to the length of time required for the application to restart after a power failure Disaster recovery is the process of restoring a previous copy of the data and applying logs or other necessary processes to that copy to bring it to a known point of consistency. Disaster restart is the restarting of dependent write consistent copies of data and applications, utilizing the automated application of DBMS recovery logs during DBMS initialization to bring the data and application to a transactional point of consistency. There is a fundamental difference between Disaster Recovery and Disaster Restart. Disaster recovery is the process of restoring a previous copy of the data and applying logs to that copy to bring it to a known point of consistency. Disaster restart is the restarting of mirrored consistent copies of data and applications. Disaster recovery generally implies the use of backup technology in which data is copied to tape and then it is shipped off-site. When a disaster is declared, the remote site copies are restored and logs are applied to bring the data to a point of consistency. Once all recoveries are completed, the data is validated to ensure it is correct. Business Continuity Overview

10 Disruptors of Data Availability
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved. Disruptors of Data Availability Disaster (<1% of Occurrences) Natural or man made Flood, fire, earthquake Contaminated building Unplanned Occurrences (13% of Occurrences) Failure Database corruption Component failure Human error Planned Occurrences (87% of Occurrences) Competing workloads Backup, reporting Data warehouse extracts Application and data restore Elevated demand for increased application availability confirms the need to ensure business continuity practices are consistent with business needs. Interruptions are classified as either planned or unplanned. Failure to address these specific outage categories seriously compromises a company’s ability to meet business goals. Planned downtime is expected and scheduled, but it is still downtime causing data to be unavailable. Causes of planned downtime include: New hardware installation/integration/maintenance Software upgrades/patches Backups Application and data restore Data center disruptions from facility operations (renovations, construction, other) Refreshing a testing or development environment with production data Porting testing/development environment over to production environment Source: Gartner, Inc. Business Continuity Overview

11 Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.
Causes of Downtime Human Error System Failure Today, the most critical component of an organization is information. Any disaster occurrence will affect information availability critical to run normal business operations. In our definition of disaster, the organization’s primary systems, data, applications are damaged or destroyed. Not all unplanned disruptions constitute a disaster. Infrastructure Failure Disaster Business Continuity Overview

12 Business Continuity vs. Disaster Recovery
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved. Business Continuity vs. Disaster Recovery Business Continuity has a broad focus on prevention: Predictive techniques to identify risks Procedures to maintain business functions Disaster Recovery focuses on the activities that occur after an adverse event to return the entity to ‘normal’ functioning Business Continuity is a holistic approach to planning, preparing, and recovering from an adverse event. The focus is on prevention, identifying risks, and developing procedures to ensure the continuity of business function. Disaster recovery planning should be included as part of business continuity. BC Objectives include: Facilitate uninterrupted business support despite the occurrence of problems. Create plans that identify risks and mitigate them wherever possible. Provide a road map to recover from any event. Disaster Recovery is more about specific cures, to restore service and damaged assets after an adverse event. In our context, Disaster Recovery is the coordinated process of restoring systems, data, and infrastructure required to support key ongoing business operations. Business Continuity Overview

13 Business Continuity Planning (BCP)
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved. Business Continuity Planning (BCP) Includes the following activities: Identifying the mission or critical business functions Collecting data on current business processes Assessing, prioritizing, mitigating, and managing risk Risk Analysis Business Impact Analysis (BIA) Designing and developing contingency plans and disaster recovery plan (DR Plan) Training, testing, and maintenance Business Continuity Planning (BCP) is a risk management discipline. It involves the entire business--not just IT. BCP proactively identifies vulnerabilities and risks, planning in advance how to prepare for and respond to a business disruption. A business with strong BC practices in place is better able to continue running the business through the disruption and to return to “business as usual.” BCP actually reduces the risk and costs of an adverse event because the process often uncovers and mitigates potential problems. Business Continuity Overview

14 Business Continuity Planning Lifecycle
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved. Business Continuity Planning Lifecycle The Business Continuity Planning process includes the following stages: Objectives Determine business continuity requirements and objectives including scope and budget Team selection (include all areas of the business and subject matter expertise (internal/external) Create the project plan Perform analysis Collect information on data, business processes, infrastructure supports, dependencies, frequency of use Identify critical needs and assign recovery priorities. Create a risk analysis (areas of exposure) and mitigation strategies wherever possible. Create a Business Impact Analysis (BIA) Create a Cost/benefit analysis – identify the cost (per hour/day, etc.) to the business when data is unavailable. Evaluate Options 3. Design and Develop the BCP/Strategies Evaluate options Define roles/responsibilities Develop contingency scenarios Develop emergency response procedures Detail recovery, resumption, and restore procedures Design data protection strategies and develop infrastructure Implement risk management/mitigation procedures Train, test, and document, implement, maintain, and assess Business Continuity Overview

15 Business Impact Analysis (BIA)
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved. Business Impact Analysis (BIA) # Business Area Affected Impact (1 -5) Probability (1-5) Single Loss Expectancy # Event p/y Loss p/y Est cost of mitigation High Risk SPOF Item 1 Entire Company 5 $279,056 .25 $69517 $5,800 No redundant UPS for Networking/phone equip 2 $279,066 0.2 $55768 $66,456 Cisco net backbone switch not redundant 3 $279,098 $55619 $10,000 Relocate net equip to a separate physical rack 4 IT-All $16,000 1.0 18000 $80,000 Primary dev platforms don’t have failover 0.5 $8000 $122,000 Computer room does not have sufficient UPS capacity to run on single unit 6 IT- Intranet/B2B $400 $1800 $5,000 No failover for development webserver This is an example of Business Impact Analysis (BIA). The dollar values are arbitrary and are used just for illustration. BIA quantifies the impact that an outage will have to the business and potential costs associated with the interruption. It helps businesses channel their resources based on probability of failure and associated costs. Business Continuity Overview

16 Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.
Identifying Single Points of Failure Primary Node User & Application Clients IP Consider the components in the picture and identify the Single Points of Failure. Business Continuity Overview

17 Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.
HBA Failures Configure multiple HBAs, and use multi-pathing software Protects against HBA failure Can provide improved performance (vendor dependent) HBA HBA Port Port HBA HBA Switch Host Storage Configuring multiple HBAs and using multi-pathing software provides path redundancy. Upon detection of a failed HBA, the software can re-drive the I/O through another available path. Business Continuity Overview

18 Switch/Storage Array Port Failures
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved. Switch/Storage Array Port Failures Configure multiple switches Make the devices available via multiple storage array ports HBA HBA Port Port HBA HBA Port Port Host Switch Storage This configuration provides switch redundancy, as well as protects against storage array port failures. Business Continuity Overview

19 Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.
Disk Failures Use some level of RAID HBA HBA Port Port HBA HBA Port Port Host Switch Storage As seen earlier, using some level of RAID such as RAID-1 or RAID-5 ensures continuous operation in the event of disk failures. Business Continuity Overview

20 Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.
Host Failures Host clustering protects against production host failures HBA HBA Port Port HBA HBA Port Port Host Switch Planning and configuring clusters is a complex task. At a high level: A cluster is two or more hosts with access to the same set of storage (array) devices Simplest configuration is a two node (host) cluster One of the nodes would be the production server while the other would be configured as a standby. This configuration is described as Active/Passive. Participating nodes exchange “heart-beats” or “keep-alives” to inform each other about their health. In the event of the primary node failure, cluster management software will shift the production workload to the standby server. Implementation of the cluster failover process is vendor specific. A more complex configuration would be to have both the nodes run production workload on the same set of devices. Either cluster software or application/database should then provide a locking mechanism so that the nodes do not try to update the same areas on disk simultaneously. This would be an Active/Active configuration. Storage Storage HBA HBA Host Business Continuity Overview

21 Site/Storage Array Failures
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved. Site/Storage Array Failures Remote replication helps protect against either entire site or storage array failures Storage HBA HBA Port Port HBA HBA Port Port Host Switch What is not shown in the picture is host connectivity to the storage array in the remote site. Remote replication is explored fully later module in this section. Storage HBA HBA Business Continuity Overview

22 Resolving Single Points of Failure
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved. Resolving Single Points of Failure Redundant Disks (RAID 1/RAID 5) Redundant Paths Redundant Network Primary Node Clustering Software User & Application Clients Keep Alive Switches IP IP Failover Node Redundant Site This slide summarizes what we have seen in the previous few. Storage Array Storage Array Business Continuity Overview

23 Business Continuity Technology Solutions
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved. Business Continuity Technology Solutions Local Replication Remote Replication Backup/Restore Business continuity technology solutions include local replication, remote replication, and backup/restore. This module provides a very high level overview of some of these solutions. They are covered in more detail in later modules. Business Continuity Overview

24 Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.
Local Replication Data from the production devices is copied over to a set of target (replica) devices within the same array After some time, the replica devices will contain identical data as those on the production devices Subsequently copying of data can be halted. At this point-in-time, the replica devices can be used independently of the production devices The replicas can then be used for restore operations in the event of data corruption or other events Alternatively the data from the replica devices can be copied to tape. This off-loads the burden of backup from the production devices Local replication technologies offer fast and convenient methods for ensuring data availability. The different technologies and the uses of replicas for BC/DR operations will be discussed in a later module in this section. Typically local replication uses replica disk devices. This greatly speeds up the restore process, thus minimizing the RTO. Frequent point-in-time replicas also help in minimizing RPO. Business Continuity Overview

25 Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.
Remote Replication Data from the production devices is copied over to a set of target (replica) devices on a different array at some distance away Target devices can be kept continuously synchronized with the production devices In the event of a failure of the production devices, applications can continue to run from the target devices Remote replication typically involves a pair of arrays separated by some distance. To achieve near-zero RPO and a very small RTO, production and target devices are kept synchronized at all times. Periodic local replicas of the target devices may also be taken, to protect against data corruption on the production devices. The various alternatives for remote replication are discussed later in this section. Business Continuity Overview

26 Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.
Backup/Restore Backup to tape has been the predominant method for ensuring data availability and business continuity Low cost, high capacity disk drives are now being used for backup to disk. This considerably speeds up the backup and the restore process Frequency of backup will be dictated by defined RPO/RTO requirements as well as the rate of change of data Far from being antiquated, periodic backup is still a widely used method for preserving copies of data. In the event of data loss due to corruption or other events, data can be restored up to the last backup. Evolving technologies now permit faster backups to disks. Magnetic tape drive speeds and capacities are also continually being enhanced. The various backup paradigms and the role of backup in BC/DR planning are discussed in detail later in this section. Business Continuity Overview

27 Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.
Module Summary Key points covered in this module: Importance of Business Continuity Types of outages and their impact to businesses Business Continuity Planning and Disaster Recovery Definitions of RPO and RTO Difference between Disaster Recovery and Disaster Restart Identifying and eliminating Single Points of Failure These are the key points covered in this module. Please take a moment to review them. Business Continuity Overview

28 Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.
Check Your Knowledge Which concerns do business continuity solutions address? What is the difference between RPO and RTO? What is the difference between Disaster Recovery and Disaster Restart? What are some of the Single Points of Failure in a typical data center environment? How can the loss of a storage array port be mitigated? Business Continuity Overview

29 Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.
Apply Your Knowledge After completing this topic, you will be able to: Describe EMC PowerPath Discuss the features and benefits of PowerPath in storage environments Explain how PowerPath achieves transparent recovery At this point, let’s apply what has been learned in this lesson to some real world examples. In this case, we look at how EMC PowerPath improves business continuity in storage environments. Business Continuity Overview

30 Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.
What is EMC PowerPath DBMS Management Utils File System Logical Volume Manager Applications Open Systems Host SERVER STORAGE Interconnect Topology SCSI Driver SCSI Controller PowerPath Host Based Software Resides between application and SCSI device driver Provides Intelligent I/O path management Transparent to the application Automatic detection and recovery from host-to-array path failures PowerPath is host-based software that resides between the application and the disk device layers. Every I/O from the host to the array must pass through the PowerPath driver software. This allows PowerPath to work in conjunction with the array and connectivity environment to provide intelligent I/O path management. This includes path failover and dynamic load balancing, while remaining transparent to any application I/O requests as it automatically detects and recovers from host-to-array path failures. PowerPath is supported on various hosts and Operating Systems such as Sun- Solaris, IBM-AIX, HP-UX, Microsoft Windows, Linux, and Novell. Storage arrays from EMC, Hitachi, HP, and IBM are supported. The level of OS and array models supported varies between PowerPath software versions. Business Continuity Overview

31 Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.
PowerPath Features Multiple paths, for higher availability and performance Dynamic multipath load balancing Proactive path testing and automatic path recovery Automatic path failover Online path configuration and management High-availability cluster support PowerPath Delivers: PowerPath maximizes application availability, optimizes performance, and automates online storage management while reducing complexity and cost, all from one powerful data path management solution. PowerPath supports the following features: Multiple path support - PowerPath supports multiple paths between a logical device and a host. Multiple paths enables the host to access a logical device, even if a specific path is unavailable. Also, multiple paths enable sharing of the I/O workload to a given logical device. Dynamic load balancing - PowerPath is designed to use all paths at all times. PowerPath distributes I/O requests to a logical device across all available paths, rather than requiring a single path to bear the entire I/O burden. Proactive path testing and automatic path recovery - PowerPath uses a path test to ascertain the viability of a path. After a path fails, PowerPath continues testing it periodically to determine if it is fixed. If the path passes the test, PowerPath restores it to service and resumes sending I/O to it. Automatic path failover - If a path fails, PowerPath redistributes I/O traffic from that path to functioning paths. Online configuration and management - PowerPath management interfaces include a command line interface and a GUI interface on Windows. High availability cluster support - PowerPath is particularly beneficial in cluster environments, as it can prevent operational interruptions and costly downtime. Business Continuity Overview

32 PowerPath Configuration
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved. PowerPath Configuration All volumes are accessible through all paths Maximum 32 paths to a logical volume Interconnect support for SAN SCSI iSCSI Host Application(s) HBA SD Host Bus Adapter SCSI Driver Storage SERVER STORAGE Interconnect Topology PowerPath Without PowerPath, if a host needed access to 40 devices, and there were four host bus adapters, you would most likely configure it to present 10 unique devices each host bus adapter. With PowerPath, you configure it in a way to allow all 40 devices to be “seen” by all four host bus adapters. PowerPath supports up to 32 paths to a logical volume. The host can be connected to the array using a number of interconnect topologies such as SAN, SCSI, or iSCSI. Business Continuity Overview

33 The PowerPath Filter Driver
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved. The PowerPath Filter Driver Platform independent base driver Applications direct I/O to PowerPath PowerPath directs I/O to optimal path based on current workload and path availability When a path fails PowerPath chooses another path in the set Host Application(s) PowerPath Filter Driver SERVER SD SD SD SD SCSI Driver HBA HBA HBA HBA Host Bus Adapter Interconnect Topology The PowerPath filter driver is a platform independent driver that resides between the application and HBA driver. The driver identifies all paths that read and write to the same device and builds a routing table called a volume path set for the device. A volume path set is created for each shared device in the array . PowerPath can use any path in the set to service an I/O request. If a path fails, PowerPath can redirect an I/O request from that path to any other available path in the set. This redirection is transparent to the application, which does not receive an error. STORAGE Storage Business Continuity Overview

34 Path Fault without PowerPath
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved. Path Fault without PowerPath In most environments, a host will have multiple paths to the Storage System Volumes are spread across all available paths Each volume has a single path Host adapter and cable connections are single points of failure Work load not balanced among all paths Host Application(s) SD SD SD SD SCSI Driver SERVER HBA HBA HBA HBA Host Bus Adapter Interconnect Topology Without PowerPath, the loss of a channel (as indicated in the diagram by a red dotted line) means one or more applications may stop functioning. This can be caused by the loss of a Host Bus Adapter, Storage Array Front-end connectivity, Switch port, or a failed cable. In a standard non-PowerPath environment, these are all single points of failure. In this case, all I/O that was heading down the path highlighted in red is now lost, resulting in an application failure and the potential for data loss or corruption. STORAGE Storage Business Continuity Overview

35 Path Fault with PowerPath
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved. Path Fault with PowerPath If a host adapter, cable, or channel director/Storage Processor fails, the device driver returns a timeout to PowerPath PowerPath responds by taking the path offline and re-driving I/O through an alternate path Subsequent I/Os use surviving path(s) Application is unaware of failure Host Application(s) PowerPath SERVER SD SD SD SD SCSI Driver HBA HBA HBA HBA Host Bus Adapter Interconnect Topology This example depicts how PowerPath failover works. When a failure occurs, PowerPath transparently redirects the I/O down the most suitable alternate path. The PowerPath filter driver looks at the volume path set for the device, considers current workload, load balancing, and device priority settings, and chooses the best path to send the I/O down. In the example, PowerPath has three remaining paths to redirect the failed I/O and to load balance. STORAGE Storage Business Continuity Overview

36 Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.
Summary Key points covered in this topic: PowerPath is server based software that provides multiple paths between the host bus adapter and the Storage Subsystem Redundant paths eliminate host adapter, cable connection, and channel adapters as single points of failures and increase availability Improves performance by dynamically balancing the workload across all available paths Application transparent Enhances data availability and accessibility Business Continuity Overview


Download ppt "Business Continuity Overview"

Similar presentations


Ads by Google