Download presentation
Presentation is loading. Please wait.
1
Disaster Recovery Chao-Hsien Chu, Ph.D.
College of Information Sciences and Technology The Pennsylvania State University University Park, PA 16802 Theory Practice Learning by Doing IST 515
3
Objectives Describe the basic differences between BCP and DRP
Describe the steps involved in creating a disaster recovery plan tests. Identify and describe the various types of recovery strategies. Describe how to formulate a recovery strategy. Compare and contrast strategies for backup. Identify the advantages and disadvantages of mutual aid agreements. Compare and contrast the advantages and disadvantages of hot sites and cold sites. Compare and contrast the advantages and disadvantages of using service bureaus.
4
Readings Hansche, S., Berti, J. and Hare, C., Official (ISC)2 Guide to the CISSP Exam, Auerbach, Chapter 9 (Required). Swanson, M., Wohl, A., Pope, L., Grance, T., Hash, J., and Thomas, R., Contingency Planning Guide for Information Technology Systems, NIST Special Publication , June 2002. Wikipedia, Disaster recovery.
5
Disasters
6
BCP Cycle
7
Areas Covered in BCP Contact points. Who to contact during office hours, outside office hours, and in an emergency; Roles and responsibilities. A well-defined organizational structure for the business continuity and recovery teams; Risk levels. A categorization of business risks and the level of risk the organization deems acceptable; Continuity and recovery service levels. How much time is acceptable for responding to threats, implementing continuity plans, and recovering from failure scenarios;
8
Areas Covered in BCP Business continuity reviews. How and when the organization reviews business continuity plans; Business continuity processes. Processes and procedures that inform staff how to react to and handle particular failure scenarios; Incident reporting and documentation. Methods of recording and documenting incidents and responses to them; Testing. Acceptance criteria and testing requirements for the business continuity plan; and Training. Training requirements for staff involved in business continuity and disaster recovery processes.
9
Step 1: Initiate the BCP Project
Obtain and confirm support from senior management. Identify key business and technical stakeholders. Form a business continuity working group. Define objectives and constraints. Establish strategic milestones and draw up a road map. Begin a draft version of business continuity policy.
10
Step 2: Identify Business Threats
Technology threats include natural disaster (such as flooding), fire, power failure, systems and network failure, systems and network flooding (when attackers try to overwhelm a network with traffic), virus attack, denial-of-service attack, theft, vandalism, and sabotage. Information threats come from hacking, theft, fraud, fabrication, alteration, misuse, natural disaster, fire, and the degradation of the ink on paper records. People threats include illness, recruitment shortfalls, resignation, compassionate leave, pregnancy, weather, and unavailability of transportation or office access.
11
Step 2: Identify Business Threats
Identify the community of business and technical stakeholders. Conduct threat identification workshops. Delineate and document business threats.
12
Step 3: Conduct a Risk Analysis
Conduct risk analysis workshops. Assess the likelihood and impact of threat occurrence. Categorize and prioritize threats according to risk level. Review outputs of risk analysis with management. Ascertain level of risk acceptable to the organization. Document outputs in business continuity policy.
13
Step 4: Establish the Business Continuity Team
Identify key business, technical, and customer services stakeholders. Form and empower the business continuity team. Clarify and agree on team objectives and working mode. Define roles and responsibilities; produce a work plan. Identify incident engagement and response processes. Update business continuity policy.
14
Roles of BC Team A business continuity manager is the first point of contact, manages the incident, initiates the business continuity plan, mobilizes the business continuity team, and presents key decisions to business owners when appropriate. The business owner makes key decisions about how the business handles incidents. The technical services manager manages disruptions to technical services, such as IT infrastructure and applications; initiates continuity arrangements; and interacts with third-party business continuity service providers. An estate manager manages disruptions relating to buildings, offices, and the surrounding environment; initiates continuity arrangements and interacts with third party business continuity service providers.
15
Roles of BC Team The business operations and customer services manager manages disruptions to business operations and customer services; keeps customers informed if there is a noticeable impact on customer service levels; initiates continuity arrangements; and interacts with third-party business continuity service providers. Business continuity (or resumption) teams are technical, estate, or customer services teams that execute the business continuity plans. A recovery manager guides the business’ recovery to normal operations.
16
Step 5: Design the Business Continuity Plan
Identify critical and noncritical business services. Establish preferred business continuity service levels and profiles for continuity and recovery. Reaffirm key constraints (such as time and cost). For each threat, identify possible continuity strategies and evaluate them in terms of time, cost, and benefits. Identify and engage potential business continuity partners. Draft a set of continuity plans and work toward an agreed set of plans with senior management. Produce and execute an implementation plan.
17
Common Strategies Technology: Redundancy (of hardware and network, for example), maintenance and support agreements, and backup and restore capabilities are common defensive strategies. Information: Recover information by using data mirroring, backup and restore, auditing, and off-site or secondary data storage. People: To temporarily shore up people-related resources, use contract staff, rotas (workloads that a company can change in response to business demand or personnel shortfalls), call-out arrangements (having certain staff in standby mode to be called to work as necessary), rental offices and sites, manual procedures, and service-forwarding agreements (such as with specialist call centers).
18
Evaluating Criteria Costs for acquisition, deployment, testing, training, and associated management overhead; Level of protection; Business resumption response time; and Time to implement, including time for acquiring, deploying, and testing the business continuity strategy and for conducting relevant and necessary training.
19
Step 6: Define Your Business Continuity Processes
Identify, define, and document business continuity processes. Review and verify business continuity processes with relevant stakeholders. Identify training requirements. Develop training exercises, role-playing scripts, and simulation case studies. Initiate training and awareness programs.
20
Business Continuity Processes
Handling specific failure events, such as fire and network failures; Backup and restoration of systems and business data; Virus management; Incident reporting; Problem escalation hierarchies; Customer and staff communication; contact procedures for third-party support providers.
21
Step 7: Test your business continuity plan
Define business continuity acceptance criteria. Formulate the business continuity test plan. Identify major testing milestones. Devise the testing schedule. Execute tests via simulation and rehearsal; document test results. Assess overall effectiveness of business continuity plan; pinpoint areas of weakness and improvement. Iterate tests until the plan meets acceptance criteria. Check, complete, and distribute business continuity policy.
22
Reasons for Testing BCP
Validate the plan’s effectiveness in meeting your stated business continuity service levels; Identify, at an early stage, any shortcomings in the plan; Assess whether your business continuity service levels are realistic and achievable given your budgetary and time constraints; and Give senior management and other parties (such as regulatory bodies) confidence in the plan.
23
Step 8: Review your business continuity plan
Develop a review schedule for different types of review. Arrange a business continuity review meeting or workshop. Update the business continuity document. Kick off another BCP cycle if necessary.
24
When to Review BCP Significant changes to the business—for example, the launch of new e-business operations; Changes in business priorities; Shifts in the legal or regulatory landscape; Significant world events (wars or terrorist attacks); Changes to the IT budget; Physical relocation of IT systems and operations; Outsourcing of IT systems and operations; Developments in IT infrastructure; and Significant changes in the labor market.
25
Common Pitfalls In BCP
26
Disaster Recovery Disaster recovery refers to the immediate and temporary restoration of critical computing and network operations after a natural or man-made disaster within defined timeframes. An organization should document how it will respond to a disaster and resume the critical business functions within a predetermined period of time; minimize the amount of loss; and repair (or replace) the primary facility to resume data processing support.
27
Disaster Recovery Planning
A comprehensive statement of consistent actions to be taken before, during, and after a disruptive event that causes a significant loss of information systems resources The procedures for responding to an emergency, providing extended backup operations during the interruption, and managing recovery and salvage processes afterwards, should an organization experience a substantial loss of processing capability.
28
Disaster Recovery Planning
To provide the capability to implement critical processes at an alternative site and return to the primary site and normal processing within a time frame that minimizes the loss to the organization, by executing rapid recovery procedures.
30
Goals and Objectives of DRP
Protecting an organization from major computer services failure. Minimizing the risk to the organization from delays in providing services. Guaranteeing the reliability of standby systems through testing and simulation. Minimizing the decision-making required by personnel during a disaster.
32
Disaster Recovery Procedures
The recovery team. The salvage team. Normal operations resume. Other recovery issues: Interfacing with external groups Employee relations Fraud and crime (vandalism and looting) Financial disbursement. Media relations.
33
Recovery Strategies
34
Recovery Strategies Recovery strategies consist of a set of predefined and management approved actions implemented in response to an unacceptable business interruption. The focus is on recovery methods to meet the predetermined recovery timeframes established for the operation and functioning of the critical business functions. Developing the recovery strategies includes compiling the resource requirements and identifying the alternatives available during recovery. Predefined means we don’t have to make it up as we go along. We have a documented, tested plan in place. Management approved means you will get the resources to implement the BCP.
35
Sample of Business Unit Priorities
Business Units Recover Windows (Hrs) IT Platforms Priority IT Security 2 Mainframe, LAN, WAN 1 Facilities LAN, WAN Legal 36 3 Administrative 18 Accounting 48 Human Resources
36
Steps in Developing Recovery Strategies
Document all costs with each alternative. Obtain costs for any outside services. Develop written agreements. Evaluate risk reduction and resumption strategies based on a full loss of the facility. Identify risk reduction measures and revise resumption priorities and timeframes. Document recovery strategies and present them to management for comments and approval.
37
Recovery Strategies Strategies should address recovery of:
Business operations Facilities & supplies Users (workers and end-users) Technical (network, telecommunication, data center) Data (off-site backups of data and applications) Business operations were enumerated in the BIA – what IT and other requirements are necessary to support them? Facilities and supplies – where do I sit at the DR site? Where is my conference room? Do I get a whiteboard? And by the way, I need a pencil. Users – can manual processes be used as part of DR? If so, how does the manual processing get integrated back into the electronic processing later? Do we need housing, transportation? Recovery of data centers and networks is an obvious necessity requiring careful planning. There are technical solutions available, though. (Bring money.) You mean we’ve got all these computers in the DR site and no data? Who forgot the DATA?? I’m only going to go into detail on the last two bullets. The first three are also quite detailed planning processes.
38
Business Recovery Strategies
Business recovery strategies focus on critical resources and the MTD for each business function. The business unit priorities are taken directly from the BIA. The length of the recovery window for each business unit dictates the priority for recovery. The strategies involved identifying the following: Critical business units and their associated business functions. Critical IT system requirements for each business function. Procedures for connectivity to IT infrastructures (e.g., mainframe, mini, LAN, WAN).
39
Business Recovery Strategies
The strategies involved identifying the following: Critical equipment and supply requirements for each business function. Essential office space requirements of each business unit. Key personnel for each business unit. Redirection of postal service mail, voice telecommunications, and data networks to the recovery site. Business unit interdependencies with other units. Off-site storage (procedures, media, documents). Vendor services.
40
Facility and Supply Recovery Strategies
Facility recovery involves identifying recovery procedures for the alternate facility, including space, security, fire protection, infrastructure, utility, supply, and environmental requirements. Determine minimum space for recovery of critical business units. Determine space needs for less critical resources. Determine security needs at recovery sites. Determine fire protection needs. Determine critical furnishings and office equipment. Determine infrastructure requirements. Determine utility and environmental needs. Determine what office/business supplies are needed.
41
User Recovery Strategies
The strategies involved with personnel requirements focus on manual procedures, vital records, and restoration procedures. A critical component is establishing methods to implement the process and maintain the records so that information can be easily and accurately updated to the electronic format when service is restored. The plan should specify the followings: Manual procedures. Vital record storage (i.e., medical, personnel). Employee notification procedures. Employee transportation arrangements. Employee accommodations.
42
Technical Recovery Strategies
Technical recovery strategies define alternate recovery strategies for the data center and network infrastructure components. Methods: Subscription services. Mutual aid agreements. Redundant data centers. Service bureaus. The data center is obvious. Network connectivity (loss of) could be defined as a disaster all by itself. Telecommunications are vital to most businesses. You don’t want to get to the DR site and realize you forgot the need for a PBX or a FAX machine.
43
Subscription Services
Subscription services provide an alternate facility or “site” for recovery. They are characterized as hot, warm, cold, mirror, and mobile sites. Hot Site. A fully configured site with complete customer required hardware and software provided by the service. Warm Site. Similar to a hot site, but the expensive equipment (i.e., mainframe) is not available on-site. The site is ready in hours after the needed equipment arrives.
44
Subscription Services
Cold Site. Does not include any technical equipment or resources, except environmental support such as air conditioning, power, telecommunication links, raised floors, etc. Mirror Site. Also referred to as full redundancy, is a computer service facility equipped with utilities, communication lines, and appropriate hardware that is fully operational and processes each transaction along with the primary site. Mobile Site. A trailer that can be set up and link by a trailer sleeve to create a space to suit the subscriber’s recovery needs.
45
Reciprocal or Mutual Aid Agreements
This strategy is to establish reciprocal or mutual aid agreements with other companies to provide facilities to the other in the event of a disaster. Reciprocal agreements require the companies to have similar hardware and software computing environments. Typically, reciprocal agreements are dismissed in practice because few information system facilities have the extra capacity needed to run both their own and another organization’s needs for any extended period of time.
46
Technical Recovery Strategies
Redundant Processing Centers: Expensive Maybe not enough spare capacity for critical operations Service Bureaus: Many clients share facilities Almost as expensive as a hot site Must negotiate agreements with other clients Think of load balanced redundant sites, for instance. Operations are going OK, but are both sites running at less than 50% capacity? Can site A handle the load if site B goes down?
47
Data Recovery Strategies
The objectives are to back up critical software and data, store the backups at an off-site location, and retrieve the backups quickly during a recovery operation Backups of data and applications Off-site vs. on-site storage of media How fast can data be recovered? How much data can you lose? Security of off-site backup media Types of backups (full, incremental, differential, etc.) Full backups are what you think they are – everything. Incremental backups are files changed since *previous* backup, which might be a full backup or an incremental. Potentially a long recovery period. Differential backups are all files changed since previous full backup – quicker recovery. Continuous backups – like a journal file system, or like Oracle’s Dataguard hot spare environment. Geographically separated systems are kept up to date in real time.
48
Recovery Management This is sometimes referred to as Crisis Management. Essentially, it is the overall coordination of the organization’s response to a crisis. The goal is to deal with the issues in an effective and timely manner and avoid or minimize damage to the organization’s profitability, reputation, and ability to operate. The flow of accurate information is a key ingredient to effective crisis management. The effective management of information can serve as the first line of defense against a crisis and can also be the most effective mechanism in the process of restoring both the business functions and public confidence.
49
Testing the Disaster Recovery Plan
To verify the accuracy of the recovery procedures and identities To prepare and trains the personnel to execute their emergency duties To verify the processing capability of the alternative backup site
50
Testing DRP Creating the Test Document: Testing Schedule and Timing
The Duration of the Test The Specific test steps Who will be the participants in the test The task assignments of the test personnel The resources and services required (supplies, hardware, software, documentation, and so forth)
51
Five DRP Test Types Checklist Structured walk-through Simulation
Parallel Full-interruption
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.