Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lesson 20. Fault Tolerance and Disaster Recovery.

Similar presentations


Presentation on theme: "Lesson 20. Fault Tolerance and Disaster Recovery."— Presentation transcript:

1 Lesson 20. Fault Tolerance and Disaster Recovery

2 Objectives At the end of this presentation, you will be able to:

3 Identify the purpose and characteristics of fault tolerance. Explain how redundancy is used in servers and networks to eliminate single points of failure. Identify several techniques used in servers and network systems to increase fault tolerance. Define: Fault tolerance, redundancy, RAID, mirror server, and cluster.

4 Plan for disaster recovery. Develop a disaster recovery plan. Implement a disaster recovery plan. Document and regularly test the disaster recovery plan. Explain standard backup procedures and backup media storage practices. Identify types of backups and restoration schemes Confirm and use off-site storage of backups

5 3.11 3.12 Network+ Domain covered:

6 Fault Tolerance The ability of a network or a computer to go on working in spite of one or more component failures. Achieved by eliminating “single points of failure.” Achieved primarily through redundancy.

7 Redundancy in the Server Eliminates the most common “single points of failure.” Uses multiple components in parallel so that if one component fails another takes over.

8 Hardware Failure Disk Drives 50% Power Supply 28% Fan 8% CPU 5% Memory 4% Controller 4% Motherboard 1% Source: Intel

9 Courtesy Intel Corp.

10 Redundant Array of Inexpensive Drives (RAID) RAID is a way of coaxing two or more inexpensive, slow, unreliable drives to perform in concert so that they act like a more expensive, faster, reliable drive.

11 A disk system with RAID capability: Protects its data and provides on-line, immediate access to its data, despite a single disk failure. Provides for the on-line reconstruction of the contents of a failed disk to a replacement disk. RAID Advisory Board (RAB)

12 Various RAID implementations exist. They are identified as Levels. The basic implementations are called level 0 through level 6. A higher level is not necessarily better than a lower level.

13 RAID can be implemented in: Software o Slower o Less expensive Hardware o Faster o More expensive

14 RAID Level 1 - data is written to two separate drives. 0123 0 1 2 3 0 1 2 3

15 Provides access to data despite a disk failure. 0123 0 1 2 3

16 Provides for Reconstruction of the contents of the failed disk. 0 1 2 3 0 1 2 3

17 Server Chassis Five Hard Drives

18 Redundant Hard Drives in a Server Courtesy Intel Corp.

19 Redundant Power Supplies Spare Power Supply

20 Redundant Power Supplies Courtesy Intel Corp.

21 Caution Hot-Swap Fans

22 Hot-Swap Fan Courtesy Intel Corp.

23 CPU Socket

24 Dual Processor Slots Courtesy Intel Corp.

25 Redundant NICs Active NIC Spare NIC

26 Backup Power Standby Power Supply (SPS) Uninterruptible Power Supply (UPS)

27 Standby Power Supply (SPS) An “off-line” device that functions only when normal power fails. A sensor detects AC power failure and switches over to standby power. Standby power is provided by a battery and a power inverter.

28 Battery Pack Standby Power Supply (Normal) Charger

29 Battery Pack Standby Power Supply (AC Power Fails) Charger Inverter

30 Uninterruptible Power Supply (UPS) An “on-line” device that constantly provides power. In the event of an AC power failure there is no switchover to standby power, because the UPS is constantly “on-line.” It “conditions” the AC input, isolating the computer equipment from all variations in AC power.

31 The UPS conditions the AC line against: Power outages – Total loss of AC power. Surges – Temporary voltage rises. Sags – Temporary voltage drops. Noise – High frequency voltage spikes, both up and down.

32 Battery Pack Uninterruptible Power Supply (Normal) Charger Inverter

33 Battery Pack Uninterruptible Power Supply (AC Power Fails) Charger Inverter

34 Increase Fault Tolerance RAID Multiple power supplies Multiple fans Multiple CPUs Redundant PCI cards Backup power sources

35 Disaster Recovery

36 Types of Disasters Fires Floods Wind and water damage Accidents Power outages Civil unrest Malicious attacks

37 Disaster Recovery The ability to return to an acceptable level of operation after a disaster. Requires a well thought-out disaster recovery plan. A comprehensive implementation of the plan. Frequent testing and updating of the plan.

38 7 Steps to Disaster Tolerance Initiate the project Form a project team Complete a needs analysis Develop a plan that encompasses both protection and recovery Implement the plan Test the plan Constantly update the plan

39 What’s in the Protection Plan? Procedures and policies describing how the facility, its functions, and data are to be protected. List of new protective equipment, software, and services needed along with a budget, procurement schedule and installation schedule. A step-by-step procedure and timetable for upgrading the data center from its present state to a protected state.

40 What’s in the Recovery Plan? Procedures and policies describing how and under what conditions the recovery plan should be activated. Basic protective and recovery information on each major piece of equipment. Names and telephone numbers of key corporate officials and the emergency management team members. Address of off-site backup facilities, with name and number of contact person. Location of backup tapes and disks.

41 What’s in the Recovery Plan? (continued) Names and phone numbers of key hardware, software and services vendors. Model numbers, serial numbers, as well as warranty and service agreement information on major pieces of equipment. Insurance policy numbers and information. Documentation of the equipment, software, configuration and wiring infrastructure of the data center.

42 24 X 7 X 365 24 hours per day 7 days per week 365 days per year

43 Backing up the Main System Hot Site Backup Warm Site Backup Cold Site Backup

44 Hot Site Backup A duplicate and running complement of computer hardware and software ready to take over immediately should the main system become unavailable for any reason. Data on the main system is backed up to the duplicate system in real time. If the main system fails the duplicate can take over operation without any downtime.

45 Warm Site Backup A duplicate complement of computer hardware and software ready to take over in a reasonable length of time should the main system become unavailable. Data is not backed up to the duplicate system in real-time, but could be restored from back up tapes or other media.

46 Cold Site Backup An off-site location that can be used in case the main site is inoperable. Ready to go, but with no equipment installed. Least expensive to maintain. The recovery time is quite long compared to hot-site or even warm-site backup.

47 Implement the Plan Buying and installing the equipment, software, and services necessary to bring the data center up to a protected state. Training in the discipline of new policies and procedures. The plan may be so extensive and so expensive that it must be phased in over time. Every day that the plan is delayed, the company is at risk.

48 Test the Plan The only way to insure the plan works. Simulate a disaster. Test the plan regularly and thoroughly.

49 Constantly Update the Plan New equipment New software New people New tasks

50 Safeguard the Disaster Recovery Plan Make duplicate copies. Make certain that duplicate copies are updated when the master copy is updated. Make sure key people know where the document can be found.

51 Need for Frequent and Regular Backup The most effective way to prevent data loss. Protects all but the most recent data. Protects against hardware failure, equipment theft, hackers, viruses and vandals. Storing backups in a different location protects against fire, flood and other natural disasters.

52 Backup Considerations What data should be backed up? How often should the data be backed up? What type of backup media should be used? What type of backup scheme should be used?

53 What data should be backed up? Backups require time and media. Backups must not exceed the capacity of the backup device. Ask yourself: “Can I afford to lose this?” It is better to err on the side of caution.

54 What data need not be routinely backed up? Operating Systems Application software Historical data that does not change

55 How often should the data be backed up? Trade-off of risk versus benefit. Ask yourself: “How much data can I afford to lose?” Daily backups are the most common. But circumstances may dictate anything from continuous real-time backups to weekly backups.

56 What type of backup media should be used? Magnetic Disks Optical Disks Magnetic Tape Internet Backup

57 Types of Backup Full Incremental Differential

58 Full Backup The backup of all files on the drive. Takes the longest time to record because every file is copied. Takes the shortest time to restore because everything is on a single tape.

59 Full Backup Monday Tuesday Wednesday Thursday Friday Saturday Sunday

60 Restoration From A Full Backup Requires Only One Tape Wednesday

61 The Full Backup is: A straightforward method of insuring good backups and quick, easy restorations. The starting point for the Incremental and the Differential Backups.

62 Incremental Backup Records only those files that have changed since the last Incremental or Full Backup. Takes the shortest time to record. Generally takes the longest time to restore. Generally requires several tapes to restore.

63 Incremental Backup Monday Tuesday Wednesday Thursday Friday Saturday Sunday (Full Backup) Incremental Backups

64 Restoration From an Incremental Backup May Require Several Tapes Monday Tuesday Wednesday Sunday (Full Backup) Incremental Backups

65 Differential Backup Records only those files that have changed since the last Full backup. Takes less time to record than a Full backup. Takes less time to restore than an Incremental backup. The restore process requires only two tapes.

66 Differential Backup Monday Tuesday Wednesday Thursday Friday Saturday Sunday (Full Backup) Differential Backups

67 Restoration From a Differential Backup Requires Only Two Tapes Wednesday (Differential Backup) Sunday (Full Backup)

68 Grandfather - Father - Son (GFS) Tape Rotation Scheme Son – The daily backup tapes. Father – The full backup tape for the week. Grandfather – The full backup tape for the month.

69 Reuse Tapes Son – After one week Father – Five weeks Grandfather – Save indefinitely at an off- site location.

70 Verify that the Backup Works Can you restore from the backup tape? Can you still restore from the backup tape if the original tape drive is destroyed?

71 Problems with Tapes Problem 1: Tape drive heads are dirty. o Solution: Clean the tape heads. Problem 2: The tapes become worn with time and use. o Solution: Replace the worn tape with a new tape.

72 Backup Software A Utility designed to make routine backups as effortless and as effective as possible. Suppliers of Backup software o NOS vendor o Third parties

73 Identify the purpose and characteristics of fault tolerance. Explain how redundancy is used in servers and networks to eliminate single points of failure. Identify several techniques used in servers and network systems to increase fault tolerance. Define: Fault tolerance, redundancy, RAID, mirror server, and cluster.

74 Plan for disaster recovery. Develop a disaster recovery plan. Implement a disaster recovery plan. Document and regularly test the disaster recovery plan. Explain standard backup procedures and backup media storage practices. Identify types of backups and restoration schemes Confirm and use off-site storage of backups


Download ppt "Lesson 20. Fault Tolerance and Disaster Recovery."

Similar presentations


Ads by Google