Lesson 20. Fault Tolerance and Disaster Recovery.

Slides:



Advertisements
Similar presentations
Transaction Journaling
Advertisements

Backing up and Archiving Data Chapter 1. Introduction This presentation covers the following: – What is backing up – What is archiving – Why are both.
A new standard in Enterprise File Backup. Contents 1.Comparison with current backup methods 2.Introducing Snapshot EFB 3.Snapshot EFB features 4.Organization.
Backup and Disaster Recovery (BDR) A LOGICAL Alternative to costly Hosted BDR ELLEGENT SYSTEMS, Inc.
Backup Strategy. An Exam question will ask you to describe a backup strategy. Be able to explain: Safe, secure place in different location. Why? – For.
WHAT IS RAID? Christopher J Dutra Seton Hall University.
Chapter 5: Server Hardware and Availability. Hardware Reliability and LAN The more reliable a component, the more expensive it is. Server hardware is.
Everything your business needs to know but probably doesn’t.
Backups Rob Limbaugh March 2, Agenda  Explain of a Backup and purpose  Habits  Discuss Types  Risk/Scope  Disasters and Recovery.
Monitoring and Troubleshooting Servers
Understand Database Backups and Restore Database Administration Fundamentals LESSON 5.2.
Reliability Week 11 - Lecture 2. What do we mean by reliability? Correctness – system/application does what it has to do correctly. Availability – Be.
Disaster Protection and Recovery By: Michael Morrell Ross Ashenfelter Teresa Furnish Karla Maddox.
Lesson 11 – NETWORK DISASTER RECOVERY Disaster recovery plans Network backup and restoration OVERVIEW.
Copyright © 2012 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin CHAPTER FIVE INFRASTRUCTURES: SUSTAINABLE TECHNOLOGIES CHAPTER.
Preservasi Informasi Digital.  It will never happen here!  Common Causes of Loss of Data  Accidental Erasure (delete, power, backup)  Viruses and.
1 Lesson 3 Computer Protection Computer Literacy BASICS: A Comprehensive Guide to IC 3, 3 rd Edition Morrison / Wells.
Guide to Linux Installation and Administration, 2e1 Chapter 13 Backing Up System Data.
John Graham – STRATEGIC Information Group Steve Lamb - QAD Disaster Recovery Planning MMUG Spring 2013 March 19, 2013 Cleveland, OH 03/19/2013MMUG Cleveland.
Document Backup I & II Nasouh Keilani Computer Technician.
Computing Fundamentals Module Lesson 3 — Maintaining and Protecting Hardware Computer Literacy BASICS.
November 2009 Network Disaster Recovery October 2014.
COLD, WARM, & HOT SITES By Sach Dhanjal CCT355H5.
Redundant Array of Inexpensive Disks (RAID). Redundant Arrays of Disks Files are "striped" across multiple spindles Redundancy yields high data availability.
This courseware is copyrighted © 2011 gtslearning. No part of this courseware or any training material supplied by gtslearning International Limited to.
LAN / WAN Business Proposal. What is a LAN or WAN? A LAN is a Local Area Network it usually connects all computers in one building or several building.
Security+ All-In-One Edition Chapter 16 – Disaster Recovery and Business Continuity Brian E. Brzezicki.
Introduction to Computer Networks Introduction to Computer Networks.
SYSTEM ADMINISTRATION Chapter 15 Network Integrity.
Chapter Sixteen Data Recovery and Fault Tolerance.
Network Management Chapter 18. Objectives Describe how configuration management documentation enables you to manage and upgrade a network efficiently.
Principles of Computer Security: CompTIA Security + ® and Beyond, Second Edition © 2010 Disaster Recovery, Business Continuity, and Organizational Policies.
Business Continuity and Disaster Recovery Chapter 8 Part 2 Pages 914 to 945.
IS 380.  Provides detailed procedures to keep the business running and minimize loss of life and money  Identifies emergency response procedures  Identifies.
Guide to Linux Installation and Administration, 2e 1 Chapter 9 Preparing for Emergencies.
Asset & Security Management Chapter 9. IT Asset Management (ITAM) Is the process of tracking information about technology assets through the entire asset.
1 Lesson 3 Computer Protection Computer Literacy BASICS: A Comprehensive Guide to IC 3, 3 rd Edition Morrison / Wells.
Business Continuity & Disaster recovery
 FFC backs up all of its data each day. It stores its most recent daily backup once a week at a company owned offsite location. FFC also stores the most.
1 Availability Policy (slides from Clement Chen and Craig Lewis)
1 Maintain System Integrity Maintain Equipment and Consumables ICAS2017B_ICAU2007B Using Computer Operating system ICAU2231B Caring for Technology Backup.
Co-location Sites for Business Continuity and Disaster Recovery Peter Lesser (212) Peter Lesser (212) Kraft.
© 2001 by Prentice Hall11-1 Local Area Networks, 3rd Edition David A. Stamper Part 4: Installation and Management Chapter 11 LAN Administration: Backup.
©2006 Merge eMed. All Rights Reserved. Energize Your Workflow 2006 User Group Meeting May 7-9, 2006 Disaster Recovery Michael Leonard.
Preventing Common Causes of loss. Common Causes of Loss of Data Accidental Erasure – close a file and don’t save it, – write over the original file when.
Computer Literacy BASICS
Mark A. Magumba Storage Management. What is storage An electronic place where computer may store data and instructions for retrieval The objective of.
Chapter 6 Protecting Your Files. 2Practical PC 5 th Edition Chapter 6 Getting Started In this Chapter, you will learn: − What you should know about losing.
Disaster Recovery and Business Continuity Planning.
XP Practical PC, 3e Chapter 6 1 Protecting Your Files.
E.Soundararajan R.Baskaran & M.Sai Baba Indira Gandhi Centre for Atomic Research, Kalpakkam.
BACKUP & RECOVERY Option 1: Transaction Processing Systems.
McLean HIGHER COMPUTER NETWORKING Lesson 15 (a) Disaster Avoidance Description of disaster avoidance: use of anti-virus software use of fault tolerance.
Phases of BCP The BCP process can be divided into the following life cycle phases: Creation of a business continuity and disaster recovery policy. Business.
Security and Backup. Introduction A back-up strategy must cover all eventualities: Accidental damage Equipment failure Deliberate damage It must consider:
Fault Tolerance and Disaster Recovery. Topics Using Antivirus software Fault tolerance –Power –Redundancy –Storage –Services Disaster Recovery –Backup/Restore.
Principles of Computer Security: CompTIA Security + ® and Beyond, Third Edition © 2012 Principles of Computer Security: CompTIA Security+ ® and Beyond,
1/28/2010 Network Plus Malware and Ensuring Availability.
20/12/20151 Data Structures Backing up and Archiving Data.
The Problem YOU are responsible for confidential, mission- critical data... but.
Install, configure and test ICT Networks
High Availability Environments cs5493/7493. High Availability Requirements Achieving high availability Redundancy of systems Maintenance Backup & Restore.
Chapter 6 Protecting Your Files
Computer Literacy BASICS
CompTIA Security+ Study Guide (SY0-401)
AS ICT Module 2 Objectives: Security of Data
CompTIA Security+ Study Guide (SY0-501)
Backup and restoration of data, redundancy
Computer Literacy BASICS
Presentation transcript:

Lesson 20. Fault Tolerance and Disaster Recovery

Objectives At the end of this presentation, you will be able to:

Identify the purpose and characteristics of fault tolerance. Explain how redundancy is used in servers and networks to eliminate single points of failure. Identify several techniques used in servers and network systems to increase fault tolerance. Define: Fault tolerance, redundancy, RAID, mirror server, and cluster.

Plan for disaster recovery. Develop a disaster recovery plan. Implement a disaster recovery plan. Document and regularly test the disaster recovery plan. Explain standard backup procedures and backup media storage practices. Identify types of backups and restoration schemes Confirm and use off-site storage of backups

Network+ Domain covered:

Fault Tolerance The ability of a network or a computer to go on working in spite of one or more component failures. Achieved by eliminating “single points of failure.” Achieved primarily through redundancy.

Redundancy in the Server Eliminates the most common “single points of failure.” Uses multiple components in parallel so that if one component fails another takes over.

Hardware Failure Disk Drives 50% Power Supply 28% Fan 8% CPU 5% Memory 4% Controller 4% Motherboard 1% Source: Intel

Courtesy Intel Corp.

Redundant Array of Inexpensive Drives (RAID) RAID is a way of coaxing two or more inexpensive, slow, unreliable drives to perform in concert so that they act like a more expensive, faster, reliable drive.

A disk system with RAID capability: Protects its data and provides on-line, immediate access to its data, despite a single disk failure. Provides for the on-line reconstruction of the contents of a failed disk to a replacement disk. RAID Advisory Board (RAB)

Various RAID implementations exist. They are identified as Levels. The basic implementations are called level 0 through level 6. A higher level is not necessarily better than a lower level.

RAID can be implemented in: Software o Slower o Less expensive Hardware o Faster o More expensive

RAID Level 1 - data is written to two separate drives

Provides access to data despite a disk failure

Provides for Reconstruction of the contents of the failed disk

Server Chassis Five Hard Drives

Redundant Hard Drives in a Server Courtesy Intel Corp.

Redundant Power Supplies Spare Power Supply

Redundant Power Supplies Courtesy Intel Corp.

Caution Hot-Swap Fans

Hot-Swap Fan Courtesy Intel Corp.

CPU Socket

Dual Processor Slots Courtesy Intel Corp.

Redundant NICs Active NIC Spare NIC

Backup Power Standby Power Supply (SPS) Uninterruptible Power Supply (UPS)

Standby Power Supply (SPS) An “off-line” device that functions only when normal power fails. A sensor detects AC power failure and switches over to standby power. Standby power is provided by a battery and a power inverter.

Battery Pack Standby Power Supply (Normal) Charger

Battery Pack Standby Power Supply (AC Power Fails) Charger Inverter

Uninterruptible Power Supply (UPS) An “on-line” device that constantly provides power. In the event of an AC power failure there is no switchover to standby power, because the UPS is constantly “on-line.” It “conditions” the AC input, isolating the computer equipment from all variations in AC power.

The UPS conditions the AC line against: Power outages – Total loss of AC power. Surges – Temporary voltage rises. Sags – Temporary voltage drops. Noise – High frequency voltage spikes, both up and down.

Battery Pack Uninterruptible Power Supply (Normal) Charger Inverter

Battery Pack Uninterruptible Power Supply (AC Power Fails) Charger Inverter

Increase Fault Tolerance RAID Multiple power supplies Multiple fans Multiple CPUs Redundant PCI cards Backup power sources

Disaster Recovery

Types of Disasters Fires Floods Wind and water damage Accidents Power outages Civil unrest Malicious attacks

Disaster Recovery The ability to return to an acceptable level of operation after a disaster. Requires a well thought-out disaster recovery plan. A comprehensive implementation of the plan. Frequent testing and updating of the plan.

7 Steps to Disaster Tolerance Initiate the project Form a project team Complete a needs analysis Develop a plan that encompasses both protection and recovery Implement the plan Test the plan Constantly update the plan

What’s in the Protection Plan? Procedures and policies describing how the facility, its functions, and data are to be protected. List of new protective equipment, software, and services needed along with a budget, procurement schedule and installation schedule. A step-by-step procedure and timetable for upgrading the data center from its present state to a protected state.

What’s in the Recovery Plan? Procedures and policies describing how and under what conditions the recovery plan should be activated. Basic protective and recovery information on each major piece of equipment. Names and telephone numbers of key corporate officials and the emergency management team members. Address of off-site backup facilities, with name and number of contact person. Location of backup tapes and disks.

What’s in the Recovery Plan? (continued) Names and phone numbers of key hardware, software and services vendors. Model numbers, serial numbers, as well as warranty and service agreement information on major pieces of equipment. Insurance policy numbers and information. Documentation of the equipment, software, configuration and wiring infrastructure of the data center.

24 X 7 X hours per day 7 days per week 365 days per year

Backing up the Main System Hot Site Backup Warm Site Backup Cold Site Backup

Hot Site Backup A duplicate and running complement of computer hardware and software ready to take over immediately should the main system become unavailable for any reason. Data on the main system is backed up to the duplicate system in real time. If the main system fails the duplicate can take over operation without any downtime.

Warm Site Backup A duplicate complement of computer hardware and software ready to take over in a reasonable length of time should the main system become unavailable. Data is not backed up to the duplicate system in real-time, but could be restored from back up tapes or other media.

Cold Site Backup An off-site location that can be used in case the main site is inoperable. Ready to go, but with no equipment installed. Least expensive to maintain. The recovery time is quite long compared to hot-site or even warm-site backup.

Implement the Plan Buying and installing the equipment, software, and services necessary to bring the data center up to a protected state. Training in the discipline of new policies and procedures. The plan may be so extensive and so expensive that it must be phased in over time. Every day that the plan is delayed, the company is at risk.

Test the Plan The only way to insure the plan works. Simulate a disaster. Test the plan regularly and thoroughly.

Constantly Update the Plan New equipment New software New people New tasks

Safeguard the Disaster Recovery Plan Make duplicate copies. Make certain that duplicate copies are updated when the master copy is updated. Make sure key people know where the document can be found.

Need for Frequent and Regular Backup The most effective way to prevent data loss. Protects all but the most recent data. Protects against hardware failure, equipment theft, hackers, viruses and vandals. Storing backups in a different location protects against fire, flood and other natural disasters.

Backup Considerations What data should be backed up? How often should the data be backed up? What type of backup media should be used? What type of backup scheme should be used?

What data should be backed up? Backups require time and media. Backups must not exceed the capacity of the backup device. Ask yourself: “Can I afford to lose this?” It is better to err on the side of caution.

What data need not be routinely backed up? Operating Systems Application software Historical data that does not change

How often should the data be backed up? Trade-off of risk versus benefit. Ask yourself: “How much data can I afford to lose?” Daily backups are the most common. But circumstances may dictate anything from continuous real-time backups to weekly backups.

What type of backup media should be used? Magnetic Disks Optical Disks Magnetic Tape Internet Backup

Types of Backup Full Incremental Differential

Full Backup The backup of all files on the drive. Takes the longest time to record because every file is copied. Takes the shortest time to restore because everything is on a single tape.

Full Backup Monday Tuesday Wednesday Thursday Friday Saturday Sunday

Restoration From A Full Backup Requires Only One Tape Wednesday

The Full Backup is: A straightforward method of insuring good backups and quick, easy restorations. The starting point for the Incremental and the Differential Backups.

Incremental Backup Records only those files that have changed since the last Incremental or Full Backup. Takes the shortest time to record. Generally takes the longest time to restore. Generally requires several tapes to restore.

Incremental Backup Monday Tuesday Wednesday Thursday Friday Saturday Sunday (Full Backup) Incremental Backups

Restoration From an Incremental Backup May Require Several Tapes Monday Tuesday Wednesday Sunday (Full Backup) Incremental Backups

Differential Backup Records only those files that have changed since the last Full backup. Takes less time to record than a Full backup. Takes less time to restore than an Incremental backup. The restore process requires only two tapes.

Differential Backup Monday Tuesday Wednesday Thursday Friday Saturday Sunday (Full Backup) Differential Backups

Restoration From a Differential Backup Requires Only Two Tapes Wednesday (Differential Backup) Sunday (Full Backup)

Grandfather - Father - Son (GFS) Tape Rotation Scheme Son – The daily backup tapes. Father – The full backup tape for the week. Grandfather – The full backup tape for the month.

Reuse Tapes Son – After one week Father – Five weeks Grandfather – Save indefinitely at an off- site location.

Verify that the Backup Works Can you restore from the backup tape? Can you still restore from the backup tape if the original tape drive is destroyed?

Problems with Tapes Problem 1: Tape drive heads are dirty. o Solution: Clean the tape heads. Problem 2: The tapes become worn with time and use. o Solution: Replace the worn tape with a new tape.

Backup Software A Utility designed to make routine backups as effortless and as effective as possible. Suppliers of Backup software o NOS vendor o Third parties

Identify the purpose and characteristics of fault tolerance. Explain how redundancy is used in servers and networks to eliminate single points of failure. Identify several techniques used in servers and network systems to increase fault tolerance. Define: Fault tolerance, redundancy, RAID, mirror server, and cluster.

Plan for disaster recovery. Develop a disaster recovery plan. Implement a disaster recovery plan. Document and regularly test the disaster recovery plan. Explain standard backup procedures and backup media storage practices. Identify types of backups and restoration schemes Confirm and use off-site storage of backups