Download presentation
Presentation is loading. Please wait.
1
Seminar on Enterprise Software
Hardware Redundancy MTAT Seminar on Enterprise Software Olgun Cakabey B06337 Othmar Mwambe B06324 December 2010
2
Agenda What is Redundancy? Introduction to Hardware Redundancy
Hardware Components Disk Storage RAID (Redundant Array of Independent Disks) RAID Configurations Hardware Redundancy Techniques Conclusions References Demo
3
What is Redundancy? In engineering, Redundancy is the duplication of critical components of a system with the intention of increasing reliability of the system, usually in the case of a backup or fail-safe.
4
Concept of Redundancy Hardware redundancy is the addition of extra hardware, usually for the purpose of either detecting or tolerating faults. Software redundancy is the addition of extra software, beyond what is needed to perform a given function, to detect and possibly tolerate faults. Information redundancy is the addition of extra information beyond that required to implement a given function; for example, error detection codes. Time redundancy uses additional time to perform the functions of a system such that fault detection and often fault tolerance can be achieved. Transient faults are tolerated by this.
5
Introduction to Hardware Redundancy
Hardware redundancy does not only concentrate on recovery from failures, but also on protection against them. Always demands trade off against achievable dependability. Costs: Additional components, area, power, shielding, ... Please Do not discuss much about topics here. Under computer system overall implies what is a compute system - its architecture and components Then focus on hardware and software components Computer Without Redundancy
6
Hardware Components There are several parts of computer systems which are highly considered when we are discussing about hardware redundancy which are CPU, Memory, Backplane and System Bus, I/O and Network Cards, Power Supplies, Cables and Connections. Some systems have a layer between the CPUs and the operating system, and this is sometimes called hypervisor
7
Disk Storage
8
Types of Storage Disks Disks are one of the most important parts of a computer system as they store the data, application programs, and operating systems. Data disks Operating system disks;eg.bootable cd
9
RAID (Redundant Array of Independent Disks)
RAID is a way of storing the same data in different places (thus, redundantly) on multiple hard disks. There are different Raid levels but They all follow the same idea: the data of one I/O request (read or write) coming from the computer system are sent to the Raid group and are distributed there to multiple disks enriched with redundant information to provide protection against disk failure(s).
10
RAID Configurations If a disk drive fails, the redundant Raid group is able to reconstruct the lost information. There two parameters which describe a stripe: the number of disks (also called stripe width) and the number of bytes written to a disk as a chunk.
11
How the reconstruction of data works
Parity checking is a rudimentary method of detecting simple, single-bit errors in a memory system.
12
Raid0 block-level striping without parity or mirroring
provides improved performance and additional storage but no redundancy or fault tolerance This combines several disks to one stripe with the goal that the I/O load is evenly distributed between the disks
13
Raid0
14
Raid1 mirroring without parity or striping
This is first – and simplest – level for redundancy: data is written identically to multiple disks (a "mirrored set"). This minimizes overhead and provides good performance. Mirroring can decrease write performance slightly as twice the amount of data needs to be transferred
15
Raid1
16
Raid3 byte-level striping with dedicated parity
Each single I/O request is distributed over all data disks. The performance of Raid3 is very good for large, single requests, as all disks are used equally. DisAdv: To reconstruct a failed drive, all the data needs to be read, which makes reconstruction much slower than with Raid1
17
Raid3
18
Raid5 block-level striping with distributed parity
On small writes, Raid5 is inefficient. Each time a block is written, first the old data block and parity block need to be read DisAdv: Like Raid3, Raid5 has slow redundancy recovery times, since all the data needs to be read in order to reconstruct the lost data
19
Raid5
20
Raid6/Double Parity Raid
It provides fault tolerance from two drive failures This makes larger RAID groups more practical, especially for high-availability systems
21
Raid6/Double Parity Raid
22
Raid10 and Raid01 Combining Stripes and Mirrors
Sometimes it is useful to combine multiple Raid groups with different Raid levelsDisk outages in the Raid10 configuration leave the mirror intact, though without redundancy
23
Raid10 and Raid01
24
Comparison
25
Hardware Redundancy Techniques
Passive techniques Active techniques Hybrid techniques
26
Passive Techniques Also known as static technique..
Implements fault masking Fault does not show up, since it is transparently removed No action from the system is required No reconfiguration - inherently fault tolerant Examples: Voting, correcting codes, N-modular redundancy (NMR), Flux Summing, special logic, TMR with duplex
27
Fault Masking Fault masking “hides” faults that occur. Do not require detecting faults, but require containment of faults (the effect of all faults should be local)
28
Active Techniques Also known as dynamic technique..
Actions required for correct result • detection, localization, containment, recovery • no fault masking Does not attempt to prevent faults from producing errors within the system After fault detection, the system is reconfigured to avoid a failure remove faulty hardware from system
29
Active Techniques (continued)
Most common in applications that can tolerate temporary erroneous results – satellite systems - preferable to have temporary failures that high degree of redundancy Examples: Stand-by sparing, duplication with comparison, pair-and-a-spare, watchdog timer
30
Hybrid Techniques is combination of passive + active techniques
fault masking + reconfiguration use fault masking to prevent erroneous results (prevent temporary errors) and provide spares to replace faulty hardware (high reliability)
31
Hybrid Techniques (continued)
expensive, but better to achieve higher reliability and more fault tolerance Types: Self-purging redundancy, N-modular redundancy with spares, Triple-duplex architecture
32
Conclusions Redundancy is never for free!!
Application-dependent choice – critical-computation - momentary erroneous results are not acceptable passive or hybrid – long-life, high-availability - system should be restored quickly • active – very critical applications - highest reliability • hybrid
33
References [1] SCHMIDT Klaus, High Availability and Disaster Recovery: Concepts, Design, Implementation, Springer, 2009 [2] [3] [4] SIEWIOREK Daniel P, SWARZ Robert S., Reliable Computer Systems. third., Wellesley, MA : A. K. Peters, Ltd., X, 1998
34
THANK YOU ANY QUESTIONS?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.