Download presentation
Presentation is loading. Please wait.
Published byJohnathan Joseph Modified over 9 years ago
1
3. Hardware Redundancy Reliable System Design 2010 by: Amir M. Rahmani
2
matlab1.ir Forms of Redundancy Hardware redundancy – add extra hardware for detection or tolerating faults Software redundancy – add extra software for detection and possibly tolerating faults Information redundancy – extra information, i.e. codes Time redundancy – extra time for performing tasks for fault tolerance
3
matlab1.ir Types of Hardware Redundancy Fault Tolerance requires Redundancy 1- Static Redundancy (that is Passive) uses fault masking to hide occurrence of fault does not require reconfiguration Example: TMR, Voting 2- Dynamic Redundancy (that is Active) uses comparison for detection and/or diagnoses requires reconfiguration remove faulty hardware from system Example: Stand-by system 3- Hybrid Redundancy combination of static & dynamic redundancy
4
matlab1.ir 1- Static Redundancy A class of redundancy techniques that can tolerate faults without reconfiguration (failover). Static redundancy can be divided into two major subclasses: Masking redundancy Active redundancy
5
matlab1.ir Masking Redundancy Uses majority voting to mask faults Requires 2f +1 modules to tolerate f faulty modules N-Modular Redundant system (NMR) N independent modules replicate the same function – parallelism – results are voted on – requirements: N >= 3 TMR (Triple Modular Redundancy)
6
matlab1.ir Triple Modular Redundancy (TMR) e.g. Majority voting. 1-bit majority voter (3 AND gates ORed)
7
matlab1.ir Triple Modular Redundancy (TMR)
8
matlab1.ir Masking Redundancy TMR with triple voting
9
matlab1.ir Masking Redundancy Multi-stage TMR
10
matlab1.ir N-Modular Redundant system (NMR)
11
matlab1.ir Active Redundancy Two or more units are active and produce replicated results simultaneously Relies on fail-stop units Fail-stop property: a unit produces correct results or no results at all Requires f +1 modules to tolerate f faulty modules
12
matlab1.ir Fail-stop Nodes Node 1 and 2 send their results individually to node 3 and 4 All nodes are fail-stop: They send correct results or no results at all
13
matlab1.ir 2- Dynamic Redundancy Relies on error detection and reconfiguration Requires f +1 modules to tolerate f faulty modules May require recovery of system or application state May require outage time
14
matlab1.ir Example: Duplicate and Compare – can only detect, but NOT diagnose i.e. fault detection, no fault-tolerance – may order shutdown – comparator is single point of failure simple implementation: 2 input XOR for single bit compare
15
matlab1.ir Example: Stand-by System E.g. communications checksums and memory parity bits – only one module is driving outputs – other modules are: idle => hot spares shut down => cold spares – error detection => switch to a new module (hot or cold spares)
16
matlab1.ir Types of Stand-by Systems Hot standby Warm standby Cold standby
17
matlab1.ir Hot Stand-by Characteristics Spare updated simultaneously with primary module + Advantages + Very short or no outage time + Does not require recovery of application - Drawbacks - High failure rate (fault rate) - High power consumption
18
matlab1.ir Warm Stand-by Characteristics Spare up and running Needs to recover application status + Advantages + Does not require simultaneous up-dating of spare and primary module - Drawbacks - Requires recovery of application state - High fault rate - High power consumption
19
matlab1.ir Cold Stand-by Characteristics Spare powered-down + Advantages + Low failure rate (fault rate) + Low power consumption Satellite application - Drawbacks - Very long outage time - Needs to boot kernel/operating system and recover application status.
20
matlab1.ir 3- Hybrid Redundancy N-Modular Redundancy with spares – N active + S spare modules (off-line) – Voting and comparison – Replaces erroneous module from spare pool
21
matlab1.ir N-Modular Redundancy with spares
22
matlab1.ir Coding checks / Exception checks Coding checks Error detection codes are formed by the addition of check bits to a data word. A cyclic redundancy code check was used in the disk store of ESS. A parity bit was used in the RAM Exception checks Hardware constraints: Usually result from the inability of the hardware to provide the better service needed by the software. Examples Improper address alignment Unequipped memory locations Unused op-code Stack overflow
23
matlab1.ir Watchdog Timers So far, we’ve figured out how to detect when something is wrong … but how do we detect when we’re not doing anything at all? Watchdog timer monitors a module and triggers a recovery if the module doesn’t do anything in a given amount of time – E.g., put a watchdog timer on a microprocessor bus Who watches the watchdog? – If we assume single fault scenario, then this usually isn’t a problem – But what if watchdog has hard fault that causes it to never timeout and trigger a recovery?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.