2. Introduction to Redundancy Techniques Redundancy Implies the use of hardware, software, information, or time beyond what is needed for normal system.

Slides:



Advertisements
Similar presentations
Express5800/ft series servers Product Information Fault-Tolerant General Purpose Servers.
Advertisements

Principles of Engineering System Design Dr T Asokan
Survey of Detection, Diagnosis, and Fault Tolerance Methods in FPGAs
A Theoretical Investigation of Generalized Voters for Redundant Systems Class: CS791F - Fall 2005 Professor : Dr. Bojan Cukic Student: Yue Jiang.
Smooting voter : a novel voting algorithm for handling multiple errors in fault-tolerant control systems Microprocessors and Microsystems 2003 G.Latif-Shabgahi,S.Bennett,J.M.Bass.
Fault-Tolerant Systems Design Part 1.
COE 444 – Internetwork Design & Management Dr. Marwan Abu-Amara Computer Engineering Department King Fahd University of Petroleum and Minerals.
11. Practical fault-tolerant system design Reliable System Design 2005 by: Amir M. Rahmani.
Master/Slave Architecture Pattern Source: Pattern-Oriented Software Architecture, Vol. 1, Buschmann, et al.
FAULT TOLERANCE IN FPGA BASED SPACE-BORNE COMPUTING SYSTEMS Niharika Chatla Vibhav Kundalia
(C) 2005 Daniel SorinDuke Computer Engineering Autonomic Computing via Dynamic Self-Repair Daniel J. Sorin Department of Electrical & Computer Engineering.
3. Hardware Redundancy Reliable System Design 2010 by: Amir M. Rahmani.
Self-Checking Circuits
Fault Detection in a HW/SW CoDesign Environment Prepared by A. Gaye Soykök.
Making Services Fault Tolerant
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
1 Chapter Fault Tolerant Design of Digital Systems.
8. Fault Tolerance in Software
REAL-TIME SOFTWARE SYSTEMS DEVELOPMENT Instructor: Dr. Hany H. Ammar Dept. of Computer Science and Electrical Engineering, WVU.
Embedded Systems Laboratory Informatics Institute Federal University of Rio Grande do Sul Porto Alegre – RS – Brazil SRC TechCon 2005 Portland, Oregon,
7. Fault Tolerance Through Dynamic or Standby Redundancy 7.5 Forward Recovery Systems Upon the detection of a failure, the system discards the current.
Developing Dependable Systems CIS 376 Bruce R. Maxim UM-Dearborn.
Design of SCS Architecture, Control and Fault Handling.
Airbus flight control system  The organisation of the Airbus A330/340 flight control system 1Airbus FCS Overview.
Airbus flight control system
1 Fault-Tolerant Computing Systems #2 Hardware Fault Tolerance Pattara Leelaprute Computer Engineering Department Kasetsart University
Command and Data Handling (C&DH)
CS, AUHenrik Bærbak Christensen1 Fault Tolerant Architectures Lyu Chapter 14 Sommerville Chapter 20 Part II.
Secure Systems Research Group - FAU 1 A survey of dependability patterns Ingrid Buckley and Eduardo B. Fernandez Dept. of Computer Science and Engineering.
1 Nasser Alsaedi. The ultimate goal for any computer system design are reliable execution of task and on time delivery of service. To increase system.
Fault-Tolerant Systems Design Part 1.
SENG521 (Fall SENG 521 Software Reliability & Testing Fault Tolerant Software Systems: Techniques (Part 4b) Department of Electrical.
Secure Systems Research Group - FAU 1 Active Replication Pattern Ingrid Buckley Dept. of Computer Science and Engineering Florida Atlantic University Boca.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development 3.
CprE 458/558: Real-Time Systems
FAULT-TOLERANT COMPUTING Jenn-Wei Lin Department of Computer Science and Information Engineering Fu Jen Catholic University Simple Concepts in Fault-Tolerance.
Relyzer: Exploiting Application-level Fault Equivalence to Analyze Application Resiliency to Transient Faults Siva Hari 1, Sarita Adve 1, Helia Naeimi.
Redundancy. Definitions Simplex –Single Unit TMR or NMR –Three or n units with a voter TMR/Simplex –After the first failure, a good unit is switched out.
FTC (DS) - V - TT - 0 HUMBOLDT-UNIVERSITÄT ZU BERLIN INSTITUT FÜR INFORMATIK DEPENDABLE SYSTEMS Vorlesung 5 FAULT RECOVERY AND TOLERANCE TECHNIQUES (SYSTEM.
Fault-Tolerant Systems Design Part 1.
1/14 Merging BIST and Configurable Computing Technology to Improve Availability in Space Applications Eduardo Bezerra 1, Fabian Vargas 2, Michael Paul.
Using Software Rules To Enhance FPGA Reliability Chandru Mirchandani Lockheed-Martin Transportation & Security Solutions September 7-9, 2005 P226/MAPLD2005.
Mixed Criticality Systems: Beyond Transient Faults Abhilash Thekkilakattil, Alan Burns, Radu Dobrin and Sasikumar Punnekkat.
Evaluating Logic Resources Utilization in an FPGA-Based TMR CPU
1 Advanced Digital Design Reconfigurable Logic by A. Steininger and M. Delvai Vienna University of Technology.
A Survey of Fault Tolerance in Distributed Systems By Szeying Tan Fall 2002 CS 633.
Paper by F.L. Kastensmidt, G. Neuberger, L. Carro, R. Reis Talk by Nick Boyd 1.
Structuring Redundancy for Fault Tolerance Chapter 2 Designed by: Hadi Salimi Instructor: Dr. Mohsen Sharifi.
SENG521 (Fall SENG 521 Software Reliability & Testing Fault Tolerant Software Systems: Techniques (Part 4a) Department of Electrical.
Classifying fault-tolerance Masking tolerance. Application runs as it is. The failure does not have a visible impact. All properties (both liveness & safety)
CS203 – Advanced Computer Architecture Dependability & Reliability.
Week#3 Software Quality Engineering.
1 Introduction to Engineering Spring 2007 Lecture 16: Reliability & Probability.
Self-Checking Circuits
ECE 753: FAULT-TOLERANT COMPUTING
CFTP ( Configurable Fault Tolerant Processor )
Fault Tolerance & Reliability CDA 5140 Spring 2006
FPGA: Real needs and limits
Fault Tolerance In Operating System
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
Fault Tolerance Distributed Web-based Systems
Design of a ‘Single Event Effect’ Mitigation Technique for Reconfigurable Architectures SAJID BALOCH Prof. Dr. T. Arslan1,2 Dr.Adrian Stoica3.
COP 5611 Operating Systems Spring 2010
2/23/2019 A Practical Approach for Handling Soft Errors in Iterative Applications Jiaqi Liu and Gagan Agrawal Department of Computer Science and Engineering.
Fault Tolerance Distributed
Hardware Assisted Fault Tolerance Using Reconfigurable Logic
FAULT-TOLERANT TECHNIQUES FOR NANOCOMPUTERS
Redundancy in Fault Tolerant Computing
Seminar on Enterprise Software
Presentation transcript:

2. Introduction to Redundancy Techniques Redundancy Implies the use of hardware, software, information, or time beyond what is needed for normal system operation. Has a strong impact on a system in the areas of performance, size, weight, power consumption, and reliability.

2. Introduction to Redundancy Techniques Passive  Based on the concept of fault masking to hide the occurrence of faults and prevent the faults from resulting in errors (developed around the concept of majority voting)  Do not provide for faults detection, but simply mask them Active, or Dynamic  Attempts to achieve fault tolerance by means of fault detection, fault location, reconfiguration, and recovery (property of fault masking is not obtained: there is no attempt to prevent faults from producing errors within the system)  More suitable for applications where temporary, erroneous results are acceptable, as long as the system reconfigures and regains its operational status in a satisfactory length of time Hybrid  Combines the attractive features of both the Active and the Passive approaches 2.1 Hardware Redundancy

2. Introduction to Redundancy Techniques 2.1 Hardware Redundancy Module 1 Module 2 Module 3 Voter Output Basic concept of Triple Modular Replication (TMR) Proc 1 Proc 2 Proc 3 Voter The use of tripliacted voters in a TMR configuration Voter Mem 1 Mem 2 Mem 3

Voting at Several Levels within N-Modular Redundancy (NMR) Systems 2. Introduction to Redundancy Techniques 2.1 Hardware Redundancy 3 independent temperature sensors perform a vote on the 3 sensor values. Next, calculate the amount of heat/cooling by means of 3 separate modules, and then vote on the calculations to determine a result. X 3 independent sensors sample the temperature, perform the calculations, and then provide a single vote on the final result. Difference between the two approaches  fault containment: voting at the sensors will mask and contain the effects of an eventual sensor fault.

2. Introduction to Redundancy Techniques 2.1 Hardware Redundancy Voter Task Example of SW voting Task A Task B Task A Proc 1 Proc 3 Proc 2 HW Voting x SW Voting ? 1. The availability of processor to perform the voting 2. The speed at which voting must be performed 3. The criticality of space, power, and weight limitations 4. The # of different voters that must be provided 5. The flexibility required of the voter with respect to future changes in the system

In practical applications of voting, 3 results in a TMR system may not completely agree, even in a fault-free environment: e.g., A/D converters in sensors may produce quantities that disagree in the least-significant bits. This disagreement can propagate into larger discrepancies after computation, which can significantly affect the voting process. 2. Introduction to Redundancy Techniques 2.1 Hardware Redundancy

2. Introduction to Redundancy Techniques 2.1 Hardware Redundancy Solution  Mid-Value select Technique A TMR system selects the value that lies in the middle of the others : Corrupted signal Uncorrupted signals Selected signals