Fault Tolerance: Basic Mechanisms mMIC-SFT September 2003 Anders P. Ravn Aalborg University.

Slides:



Advertisements
Similar presentations
Principles of Engineering System Design Dr T Asokan
Advertisements

Tolerating Timing faults TSW November 2009 Anders P. Ravn Aalborg University.
Fault-Tolerant Systems Design Part 1.
Exception Handling – illustrated by Java mMIC-SFT November 2003 Anders P. Ravn Aalborg University.
Copyright 2007 Koren & Krishna, Morgan-Kaufman Part.1.1 FAULT TOLERANT SYSTEMS Part 1 - Introduction.
5th Conference on Intelligent Systems
Making Services Fault Tolerant
Fault Tolerance -Example TSW November 2009 Anders P. Ravn Aalborg University.
Dependability TSW 10 Anders P. Ravn Aalborg University November 2009.
Software Fault Tolerance – The big Picture RTS April 2008 Anders P. Ravn Aalborg University.
© Burns and Welling, 2001 Characteristics of a RTS n Large and complex n Concurrent control of separate system components n Facilities to interact with.
8. Fault Tolerance in Software 8.1 Introduction Is it true that a program that has once performed a given task as specified will continue to do so? Yes,
1 Chapter Fault Tolerant Design of Digital Systems.
Modified from Sommerville’s originals Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development.
ABCSG - Dependable Systems - 01/06/ ABCSG Dependable Systems.
Modified from Sommerville’s originals Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development.
Dependability ITV Real-Time Systems Anders P. Ravn Aalborg University February 2006.
7. Fault Tolerance Through Dynamic or Standby Redundancy 7.5 Forward Recovery Systems Upon the detection of a failure, the system discards the current.
Developing Dependable Systems CIS 376 Bruce R. Maxim UM-Dearborn.
Software Fault Tolerance – The big Picture mMIC-SFT September 2003 Anders P. Ravn Aalborg University.
Chapter 2: Reliability and Fault Tolerance
1 Making Services Fault Tolerant Pat Chan, Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong Miroslaw Malek.
Page 1 Copyright © Alexander Allister Shvartsman CSE 6510 (461) Fall 2010 Selected Notes on Fault-Tolerance (12) Alexander A. Shvartsman Computer.
Design of SCS Architecture, Control and Fault Handling.
Summary and Safety Assessment mMIC-SFT November 2003 Anders P. Ravn Aalborg University.
Software faults & reliability Presented by: Presented by: Pooja Jain Pooja Jain.
Lecture 13 Fault Tolerance Networked vs. Distributed Operating Systems.
1 Fault-Tolerant Computing Systems #2 Hardware Fault Tolerance Pattara Leelaprute Computer Engineering Department Kasetsart University
2. Fault Tolerance. 2 Fault - Error - Failure Fault = physical defect or flow occurring in some component (hardware or software) Error = incorrect behavior.
Characteristics of a RTS
CS, AUHenrik Bærbak Christensen1 Fault Tolerant Architectures Lyu Chapter 14 Sommerville Chapter 20 Part II.
Chapter 19 Recovery and Fault Tolerance Copyright © 2008.
Replicated State Machines ITV Model-based Analysis and Design of Embedded Software Techniques and methods for Critical Software Anders P. Ravn Aalborg.
Fault-Tolerant Systems Design Part 1.
Building Dependable Distributed Systems Chapter 1 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
1 CSE 8343 Presentation # 2 Fault Tolerance in Distributed Systems By Sajida Begum Samina F Choudhry.
SENG521 (Fall SENG 521 Software Reliability & Testing Fault Tolerant Software Systems: Techniques (Part 4b) Department of Electrical.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development 3.
1 Fault Tolerant Computing Basics Dan Siewiorek Carnegie Mellon University June 2012.
Fault Tolerance Mechanisms ITV Model-based Analysis and Design of Embedded Software Techniques and methods for Critical Software Anders P. Ravn Aalborg.
Faults and fault-tolerance
Quality Assurance.
CprE 458/558: Real-Time Systems
RELIABILITY ENGINEERING 28 March 2013 William W. McMillan.
Fault-Tolerant Systems Design Part 1.
Hwajung Lee. One of the selling points of a distributed system is that the system will continue to perform even if some components / processes fail.
Presentation-2 Group-A1 Professor:Mohamed Khalil Anita Kanuganti Hemanth Rao.
Chapter 11 Fault Tolerance. Topics Introduction Process Resilience Reliable Group Communication Recovery.
Mixed Criticality Systems: Beyond Transient Faults Abhilash Thekkilakattil, Alan Burns, Radu Dobrin and Sasikumar Punnekkat.
1 Fault-Tolerant Computing Systems #1 Introduction Pattara Leelaprute Computer Engineering Department Kasetsart University
1 INTRUSION TOLERANT SYSTEMS WORKSHOP Phoenix, AZ 4 August 1999 Jaynarayan H. Lala ITS Program Manager.
CSE 8377 Software Fault Tolerance. CSE 8377 Motivation Software is becoming central to many life- critical systems Software is created by error-prone.
A Survey of Fault Tolerance in Distributed Systems By Szeying Tan Fall 2002 CS 633.
Faults and fault-tolerance One of the selling points of a distributed system is that the system will continue to perform even if some components / processes.
SENG521 (Fall SENG 521 Software Reliability & Testing Fault Tolerant Software Systems: Techniques (Part 4a) Department of Electrical.
Week#2 Software Quality Assurance Software Quality Engineering.
18/05/2006 Fault Tolerant Computing Based on Diversity by Seda Demirağ
Week#3 Software Quality Engineering.
Faults and fault-tolerance
Chapter 9, Testing.
Fault-Tolerant Computing Systems #3 Fault-Tolerant Software
Chapter 2: Reliability and Fault Tolerance
Fault Tolerance & Reliability CDA 5140 Spring 2006
Fault Tolerance In Operating System
Multi-version approach (with error detection and recovery)
Faults and fault-tolerance
Fault Tolerance Distributed Web-based Systems
Faults and fault-tolerance
Fault Tolerance Distributed
Autonomic Pervasive Systems
Presentation transcript:

Fault Tolerance: Basic Mechanisms mMIC-SFT September 2003 Anders P. Ravn Aalborg University

Fault Tolerance Means to isolate component faults Prevents system failures May increase system dependability... And mask them

Dependability - means Fault prevention Fault tolerance Error Removal Failure Forecasting BW p. 106,...

Fault Tolerance

FT - levels Full tolerance Graceful Degradation Fail safe BW p. 107

FT basis: Redundancy Time Space TryRetry... Try... BW p. 109

N-version programming V1 V2 V3 Driver (comporator) Comparison vectors (votes) Comparison status indicators BW p. 109 Comparison points

Fault classification (scope of N-VP) Origin Kind Property physical (internal/external) logical (design/interaction) omission value timing byzantine duration (permanent, transient) consistency (determinate, nondeterminate) autonomy (spontaneous, event-dependent) + (+) ++ (+) + / (+) + / +

Dynamic Redundancy 1.Error detection 2.Damage confinement and assessment 3.Error recovery 4.Fault treatment and continued service BW p. 114

Error Detection f: State x Input  State x Output Environment (exception) Application BW p. 115 Assertion: precondition (input) postcondition (input, output) invariant(state, state’) Timing: WCET(f, input) Deadline (f,input) D

Damage Confinement Static structure Dynamic structure BW p. 117 object I I

Error Recovery Forward Backward BW p. 118 Repair the state – if you can ! define recovery points checkpoint state at r. p. roll back retry Domino effect

Recovery blocks ENSURE acceptance_test BY { module_1 } ELSE BY { module_2 }... ELSE BY { module_m } ELSE ERROR BW p. 120

The ideal FT-component Exception HandlerNormal mode Request/response Interface exception Interface exception Failure exception Failure exception BW p. 126