1 Fault-Tolerant Computing Systems #1 Introduction Pattara Leelaprute Computer Engineering Department Kasetsart University

Slides:



Advertisements
Similar presentations
Chapter 8 Fault Tolerance
Advertisements

Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
EECE499 Computers and Nuclear Energy Electrical and Computer Eng Howard University Dr. Charles Kim Fall 2013 Webpage:
5th Conference on Intelligent Systems
Making Services Fault Tolerant
Fault Tolerance -Example TSW November 2009 Anders P. Ravn Aalborg University.
Software Fault Tolerance – The big Picture RTS April 2008 Anders P. Ravn Aalborg University.
1 Chapter Fault Tolerant Design of Digital Systems.
Presented By: Vinay Kumar.  At the time of invention, Internet was just accessible to a small group of pioneers who wanted to make the network work.
Reliability on Web Services Pat Chan 31 Oct 2006.
SENG521 (Fall SENG 521 Software Reliability & Testing Defining Necessary Reliability (Part 3b) Department of Electrical & Computer.
Computer Science Lecture 16, page 1 CS677: Distributed OS Last Class:Consistency Semantics Consistency models –Data-centric consistency models –Client-centric.
1 Making Services Fault Tolerant Pat Chan, Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong Miroslaw Malek.
Page 1 Copyright © Alexander Allister Shvartsman CSE 6510 (461) Fall 2010 Selected Notes on Fault-Tolerance (12) Alexander A. Shvartsman Computer.
A Progressive Fault Tolerant Mechanism in Mobile Agent Systems Michael R. Lyu and Tsz Yeung Wong July 27, 2003 SCI Conference Computer Science Department.
Introduction to Dependability slides made with the collaboration of: Laprie, Kanoon, Romano.
Software Dependability CIS 376 Bruce R. Maxim UM-Dearborn.
ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability.
1 Fault-Tolerant Computing Systems #2 Hardware Fault Tolerance Pattara Leelaprute Computer Engineering Department Kasetsart University
2. Fault Tolerance. 2 Fault - Error - Failure Fault = physical defect or flow occurring in some component (hardware or software) Error = incorrect behavior.
 The software systems must do what they are supposed to do. “do the right things”  They must perform these specific tasks correctly or satisfactorily.
1 Software Testing and Quality Assurance Lecture 33 – Software Quality Assurance.
1 Chapter 3 Critical Systems. 2 Objectives To explain what is meant by a critical system where system failure can have severe human or economic consequence.
Secure Systems Research Group - FAU 1 A survey of dependability patterns Ingrid Buckley and Eduardo B. Fernandez Dept. of Computer Science and Engineering.
Introduction to Dependability. Overview Dependability: "the trustworthiness of a computing system which allows reliance to be justifiably placed on the.
Testing Basics of Testing Presented by: Vijay.C.G – Glister Tech.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 3 Slide 1 Critical Systems 1.
Part.1.1 In The Name of GOD Welcome to Babol (Nooshirvani) University of Technology Electrical & Computer Engineering Department.
Safety-Critical Systems T Ilkka Herttua. Safety Context Diagram HUMANPROCESS SYSTEM - Hardware - Software - Operating Rules.
Building Dependable Distributed Systems Chapter 1 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Secure Systems Research Group - FAU 1 Active Replication Pattern Ingrid Buckley Dept. of Computer Science and Engineering Florida Atlantic University Boca.
Adaptive control and process systems. Design and methods and control strategies 1.
CprE 545Iowa State University CprE 558: Real-Time Systems Lectures 15-16: Dependability Concepts & Faul-Tolerance.
CprE 458/558: Real-Time Systems
CMSC 345 Fall 2000 Requirements Overview. Work with customers to elicit requirements by asking questions, demonstrating similar systems, developing prototypes,
CS 505: Thu D. Nguyen Rutgers University, Spring CS 505: Computer Structures Fault Tolerance Thu D. Nguyen Spring 2005 Computer Science Rutgers.
Fault-Tolerant Computing Systems #4 Reliability and Availability
Fault Tolerance Benchmarking. 2 Owerview What is Benchmarking? What is Dependability? What is Dependability Benchmarking? What is the relation between.
Hwajung Lee. One of the selling points of a distributed system is that the system will continue to perform even if some components / processes fail.
1 INTRUSION TOLERANT SYSTEMS WORKSHOP Phoenix, AZ 4 August 1999 Jaynarayan H. Lala ITS Program Manager.
Introduction to Fault Tolerance By Sahithi Podila.
©Ian Sommerville 2000Dependability Slide 1 Chapter 16 Dependability.
A Survey of Fault Tolerance in Distributed Systems By Szeying Tan Fall 2002 CS 633.
Faults and fault-tolerance One of the selling points of a distributed system is that the system will continue to perform even if some components / processes.
COP 5611 Operating Systems Spring 2010 Dan C. Marinescu Office: HEC 439 B Office hours: M-Wd 1:00-2:00 PM.
SENG521 (Fall SENG 521 Software Reliability & Testing Fault Tolerant Software Systems: Techniques (Part 4a) Department of Electrical.
Chapter 8 Fault Tolerance. Outline Introductions –Concepts –Failure models –Redundancy Process resilience –Groups and failure masking –Distributed agreement.
1 Introduction to Engineering Spring 2007 Lecture 16: Reliability & Probability.
Self-Checking Circuits
Software Dependability
Critical systems design
Faults and fault-tolerance
Software Testing An Introduction.
Fault Tolerance & Reliability CDA 5140 Spring 2006
Fault Tolerance In Operating System
Faults and fault-tolerance
Fault Tolerance Distributed Web-based Systems
Faults and fault-tolerance
Mattan Erez The University of Texas at Austin July 2015
Introduction to Fault Tolerance
Fault Tolerance Distributed
INFS 452 – Computer Ethics & Society
Overview Dependability: "[..] the trustworthiness of a computing system which allows reliance to be justifiably placed on the service it delivers [..]"
Reliability and Safety
Chapter # 7 Software Development
Presentation transcript:

1 Fault-Tolerant Computing Systems #1 Introduction Pattara Leelaprute Computer Engineering Department Kasetsart University

2 Dependability ( ความเชื่อถือได้ )  Trustworthiness ( ความไว้วางใจ, ความเชื่อมั่น, ความเชื่อ ใจ ) of a computer system Reliance can be justified by the service it delivers Why dependability is necessary for computer?  Life critical task (lost of human life) Patient monitoring Missile guidance control Air traffic system (i.e. Die Hard2 )  Task that critically depends on computers (financial lost) Banking systems Stock markets Online shopping Introduction Make a group and give an example as many as you can (10 min).

3 Reliability and Availability Attributes important for dependability  Reliability ( ความน่าเชื่อถือ ), availability ( การหามาได้ ), safety, security Attributes important for fault tolerance  Reliability Deals with continuity of services  Availability Deals with readiness for usage

4 Fault Avoidance & Fault Tolerance Fault Avoidance  Approach to prevent faults from the occurring or getting introduced into the system (direct approach) Fault Tolerance  Approach to provide service despite the presence of faults in the system. Fault = abnormality of a component of the system

5 Fault Avoidance Eliminate as many faults as possible before the system is put in use. Has no redundancy ( ความซ้ำซ้อน ) Focus on methodologies on design, testing and validation All component must work correctly without failing, at all time. Manual maintenance methods are needed to repair the system when failure takes place IMPOSSIBLE

6 Failure, Fault, Error Fault  Abnormality of a component of the system  Cause of an error and failure Error  Abnormal state of a component system of a system  Appearance of fault in the system  Cause of failure Failure  The system cannot provide the desired service (behavior of system deviates from the required specification) 100% Not 100% OK 0 1 NG

7 Type of Faults (by Duration) Duration  Transient fault ( ชั่วคราว ) Faults of limited duration (exist only in short duration) Caused by temporary malfunction of system Hard to detect Intermittent fault ( เป็นช่วงๆ ) (transient fault that occurs repeatedly in short duration)  Permanent fault ( ถาวร ) Permanently exist until the faulty component is repaired Most of techniques for fault tolerance assume that the component fail permanently Should be detected

8 Type of Faults (by Phase) Phase in which faults are introduced  Design fault Introduced during system design Introduced during modification of the system  Operational fault Appear during the system life time, and caused due to the physical reasons

9 Fault Tolerance and Redundancy Fault-tolerant system A system that can mask ( ปิดบัง ) an effect of fault by using redundancy Redundancy ( ความซ้ำซ้อน, การมีมากเกินไป ・ A kind of redundancy is needed for fault tolerant system ・ Defined as those parts of the system that are not needed for the correct functioning system (No need when the system is normal)  Space Redundancy Hardware, Software  Time Redundancy Extra time for performing tasks for fault tolerance Goal = avoid system failure even if faults are present

10 Digital Circuit Review x0x0 x2x2 x4x4 x6x6 x1x1 x3x3 x5x5 x7x7 x8x8 z2z2 z1z1