Building Dependable Distributed Systems Chapter 1 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University

Slides:



Advertisements
Similar presentations
EEC 688/788 Secure and Dependable Computing Lecture 2 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Advertisements

Dependability ITV Model-based Analysis and Design of Embedded Software Techniques and methods for Critical Software Anders P. Ravn Aalborg University August.
EEC 688/788 Secure and Dependable Computing Lecture 2 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 11 Wenbing Zhao Department of Electrical and Computer Engineering.
EEC 688/788 Secure and Dependable Computing Lecture 2 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
SWE Introduction to Software Engineering
CSE 322: Software Reliability Engineering Topics covered: Dependability concepts Dependability models.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 30 Slide 1 Security Engineering.
Presented By: Vinay Kumar.  At the time of invention, Internet was just accessible to a small group of pioneers who wanted to make the network work.
EEC 688/788 Secure and Dependable Computing Lecture 11 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
1 Software Testing and Quality Assurance Lecture 34 – Software Quality Assurance.
Soft. Eng. II, Spr. 2002Dr Driss Kettani, from I. Sommerville1 CSC-3325: Chapter 9 Title : Reliability Reading: I. Sommerville, Chap. 16, 17 and 18.
Software Fault Tolerance – The big Picture mMIC-SFT September 2003 Anders P. Ravn Aalborg University.
SENG521 (Fall SENG 521 Software Reliability & Testing Defining Necessary Reliability (Part 3b) Department of Electrical & Computer.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 2 Wenbing Zhao Department of Electrical and Computer Engineering.
Page 1 Copyright © Alexander Allister Shvartsman CSE 6510 (461) Fall 2010 Selected Notes on Fault-Tolerance (12) Alexander A. Shvartsman Computer.
DITSCAP Phase 2 - Verification Pramod Jampala Christopher Swenson.
Introduction to Dependability slides made with the collaboration of: Laprie, Kanoon, Romano.
EEC 688/788 Secure and Dependable Computing Lecture 11 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Software Process and Product Metrics
CIS 376 Bruce R. Maxim UM-Dearborn
Issues on Software Testing for Safety-Critical Real-Time Automation Systems Shahdat Hossain Troy Mockenhaupt.
Software Dependability CIS 376 Bruce R. Maxim UM-Dearborn.
Reliability and Fault Tolerance Setha Pan-ngum. Introduction From the survey by American Society for Quality Control [1]. Ten most important product attributes.
2. Fault Tolerance. 2 Fault - Error - Failure Fault = physical defect or flow occurring in some component (hardware or software) Error = incorrect behavior.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 9 Slide 1 Critical Systems Specification 2.
Software Metrics - Data Collection What is good data? Are they correct? Are they accurate? Are they appropriately precise? Are they consist? Are they associated.
1 Software Testing and Quality Assurance Lecture 33 – Software Quality Assurance.
1 Software testing. 2 Testing Objectives Testing is a process of executing a program with the intent of finding an error. A good test case is in that.
1 Chapter 3 Critical Systems. 2 Objectives To explain what is meant by a critical system where system failure can have severe human or economic consequence.
Secure Systems Research Group - FAU 1 A survey of dependability patterns Ingrid Buckley and Eduardo B. Fernandez Dept. of Computer Science and Engineering.
Introduction to Dependability. Overview Dependability: "the trustworthiness of a computing system which allows reliance to be justifiably placed on the.
Chapter 19 Recovery and Fault Tolerance Copyright © 2008.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 3 Slide 1 Critical Systems 1.
Ch. 1.  High-profile failures ◦ Therac 25 ◦ Denver Intl Airport ◦ Also, Patriot Missle.
Secure Systems Research Group - FAU 1 Active Replication Pattern Ingrid Buckley Dept. of Computer Science and Engineering Florida Atlantic University Boca.
Adaptive control and process systems. Design and methods and control strategies 1.
CprE 545Iowa State University CprE 558: Real-Time Systems Lectures 15-16: Dependability Concepts & Faul-Tolerance.
CprE 458/558: Real-Time Systems
CS 505: Thu D. Nguyen Rutgers University, Spring CS 505: Computer Structures Fault Tolerance Thu D. Nguyen Spring 2005 Computer Science Rutgers.
Basic Concepts of Dependability Jean-Claude Laprie DeSIRE and DeFINE Workshop — Pisa, November 2002.
Software Engineering1  Verification: The software should conform to its specification  Validation: The software should do what the user really requires.
1 Fault-Tolerant Computing Systems #1 Introduction Pattara Leelaprute Computer Engineering Department Kasetsart University
1 INTRUSION TOLERANT SYSTEMS WORKSHOP Phoenix, AZ 4 August 1999 Jaynarayan H. Lala ITS Program Manager.
Slide 1 Security Engineering. Slide 2 Objectives l To introduce issues that must be considered in the specification and design of secure software l To.
Software Quality Assurance and Testing Fazal Rehman Shamil.
©Ian Sommerville 2000Dependability Slide 1 Chapter 16 Dependability.
Testing Overview Software Reliability Techniques Testing Concepts CEN 4010 Class 24 – 11/17.
Chapter 8 Fault Tolerance. Outline Introductions –Concepts –Failure models –Redundancy Process resilience –Groups and failure masking –Distributed agreement.
EEC 688/788 Secure and Dependable Computing Lecture 1 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Software Dependability
Software Metrics and Reliability
Hardware & Software Reliability
Fault Tolerance & Reliability CDA 5140 Spring 2006
Software Reliability Definition: The probability of failure-free operation of the software for a specified period of time in a specified environment.
IEEE Std 1074: Standard for Software Lifecycle
Security Engineering.
Fault Tolerance In Operating System
EEC 688/788 Secure and Dependable Computing
Reliability and Fault Tolerance
Fault Tolerance Distributed Web-based Systems
Introduction to Fault Tolerance
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Overview Dependability: "[..] the trustworthiness of a computing system which allows reliance to be justifiably placed on the service it delivers [..]"
Presentation transcript:

Building Dependable Distributed Systems Chapter 1 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University Building Dependable Distributed Systems, Copyright Wenbing Zhao 1

Wenbing Zhao Outline Basic terminology Dependability concepts  Attributes  Fault, error, and failure  Approaches to achieving dependability

Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao Terminology A system is an entity that interacts with other entities, i.e., other systems, including hardware, software, humans, and the physical world with its natural phenomena These other systems are the environment of the given system The system boundary is the common frontier between the system and its environment A system may consists of one or more components, such as nodes or processes System Environment System Boundary

Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao Terminology State: determines the status of the system  A system may be recovered to where it was before a failure if its state was captured and survives the failure Service delivered by a system: work done that benefits its users User/Client: another system that interacts with the former Function of a system: what the system is intended to do (Functional) Specification: description of the system function Correct service: when the delivered service implements the system function

Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao Dependability and its Attributes Dependability refers to the ability of a distributed system to provide correct services to its users despite various threats to the system such as undetected software defects, hardware failures, and malicious attacks A dependable system has the following attributes  Availability: a measure of the readiness of the system  Reliability: a measure of the system’s capability of providing correct services continuously for a period of time  Integrity: the capability of the system to protect its state from being compromised due to various threats  Maintainability: the capability of the system to evolve after it is deployed  Safety: when the system fails, it does not cause catastrophic consequences

Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao Quantitative Dependability Measures Availability - a measure of the readiness of the system  It is the probability of being operational at a given instant of time A availability means that the system is not operational at most one hour in a million hours A system with high availability may in fact fail. However, failure frequency and recovery time should be small enough to achieve the desired availability Soft real-time systems such as telephone switching and airline reservation require high availability

Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao

Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao Quantitative Dependability Measures Reliability - a measure of continuous delivery of correct service.  It is the probability of surviving (potentially despite failures) over an interval of time  May also be evaluated as time to failure For example, the reliability requirement might be stated as a availability for a 10-hour mission. In other words, the probability of failure during the mission may be at most Hard real-time systems such as flight control and process control demand high reliability, in which a failure could mean loss of life

Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao Fault, Error, and Failure The adjudged or hypothesized cause of an error is called a fault An error is a manifestation of a fault in a system, in which the logical state of an element differs from its intended value A service failure occurs if the error propagates to the service interface and causes the service delivered by the system to deviate from correct service The failure of a component causes a permanent or transient fault in the system that contains the component Service failure of a system causes a permanent or transient external fault for the other system(s) that receive service from the given system

Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao Fault Faults can arise during all stages in a computer system's evolution - specification, design, development, manufacturing, assembly, and installation - and throughout its operational life Most faults that occur before full system deployment are discovered through testing and eliminated Faults that are not removed can reduce a system's dependability when it is in the field A fault can be classified by its duration, nature of output, and correlation to other faults (and many other criteria)

Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao Fault Types - Based on Duration Permanent faults are caused by irreversible device/software failures within a component due to damage, fatigue, or improper manufacturing, or bad design and implementation  Permanent software faults are also called Bohrbugs  Easier to detect Transient/intermittent faults are triggered by environmental disturbances or incorrect design  Transient software faults are also referred to as Heisenbugs  Study shows that Heisenbugs are the majority software faults  Harder to detect

Building Dependable Distributed Systems, Copyright Wenbing Zhao Wenbing Zhao Fault Types - Based on Nature of Output Malicious fault: The fault that causes a unit to behave arbitrarily or malicious. Also referred to as Byzantine fault  A sensor sending conflicting outputs to different processors  Compromised software system that attempts to cause service failure Non-malicious faults: the opposite of malicious faults  Faults that are not caused with malicious intention  Faults that exhibit themselves consistently to all observers, e.g., fail-stop A fail-stop system simply stops executing once it fails Malicious faults are much harder to detect than non-malicious faults

Wenbing Zhao Fault Types - Based on Correlation Components fault may be independent of one another or correlated A fault is said to be independent if it does not directly or indirectly cause another fault Faults are said to be correlated if they are related. Faults could be correlated due to physical or electrical coupling of components Correlated faults are more difficult to detect than independent faults Building Dependable Distributed Systems, Copyright Wenbing Zhao

Wenbing Zhao Approaches to Achieving Dependability Fault Avoidance - how to prevent, by construction, the fault occurrence or introduction Fault Removal - how to minimize, by verification, the presence of faults Fault Tolerance - how to provide, by redundancy, a service complying with the specification in spite of faults Fault Forecasting - how to estimate, by evaluation, the presence, the creation, and the consequence of faults