The ANSA project Failures and Dependability in ANSA.

Slides:



Advertisements
Similar presentations
Principles of Engineering System Design Dr T Asokan
Advertisements

Distributed Systems Major Design Issues Presented by: Christopher Hector CS8320 – Advanced Operating Systems Spring 2007 – Section 2.6 Presentation Dr.
Christian Delbe1 Christian Delbé OASIS Team INRIA -- CNRS - I3S -- Univ. of Nice Sophia-Antipolis November Automatic Fault Tolerance in ProActive.
Chapter 19: Network Management Business Data Communications, 5e.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
COE 444 – Internetwork Design & Management Dr. Marwan Abu-Amara Computer Engineering Department King Fahd University of Petroleum and Minerals.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Lecture 13 Enterprise Systems Development ( CSC447 ) COMSATS Islamabad Muhammad Usman, Assistant Professor.
Reliable System Design 2011 by: Amir M. Rahmani
Chapter 19: Network Management Business Data Communications, 4e.
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
Adaptive Systems – Graceful Degrading System Paul Li
Models -1 Scientists often describe what they do as constructing models. Understanding scientific reasoning requires understanding something about models.
1 ITC242 – Introduction to Data Communications Week 12 Topic 18 Chapter 19 Network Management.
CS-550 (M.Soneru): Recovery [SaS] 1 Recovery. CS-550 (M.Soneru): Recovery [SaS] 2 Recovery Computer system recovery: –Restore the system to a normal operational.
Overview Distributed vs. decentralized Why distributed databases
A General approach to MPLS Path Protection using Segments Ashish Gupta Ashish Gupta.
Dependability Evaluation. Techniques for Dependability Evaluation The dependability evaluation of a system can be carried out either:  experimentally.
Soft. Eng. II, Spr. 2002Dr Driss Kettani, from I. Sommerville1 CSC-3325: Chapter 9 Title : Reliability Reading: I. Sommerville, Chap. 16, 17 and 18.
7. Fault Tolerance Through Dynamic or Standby Redundancy 7.5 Forward Recovery Systems Upon the detection of a failure, the system discards the current.
Developing Dependable Systems CIS 376 Bruce R. Maxim UM-Dearborn.
Page 1 Copyright © Alexander Allister Shvartsman CSE 6510 (461) Fall 2010 Selected Notes on Fault-Tolerance (12) Alexander A. Shvartsman Computer.
1 FM Overview of Adaptation. 2 FM RAPIDware: Component-Based Design of Adaptive and Dependable Middleware Project Investigators: Philip McKinley, Kurt.
1 Rollback-Recovery Protocols II Mahmoud ElGammal.
Achieving Qualities 1 Võ Đình Hiếu. Contents Architecture tactics Availability tactics Security tactics Modifiability tactics 2.
1 Fault-Tolerant Computing Systems #2 Hardware Fault Tolerance Pattara Leelaprute Computer Engineering Department Kasetsart University
1 Fault Tolerance in the Nonstop Cyclone System By Scott Chan Robert Jardine Presented by Phuc Nguyen.
Secure Systems Research Group - FAU 1 A survey of dependability patterns Ingrid Buckley and Eduardo B. Fernandez Dept. of Computer Science and Engineering.
Chapter 19 Recovery and Fault Tolerance Copyright © 2008.
EEC 688/788 Secure and Dependable Computing Lecture 7 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
SafetyNet: improving the availability of shared memory multiprocessors with global checkpoint/recovery Daniel J. Sorin, Milo M. K. Martin, Mark D. Hill,
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
Combining Theory and Systems Building Experiences and Challenges Sotirios Terzis University of Strathclyde.
Cloud Age Time to change the programming paradigm?
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
CprE 458/558: Real-Time Systems
Safety-Critical Systems 7 Summary T V - Lifecycle model System Acceptance System Integration & Test Module Integration & Test Requirements Analysis.
SDN Management Layer DESIGN REQUIREMENTS AND FUTURE DIRECTION NO OF SLIDES : 26 1.
Programmability Hiroshi Nakashima Thomas Sterling.
Transactions.
Revisiting failure detectors Some of you asked questions about implementing consensus using S - how does it differ from reaching consensus using P. Here.
Transaction Processing Concepts Muheet Ahmed Butt.
Middleware for Fault Tolerant Applications Lihua Xu and Sheng Liu Jun, 05, 2003.
Banaras Hindu University. A Course on Software Reuse by Design Patterns and Frameworks.
Introduction to Databases Dr. Osama AL Rababah. Objectives In this capture you will learn: Some common uses of database systems. The characteristics of.
Teknologi Pusat Data 12 Data Center Site Infrastructure Tier Standard: Topology Ida Nurhaida, ST., MT. FASILKOM Teknik Informatika.
A Survey of Fault Tolerance in Distributed Systems By Szeying Tan Fall 2002 CS 633.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 1 Database Systems.
Seminar On Rain Technology
Week#3 Software Quality Engineering.
Context-Aware Middleware for Resource Management in the Wireless Internet US Lab 신현정.
Multiplication Timed Tests.
Tunis, Tunisia, 28 April 2014 Requirements of network virtualization for Future Networks Nozomu Nishinaga New Generation Network Laboratory Network Research.
Hardware & Software Reliability
Outline Introduction Background Distributed DBMS Architecture
Prepared by Ertuğrul Kuzan
Distributive Property
Fault Tolerance & Reliability CDA 5140 Spring 2006
Fault Tolerance In Operating System
Software Reliability: 2 Alternate Definitions
ACID PROPERTIES.
COT 5611 Operating Systems Design Principles Spring 2012
Fault Tolerance Distributed Web-based Systems
Patterns.
Distributed Databases
Distributed Transactions
Design.
Distributed Systems and Concurrency: Distributed Systems
COT 5611 Operating Systems Design Principles Spring 2014
Presentation transcript:

The ANSA project Failures and Dependability in ANSA

System structure Component based: component behaviour can be observed by other components Independent components: own observations and reasoning about events No global observer No global ordering of events No global time

Expectations – I V T t0t0 t1t1 An event with value v 0 is expected in time interval t 0 and t 1 v0v0

Expectations – II V T t0t0 t1t1 An event with a value between v 0 and v 1 is expected in time interval t 0 and t 1 v1v1 v0v0

Expectations – III V T t0t0 t1t1 An event with a value between v 0 and v 1 is expected in time interval t 0 and t 1 The event value is time dependent v0v0 v0v0 E  V x T

Occurrences V T t0t0 t1t1 An event can occur exactly once in the ANSA model v0v0 v0v0 O0O0

Occurrences V T t0t0 t1t1 An event can occur exactly once in the ANSA model v0v0 v0v0 O  V x T |O| = {0,1} O1O1

Correctness Correct occurrence of an event O  E   Correct non-occurrence of an event O  E =  Formal definition of correctness (O  E   )  (O  E =  )

Failures Negation of correct event  (O  E   )  (O  E =  ) Simplified (O  E   )  (O  E =  ) Unexpected occurrence O    E =  Omission failure E    O =  Incorrect occurrence O    E    (O  E =  )

Consistency between multiple events Events constrain the expectation of future events Local events: Observation by local mechanisms of a component Distributed events: Distributed consensus problem, collaboration of components required Consistency enforcement instead of distributed deviation detection Express global properties as a set of local ones

Computability of next expectation Research questions: Does a function f(O) exist to compute the next expectation? How many such functions are need for a simple protocol? V T t0t0 t1t1 v1v1 v0v0 V T t2t2 t3t3 v3v3 v2v2 O0O0 TOTO TOTO

Computability of next expectation Research question: Does a function g(O) exist to compute the next expectation in case of a failure? V T t0t0 t1t1 v1v1 v0v0 V T t2t2 t3t3 v3v3 v2v2 O0O0 TOTO TOTO

Dependability Principles – I Separation: More (distributed) components reduce dependability Diversity: Designers need to be prepared and mechanisms need to allow for diversity Scaling: Mechanisms must be exchangeable to suit different scenarios

Dependability Principles – II Federation: heterogeneous authorities and dependability contracts Transparency: hide dependability mechanisms from the programmer Concurrency: conflicting, inconsistent changes to data Configuration: add and update parts of the system; adapt failure detectors

Management Model – I 1.Fault confinement: limitation of propagation to other parts of the system 2.Fault detection: compare time/value observation with expectation 3.Fault diagnosis: if fault detection can not identify the faulty component 4.Reconfiguration: isolate faulty component or replace with spare 5.Recovery: remove effect of fault

Management Model – II Restart: after all damaged state has been removed Repair: restores the faulty component to an undamaged state Reintegration: reconfiguration of the system to reintroduce the repaired component

Open questions Is our list of principles complete? –Separation, Diversity, Scaling, Federation, Transparency, Concurrency, Configuration Is our D 2 R 3 strategy complete? –Fault confinement, Fault detection, Fault diagnosis, Reconfiguration, Recovery, Restart, Repair, Reintegration Is our CFEF diagram correct? –Do we detect faults, errors of failures?

CFEF diagram question ??