The ANSA project Failures and Dependability in ANSA
System structure Component based: component behaviour can be observed by other components Independent components: own observations and reasoning about events No global observer No global ordering of events No global time
Expectations – I V T t0t0 t1t1 An event with value v 0 is expected in time interval t 0 and t 1 v0v0
Expectations – II V T t0t0 t1t1 An event with a value between v 0 and v 1 is expected in time interval t 0 and t 1 v1v1 v0v0
Expectations – III V T t0t0 t1t1 An event with a value between v 0 and v 1 is expected in time interval t 0 and t 1 The event value is time dependent v0v0 v0v0 E V x T
Occurrences V T t0t0 t1t1 An event can occur exactly once in the ANSA model v0v0 v0v0 O0O0
Occurrences V T t0t0 t1t1 An event can occur exactly once in the ANSA model v0v0 v0v0 O V x T |O| = {0,1} O1O1
Correctness Correct occurrence of an event O E Correct non-occurrence of an event O E = Formal definition of correctness (O E ) (O E = )
Failures Negation of correct event (O E ) (O E = ) Simplified (O E ) (O E = ) Unexpected occurrence O E = Omission failure E O = Incorrect occurrence O E (O E = )
Consistency between multiple events Events constrain the expectation of future events Local events: Observation by local mechanisms of a component Distributed events: Distributed consensus problem, collaboration of components required Consistency enforcement instead of distributed deviation detection Express global properties as a set of local ones
Computability of next expectation Research questions: Does a function f(O) exist to compute the next expectation? How many such functions are need for a simple protocol? V T t0t0 t1t1 v1v1 v0v0 V T t2t2 t3t3 v3v3 v2v2 O0O0 TOTO TOTO
Computability of next expectation Research question: Does a function g(O) exist to compute the next expectation in case of a failure? V T t0t0 t1t1 v1v1 v0v0 V T t2t2 t3t3 v3v3 v2v2 O0O0 TOTO TOTO
Dependability Principles – I Separation: More (distributed) components reduce dependability Diversity: Designers need to be prepared and mechanisms need to allow for diversity Scaling: Mechanisms must be exchangeable to suit different scenarios
Dependability Principles – II Federation: heterogeneous authorities and dependability contracts Transparency: hide dependability mechanisms from the programmer Concurrency: conflicting, inconsistent changes to data Configuration: add and update parts of the system; adapt failure detectors
Management Model – I 1.Fault confinement: limitation of propagation to other parts of the system 2.Fault detection: compare time/value observation with expectation 3.Fault diagnosis: if fault detection can not identify the faulty component 4.Reconfiguration: isolate faulty component or replace with spare 5.Recovery: remove effect of fault
Management Model – II Restart: after all damaged state has been removed Repair: restores the faulty component to an undamaged state Reintegration: reconfiguration of the system to reintroduce the repaired component
Open questions Is our list of principles complete? –Separation, Diversity, Scaling, Federation, Transparency, Concurrency, Configuration Is our D 2 R 3 strategy complete? –Fault confinement, Fault detection, Fault diagnosis, Reconfiguration, Recovery, Restart, Repair, Reintegration Is our CFEF diagram correct? –Do we detect faults, errors of failures?
CFEF diagram question ??