Self-stabilization in NEST Mikhail Nesterenko (based on presentation by Anish Arora, Ohio State University)
Goals Scalable dependability via new notions of stabilization e.g. weak, protective, bounded stabilization Stabilization at all levels of NEST system stack e.g., at application level, via component-frameworks and automated synthesis e.g., at middleware level, via stabilizing monitoring
Co-conspirators Mohamed Gouda UTexas, Austin Ted Herman UIowa Sandeep Kulkarni Michigan State Mikhail Nesterenko Kent State
Stabilization Notions: Original Concept legitimate states from where safety and liveness are satisfied illegitimate states reached possibly due to faults Closure: Set of legitimate states is closed under system execution Convergence: Starting from any system state, every system computation eventually reaches a legitimate state
Weak Stabilization Closure Weak Convergence: Starting from any system state, some system computation eventually reaches a legitimate state
Protective Stabilization Closure Convergence (strong or weak) Protection: No transition is unsafe ( )
Bounded Stabilization Closure Bounded Convergence: Set of fault-span states is closed under system execution Starting from any fault-span state, every system computation reaches a legitimate state in bounded time Fault-span states, convergence time is bounded
Stabilization in NEST System Stack AP Timed AP APC Stabilizing application component framework synthesis Nonstabilizing application Stabilization synthesis framework Implementing stabilizing apps Stabilizing system/app monitoring
Project: Stabilizing Monitoring Service Model: apps/daemons/nodes periodically send a refresh to service period is chosen within some interval [LF.. HF] Service ensures in stabilizing manner: apps/daemons/nodes are up monitoring service of a node is up
Layered Architecture Layer 0: Hardware watchdog implements a hardware self-rebooting mechanism Layer 1: Basic monitoring ensures that registered app/daemons are up Layer 2: Remote and Advanced monitoring ensures other nodes and distributed process groups are up generation of suspicions for dependent apps/daemons adaptation of refresh periods & registered apps/daemons
Project: Implementing Stabilizing Applications Input: a (weakly-) stabilizing protocol consisting of processes communicating via messages in Abstract Protocol (AP) notation Output: a weakly-stabilizing implementation using UNIX processes and UDP communication
Approach AP Timed AP APC preserves all safety and liveness properties preserves some properties, including weak- stabilization Input Output Abstract timeouts Zero message delay Action/fault atomicity Action fairness Real timeouts Non-zero message delay Action/fault atomicity Action fairness Real timeouts Non-zero message delay Event/weak fault atomicity Weak action fairness
Project: Stabilization Synthesis Framework Nonstabilizing APC Stabilizing APC dependability component framework Nonstabilizing AP Stabilizing AP synthesis procedure
Approach Exponential-time synthesis procedure, with adequate polynomial-time heuristic sufficient for synthesis of byzantine agreement Dependability component framework enables reuse of application-independent aspects of stabilization application-dependent parameter used to instantiate this framework, e.g. network type, communication patterns
Sample Component Frameworks Reactive link-predicate stabilization component Retransmission based Use of ACK/NACKs Proactive link-predicate stabilization component Forward error correction based Sending parity packets in advance Group-of-nodes state-predicate stabilization component
Deliverables and Milestones Stabilizing Monitoring Framework: Aug’02: Implementation of basic node monitoring Aug’03: Implementation of advanced node/group monitoring Apr’04: Demo of monitoring service use by NEST application Implementing Stabilizing Applications: Aug’02: AP-to-APC transformer implementation Apr’03: Demo of stabilizing transformer-based NEST application Aug’04: Transformer for stabilization of sequential processes Stabilizing Synthesis Framework: Aug’02: Demo of tool for synthesis of stabilizing AP protocols Apr’03: BNF & semantics of APC dependability component composition language Aug’03: Application-independent code for reactive & proactive component frameworks Apr’04: Demo of stabilizing framework-based NEST application