Self-stabilization in NEST Mikhail Nesterenko (based on presentation by Anish Arora, Ohio State University)

Slides:



Advertisements
Similar presentations
Operating Systems Components of OS
Advertisements

1 Routing Protocols I. 2 Routing Recall: There are two parts to routing IP packets: 1. How to pass a packet from an input interface to the output interface.
Impossibility of Distributed Consensus with One Faulty Process
Pontus Boström and Marina Waldén Åbo Akademi University/ TUCS Development of Fault Tolerant Grid Applications Using Distributed B.
BASIC BUILDING BLOCKS -Harit Desai. Byzantine Generals Problem If a computer fails, –it behaves in a well defined manner A component always shows a zero.
Teaser - Introduction to Distributed Computing
Eugene Syriani * † Hans Vangheluwe * ‡ Amr Al Mallah * † * ‡ Tuscaloosa, AL Montreal, Canada Antwerp, Belgium.
Remote Procedure Call Design issues Implementation RPC programming
Fault-Tolerant Real-Time Networks Tom Henzinger UC Berkeley MURI Kick-off Workshop Berkeley, May 2000.
Distributed components
Architectural Styles. Definitions of Architectural Style  Definition. An architectural style is a named collection of architectural design decisions.
Convergence Refinement Murat Demirbas Anish Arora The Ohio State University.
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 3 – Distributed Systems.
Asynchronous Consensus (Some Slides borrowed from ppt on Web.(by Ken Birman) )
Synthesis of Fault-Tolerant Distributed Programs Ali Ebnenasir Department of Computer Science and Engineering Michigan State University East Lansing MI.
Enhancing The Fault-Tolerance of Nonmasking Programs Sandeep S. Kulkarni and Ali Ebnenasir Software Engineering and Network Systems Laboratory Computer.
1 SOCK and JOLIE from the formal basis to a service oriented programming language Ivan Lanese Computer Science Department University of Bologna Italy Joint.
Graybox Stabilization Anish Arora Murat Demirbas Sandeep Kulkarni The Ohio State University/ Michigan State University July 2001.
Chapter 13 Embedded Systems
The Complexity of Adding Failsafe Fault-tolerance Sandeep S. Kulkarni Ali Ebnenasir.
Self-Stabilization An Introduction Aly Farahat Ph.D. Student Automatic Software Design Lab Computer Science Department Michigan Technological University.
Networks 1 CS502 Spring 2006 Network Input & Output CS-502 Operating Systems Spring 2006.
Automatic Synthesis of Fault-Tolerance Ali Ebnenasir Software Engineering and Network Systems Laboratory Computer Science and Engineering Department Michigan.
1 Programming systems for distributed applications Seif Haridi KTH/SICS.
GLOMAR  Aims - Provides adaptive consistency control for mobile enabled file systems  Abstracting consistency control into a component architecture 
CS-3013 & CS-502, Summer 2006 Network Input & Output1 CS-3013 & CS-502, Summer 2006.
1 FM Overview of Adaptation. 2 FM RAPIDware: Component-Based Design of Adaptive and Dependable Middleware Project Investigators: Philip McKinley, Kurt.
Composition Model and its code. bound:=bound+1.
.NET Mobile Application Development Remote Procedure Call.
Course Instructor: Aisha Azeem
Architectural Design Establishing the overall structure of a software system Objectives To introduce architectural design and to discuss its importance.
Testing RAVEN Helmut Neukirchen Faculty of Industrial Engineering, Mechanical Engineering and Computer Science University of Iceland, Reykjavík, Iceland.
OMNET++. Outline Introduction Overview The NED Language Simple Modules.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
Modeling Process CSCE 668Set 14: Simulations 2 May be several algorithms (processes) runs on each processor to simulate the desired communication system.
Cluster Reliability Project ISIS Vanderbilt University.
CS 390- Unix Programming Environment CS 390 Unix Programming Environment Topics to be covered: Distributed Computing Fundamentals.
SOFTWARE DESIGN.
9 September 2008CIS 340 # 1 Topics reviewTo review the communication needs to support the architectures variety of approachesTo examine the variety of.
Real Time Event Based Communication Team Abhishekh Padmanabhan CIS 798 Final Presentation.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
INRIA - LaBRICharles Consel Jan-06 1 Domain-Specific Software Engineering Charles Consel Phoenix Research Group LaBRI /INRIA-Futurs January 2006.
C. André, J. Boucaron, A. Coadou, J. DeAntoni,
Fault-Tolerant Parallel and Distributed Computing for Software Engineering Undergraduates Ali Ebnenasir and Jean Mayo {aebnenas, Department.
Understanding Code Mobility A Fuggetta, G P Picco and G Vigna Presenter Samip Bararia.
Distributed System Concepts and Architectures Services
Chapter 5: Distributed objects and remote invocation Introduction Remote procedure call Events and notifications.
Network Protocols Network Systems Security Mort Anvari.
CSCE 715: Network Systems Security Chin-Tser Huang University of South Carolina.
Fault Management in Mobile Ad-Hoc Networks by Tridib Mukherjee.
A Mediated Approach towards Web Service Choreography Michael Stollberg, Dumitru Roman, Juan Miguel Gomez DERI – Digital Enterprise Research Institute
Programmability Hiroshi Nakashima Thomas Sterling.
Snap-Stabilization in Message-Passing Systems Sylvie Delaët (LRI) Stéphane Devismes (CNRS, LRI) Mikhail Nesterenko (Kent State University) Sébastien Tixeuil.
Anish Arora Ohio State University Mikhail Nesterenko Kent State University Local Tolerance to Unbounded Byzantine Faults.
G.v. Bochmann, revised Jan Comm Systems Arch 1 Different system architectures Object-oriented architecture (only objects, no particular structure)
What’s Ahead for Embedded Software? (Wed) Gilsoo Kim
Self-stabilizing energy-efficient multicast for MANETs.
Middleware for Fault Tolerant Applications Lihua Xu and Sheng Liu Jun, 05, 2003.
CIS 540 Principles of Embedded Computation Spring Instructor: Rajeev Alur
CS223: Software Engineering
Copyright 1999 G.v. Bochmann ELG 7186C ch.1 1 Course Notes ELG 7186C Formal Methods for the Development of Real-Time System Applications Gregor v. Bochmann.
CSCE 715: Network Systems Security Chin-Tser Huang University of South Carolina.
Fundamentals of Fault-Tolerant Distributed Computing In Asynchronous Environments Paper by Felix C. Gartner Graeme Coakley COEN 317 November 23, 2003.
Chapter 8 Fault Tolerance. Outline Introductions –Concepts –Failure models –Redundancy Process resilience –Groups and failure masking –Distributed agreement.
Distributed Systems – Paxos
Web Ontology Language for Service (OWL-S)
Inventory of Distributed Computing Concepts and Web services
Presented By: Raquel Whittlesey-Harris 12/04/02
CSCE 715: Network Systems Security
Presentation transcript:

Self-stabilization in NEST Mikhail Nesterenko (based on presentation by Anish Arora, Ohio State University)

Goals Scalable dependability via new notions of stabilization e.g. weak, protective, bounded stabilization Stabilization at all levels of NEST system stack e.g., at application level, via component-frameworks and automated synthesis e.g., at middleware level, via stabilizing monitoring

Co-conspirators Mohamed Gouda UTexas, Austin Ted Herman UIowa Sandeep Kulkarni Michigan State Mikhail Nesterenko Kent State

Stabilization Notions: Original Concept legitimate states from where safety and liveness are satisfied illegitimate states reached possibly due to faults Closure: Set of legitimate states is closed under system execution Convergence: Starting from any system state, every system computation eventually reaches a legitimate state

Weak Stabilization Closure Weak Convergence: Starting from any system state, some system computation eventually reaches a legitimate state

Protective Stabilization Closure Convergence (strong or weak) Protection: No transition is unsafe ( )

Bounded Stabilization Closure Bounded Convergence:  Set of fault-span states is closed under system execution  Starting from any fault-span state, every system computation reaches a legitimate state in bounded time Fault-span states, convergence time is bounded

Stabilization in NEST System Stack AP Timed AP APC Stabilizing application component framework synthesis Nonstabilizing application Stabilization synthesis framework Implementing stabilizing apps Stabilizing system/app monitoring

Project: Stabilizing Monitoring Service Model: apps/daemons/nodes periodically send a refresh to service period is chosen within some interval [LF.. HF] Service ensures in stabilizing manner: apps/daemons/nodes are up monitoring service of a node is up

Layered Architecture Layer 0: Hardware watchdog  implements a hardware self-rebooting mechanism Layer 1: Basic monitoring  ensures that registered app/daemons are up Layer 2: Remote and Advanced monitoring  ensures other nodes and distributed process groups are up  generation of suspicions for dependent apps/daemons  adaptation of refresh periods & registered apps/daemons

Project: Implementing Stabilizing Applications Input: a (weakly-) stabilizing protocol consisting of processes communicating via messages in Abstract Protocol (AP) notation Output: a weakly-stabilizing implementation using UNIX processes and UDP communication

Approach AP Timed AP APC preserves all safety and liveness properties preserves some properties, including weak- stabilization Input Output Abstract timeouts Zero message delay Action/fault atomicity Action fairness Real timeouts Non-zero message delay Action/fault atomicity Action fairness Real timeouts Non-zero message delay Event/weak fault atomicity Weak action fairness

Project: Stabilization Synthesis Framework Nonstabilizing APC Stabilizing APC dependability component framework Nonstabilizing AP Stabilizing AP synthesis procedure

Approach Exponential-time synthesis procedure, with adequate polynomial-time heuristic  sufficient for synthesis of byzantine agreement Dependability component framework enables reuse of application-independent aspects of stabilization  application-dependent parameter used to instantiate this framework, e.g. network type, communication patterns

Sample Component Frameworks Reactive link-predicate stabilization component  Retransmission based  Use of ACK/NACKs Proactive link-predicate stabilization component  Forward error correction based  Sending parity packets in advance Group-of-nodes state-predicate stabilization component

Deliverables and Milestones Stabilizing Monitoring Framework:  Aug’02: Implementation of basic node monitoring  Aug’03: Implementation of advanced node/group monitoring  Apr’04: Demo of monitoring service use by NEST application Implementing Stabilizing Applications:  Aug’02: AP-to-APC transformer implementation  Apr’03: Demo of stabilizing transformer-based NEST application  Aug’04: Transformer for stabilization of sequential processes Stabilizing Synthesis Framework:  Aug’02: Demo of tool for synthesis of stabilizing AP protocols  Apr’03: BNF & semantics of APC dependability component composition language  Aug’03: Application-independent code for reactive & proactive component frameworks  Apr’04: Demo of stabilizing framework-based NEST application