1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Washington,

Slides:



Advertisements
Similar presentations
Impossibility of Distributed Consensus with One Faulty Process
Advertisements

HSCC 03 MIT LCS Safety Verification of Model Helicopter Controller Using Hybrid Input/Output Automata Sayan Mitra MIT Hybrid Systems: Computation and Control.
Modeling and Analyzing Security Protocols using I/O Automata Nancy Lynch, MIT CSAIL DIMACS Security Workshop June 7, 2004.
1 Formal Models for Stability Analysis : Verifying Average Dwell Time * Sayan Mitra MIT,CSAIL Research Qualifying Exam 20 th December.
1 Stability of Hybrid Automata with Average Dwell Time: An Invariant Approach Daniel Liberzon Coordinated Science Laboratory University of Illinois at.
An Associative Broadcast Based Coordination Model for Distributed Processes James C. Browne Kevin Kane Hongxia Tian Department of Computer Sciences The.
Lab 2 Group Communication Andreas Larsson
Fault-Tolerant Real-Time Networks Tom Henzinger UC Berkeley MURI Kick-off Workshop Berkeley, May 2000.
Virtual Synchrony Jared Cantwell. Review Multicast Causal and total ordering Consistent Cuts Synchronized clocks Impossibility of consensus Distributed.
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
An Introduction to Input/Output Automata Qihua Wang.
OPODIS 05 Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler, Seth Gilbert, Vincent Gramoli, Peter M Musial, Alexander A Shvartsman.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.
Algorithm for Virtually Synchronous Group Communication Idit Keidar, Roger Khazan MIT Lab for Computer Science Theory of Distributed Systems Group.
CPSC 668Set 16: Distributed Shared Memory1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
1 An Inheritance-Based Technique for Building Simulation Proofs Incrementally Idit Keidar, Roger Khazan, Nancy Lynch, Alex Shvartsman MIT Lab for Computer.
Group Communication Phuong Hoai Ha & Yi Zhang Introduction to Lab. assignments March 24 th, 2004.
1 Dynamic Atomic Storage Without Consensus Alex Shraer (Technion) Joint work with: Marcos K. Aguilera (MSR), Idit Keidar (Technion), Dahlia Malkhi (MSR.
2/23/2009CS50901 Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial Fred B. Schneider Presenter: Aly Farahat.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous Uniform.
1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.
Scalable Group Communication for the Internet Idit Keidar MIT Lab for Computer Science Theory of Distributed Systems Group The main part of this talk is.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 2 – Distributed Systems.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Impossibility.
1 Idit Keidar MIT Lab for Computer Science Theory of Distributed Systems Group Paradigms for Building Distributed Systems: Performance Measurements and.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 12: Impossibility.
Optimistic Virtual Synchrony Jeremy Sussman - IBM T.J.Watson Idit Keidar – MIT LCS Keith Marzullo – UCSD CS Dept.
Chapter 8 Asynchronous System Model by Mikhail Nesterenko “Distributed Algorithms” by Nancy A. Lynch.
Introduction Distributed Algorithms for Multi-Agent Networks Instructor: K. Sinan YILDIRIM.
The DHCP Failover Protocol A Formal Perspective Rui FanMIT Ralph Droms Cisco Systems Nancy GriffethCUNY Nancy LynchMIT.
1 IOA: Mathematical Models  Distributed Programs Nancy Lynch November 15, 2000 Collaborators: Steve Garland, Josh Tauber, Anna Chefter, Antonio Ramirez,
- 1 - Embedded Systems - SDL Some general properties of languages 1. Synchronous vs. asynchronous languages Description of several processes in many languages.
Chapter 14 Asynchronous Network Model by Mikhail Nesterenko “Distributed Algorithms” by Nancy A. Lynch.
Lab 2 Group Communication Farnaz Moradi Based on slides by Andreas Larsson 2012.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Replication with View Synchronous Group Communication Steve Ko Computer Sciences and Engineering.
Services and Algorithms for Sensor Networks: a Theoretical Perspective Nancy Lynch, MIT NEST PI Meeting July, 2003.
Improving the Efficiency of Fault-Tolerant Distributed Shared-Memory Algorithms Eli Sadovnik and Steven Homberg Second Annual MIT PRIMES Conference, May.
Farnaz Moradi Based on slides by Andreas Larsson 2013.
Timed I/O Automata: A Mathematical Framework for Modeling and Analyzing Real-Time Systems Frits Vaandrager, University of Nijmegen joint work with Dilsun.
6.852: Distributed Algorithms Spring, 2008 Class 13.
1 Clock Synchronization for Wireless Sensor Networks: A Survey Bharath Sundararaman, Ugo Buy, and Ajay D. Kshemkalyani Department of Computer Science University.
Dealing with open groups The view of a process is its current knowledge of the membership. It is important that all processes have identical views. Inconsistent.
1 Modeling and Analyzing Distributed Systems Using I/O Automata Nancy Lynch, MIT Draper Laboratory, IR&D Mid-Year Meeting December 11, 2002.
Hybrid Input/Output Automata: Theory and Applications
November NC state university Group Communication Specifications Gregory V Chockler, Idit Keidar, Roman Vitenberg Presented by – Jyothish S Varma.
1 IOA: Distributed Algorithms  Distributed Programs Nancy Lynch PODC 2000 Collaborators: Steve Garland, Josh Tauber, Anna Chefter, Antonio Ramirez, Michael.
1 Theory of Distributed Systems (TDS) Group Leaders: Nancy Lynch, Idit Keidar PhD students: Victor Luchangco, Josh Tauber, Roger Khazan, Carl Livadas,
Scalable Group Communication for the Internet Idit Keidar MIT Lab for Computer Science Theory of Distributed Systems Group.
1 I/O Automaton Models: Basic, Timed, Hybrid, Probabilistic, Etc. Nancy Lynch, Dilsun Kirli, MIT University of Illinois, Urbana-Champaign, MURI Meeting.
Chapter 8 Asynchronous System Model by Mikhail Nesterenko “Distributed Algorithms” by Nancy A. Lynch.
1 Modeling and Analyzing Fault-Tolerant, Real-Time Communication Protocols Nancy Lynch Theory of Distributed Systems MIT Second MURI Workshop Berkeley,
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 16: Distributed Shared Memory 1.
ECE/CS 584: Verification of Embedded Computing Systems Model Checking Timed Automata Sayan Mitra Lecture 09.
1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Arlington-Ballston,
1 Communication and Data Management in Dynamic Distributed Systems Nancy Lynch MIT June 20, 2002 …
1 Reliable Group Communication: a Mathematical Approach Nancy Lynch Theory of Distributed Systems MIT LCS Kansai chapter, IEEE July 7, 2000 GC …
1 New Directions for NEST Research Nancy Lynch MIT NEST Annual P.I. Meeting Bar Harbor, Maine July 12, 2002 …
Mathematical Models and Proof/Analysis Methods for Timing-Based Systems And… Their Application to Communication, Fault-Tolerant Distributed Computing,
NTT - MIT Research Collaboration — Bi-Annual Report, July 1—December 31, 1999 MIT : Cooperative Computing in Dynamic Environments Nancy Lynch, Idit.
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
1 Modeling and Analyzing Distributed Systems Using I/O Automata Nancy Lynch, MIT Draper Laboratory, IR&D Kickoff Meeting Aug. 30, 2002.
Reliable multicast Tolerates process crashes. The additional requirements are: Only correct processes will receive multicasts from all correct processes.
I/O Automaton Models: Basic, Timed, Hybrid, Probabilistic, Etc.
The consensus problem in distributed systems
Algorithm for Virtually Synchronous Group Communication
Towards Next Generation Panel at SAINT 2002
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Modeling and Analysis of Complex Computational Systems
Presentation transcript:

1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Washington, D.C. December 10, 2002

2 MIT Participants Leader –Nancy Lynch Postdoctoral associates –Idit Keidar, Dilsun Kirli Graduate students –Roger Khazan, Carl Livadas, Ziv Bar-Joseph, Rui Fan, Seth Gilbert, Sayan Mitra Collaborators –Alex Shvartsman and students, Frits Vaandrager, Roberto Segala

3 I. Project Overview Design and analyze timing-based distributed algorithms that implement global services with strong guarantees: –Reliable communication –Strongly coherent data services –Consensus –… Many of the algorithms work in a dynamic environment, tolerating joins, leaves, and failures. Prove correctness. Analyze performance conditionally, under various assumptions about timing and failures. Develop underlying semantic modeling framework, based on interacting state machines (IOA, TIOA, HIOA,…) … Net …

4 Conditional performance analysis Give conditional claims about system performance under particular assumptions about behavior of environment and of network substrate, e.g.: –Stabilization of underlying network. –Limited rate of change. –Bounds on message delay. –Limited amount of failure (number, density). –Limited input arrivals (number, density). Assumptions  guarantees. Get probabilistic statements as corollaries. Composable

5 Algorithms studied Scalable group communication [Khazan, Keidar] Early-delivery dynamic atomic broadcast [Bar-Joseph, Keidar, Lynch] Scalable reliable multicast [Livadas, Keidar, Lynch] Reconfigurable atomic memory [Lynch, Shvartsman] In progress: –Rambo extensions [Gilbert, Musial, Lynch, Shvartsman] –Concurrency control using metadata [Fan] –Consensus [De Prisco, Lynch, Shvartsman] –Mobile: Clock synchronization, tracking –Peer-to-peer: Fault-tolerant location services, data services

6 This Talk I.Introduction  II.Completed work: I.Scalable group communication II.Early-delivery dynamic atomic broadcast III.Scalable reliable multicast IV.Reconfigurable atomic memory V.Modeling framework VI.Plans for next two years

7 II. Completed work

8 Scalable Group Communication [Keidar, Khazan 00, 02], [Khazan 02], [Keidar, Khazan, Lynch, Shvartsman 02] [Taraschanskiy 00] GCS

9 Group Communication Services Cope with changing participants using abstract groups of client processes with changing membership sets. Processes communicate with group members indirectly, by sending messages to the group as a whole. GC services support management of groups: –Maintain membership information. Form new views in response to changes. –Manage communication. Communication respects views. Provide guarantees about ordering, reliability of message delivery. Virtual synchrony Systems; Isis, Transis, Totem, Ensemble,… GCS

10 Group Communication Services High-level programming abstraction Hides complexity of coping with changes Applications: –Managing replicated data –Distributed multiplayer interactive games –Multi-media conferencing, collaborative work Disadvantages: –Can be costly, especially when forming new views. –May have problems scaling to large networks.

11 Scalable GC Algorithm Specification, including virtual synchrony. New algorithm: –Uses a scalable membership service, implemented on a small set of membership servers [Keidar, Sussman, Marzullo, Dolev]. –Multicast implemented on all the nodes. –View change uses only one round for state exchange, in parallel with membership service’s agreement on views. –Participants can join during view formation. GCS Net Memb GCS

12 Models and analysis Safety proofs, using new incremental proof methods. Liveness proofs Performance analysis: –Analyze time from when network stabilizes until the GCS announces the final view. –Analyze message latency. –Conditional analysis, based on input, failure, and timing assumptions. –Compositional analysis, based on performance of Membership service and Net. Modeled and analyzed data-management application running on top of the new GCS. Distributed implementation. SS’ AA’

13 Early-Delivery Dynamic Atomic Broadcast [Bar-Joseph, Keidar, Lynch 02] DAB

14 Dynamic Atomic Broadcast Atomic bcast, in a setting where processes may join, leave, or fail. Participants receive consistent sequences of messages. Safety: Sending, receiving orders are consistent with a single global message ordering S. No gaps. Liveness: Eventual join-ack, leave-ack. Eventual delivery, including the first message the process itself sends. Strong latency guarantees: Fast delivery, even with joins, leaves. Application: Distributed multiplayer interactive games. join leave mcast(m) join-ack leave-ack rcv(m) join-ack join … DAB

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46 General Models and Proof Methods I/O automaton models [Lynch, Tuttle 87] –Nondeterministic, infinite-state machines –Input/output/internal actions, traces –Modularity: Composition, levels of abstraction Mathematical, language-independent Used to model distributed algorithms, communication protocols Validation, code generation, upper and lower bounds

47 Timing, Hybrid Considerations Timing: TIOAs [Lynch, Vaandrager] –Timeout-based algorithms. –Local clocks, clock synchronization –Performance analysis Hybrid: HIOAs [L, Segala, V, Weinberg 96] –Real world + computer components –Continuous flows of data

48 Other Embellishments Probabilities: PIOA, PTIOA [ Segala 95] –Probabilistic and nondeterministic behavior. –Randomized distributed algorithms –Systems with probabilistic assumptions Dynamic systems: DIOA [Attie, Lynch 99] –Run-time process creation and destruction, mobility. –Agent systems

49 Hybrid I/O Automata [Lynch, Segala, Vaandrager, HSCC 01] New, simpler version of HIOA model of [LSVW96] Supports decomposing hybrid system descriptions: –External behavior: Discrete actions and continuous flows –Composition: Synchronizes external actions and flows, respects external behavior –Abstraction: Implementation and simulation relation notions, respect external behavior. Separate mechanisms: –External actions for discrete communication. –External variables for continuous flow.

50 Example: Delay Buffer Del(d) Accepts discrete and continuous input, produces isomorphic output, with delay d. Compose in sequence, in cycle: Composition implements Del(d1 + d2): Del(d1)Del(d2) Del(d1)Del(d2)Del(d1)Del(d2) Del(d1 + d2)

51 Example: Vehicle and Controller Keep vehicle speed in [v1, v2]. Sensor senses velocity, reports to Controller every time d. Controller suggests acceleration. Vehicle follows suggested acceleration, with uncertainty ε. Compose: Discrete, continuous interactions Prove invariant: velocity in [v1,v2]. Use auxiliary invariants, including timing. Vehicle SensorActuator Controller report(v) vel-out suggest(a) acc-in

52 HOIA definition U, X, Y: Input, output, internal (state) variables Θ: Initial states I, O, H: Input, output, internal actions D, discrete transitions T, trajectories –Mappings from time intervals to valuations of variables Closure properties Input-enabling for actions, flows Execution: τ0, a1, τ1, a2, τ2, … Trace: Restrict to external variables and actions

53 Composition and Abstraction Abstraction: –A implements B if comparable and traces(A) subset of traces(B). –Simulation relation: Start, step, trajectory conditions –Theorem: Simulation relation implies implementation Composition: –Synchronize external actions and variables –Theorems: Projection, pasting, substitutivity Receptiveness: –Doesn’t cooperative in producing Zeno behavior –Theorem: Closed under composition (with technical assumption).

54 VI. Plans for the next two years

55 Plans: Distributed Algorithms Scalable Reliable Multicast –Analyze SRM in the presence of leaves and node failures. –Finish analysis of CESRM. –Study LMS reliable multicast protocol [Papadopoulos, Varghese 98], compare with SRM, CESRM. Reconfigurable Atomic Memory –Optimizations: concurrent gc, remove reads,… –Experiments, demo applications Paxos consensus Mobile: clock synchronization, tracking, layers Peer-to-peer: location service with provable fault-tolerance guarantees, under steady-state assumptions. Building strongly coherent data service over location service.

56 Plans: Semantic framework Timed models: –Composition theorems for timing properties –Specially structured TIOAs for conditional performance analysis Hybrid models: –Integrate control theory methods. Probabilistic models: –Compositional analysis methods –Combine hybrid and probabilistic models