Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Communication and Data Management in Dynamic Distributed Systems Nancy Lynch MIT June 20, 2002 …

Similar presentations


Presentation on theme: "1 Communication and Data Management in Dynamic Distributed Systems Nancy Lynch MIT June 20, 2002 …"— Presentation transcript:

1 1 Communication and Data Management in Dynamic Distributed Systems Nancy Lynch MIT June 20, 2002 …

2 2 NSF-ITR Project Design and analyze building blocks for computing in highly dynamic distributed settings: –Global service specifications: –Distributed algorithms that implement them: Dynamic systems: –Internet, mobile computing –Joins, leaves, failures –Contrast: Traditional theory of distributed systems deals mostly with static systems, with fixed sets of processes. … Net … … Servic e

3 3 NSF-ITR Project We present everything rigorously, using mathematical interacting state machine models (I/O automata). –Formal service specifications –Formal algorithm descriptions –Formal models for applications –Prove correctness, using invariants and simulation relations –Analyze performance, fault-tolerance Develop supporting theory Apply the theory to software systems … Net …

4 4 Current Subprojects Scalable group communication [Khazan, Keidar] Incremental proofs [Keidar, Khazan, Lynch, Shvartsman] Dynamic Atomic Broadcast [Bar-Joseph, Keidar, Lynch] Reconfigurable Atomic Memory [Lynch, Shvartsman] Communication protocols [Livadas, Lynch, Keidar, Bakr] Peer-to-peer computing [Lynch, Malkhi, Ratajczak, Stoica] Costs of fault-tolerant consensus [Keidar, Rajsbaum] Foundations: [Lynch, Segala, Vaandrager, Kirli] Applications: –Helicopter [Mitra, Wang, Feron], –Video streaming[Livadas, Nguyen, Zakhor], –Unmanned flight control [Ha,Kochocki,Tanzman], –Agent programming [Kawabe]

5 5 People Project leader: Nancy Lynch, Alex Shvartsman Postdocs: Idit Keidar, Dilsun Kirli PhD students: Roger Khazan, Carl Livadas, Ziv Bar- Joseph, Rui Fan, Sayan Mitra, Seth Gilbert MEng students: Omar Bakr, Matt Bachmann, Vida Ha Other collaborators: Dahlia Malkhi, David Ratajczak, Ion Stoica, Sergio Rajsbaum, Roberto Segala, Frits Vaandrager, Yong Wang, Eric Feron, Thinh Nguyen, Avideh Zakhor, Joe Kochocki, Alan Tanzman, Yoshinobu Kawabe…

6 6 This talk: 1.Scalable Group Communication 2.Dynamic Atomic Broadcast 3.Reconfigurable Atomic Memory

7 7 1. Scalable Group Communication [Keidar, Khazan 00, 02] [Khazan 02] [K,K,Lynch, Shvartsman 02] … GCS

8 8 Group Communication Services Cope with changing participants using abstract groups of client processes with changing membership sets. Processes communicate with group members indirectly, by sending messages to the group as a whole. GC services support management of groups: –Maintain membership information. Form new views in response to changes. –Manage communication. Communication respects views. Provide guarantees about ordering, reliability of message delivery. Virtual synchrony Systems; Isis, Transis, Totem, Ensemble,… GCS

9 9 Group Communication Services Advantages: –High-level programming abstraction –Hides complexity of coping with changes Disadvantages: –Can be costly, especially when forming new views. –May have problems scaling to large networks. Applications: –Managing replicated data –Distributed multiplayer interactive games –Multi-media conferencing, collaborative work

10 10 New GC Service for WANs [Khazan] New specification, including virtual synchrony. New algorithm: –Uses separate scalable membership service, implemented on a small set of membership servers [Keidar, Sussman, Marzullo, Dolev]. –Multicast implemented on all the nodes. –View change uses only one round for state exchange, in parallel with membership service’s agreement on views. –Participants can join during view formation. GCS Net Memb GCS

11 11 New GC Service for WANs Distributed implementation [Tarashchanskiy] Safety proofs, using new incremental proof methods [Keidar, Khazan, Lynch, Shvartsman 00]. Liveness proofs Performance analysis –Analyze time from when network stabilizes until GCS announces new views. –Analyze message latency. –Conditional analysis, based on input, failure, and timing assumptions. –Compositional analysis, based on performance of Membership Service and Net. Also modeled and analyzed data-management application running on top of the new GCS. SS’ AA’

12 12 2. Early-Delivery Dynamic Atomic Broadcast [Bar-Joseph, Keidar, Lynch 02] DAB

13 13 Dynamic Atomic Broadcast Atomic broadcast with latency guarantees, in a dynamic setting where processes may join, leave, or fail. We define the DAB problem, and present and analyze a new distributed algorithm to solve it. In the absence of failures: Constant latency, even when participants join and leave. With failures: Latency linear in the number of failures. Uses a new distributed consensus service, in which participants do not know who the other participants are. We define the CUP problem, and present and analyze a new algorithm to solve it. Algorithm improves upon previously-suggested algorithms using group communication.

14 14 The DAB Problem Problem: Guarantee participants receive consistent sequences of messages. Fast delivery, even with joins, leaves. Safety: Sending, receiving orders are consistent with a single global message ordering S. No gaps. Liveness: Eventual join-ack, leave-ack. Eventual delivery, including the first message the process itself sends. Application: Distributed multiplayer interactive games. join leave mcast(m) join-ack leave-ack rcv(m) join-ack join DAB …

15 15 Implementing DAB Processes: –Timing-dependent, have approximately-synchronized clocks. Net: –Dynamic network, pairwise FIFO delivery –Low latency –Does not guarantee a single total order, nor that all processes see the same messages from a failing process. join net-join DAB Net

16 16 Implementing DAB Key difficulties: –Network doesn’t guarantee a single total order. –Different processes may receive different final messages from a failed process. So, processes coordinate message delivery: –Divide time into slots using local clock, assign each message to a slot. –Deliver messages in order of (slot, sender id). –Determine members of each slot, deliver only from members. Processes must agree on slot membership –Joining (leaving) process selects join-slot (leave-slot), informs other processes. –Failed process triggers consensus.

17 17 Using Consensus for DAB When process j fails, a consensus service is used to agree on j’s failure slot. Requires a new kind of consensus service, which: –Does not assume participants are known a priori; lets each participant say who it thinks the other participants are. –Allows processes to abstain. –Example: i joins around when consensus starts. j1 thinks i is participating, j2 thinks not. i cannot participate as usual, because j2 ignores it, but cannot be silent, because j1 waits for it. So i abstains. We define new Consensus with Unknown Participants (CUP) service. Use separate CUP(j) service to decide on failure slot for j.

18 18 DAB i1 DAB i2 CUP(j) DAB Net fail The DAB Algorithm Using CUP

19 19 The CUP Problem Guarantees agreement, validity, termination. Assumes submitted worlds are “close”: –Process that initiates is in other processes’ worlds –Process in anyone’s world initiates, abstains, leaves, or fails. CUP decide(v) init(v,W) abstain leave leave-detect(j) fail-detect(j)

20 20 The CUP Algorithm CUP Net We give a new early-stopping consensus algorithm. –Similar to previous algorithms, e.g., [Dolev, Reischuk, Strong 90]. –But tolerates: Uncertainty about participants, Processes leaving. Terminates in two rounds when failures stop (even if leaves continue). Latency linear in number of actual failures

21 21 DAB i1 DAB i2 CUP(j 1 ) DAB Net The DAB Algorithm Using CUP

22 22 Discussion: DAB Modular: DAB algorithm, CUP, Network Modularity needed for keeping the complexity under control. Initial presentation was intertwined, not modular. Correctness of CUP (agreement, validity, termination) used to prove correctness of DAB (atomic broadcast safety and liveness guarantees). Latency bounds for CUP used to prove latency bounds for DAB.

23 23 3. Reconfigurable Atomic Memory for Basic Objects [Lynch, Shvartsman 02] RAMBO

24 24 RAMBO Defined new service: Reconfigurable Atomic Memory for Basic Objects (dynamic atomic read/write shared memory). Developed new, efficient, modular distributed algorithm to implement RAMBO. Highly survivable; tolerates joins, leaves, failures. Tolerates short-term changes by using quorums. Tolerates long-term changes by reconfiguring. –Reconfigures on-the-fly; no heavyweight view change. –Maintains atomicity across configuration changes. Can be used in mobile or peer-to-peer settings. Applications: Battle data for teams of soldiers, game data for players in multiplayer game.

25 25 Static Quorum-Based Atomic Read/Write Memory Implementation [Attiya, Bar-Noy, Dolev] Read, Write use two phases: –Phase 1: Read (value, tag) from a read-quorum –Phase 2: Write (value,tag) to a write-quorum Write determines largest tag in phase 1, picks a larger one, writes new (value, tag) in phase 2. Read determines latest (value,tag) in phase 1, propagates it in phase 2, then returns the value. –Could return unconfirmed value after phase 1. Highly concurrent. Quorum intersection property implies atomicity.

26 26 How to make this dynamic? Quorum members may join, leave, fail; need to reconfigure. Idea: Any member of current quorum configuration can propose a new configuration. Questions: –How to agree on new configuration? –How to install it? –How to preserve atomicity of data during reconfiguration? –How to avoid stopping Reads/Writes in progress?

27 27 Our RAMBO Algorithm Uses a separate reconfiguration service. Recon recon read, write Net new-config Recon

28 28 Recon Using Consensus Recon service uses (static) consensus services to determine new configurations 1, 2, 3,… Consensus is a fairly heavyweight mechanism, but: –Only used for reconfigurations, which are presumably infrequent. –Does not delay Read/Write operations (unlike GCS approaches). Consensus Recon Net recon recon-ack

29 29 Consensus Implementation Use a variant of Paxos algorithm [Lamport] Agreement, validity guaranteed absolutely. Termination guaranteed when underlying system stabilizes. Leader chosen using failure detectors; conducts two- phase algorithm with retries. decide(v) init(v) Consensus

30 30 Read/Write Algorithm using Recon Read/write processes run two-phase static quorum-based algorithm, using current configuration. Use gossiping and fixed point tests rather than highly structured communication. When Recon provides new configuration, R/W uses both. Do not abort R/W in progress, but do extra work to access additional processes needed for new quorums. read, write Net Recon new-config

31 31 Removing Old Configurations Read/Write algorithm removes old configurations by garbage-collecting them in the background. Two-phase garbage-collection procedure: –Phase 1: Inform write-quorum of old configuration about the new configuration. Collect latest value from read-quorum of old configuration. –Phase 2: Inform write-quorum of new configuration about latest value. Garbage-collection concurrent with Reads/Writes. Implemented using gossiping and fixed points.

32 32 Discussion: RAMBO Highly modular: R/W algorithm, Recon service, Consensus, Leader election, Network Modularity needed for keeping the complexity under control. Correctness proofs: –Atomicity of Reads and Writes Latency bounds: –For reading, writing, garbage-collection. –Under various assumptions about timing, joins, failures, and rate of reconfiguration. LAN implementations begun.

33 33 Foundations: Hybrid, Timed, Probabilistic Models

34 34 Hybrid I/O Automata (HIOA) [Lynch, Segala, Vaandrager 01, 02] Mathematical model for hybrid (continuous/discrete) system components. Discrete actions, continuous trajectories Supports composition, levels of abstraction. Case studies: –Automated transportation systems –Quanser helicopter system [Mitra, Wang, Feron, Lynch] P C AS

35 35 Timed I/O Automata, Probabilistic,… Timed I/O Automata [Lynch, Segala, Vaandrager, Kirli]: –For modeling and analyzing timing-based systems, e.g., most of the building blocks of our AFOSR project. –Support composition, abstraction. –Collecting ideas from many research papers. Probabilistic I/O automata [Lynch, Segala, Vaandrager]: –For modeling systems with random behavior. –Composition, abstraction aspects still need development. –Need to be combined with timed/hybrid models.

36 36 Conclusions Three main building blocks (services and algorithms) for dynamic systems: –Scalable Group Communication –Dynamic Atomic Broadcast –Reconfigurable Atomic Memory Other auxiliary building blocks, e.g., Group Membership, Consensus with Unknown Participants, Reconfiguration service. Much remains to be done, to produce a “complete” set of useful building blocks for dynamic systems, and a good theory for this area.

37 37

38 38

39 39 Net Memb GCS

40 40 ABD 1 2

41 41 Consensus decide(v) init(v) Consensus


Download ppt "1 Communication and Data Management in Dynamic Distributed Systems Nancy Lynch MIT June 20, 2002 …"

Similar presentations


Ads by Google