1 Reliable Group Communication: a Mathematical Approach Nancy Lynch Theory of Distributed Systems MIT LCS Kansai chapter, IEEE July 7, 2000 GC …

Slides:



Advertisements
Similar presentations
6.852: Distributed Algorithms Spring, 2008 Class 7.
Advertisements

Optimizing Buffer Management for Reliable Multicast Zhen Xiao AT&T Labs – Research Joint work with Ken Birman and Robbert van Renesse.
Background information Formal verification methods based on theorem proving techniques and model­checking –to prove the absence of errors (in the formal.
Lab 2 Group Communication Andreas Larsson
Virtual Synchrony Jared Cantwell. Review Multicast Causal and total ordering Consistent Cuts Synchronized clocks Impossibility of consensus Distributed.
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
Algorithm for Virtually Synchronous Group Communication Idit Keidar, Roger Khazan MIT Lab for Computer Science Theory of Distributed Systems Group.
Group Communications Group communication: one source process sending a message to a group of processes: Destination is a group rather than a single process.
Transis Efficient Message Ordering in Dynamic Networks PODC 1996 talk slides Idit Keidar and Danny Dolev The Hebrew University Transis Project.
1 An Inheritance-Based Technique for Building Simulation Proofs Incrementally Idit Keidar, Roger Khazan, Nancy Lynch, Alex Shvartsman MIT Lab for Computer.
Multicast Protocols Jed Liu 28 February Introduction  Recall Atomic Broadcast:  All correct processors receive same set of messages.  All messages.
Group Communication Phuong Hoai Ha & Yi Zhang Introduction to Lab. assignments March 24 th, 2004.
1 Dynamic Atomic Storage Without Consensus Alex Shraer (Technion) Joint work with: Marcos K. Aguilera (MSR), Idit Keidar (Technion), Dahlia Malkhi (MSR.
Scalable Group Communication for the Internet Idit Keidar MIT Lab for Computer Science Theory of Distributed Systems Group The main part of this talk is.
Lecture 13 Synchronization (cont). EECE 411: Design of Distributed Software Applications Logistics Last quiz Max: 69 / Median: 52 / Min: 24 In a box outside.
Timed Quorum Systems … for large-scale and dynamic environments Vincent Gramoli, Michel Raynal.
1 Idit Keidar MIT Lab for Computer Science Theory of Distributed Systems Group Paradigms for Building Distributed Systems: Performance Measurements and.
Transis Dynamic Voting for Consistent Primary Components PODC 1997 talk slides Esti Yeger Lotem, Idit Keidar and Danny Dolev The Hebrew University
1 A Framework for Highly Available Services Based on Group Communication Alan Fekete Idit Keidar University of Sidney MIT.
Composition Model and its code. bound:=bound+1.
Optimistic Virtual Synchrony Jeremy Sussman - IBM T.J.Watson Idit Keidar – MIT LCS Keith Marzullo – UCSD CS Dept.
Transis 1 Fault Tolerant Video-On-Demand Services Tal Anker, Danny Dolev, Idit Keidar, The Transis Project.
Lab 1 Bulletin Board System Farnaz Moradi Based on slides by Andreas Larsson 2012.
Fault Tolerance via the State Machine Replication Approach Favian Contreras.
The DHCP Failover Protocol A Formal Perspective Rui FanMIT Ralph Droms Cisco Systems Nancy GriffethCUNY Nancy LynchMIT.
Lecture #12 Distributed Algorithms (I) CS492 Special Topics in Computer Science: Distributed Algorithms and Systems.
ARMADA Middleware and Communication Services T. ABDELZAHER, M. BJORKLUND, S. DAWSON, W.-C. FENG, F. JAHANIAN, S. JOHNSON, P. MARRON, A. MEHRA, T. MITTON,
TOTEM: A FAULT-TOLERANT MULTICAST GROUP COMMUNICATION SYSTEM L. E. Moser, P. M. Melliar Smith, D. A. Agarwal, B. K. Budhia C. A. Lingley-Papadopoulos University.
1 IOA: Mathematical Models  Distributed Programs Nancy Lynch November 15, 2000 Collaborators: Steve Garland, Josh Tauber, Anna Chefter, Antonio Ramirez,
Lab 2 Group Communication Farnaz Moradi Based on slides by Andreas Larsson 2012.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Replication with View Synchronous Group Communication Steve Ko Computer Sciences and Engineering.
Farnaz Moradi Based on slides by Andreas Larsson 2013.
Improving the Efficiency of Fault-Tolerant Distributed Shared-Memory Algorithms Eli Sadovnik and Steven Homberg Second Annual MIT PRIMES Conference, May.
Farnaz Moradi Based on slides by Andreas Larsson 2013.
Timed I/O Automata: A Mathematical Framework for Modeling and Analyzing Real-Time Systems Frits Vaandrager, University of Nijmegen joint work with Dilsun.
6.852: Distributed Algorithms Spring, 2008 Class 13.
1 Modeling and Analyzing Distributed Systems Using I/O Automata Nancy Lynch, MIT Draper Laboratory, IR&D Mid-Year Meeting December 11, 2002.
Ensemble Fault-Tolerance Security Adaptation. The Horus and Ensemble Projects Accomplishments and Limitations Kent Birman, Bob Constable, Mayk Hayden,
Video Multicast over the Internet Presented by: Liang-Yuh Wu Lung-Yuan Wu Hao-Hsiang Ku 12 / 6 / 2001 Bell Lab. And Georgia Institute of Technologies IEEE.
November NC state university Group Communication Specifications Gregory V Chockler, Idit Keidar, Roman Vitenberg Presented by – Jyothish S Varma.
Totally Ordered Broadcast in the face of Network Partitions [Keidar and Dolev,2000] INF5360 Student Presentation 4/3-08 Miran Damjanovic
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
1 IOA: Distributed Algorithms  Distributed Programs Nancy Lynch PODC 2000 Collaborators: Steve Garland, Josh Tauber, Anna Chefter, Antonio Ramirez, Michael.
1 Theory of Distributed Systems (TDS) Group Leaders: Nancy Lynch, Idit Keidar PhD students: Victor Luchangco, Josh Tauber, Roger Khazan, Carl Livadas,
CSE 60641: Operating Systems Implementing Fault-Tolerant Services Using the State Machine Approach: a tutorial Fred B. Schneider, ACM Computing Surveys.
Scalable Group Communication for the Internet Idit Keidar MIT Lab for Computer Science Theory of Distributed Systems Group.
The Totem Single-Ring Ordering and Membership Protocol Y. Amir, L. E. Moser, P. M Melliar-Smith, D. A. Agarwal, P. Ciarfella.
December 4, 2002 CDS&N Lab., ICU Dukyun Nam The implementation of video distribution application using mobile group communication ICE 798 Wireless Mobile.
Course: COMS-E6125 Professor: Gail E. Kaiser Student: Shanghao Li (sl2967)
Building Dependable Distributed Systems, Copyright Wenbing Zhao
SysRép / 2.5A. SchiperEté The consensus problem.
Chapter 8 Asynchronous System Model by Mikhail Nesterenko “Distributed Algorithms” by Nancy A. Lynch.
1 Modeling and Analyzing Fault-Tolerant, Real-Time Communication Protocols Nancy Lynch Theory of Distributed Systems MIT Second MURI Workshop Berkeley,
Formal Verification. Background Information Formal verification methods based on theorem proving techniques and model­checking –To prove the absence of.
Building reliable, high- performance communication systems from components Xiaoming Liu, Christoph Kreitz, Robbert van Renesse, Jason Hickey, Mark Hayden,
1 Communication and Data Management in Dynamic Distributed Systems Nancy Lynch MIT June 20, 2002 …
Ordering in online games Objectives – Understand the ordering requirements of gaming – Realise how ordering may be achieved – Be able to relate ordering.
1 New Directions for NEST Research Nancy Lynch MIT NEST Annual P.I. Meeting Bar Harbor, Maine July 12, 2002 …
Mathematical Models and Proof/Analysis Methods for Timing-Based Systems And… Their Application to Communication, Fault-Tolerant Distributed Computing,
NTT - MIT Research Collaboration — Bi-Annual Report, July 1—December 31, 1999 MIT : Cooperative Computing in Dynamic Environments Nancy Lynch, Idit.
1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Washington,
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
1 Modeling and Analyzing Distributed Systems Using I/O Automata Nancy Lynch, MIT Draper Laboratory, IR&D Kickoff Meeting Aug. 30, 2002.
Reliable multicast Tolerates process crashes. The additional requirements are: Only correct processes will receive multicasts from all correct processes.
Algorithm for Virtually Synchronous Group Communication
PERSPECTIVES ON THE CAP THEOREM
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Abstractions for Fault Tolerance
Presentation transcript:

1 Reliable Group Communication: a Mathematical Approach Nancy Lynch Theory of Distributed Systems MIT LCS Kansai chapter, IEEE July 7, 2000 GC …

2 Dynamic Distributed Systems Modern distributed systems are dynamic. Set of clients participating in an application changes, because of: –Network, processor failure, recovery –Changing client requirements To cope with changes: –Use abstract groups of client processes with changing membership sets. –Processes communicate with group members by sending messages to the group as a whole. ? ? ? ? ?

3 Group Communication Services Support management of groups Maintain membership info Manage communication Make guarantees about ordering, reliability of message delivery, e.g.: –Best-effort: IP Multicast –Strong consistency guarantees: Isis, Transis, Ensemble Hide complexity of coping with changes GC …

4 This Talk Describe –Group communication systems –A mathematical approach to designing, modeling, analyzing GC systems. –Our accomplishments and ideas for future work. Collaborators: Idit Keidar, Alan Fekete, Alex Shvartsman, Roger Khazan, Roberto De Prisco, Jason Hickey, Robert van Renesse, Carl Livadas, Ziv Bar-Joseph, Kyle Ingols, Igor Tarashchanskiy

5 Talk Outline I.Background: Group Communication II.Our Approach III.Projects and Results 1. View Synchrony 2. Ensemble 3. Dynamic Views 4. Scalable Group Communication IV. Future Work V. Conclusions

6 I. Background: Group Communication

7 The Setting Dynamic distributed system, changing set of participating clients. Applications: –Replicated databases, file systems –Distributed interactive games –Multi-media conferencing, collaborative work –… ? ? ? ? ?

8 Groups Abstract, named groups of client processes, changing membership. Client processes send messages to the group (multicast). Early 80s: Group idea used in replicated data management system designs Late 80s: Separate group communication services.

9 Group Communication Service Communication middleware Manages group membership, current views View = membership set + identifier Manages multicast communication among group members –Multicasts respect views –Guarantees within each view: Reliability constraints Ordering constraints, e.g., FIFO from each sender, causal, common total order Global service GC … A B

10 Group Communication Service GCS new-view mcast receive Client A Client B

11 Isis [Birman, Joseph 87] Primary component group membership Several reliable multicast services, different ordering guarantees, e.g.: –Atomic Broadcast: Common total order, no gaps –Causal Broadcast: When partition is repaired, primary processes send state information to rejoining processes. Virtually Synchronous message delivery AB

12 Example: Interactive Game Alice, Bob, Carol, Dan in view {A,B,C,D} Primary component membership –{A}{B,C,D} split; only {B,C,D} may continue. Atomic Broadcast –A fires, B moves away; need consistent order A B C D

13 Interactive Game Causal Broadcast –C sees A enter a room; locks door. Virtual Synchrony –{A}{BCD} split; B sees A shoot; so do C, D. A B C D

14 Applications Replicated data management –State machine replication [Lamport 78], [Schneider 90] –Atomic Broadcast provides support –Same sequence of actions performed everywhere. –Example: Interactive game state machine Stock market Air-traffic control

15 Transis [Amir, Dolev, Kramer, Malkhi 92] Partitionable group membership When components merge, processes exchange state information. Virtual synchrony reduces amount of data exchanged. Applications –Highly available servers –Collaborative computing, e.g. shared whiteboard –Video, audio conferences –Distributed jam sessions –Replicated data management [Keidar, Dolev 96]

16 Other Systems Totem [Amir, Melliar-Smith, Moser, et al., 95] –Transitional views, useful with virtual synchrony Horus [Birman, van Renesse, Maffeis 96] Ensemble [Birman, Hayden 97] –Layered architecture –Composable building blocks Phoenix, Consul, RMP, Newtop, RELACS,… Partitionable

17 Service Specifications Precise specifications needed for GC services –Help application programmers write programs that use the services correctly, effectively –Help system maintainers make changes correctly –Safety, performance, fault-tolerance But difficult: –Many different services; different guarantees about membership, reliability, ordering –Complicated –Specs based on implementations might not be optimal for application programmers.

18 Early Work on GC Service Specs [Ricciardi 92] [Jahanian, Fakhouri, Rajkumar 93] [Moser, Amir, Melliar-Smith, Agrawal 94] [Babaoglu et al. 95, 98] [Friedman, van Renesse 95] [Hiltunin, Schlichting 95] [Dolev, Malkhi, Strong 96] [Cristian 96] [Neiger 96] Impossibility results [Chandra, Hadzilacos, et al. 96] But still difficult…

19 II. Our Approach

20 Approach Model everything: –Applications Requirements, algorithms –Service specs Work backwards, see what the applications need –Implementations of the services State, prove correctness theorems: –For applications, implementations. –Methods: Composition, invariants, simulation relations Analyze performance, fault-tolerance. Layered proofs, analyses Application Algorithm Service

21 Math Foundation: I/O Automata Nondeterministic state machines Not necessarily finite-state Input/output/internal actions (signature) Transitions, executions, traces System modularity: –Composition, respecting traces –Levels of abstraction, respecting traces Language-independent, math model

22 Typical Examples Modeled Distributed algorithms Communication protocols Distributed data management systems

23 Modeling Style Describe interfaces, behavior Program-like behavior descriptions: –Precondition/effect style –Pseudocode or IOA language Abstract models for algorithms, services Model several levels of abstraction, –High-level, global service specs … –Detailed distributed algorithms

24 Modeling Style Very nondeterministic: –Constrain only what must be constrained. –Simpler –Allows alternative implementations

25 Describing Timing Features TIOAs [Lynch, Vaandrager 93] –For describing: Timeout-based algorithms. Clocks, clock synchronization Performance properties

26 Describing Failures Basic or timed I/O automata, with fail, recover input actions. Included in traces, can use them in specs. fail recover fail recover

27 Describing Other Features Probabilistic behavior: PIOAs [ Segala 95] –For describing: Systems with combination of probabilistic + nondeterministic behavior Randomized distributed algorithms Probabilistic assumptions on environment Dynamic systems: DIOAs [Attie, Lynch 99] –For describing: Run-time process creation and destruction Mobility Agent systems [NTT collaboration]

28 Using I/O Automata (General) Specify systems precisely Validate designs: –Simulation –State, prove correctness theorems –Analyze performance Generate validated code Study theoretical upper and lower bounds

29 Using I/O Automata for Group Communication Systems Use for global services + distributed algorithms Define safety properties separately from performance/fault-tolerance properties. –Safety: Basic I/O automata; trace properties –Performance/fault-tolerance: Timed I/O automata with failure actions; timed trace properties

30 III. Projects and Results

31 Projects 1. View Synchrony 2. Ensemble 3. Dynamic Views 4. Scalable Group Communication

32 1. View Synchrony (VS) [Fekete, Lynch, Shvartsman 97, 00] Goals: Develop prototypes: –Specifications for typical GC services –Descriptions for typical GC algorithms –Correctness proofs –Performance analyses Design simple math foundation for the area. Try out, evaluate our approach.

33 View Synchrony What we did: Talked with system developers (Isis, Transis) Defined I/O automaton models for: –VS, prototype partitionable GC service –TO, non-view-oriented totally ordered bcast service –VStoTO, application algorithm based on [Amir, Dolev, Keidar, Melliar-Smith, Moser] Proved correctness Analyzed performance/ fault-tolerance.

34 VStoTO Architecture VS VStoTO TO bcast brcv gpsnd gprcv newview

35 TO Broadcast Specification Delivers messages to everyone, in the same order. Safety: TO-Machine Signature : input: bcast(a,p) output: brcv(a,p,q) internal: to-order(a,p) State: queue, sequence of (a,p), initially empty for each p: pending[p], sequence of a, initially empty next[p], positive integer, initially 1 TO

36 TO-Machine Transitions : bcast(a,p) Effect: append a to pending[p] to-order(a,p) Precondition: a is head of pending[p] Effect: remove head of pending[p] append (a,p) to queue brcv(a,p,q) Precondition: queue[next[q]] = (a,p) Effect: next[q] := next[q] + 1

37 Performance/Fault-Tolerance TO-Property(b,d,C): If C stabilizes, then soon thereafter (time b), any message sent or received anywhere in C is received everywhere in C, within bounded time (time d). stabilize b send d receive

38 VS Specification Partitionable view-oriented service Safety: VS-Machine –Views presented in consistent order, possible gaps –Messages respect views –Messages in consistent order –Causality –Prefix property –Safe indication Doesn’t guarantee Virtual Synchrony Like TO-Machine, but per view VS

39 Performance/Fault-Tolerance VS-Property(b,d,C): If C stabilizes, then soon thereafter (time b), views known within C become consistent, and messages sent in the final view v are delivered everywhere in C, within bounded time (time d). mcast(v)receive(v) d stabilize b newview( v)

40 VStoTO Algorithm TO must deliver messages in order, no gaps. VS delivers messages in order per view. Problems arise from view changes: –Processes moving between views could have different prefixes. –Processes could skip views. Algorithm: –Real work done in majority views only –Processes in majority views totally order messages, and deliver to clients messages that VS has said are safe. –At start of new view, processes exchange state, to reconcile progress made in different majority views.

41 Correctness (Safety) Proof Show composition of VS-Machine and VStoTO machines implements TO-Machine. Trace inclusion Use simulation relation proof: –Relate start states, steps of composition to those of TO-Machine –Invariants, e.g.: Once a message is ordered everywhere in some majority view, its order is determined forever. Checked using PVS theorem-prover, TAME [Archer] TO Composition

42 Conditional Performance Analysis Assume VS satisfies VS-Property(b,d,C): –If C stabilizes, then within time b, views known within C become consistent, and messages sent in the final view are delivered everywhere in C, within time d. And VStoTO satisfies: –Simple timing and fault-tolerance assumptions. Then TO satisfies TO-Property(b+d,d,C): –If C stabilizes, then within time b+d, any message sent or delivered anywhere in C is delivered everywhere in C, within time d.

43 Conclusions: VS Models for VS, TO, VStoTO Proofs, performance/f-t analyses Tractable, understandable, modular [PODC 97], [TOCS 00] Follow-on work: –Algorithm for VS [Fekete, Lesley] –Load balancing using VS [Khazan] –Models for other Transis algorithms [Chockler] But: VS is only a prototype; lacks some key features, like Virtual Synchrony Next: Try a real system!

44 2. Ensemble [Hickey, Lynch, van Renesse 99] Goals: Try, evaluate our approach on a real system Develop techniques for modeling, verifying, analyzing more features, of GC systems, including Virtual Synchrony Improve on prior methods for system validation

45 Ensemble Ensemble system [Birman, Hayden 97] –Virtual Synchrony –Layered design, building blocks –Coded in ML [Hayden] Prior verification work for Ensemble and predecessors: –Proving local properties using Nuprl [Hickey] –[Ricciardi], [Friedman]

46 Ensemble What we did: –Worked with developers –Followed VS example –Developed global specs for key layers: Virtual Synchrony Total Order with Virtual Synchrony –Modeled Ensemble algorithm spanning between layers –Attempted proof; found logical error in state exchange algorithm (repaired) –Developed models, proofs for repaired system

47 Conclusions: Ensemble Models for two layers, algorithm Tractable, easily understandable by developers Error, proofs Low-level models similar to actual ML code (4 to 1) [TACAS 99] Follow-on: –Same error found in Horus. –Incremental models, proofs [Hickey] Next: Use our approach to design new services.

48 3. Dynamic Views [De Prisco, Fekete, Lynch, Shvartsman 98] Goals: Define GC services that cope with both: –Long-term changes: Permanent failures, new joins Changes in the “universe” of processes –Transient changes Use these to design consistent total order and consistent replicated data algorithms that tolerate both long-term and transient changes.

49 Dynamic Views Many applications with strong consistency requirements make progress only in primary views: –Consistent replicated data management –Totally ordered broadcast Can use static notion of allowable primaries, e.g., majorities of universe, quorums –All intersect. –Only one exists at a time. –Information can flow from each to the next. But: Static notion not good for long-term changes ABCDE

50 Dynamic Views For long-term changes, want dynamic notion of allowable primaries. E.g., each primary might contain majority of previous: But: Some might not intersect. Makes it hard to maintain consistency. F ABC DE

51 Dynamic Views Key problem: –Processes may have different opinions about which is the previous primary –Could be disjoint. [Yeger-Lotem, Keidar, Dolev 97] algorithm –Keeps track of all possible previous primaries. –Ensures intersection with all of them.

52 Dynamic Views What we did : Defined Dynamic View Service, DVS, based on [YKD] Designed to tolerate long-term failures Membership: –Views delivered in consistent order, possible gaps. –Ensures new primary intersects all possible previous primaries. Communication: –Similar toVS –Messages delivered within views, –Prefix property, safe notifications.

53 Dynamic Views What we did, cont’d –Modeled, proved implementing algorithm –Modeled, proved TO-Broadcast application –Distributed implementation [Ingols 00] DVS TO

54 Handling Transient Failures: Dynamic Configurations Configuration = Set of processes plus structure, e.g., set of quorums, leader,… Application: Highly available consistent replicated data management: –Paxos [Lamport], uses leader, quorums –[Attiya,Bar-Noy, Dolev], uses read quorums and write quorums Quorums allow flexibility, availability in the face of transient failures.

55 Dynamic Configurations [De Prisco, Fekete, Lynch, Shvartsman 99, 00] Combine ideas/benefits of –Dynamic views, for long-term failures, and –Static configurations, for transient failures Idea: –Allow configuration to change (reconfiguration). –Each configuration satisfies intersection properties with respect to previous configuration Example: –Config = (membership set, read quorums, write quorums) –Membership set of new configuration contains read quorum and write quorum of previous configuration

56 Dynamic Configurations What we did : Defined dynamic configuration service DCS, guaranteeing intersection properties w.r.t. all possible previous configurations. Designed implementing algorithm, extending [YKD] Developed application: Replicated data –Dynamic version of [Paxos] –Dynamic version of [Attiya, BarNoy, Dolev] –Tolerate Transient failures, using quorums Longer-term failures, using reconfiguration

57 Conclusions: Dynamic Views New DVS, DC services for long-term changes in set of processes Applications, implementations Decomposed complex algorithms into tractable pieces: –Service specification, implementation, application –Static algorithm vs. reconfiguration Couldn’t have done it without the formal framework. [PODC 98], [DISC 99]

58 4. Scalable Group Communication [Keidar, Khazan 99], [K,K, L, Shvartsman 00] Goal : Make GC work in wide area networks What we did: Defined desired properties for GC services Defined spec for scalable group membership service [Keidar, Sussman, Marzullo, Dolev 00], implemented on small set of membership servers

59 Scalable Group Communication What we did, cont’d: Developed new, scalable GC algorithms: –Use scalable GM service –Multicast implemented on clients –Efficient: Algorithm for virtual synchrony uses only one round for state exchange, in parallel with membership service’s agreement on views. –Processes can join during reconfiguration. Distributed implementation [Tarashchanskiy]

60 S AA’ S’ Scalable GC What we did, cont’d: Developed new incremental modeling, proof methods [Keidar, Khazan, Lynch, Shvartsman 00] –Proof Extension Theorem Developed models, proofs (safety and liveness), using the new methods.

61 Conclusions: Scalable GC Specs, new algorithms, proofs New incremental proof methods Couldn’t have done it without the formal framework. [ICDCS 99], [ICSE 00]

62 IV. Future Work

63 Future Work Model, analyze GC services, applications Design new GC services Catalog Compare, evaluate GC services Math foundations Theory  Practice

64 Practical GC Systems: Current Status [Birman 99] Commercial successes: –Stock exchange (Zurich, New York) –Air-traffic control (France) Problems: –Performance, for strong guarantees like Virtual Synchrony –Not integrated with object-oriented programming technologies. Trends: –Flexible services –Weaker guarantees; better performance –Integration with OO technologies, allowing programmers to make tradeoffs.

65 1. Model, Analyze GC Services Analyze performance of our new services: Dynamic views, Scalable GC –Implementations –Applications: Replicated data, games, … –Compare predicted, observed performance. Other existing services

66 2. Design New Services Total Order + QoS [Bar-Joseph, Keidar, Anker, L.] Specs for: –Bandwidth reservation service –TO Multicast service with QoS (latency, bandwidth) Algorithms implementing TO-QoS using reservation service: Algorithm 1: Allows gaps, simple, small added latency Algorithm 2: No gaps, more complex, more latency Basic services: Consensus, resource allocation, leader election, spanning trees, overlay networks

67 3. Catalog of GC Services Service specs Property specs [Chockler, Keidar, Vitenberg 99] Implementing algorithms Prototype applications Lower bounds, impossibility results

68 4. Compare, Evaluate GC Services Study tradeoffs between strength of ordering and reliability guarantees vs. performance Compare GC services with other reliable multicast algorithms: –Scalable Reliable Multicast [Floyd, Jacobson, et al. 95]: Unreliable GC (IP Multicast) + retransmission protocol –Bimodal Multicast [Birman, Hayden, et al. 99]

69 5. Math Foundations Models: –Timing models For timing assumptions, guarantees, QoS For conditional performance analysis –Failure models, probabilistic models, process creation models… –Combined models Proof methods: –Incremental modeling, proof –Conditional performance analysis

70 Conditional Performance Analysis Idea: –Make conditional claims about system behavior, under various assumptions about behavior of environment, network. –Include timing, performance, failures. Benefits: –Formal performance predictions –Says when system makes specific guarantees Normal case + failure cases Parameters, sensitivity analysis –Composable –Get probabilistic claims as corollaries

71 CP Analysis: Typical Hypotheses Stabilization of underlying network. Limited rate of change. Bounds on message delay. Limited amount of failure (number, density). Limit input arrivals (number, density). Method allows focus on tractable cases.

72 Example: Reliable Multicast [Livadas, Keidar, Lynch] Specs for IP Mcast, Reliable Mcast services Automaton model for Scalable Reliable Mcast (SRM) protocol [Floyd, Jacobson, et al. 95] Example: –Assume bounds on IP-level message loss, processor failures –Prove bounds on: Time from client send until all non-failed clients receive. Amount of traffic generated.

73 SRM Architecture IPMcast SRM

74 6. Theory  Practice IOA language, tool support for GC services, algorithms Incremental development methods for algorithms, service specs, proofs, analyses Methods for integrating group communication services with object-oriented programming technologies

75 V. Conclusions

76 Summary GC services help in programming dynamic distributed systems, though scalability, integration problems remain. Our contributions: –Modeling style: Automata + performance properties –Techniques: Conditional performance analysis, incremental modeling/proof –Models, proofs for key services –Discovered errors –New services: Dynamic views, scalable GC Mathematical framework makes it possible to design more complex systems correctly. GC

77 Future Work 1.Model, analyze GC services, applications 2.Design new services 3.Catalog 4.Compare, evaluate services 5.Math foundations 6.Theory  Practice

78 Thank you!