1 Idit Keidar MIT Lab for Computer Science Theory of Distributed Systems Group Paradigms for Building Distributed Systems: Performance Measurements and Conditional Analysis
2 Outline Motivation: application domain Paradigms for building distributed applications Typical performance measurements and studies Conditional performance study Examples –Group membership –QoS-preserving totally ordered multicast –Dynamic voting
3 Modern Distributed Applications (in WANs) Highly available servers –Video-on-demand Collaborative computing –Shared white-board, shared editor, etc. –Military command and control –On-line strategy games
4 Important Issues in Building Distributed Applications Consistency of view –Same picture of game, same shared file Fault tolerance, high availability Performance –Conflicts with consistency? Scalability –Topology - WAN, long unpredictable delays –Number of participants
5 Generic Primitives - Middleware, “Building Blocks” E.g., total order, group communication Abstract away difficulties, e.g., –Total order - a basis for replication –Mask failures Important issues: –Well specified semantics - complete –Performance
6 Typical Performance Measurements Measure –“Average” latency –Throughput Run on idle machines, idle network,... –to get meaningful, consistent results –to get meaningful comparison among different algorithms
7 Other Interesting Questions When should we expect the system to behave as measured? How much does it degrade at other times? How fast does it converge to good behavior after a bad period? Complement the answers we get from measurements
8 Typical Performance Study “Expected” latency, throughput –Bundle up all cases? Assume some distribution (e.g. exponential) –Q: How sensitive is the analysis to this assumption? Q: How does this compose? TO NET APP
9 Conditional Analysis: Supplement to Measurements Guaranteed behavior under certain conditions on the environment –Compare with measurements at ideal times –Understand interesting issues from measurements Conditions are parameters –Understand how performance degrades Study how fast performance converges to good behavior after a bad times Wait before using probability –Composable! –Allows studying sensitivity to probability
10 Example 1: A Scalable Group Membership Algorithm for WANs Idit Keidar, Jeremy Sussman Keith Marzullo, Danny Dolev ICDCS 2000
11 Membership in WAN: the Challenge Message latency is large and unpredictable è Time-out failure detection is inaccurate è We use a notification service (NS) for WANs è Number of communication rounds matters è Algorithms may change views frequently è View changes require communication for state transfer, which is costly in WAN
12 Algorithm Novel Concepts Designed for WANs from the ground up Avoids delivery of “obsolete” views –Views that are known to be changing –Not always terminating (but NS is) –How could measurements / analysis capture this benefit? Runs in a single round “typically”(in-sync) –Three rounds in worst case (out-of-sync)
13 Measurements: End-to-end Latency: Scalable! Member scalability: 4 servers (constant) Server and member scalability: 4-14 servers
14 Interesting Questions (Future) How typical is the “typical case”? –Depends on NS Understanding costs over NS costs –Measurements show: when NS takes more time at some process, membership algorithm works in “pipeline” to save time time end NS memb msg
15 The QoS Challenge Some distributed applications require QoS –Guaranteed available bandwidth –Bounded delay, bounded jitter Membership algorithm terminates in one round under certain circumstances –Can we leverage on that to guarantee QoS under certain assumptions? Can other primitives guarantee QoS?
16 “The requirements of resilience and scalability dictate that total consistency of view is not possible unless mechanisms requiring unacceptable delays are employed” Jon Crowcroft, Internetworking Multimedia, 1999
17 QoS Preserving Totally Ordered Multicast Ziv Bar-Joseph, Idit Keidar, Tal Anker, Nancy Lynch 2000
18 QoS Preserving Totally Ordered Multicast - Motivation Total order - building block for replication Applications: –On-line strategy games, shared text editing, etc. –Need predictable delays but also consistency –Fault tolerance Not always too costly!
19 The Model (VBR) Allows for some bursty traffic Slot size , per application –Tunable Message loss handled by FEC –Analysis due to [ Bartal, Byers, Luby, Raz ] Processes can fail, recover Clocks synchronized within
20 Algorithm Overview: Fault Free Case Deliver messages in each slot according to process identifier order and reported number of messages per slot Example: – is 100 milliseconds –a sends 5 in the slot –b sends 2 in the slot The order inside the slot is: a a a a a b b Send dummy in empty slots E.g., deliver: (dummy-from-a) b b
21 Algorithm QoS Guarantees: Fault Free Case Maximum latency: + + Average rate: increased by at most 1/ –At most 1 dummy per slot –Only if sending rate drops below Max burst: same as reserved by application –No dummy messages in full slots latencyrate
22 Lower Bound on Maximum Latency with Process Faults Reduce to Consensus (well-known) Consensus lower bound: –f +1 rounds for tolerating f stopping failures Lower bound on latency: (f +1) –linear!
23 Process Failures and Joins: Summary of Results Total order with gaps –Gaps correspond to faulty processes –Latency increases to: +2 + - constant! even when processes join or fail Reliable total order (work-in-progress) –Reason about QoS guarantees under certain assumptions on failure patterns (“clean” rounds)
24 Conclusions Totally ordered multicast and QoS can co-exist in certain network models –Important to understand model, failure patterns,... Next step: implementation –Applications: shared text editor, on-line game, –See if analyzed cases are the “right” ones A framework for analyzing QoS guarantees –Other examples will follow, e.g., other QoS parameters, other primitives
25 Availability Study of Dynamic Voting Algorithms Kyle Ingols and Idit Keidar 2000
26 Dynamic Voting - Defines Quorums Adaptively Each “primary” is a majority of the previous one but not of all the universe of processes Example: {1, 2, 3, 4, 5, 6, 7, 8, 9} {1, 2, 3, 4, 5} {2, 3, 4} {3, 4, 6, 7 10, 11} Availability studied by stochastic analysis, simulations, empirical measurements,...
27 Previous Studies Ignored…. The change from one “primary” to the next cannot be atomic in a distributed system What happens if a failure occurs while the change is not complete? –Some suggested algorithms were wrong –Correct algorithms differ in handling this How fast they recover How many processes need to reconnect to recover Can attempts to change primary be pipelined?
28 Our Study Simulations Multiple frequent connectivity changes –Then, stable period - see if primary exists Observations: –Algorithms differ greatly in availability –especially in their degradation Conclusion: analysis of any kind may fail to consider important cases...