1 Idit Keidar MIT Lab for Computer Science Theory of Distributed Systems Group Paradigms for Building Distributed Systems: Performance Measurements and.

Slides:

Advertisements

Similar presentations

1 CS 194: Distributed Systems Process resilience, Reliable Group Communication Scott Shenker and Ion Stoica Computer Science Division Department of Electrical.

Advertisements

CS 542: Topics in Distributed Systems Diganta Goswami.

Teaser - Introduction to Distributed Computing

Dead Reckoning Objectives – –Understand what is meant by the term dead reckoning. –Realize the two major components of a dead reckoning protocol. –Be capable.

Lab 2 Group Communication Andreas Larsson

Receiver-driven Layered Multicast S. McCanne, V. Jacobsen and M. Vetterli SIGCOMM 1996.

E-Transactions: End-to-End Reliability for Three-Tier Architectures Svend Frølund and Rachid Guerraoui.

Virtual Synchrony Jared Cantwell. Review Multicast Causal and total ordering Consistent Cuts Synchronized clocks Impossibility of consensus Distributed.

Ashish Gupta Under Guidance of Prof. B.N. Jain Department of Computer Science and Engineering Advanced Networking Laboratory.

Dynamic Tuning of the IEEE Protocol to Achieve a Theoretical Throughput Limit Frederico Calì, Marco Conti, and Enrico Gregori IEEE/ACM TRANSACTIONS.

Algorithm for Virtually Synchronous Group Communication Idit Keidar, Roger Khazan MIT Lab for Computer Science Theory of Distributed Systems Group.

Group Communications Group communication: one source process sending a message to a group of processes: Destination is a group rather than a single process.

1 Availability Study of Dynamic Voting Algorithms Kyle Ingols and Idit Keidar MIT Lab for Computer Science.

Transis Efficient Message Ordering in Dynamic Networks PODC 1996 talk slides Idit Keidar and Danny Dolev The Hebrew University Transis Project.

Group Communication Phuong Hoai Ha & Yi Zhang Introduction to Lab. assignments March 24 th, 2004.

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous Uniform.

1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.

A General approach to MPLS Path Protection using Segments Ashish Gupta Ashish Gupta.

Scalable Group Communication for the Internet Idit Keidar MIT Lab for Computer Science Theory of Distributed Systems Group The main part of this talk is.

Abstractions for Fault-Tolerant Distributed Computing Idit Keidar MIT LCS.

Josef WidderBooting Clock Synchronization1 The  - Model, and how to Boot Clock Synchronization in it Josef Widder Embedded Computing Systems Group

Distributed Systems 2006 Group Membership * *With material adapted from Ken Birman.

1 A Framework for Highly Available Services Based on Group Communication Alan Fekete Idit Keidar University of Sidney MIT.

A General approach to MPLS Path Protection using Segments Ashish Gupta Ashish Gupta.

Composition Model and its code. bound:=bound+1.

Optimistic Virtual Synchrony Jeremy Sussman - IBM T.J.Watson Idit Keidar – MIT LCS Keith Marzullo – UCSD CS Dept.

Transis 1 Fault Tolerant Video-On-Demand Services Tal Anker, Danny Dolev, Idit Keidar, The Transis Project.

1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:

Evaluating the Running Time of a Communication Round over the Internet Omar Bakr Idit Keidar MIT MIT/Technion PODC 2002.

An Efficient Topology-Adaptive Membership Protocol for Large- Scale Cluster-Based Services Jingyu Zhou * §, Lingkun Chu*, Tao Yang* § * Ask Jeeves §University.

High-Performance Networks for Dataflow Architectures Pravin Bhat Andrew Putnam.

A Randomized Error Recovery Algorithm for Reliable Multicast Zhen Xiao Ken Birman AT&T Labs – Research Cornell University.

1 System Models. 2 Outline Introduction Architectural models Fundamental models Guideline.

ARMADA Middleware and Communication Services T. ABDELZAHER, M. BJORKLUND, S. DAWSON, W.-C. FENG, F. JAHANIAN, S. JOHNSON, P. MARRON, A. MEHRA, T. MITTON,

© Oxford University Press 2011 DISTRIBUTED COMPUTING Sunita Mahajan Sunita Mahajan, Principal, Institute of Computer Science, MET League of Colleges, Mumbai.

Quality of Service Karrie Karahalios Spring 2007.

Lab 2 Group Communication Farnaz Moradi Based on slides by Andreas Larsson 2012.

Byzantine fault-tolerance COMP 413 Fall Overview Models –Synchronous vs. asynchronous systems –Byzantine failure model Secure storage with self-certifying.

Improving the Efficiency of Fault-Tolerant Distributed Shared-Memory Algorithms Eli Sadovnik and Steven Homberg Second Annual MIT PRIMES Conference, May.

Farnaz Moradi Based on slides by Andreas Larsson 2013.

2007/1/15http:// Lightweight Probabilistic Broadcast M2 Tatsuya Shirai M1 Dai Saito.

A. Haeberlen Fault Tolerance and the Five-Second Rule 1 HotOS XV (May 18, 2015) Ang Chen Hanjun Xiao Andreas Haeberlen Linh Thi Xuan Phan Department of.

November NC state university Group Communication Specifications Gregory V Chockler, Idit Keidar, Roman Vitenberg Presented by – Jyothish S Varma.

Totally Ordered Broadcast in the face of Network Partitions [Keidar and Dolev,2000] INF5360 Student Presentation 4/3-08 Miran Damjanovic

EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University

Scalable Group Communication for the Internet Idit Keidar MIT Lab for Computer Science Theory of Distributed Systems Group.

December 4, 2002 CDS&N Lab., ICU Dukyun Nam The implementation of video distribution application using mobile group communication ICE 798 Wireless Mobile.

Building Dependable Distributed Systems, Copyright Wenbing Zhao

1 Transport Layer: Basics Outline Intro to transport UDP Congestion control basics.

Antidio Viguria Ann Krueger A Nonblocking Quorum Consensus Protocol for Replicated Data Divyakant Agrawal and Arthur J. Bernstein Paper Presentation: Dependable.

PROCESS RESILIENCE By Ravalika Pola. outline: Process Resilience  Design Issues  Failure Masking and Replication  Agreement in Faulty Systems  Failure.

Relying on Safe Distance to Achieve Strong Partitionable Group Membership in Ad Hoc Networks Authors: Q. Huang, C. Julien, G. Roman Presented By: Jeff.

Distributed Systems Lecture 9 Leader election 1. Previous lecture Middleware RPC and RMI – Marshalling 2.

NTT - MIT Research Collaboration — Bi-Annual Report, July 1—December 31, 1999 MIT : Cooperative Computing in Dynamic Environments Nancy Lynch, Idit.

Fail-Stop Processors UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau One paper: Byzantine.

1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Washington,

Chapter 8 Fault Tolerance. Outline Introductions –Concepts –Failure models –Redundancy Process resilience –Groups and failure masking –Distributed agreement.

EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University

Algorithm for Virtually Synchronous Group Communication

Outline Announcements Fault Tolerance.

Active replication for fault tolerance

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

Performance Evaluation of a Communication Round over the Internet

Evaluating the Running Time of a Communication Round over the Internet

Presentation transcript:

1 Idit Keidar MIT Lab for Computer Science Theory of Distributed Systems Group Paradigms for Building Distributed Systems: Performance Measurements and Conditional Analysis

2 Outline Motivation: application domain Paradigms for building distributed applications Typical performance measurements and studies Conditional performance study Examples –Group membership –QoS-preserving totally ordered multicast –Dynamic voting

3 Modern Distributed Applications (in WANs) Highly available servers –Video-on-demand Collaborative computing –Shared white-board, shared editor, etc. –Military command and control –On-line strategy games

4 Important Issues in Building Distributed Applications Consistency of view –Same picture of game, same shared file Fault tolerance, high availability Performance –Conflicts with consistency? Scalability –Topology - WAN, long unpredictable delays –Number of participants

5 Generic Primitives - Middleware, “Building Blocks” E.g., total order, group communication Abstract away difficulties, e.g., –Total order - a basis for replication –Mask failures Important issues: –Well specified semantics - complete –Performance

6 Typical Performance Measurements Measure –“Average” latency –Throughput Run on idle machines, idle network,... –to get meaningful, consistent results –to get meaningful comparison among different algorithms

7 Other Interesting Questions When should we expect the system to behave as measured? How much does it degrade at other times? How fast does it converge to good behavior after a bad period? Complement the answers we get from measurements

8 Typical Performance Study “Expected” latency, throughput –Bundle up all cases? Assume some distribution (e.g. exponential) –Q: How sensitive is the analysis to this assumption? Q: How does this compose? TO NET APP

9 Conditional Analysis: Supplement to Measurements Guaranteed behavior under certain conditions on the environment –Compare with measurements at ideal times –Understand interesting issues from measurements Conditions are parameters –Understand how performance degrades Study how fast performance converges to good behavior after a bad times Wait before using probability –Composable! –Allows studying sensitivity to probability

10 Example 1: A Scalable Group Membership Algorithm for WANs Idit Keidar, Jeremy Sussman Keith Marzullo, Danny Dolev ICDCS 2000

11 Membership in WAN: the Challenge Message latency is large and unpredictable è Time-out failure detection is inaccurate è We use a notification service (NS) for WANs è Number of communication rounds matters è Algorithms may change views frequently è View changes require communication for state transfer, which is costly in WAN

12 Algorithm Novel Concepts Designed for WANs from the ground up Avoids delivery of “obsolete” views –Views that are known to be changing –Not always terminating (but NS is) –How could measurements / analysis capture this benefit? Runs in a single round “typically”(in-sync) –Three rounds in worst case (out-of-sync)

13 Measurements: End-to-end Latency: Scalable! Member scalability: 4 servers (constant) Server and member scalability: 4-14 servers

14 Interesting Questions (Future) How typical is the “typical case”? –Depends on NS Understanding costs over NS costs –Measurements show: when NS takes more time at some process, membership algorithm works in “pipeline” to save time time end NS memb msg

15 The QoS Challenge Some distributed applications require QoS –Guaranteed available bandwidth –Bounded delay, bounded jitter Membership algorithm terminates in one round under certain circumstances –Can we leverage on that to guarantee QoS under certain assumptions? Can other primitives guarantee QoS?

16 “The requirements of resilience and scalability dictate that total consistency of view is not possible unless mechanisms requiring unacceptable delays are employed” Jon Crowcroft, Internetworking Multimedia, 1999

17 QoS Preserving Totally Ordered Multicast Ziv Bar-Joseph, Idit Keidar, Tal Anker, Nancy Lynch 2000

18 QoS Preserving Totally Ordered Multicast - Motivation Total order - building block for replication Applications: –On-line strategy games, shared text editing, etc. –Need predictable delays but also consistency –Fault tolerance Not always too costly!

19 The Model (VBR) Allows for some bursty traffic Slot size , per application –Tunable Message loss handled by FEC –Analysis due to [ Bartal, Byers, Luby, Raz ] Processes can fail, recover Clocks synchronized within 

20 Algorithm Overview: Fault Free Case Deliver messages in each slot according to process identifier order and reported number of messages per slot Example: –  is 100 milliseconds –a sends 5 in the slot –b sends 2 in the slot The order inside the slot is: a a a a a b b Send dummy in empty slots E.g., deliver: (dummy-from-a) b b

21 Algorithm QoS Guarantees: Fault Free Case Maximum latency:  +  +  Average rate: increased by at most 1/  –At most 1 dummy per slot –Only if sending rate drops below Max burst: same as reserved by application –No dummy messages in full slots latencyrate

22 Lower Bound on Maximum Latency with Process Faults Reduce to Consensus (well-known) Consensus lower bound: –f +1 rounds for tolerating f stopping failures  Lower bound on latency: (f +1)  –linear!

23 Process Failures and Joins: Summary of Results Total order with gaps –Gaps correspond to faulty processes –Latency increases to:  +2  +  - constant! even when processes join or fail Reliable total order (work-in-progress) –Reason about QoS guarantees under certain assumptions on failure patterns (“clean” rounds)

24 Conclusions Totally ordered multicast and QoS can co-exist in certain network models –Important to understand model, failure patterns,... Next step: implementation –Applications: shared text editor, on-line game, –See if analyzed cases are the “right” ones A framework for analyzing QoS guarantees –Other examples will follow, e.g., other QoS parameters, other primitives

25 Availability Study of Dynamic Voting Algorithms Kyle Ingols and Idit Keidar 2000

26 Dynamic Voting - Defines Quorums Adaptively Each “primary” is a majority of the previous one but not of all the universe of processes Example: {1, 2, 3, 4, 5, 6, 7, 8, 9} {1, 2, 3, 4, 5} {2, 3, 4} {3, 4, 6, 7 10, 11} Availability studied by stochastic analysis, simulations, empirical measurements,...

27 Previous Studies Ignored…. The change from one “primary” to the next cannot be atomic in a distributed system What happens if a failure occurs while the change is not complete? –Some suggested algorithms were wrong –Correct algorithms differ in handling this How fast they recover How many processes need to reconnect to recover Can attempts to change primary be pipelined?

28 Our Study Simulations Multiple frequent connectivity changes –Then, stable period - see if primary exists Observations: –Algorithms differ greatly in availability –especially in their degradation Conclusion: analysis of any kind may fail to consider important cases...