CSE 486/586 Distributed Systems Gossiping

Slides:

Advertisements

Similar presentations

CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy)

Advertisements

CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Leader Election Steve Ko Computer Sciences and Engineering University at Buffalo.

CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Reliable Multicast Steve Ko Computer Sciences and Engineering University at Buffalo.

CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Byzantine Fault Tolerance Steve Ko Computer Sciences and Engineering University at Buffalo.

CS542: Topics in Distributed Systems

CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Logical Time Steve Ko Computer Sciences and Engineering University at Buffalo.

CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Logical Time Steve Ko Computer Sciences and Engineering University at Buffalo.

CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Case Study: Amazon Dynamo Steve Ko Computer Sciences and Engineering University at Buffalo.

Communication (II) Chapter 4

CSE 486/586 CSE 486/586 Distributed Systems Case Study: Amazon Dynamo Steve Ko Computer Sciences and Engineering University at Buffalo.

CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Replication with View Synchronous Group Communication Steve Ko Computer Sciences and Engineering.

CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Gossiping Steve Ko Computer Sciences and Engineering University at Buffalo.

CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Wrap-up Steve Ko Computer Sciences and Engineering University at Buffalo.

CSE 486/586 CSE 486/586 Distributed Systems Logical Time Steve Ko Computer Sciences and Engineering University at Buffalo.

CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Wrap-up Steve Ko Computer Sciences and Engineering University at Buffalo.

CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Gossiping Steve Ko Computer Sciences and Engineering University at Buffalo.

CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.

CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Mutual Exclusion & Leader Election Steve Ko Computer Sciences and Engineering University.

CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Replication Steve Ko Computer Sciences and Engineering University at Buffalo.

CSE 486/586 CSE 486/586 Distributed Systems Consistency Steve Ko Computer Sciences and Engineering University at Buffalo.

CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Transactions on Replicated Data Steve Ko Computer Sciences and Engineering University at Buffalo.

Lecture 19-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013 Indranil Gupta (Indy) October 29, 2013 Lecture 19 Gossiping Reading: Section.

CSE 486/586 CSE 486/586 Distributed Systems Leader Election Steve Ko Computer Sciences and Engineering University at Buffalo.

CSE 486/586 CSE 486/586 Distributed Systems Gossiping Steve Ko Computer Sciences and Engineering University at Buffalo.

Lecture 19-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2012 Indranil Gupta (Indy) October 30, 2012 Lecture 19 Gossiping.

Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2011 Gossiping Reading: Section 15.4 / 18.4  2011, N. Borisov, I. Gupta, K. Nahrtstedt,

CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Logical Time Steve Ko Computer Sciences and Engineering University at Buffalo.

CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Paxos Steve Ko Computer Sciences and Engineering University at Buffalo.

EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University

Reliable multicast Tolerates process crashes. The additional requirements are: Only correct processes will receive multicasts from all correct processes.

CSE 486/586 Distributed Systems Byzantine Fault Tolerance

CSE 486/586 Distributed Systems Reliable Multicast --- 1

CSE 486/586 Distributed Systems Wrap-up

CSE 486/586 Distributed Systems Failure Detectors

CSE 486/586 Distributed Systems Leader Election

CSE 486/586 Distributed Systems Case Study: Amazon Dynamo

CSE 486/586 Distributed Systems Mid-Semester Overview

CSE 486/586 Distributed Systems Consistency --- 1

CSE 486/586 Distributed Systems Wrap-up

CS525 Advanced Distributed Systems Spring 2009

CSE 486/586 Distributed Systems Time and Synchronization

CSE 486/586 Distributed Systems Failure Detectors

CS 425 / ECE 428 Distributed Systems Fall 2017 Indranil Gupta (Indy)

CSE 486/586 Distributed Systems Failure Detectors

CSE 486/586 Distributed Systems Logical Time

湖南大学-信息科学与工程学院-计算机与科学系

CSE 486/586 Distributed Systems Byzantine Fault Tolerance

Global State and Gossip

Lecture 6: Failure Detection and Membership, Grids

CSE 486/586 Distributed Systems Consistency --- 1

EEC 688/788 Secure and Dependable Computing

CSE 486/586 Distributed Systems Leader Election

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

CSE 486/586 Distributed Systems Wrap-up

CSE 486/586 Distributed Systems Consistency --- 2

CSE 486/586 Distributed Systems Failure Detectors

CS 425 / ECE 428 Distributed Systems Fall 2018 Indranil Gupta (Indy)

CSE 486/586 Distributed Systems Consistency --- 1

CSE 486/586 Distributed Systems Reliable Multicast --- 2

CSE 486/586 Distributed Systems Logical Time

CSE 486/586 Distributed Systems Byzantine Fault Tolerance

CSE 486/586 Distributed Systems Time and Synchronization

CSE 486/586 Distributed Systems Case Study: Amazon Dynamo

CSE 486/586 Distributed Systems Concurrency Control --- 3

CSE 486/586 Distributed Systems Reliable Multicast --- 1

CSE 486/586 Distributed Systems Leader Election

CSE 486/586 Distributed Systems Mid-Semester Overview

IS 698/800-01: Advanced Distributed Systems Membership Management

Presentation transcript:

CSE 486/586 Distributed Systems Gossiping Steve Ko Computer Sciences and Engineering University at Buffalo

Revisiting Multicast Node with a piece of information to be communicated to everyone Distributed Group of “Nodes”= Processes at Internet- based hosts

Fault-Tolerance and Scalability Multicast sender X Nodes may crash Packets may be dropped Possibly 1000’s of nodes X Multicast Protocol

B-Multicast Simplest implementation Problems? UDP/TCP packets

R-Multicast Stronger guarantees Overhead is quadratic in N UDP/TCP packets

Any Other? E.g., tree-based multicast e.g., IPmulticast, SRM RMTP, TRAM,TMTP Tree setup and maintenance Problems? UDP/TCP packets

Another Approach Multicast sender

Another Approach Periodically, transmit to b random targets Gossip messages (UDP)

Another Approach Other nodes do same after receiving multicast Gossip messages (UDP)

Another Approach

“Gossip” (or “Epidemic”) Multicast Infected Protocol rounds (local clock) b random targets per round Gossip Message (UDP) Uninfected

CSE 486/586 Administrivia PA2-B is due in ~2 weeks. PA1 grades are posted. PA2-A grading is in progress.

Properties Lightweight Quick spread Highly fault-tolerant Analysis from old mathematical branch of Epidemiology [Bailey 75] Parameters c,b: c for determining rounds: (c*log(n)), b: # of nodes to contact Can be small numbers independent of n, e.g., c=2; b=2; Within c*log(n) rounds, [low latency] all but of nodes receive the multicast [reliability] each node has transmitted no more than c*b*log(n) gossip messages [lightweight]

Fault-Tolerance Packet loss Node failure 50% packet loss: analyze with b replaced with b/2 To achieve same reliability as 0% packet loss, takes twice as many rounds Node failure 50% of nodes fail: analyze with n replaced with n/2 and b replaced with b/2 Same as above

Fault-Tolerance With failures, is it possible that the epidemic might die out quickly? Possible, but improbable: Once a few nodes are infected, with high probability, the epidemic will not die out So the analysis we saw in the previous slides is actually behavior with high probability [Galey and Dani 98] The same applicable to: Rumors Infectious diseases An Internet worm Some implementations Amazon Web Services EC2/S3 (rumored) Usenet NNTP (Network News Transport Protocol)

Using Gossip for Failure Detection: Gossip-style Heartbeating  Using gossip to spread heartbeats gives better accuracy pi All-to-all heartbeating Each process sends out heartbeats to every other process Con: Slow process/link causes false positives

Gossip-Style Failure Detection 1 10118 64 2 10110 3 10090 58 4 10111 65 1 10120 66 2 10103 62 3 10098 63 4 10111 65 2 1 Address Time (local) 1 10120 70 2 10110 64 3 10098 4 10111 65 Heartbeat Counter 4 Protocol: Processes periodically gossip their membership list On receipt, the local membership list is updated 3 Current time : 70 at process 2 (asynchronous clocks)

Gossip-Style Failure Detection If the heartbeat has not increased for more than Tfail seconds (according to local time), the member is considered failed But don’t delete it right away Wait another Tcleanup seconds, then delete the member from the list

Gossip-Style Failure Detection What if an entry pointing to a failed process is deleted right after Tfail seconds? Fix: remember for another Tfail Ignore gossips for failed members Don’t include failed members in go- -ssip messages 1 10120 66 2 10110 64 3 10098 75 4 10111 65 1 10120 66 2 10110 64 4 10111 65 1 10120 66 2 10110 64 3 10098 50 4 10111 65 1 10120 66 2 10103 62 3 10098 55 4 10111 65 2 1 Current time : 75 at process 2 4 3

Summary Eager replication vs. lazy replication Gossiping Lazy replication propagates updates in the background Gossiping One strategy for lazy replication High-level of fault-tolerance & quick spread Another use case for gossiping Failure detection

Acknowledgements These slides contain material developed and copyrighted by Indranil Gupta (UIUC).