Gossip Algorithms and Implementing a Cluster/Grid Information service MsSys Course Amar Lior and Barak Amnon
2 Agenda A short introduction to gossip algorithms Cluster/Grid Information services requirements –How good is old information The distributed bulletin board model Implementation
3 A Problem In an n node system assume that every pair of nodes can communicate directly node i wishes to send a message (rumor, color) to all other nodes. Possible deterministic solutions –BROADCAST (only in a broadcast medium) –Defining a static tree between the nodes and sending the message along the edges of this tree
4 A Gossip Style solution Starting with the round in which a rumor is generated each node that holds the rumor selects another node independently and uniformly at random send the rumor to this node The distribution of the rumor is terminated after some fixed number of O( ln n ) rounds At this point all players are informed with high probability
5 Uniform Gossip Example 1 t
6 t 2
7 t 3
8 t 4
9 t 5
10 Gossip benefits Robustness to the presence of node failures –Messages will continue to propagate due to the random selection of destination –F nodes failure results in only O(F) uninformed players Simplicity –All nodes run the same algorithm Scalability –The number of massages each nodes send (and possibly receive) each round is fixed
11 Gossip taxonomy Other names are –Epidemic algorithms (demers et al) –Randomized communication (Karp et al) Propagation can be done by –Push – sending the information from the node to the selected node –Pull – the other way around –Push&Pull both ways We distinguish between 2 conceptual layers –A basic gossip algorithm »by which nodes choose other nodes for communication –A gossip-based protocol »Built on top of a gossip algorithm »Determine the content of the messages that are sent »The way received messages cause nodes to update their internal state
12 Rumor speeding bounds From a single node to all Time complexity: Message complexity (Karp el al) lower bound to the number of messages:
13 Spatial Gossip (Kampe at al) New information is most interesting to nodes that are nearby Combines the benefits of –Uniform gossip –Deterministic flooding The gossip algorithm chooses the nodes according to New information is spread to nodes at distance d with high probability,in :
14 Aggregating values Gossip can also be used to aggregate a value over all nodes Average, maximum, minimum … In this case the question is how fast the local value in each node converge to the desired value
15 Cluster/Grid Information services Basic properties of Grid environment –Information sources are distributed –Individual sources are subject to failure –Total number of information providers is large –Both the types of information sources and the ways it is used can be varied We cannot in general provide users with accurate information: any information delivered to a user is “old” –How useful is old information? (Mitzenmacher) –How to build an information service with guaranteed age properties?
16 Distributed Bulletin board The system –Consists of ‘N’ nodes (or clusters) –Distributed –Nodes are subject to failure Each node maintains a data structure that holds an entry on selected (or all) nodes in the system We refer to this data structure as “The vector” Each vector entry holds: –state of the resources (static and dynamic) about the corresponding node –age of the information (tune to the local clock) The vector is a distributed bulletin board that serves information requests locally
17 Algorithm 1- Information dissemination Each time unit –Update local information –Find all vector entries which are up to age t –Choose a random node –Send the above entries to that node Upon receiving a message –Compute the received entries age –Update the entries which the newly received information is fresher A:1B:12C:2D:4E:11 A:1C:2D:4 A:4B:12C:2D:4E:11 B:1C:3E:3
18 Algorithm 1 : t=2 1 t
19 Algorithm 1 : t=2 t 2
20 Algorithm 1 : t=2 t 3
21 Algorithm 1 : t=2 t 4
22 Algorithm 1 : t=2 t 5
23 Bounds and Approximations We want to know “how old” is the information in the vector First we find E(Xt) (for the asynchronous case) –The expected number of nodes that have information about node i which is up to t time unit old Synchronous case
24 Bounds and Approximations An approximation for the expected age of the vector
25 Real results
26 Approximating the age distribution Ak is a random variable describing the number of nodes which are up to age k
27 Age distribution
28 Handling inactive nodes The presence of inactive nodes causes problems –Age quality of the information deteriorate –Number of ARP broadcasts increase linearly Using a fixed size window improves the age quality but the number of ARP broadcasts stay the same
29 Algorithm 2 Algorithm 2 solves the above 2 issues Works basically the same as algorithm 1 with the following difference when sending a message –Calculate l the number of active nodes (from the local vector) –Generate a random number between k=0…l –If K=0 send the window to all nodes –Else send the window only to the active nodes Using Algorithm 2 the maximal expected number of messages to inactive nodes ≤ 1 –From all nodes at each round
30 Algorithm 2 – Age performance
31 Algorithm 2 – minimizing messages to inactive nodes 1 t
32 Algorithm 2 t 2
33 Algorithm 2 t 3
34 Algorithm 2 t 4
35 Supporting Urgent information In previous algorithm information is propagated from all nodes constantly In some cases we wish to send an important message urgently to all –such as the detection of a newly dead node –In this case the source node give the message high priority 2*log(n) When a node assemble the window it is about to send it takes the entries with the highest priority and only then the younger entries The priority of an entry is decremented every time unit The result is that urgent messages are disseminated in O(log(n)) steps And regular information is disseminated a bit slower
36 Information service clients MOSIX –load balancing »Fresh information is used by the load balancing algorithm to consider migrating processes –mmon, Mosix Monitoring tool »Presents the vector of a specific node »mmon –h xil-10 MPICH –Improved assignment of processes to nodes »No assignment to “dead” nodes »Assignment to the least loaded ones Nagios –Colleting information about clusters over time (history) –Periodically retrieving a vector from a machine and keeping it Decision algorithms in the cluster level –Leader election (queue fault tolerance) –Node reservation
37 Conclusions Constructed a distributed bulletin board –Age properties are guaranteed –The administrator can configure it to the desired properties –No two nodes have the same view of the system –Information requests are served locally –Noise level (messages to inactive) is constant –Urgent messages are propagated quickly
38 Future Work Investigating other gossip models –Push and Pull-Push Using only a partial view of the system