2007/1/15http:// Lightweight Probabilistic Broadcast M2 Tatsuya Shirai M1 Dai Saito
2007/1/15http:// Broadcast in Large Scale Environment End users send messages to all other users more frequently. –P2P BBS –Stock markets These applications need software broadcast. Participating processes change more dynamically compared to processes on servers, –machine crash –login to or logout from applications
2007/1/15http:// Deterministic Broadcast Each process transfers messages along defined routes. This approach provides consistency of message delivery ordering. –Messages from each process reach in the order that it sends Reliability is expressed in “best effort”
2007/1/15http:// Deterministic Broadcast cont. rate of perturbed processes Poor scalability –Single point of failure –Cost of maintaining routing information Low reliability at unstable networks. –Perturbation of few processes makes performance of healthy processes lower.
2007/1/15http:// Probabilistic Broadcast Each process transfers messages to randomly selected processes without using defined routing information. Approximate redundancy enhances reliability. Reliability is relatively high and stable in large scale and unstable environments.
2007/1/15http:// Pbcast [Kenneth et al. 1999] This approach concurrently uses deterministic and probabilistic broadcast. –While network load is low, deterministic broadcast achieve high reliability and low cost. –While network load is high, probabilistic broadcast ensure certain reliability, especially of healthy processes.
2007/1/15http:// Deterministic Broadcast The first protocol is deterministic broadcast. It uses IP multicast, or if it is not available, uses spanning trees randomly composed. –But composing spanning trees needs information of all membership. So this approach is limited to a few hundred processes, as mentioned in this paper.
2007/1/15http:// Anti-Entropy Protocol The second is anti-entropy protocol based on gossip. –In each round, members choose some of other members randomly, send a summary of their message history digest to the selected processes. –Processes receive the digest and check the lack of message, and require the lacking message for original sender. message history membership info. digests message history lack 5, 8! lack 3, 9! 3, 9 message 3, message 9 5, 8 message 5, message 8 5, 8 3, 9
2007/1/15http:// Anti-Entropy Protocol cont. Message size and fanout, the number of processes to which a process send in one round, define network load of this protocol. Message size is limited by message lifetime on each process. –A process send any message for some fixed rounds from initial reception. –After that, the message is gave up
2007/1/15http:// Flow Control Flow control while the network load is high. –The rate of pbcast messages should be limited. Normally every 100ms. –Retransmission should delays in some rounds if many other processes require. digests 3, 9 message 3, message 9 5, 8 message 5, message 8
2007/1/15http:// Evaluations Parameters: –Message loss rate –Fanout, the number of processes Reliability: –(infected processes – failed ones) > all ones/2 for applications based on quorum replication algorithm Throughput: –The number of messages a process receives in 1 second.
2007/1/15http:// Effects of Fanout Predicate I shows pbcast. –Message loss rate is –Deterministic broadcast reaches 10 % of the processes. –50 processes participate. Probability of failure decrease with an increase of the number of fanout to 8. fanout (0~10)
2007/1/15http:// Scalability Predicate I shows pbcast. –Message loss rate is –Deterministic broadcast reaches 10 % of the processes. Probability of failure decrease with bigger scale. –Though broadcast to all processes take more rounds processes (0~60)
2007/1/15http:// Time for broadcast to all processes Messages are received in 12 rounds on an average, less than 20 rounds at 1024 processes. –Fanout is 1 –Det. broadcast is not used. This result shows the means are at O(logN) rounds (0~20) processes
2007/1/15http:// Throughput 150 messages are sent in one second. –When message loss happens frequently fanout is limited to small size. Throughput of perturbed processes decreases, but healthy processes avail full throughput. rate of perturbed processes deterministic pbcast
2007/1/15http:// Throughput cont. Throughput at 200 msg/sec. –25 % of the processes pertube 25 % of the time. –Det. broadcast is unused. High frequency of packet loss causes throughput lower. In this case, average throughput decreases to 60% at 96 processes at high bandwidth. loss rate(0 ~ 0.2) 32~96 processes
2007/1/15http:// Conclusion of pbcast Gossip based protocol achieves scalability and reliability in general network environments. Then, cost of processes are not considered. The next topic is memory management for pbcast.
2007/1/15http:// Membership Management Assumption –Each process knows all Members memory consumption in large scale communication required to ensure the consistency of the Membership –Problems of Scalability in Large scale environment
2007/1/15http:// Membership Management of lpbcast Member Management + Gossip –Each process knows a subset of all Members –Sending messages with Member information –Size limitation of Membership Management Buffer Fixed Memory consumption
2007/1/15http:// Memory Management The Memory requirement for a process should not change (in large scale) –Buffer of Membership Management –Buffer of outgoing message →Scalability pbcast with a viewpoint of “Memory Consumption”
2007/1/15http:// lpbcast algorithm Assumptions –Each process has unique ID –Each message has unique ID (including process ID) –joining/leaving (= subscribing/unsubscribing) Buffers –Events : event notifications –EventIDs : Event IDs –Subs : subscription information –unSubs : unsubscription information –View : targets of gossip message Size limitation for all Buffers –Especially in Events and Subs
2007/1/15http:// sending lpbcast(e) –Add e to Events periodical gossip –Send buffers to a subset of View (every 50ms) e e Events EventIDs View Subs unSubs e Mes
2007/1/15http:// receiving When receiving gossip… –Membership Management add Mes.unSubs : unSubs ・ remove Mes.unSubs : View,Subs add Mes.Subs : View,Subs If size of View is too large, move some items to Subs randomly View Mes.unSubs Mes.Subs Subs
2007/1/15http:// receiving When receiving gossip… –Event transmission Events received for the first time are transmitted to other processes in View If size of Events is too large, remove randomly –Retrieving Event When receiving undelivered event ID in Mes.EventIDs, a request of retrieving Event Events e Unknown e e EventIDs Unknown e ID
2007/1/15http:// subscribing Subscribing process should know at least one node in specific Members Sending Gossip with appending itself to Subs When timeout, making retransmission View
2007/1/15http:// unsubscribing Sending Gossip with appending itself to unSubs –The process is gradually removed from individual view –Set timeout to unSubs messages –Assumption : removed process will not recover soon unSubs
2007/1/15http:// features of lpbcast Throughput is as high as pbcast A estimation of Memory consumption The membership algorithm and the dissemination of events are dealt with at the same level. Each view is independent uniformly –True P2P Model →suitable for WAN –Need to recognize the “locality”
2007/1/15http:// [m1,m2] Optimization Age-base –Optimization of Events Buffer –Now : Events Buffer is purged randomly →better to remove well disseminated messages –Age = # of hops P1 P2 bcast(m1) bcast(m2)gossip(m2) [m1] deliver(m2) [m1,m2]
2007/1/15http:// Optimization Frequency-base –Optimization of Subs Buffer –Now : Subs Buffer is purged randomly → better to remove well-known processes –well-known = included in Subs Buffers P1 P2 P3 Subs(P1, P2) Subs(P2) [P2] [P1,P2]
2007/1/15http:// Experiment : # of rounds Simulation –Prob. of Message loss : 0.05 –Prob. of process crash : 0.01 # of rounds to disseminate 99% of all processes Logarithmically –Fanout = 3
2007/1/15http:// Experiment : Reliability –SUN Ultra 10 (Solaris2.6, Memory256Mb) –100Mbps Ethernet –40msg/round, len(Events)=60 A probability for any given process of delivering any given event notification
2007/1/15http:// Experiment : Optimization Effect Age-based optimization –Delivery ratio = (# of delivered message)/(# of broadcast) –30msg/round len(Events)=30 Fanout=4 60processes Optimized Random
2007/1/15http:// Conclusion Scalability+Reliability Bimodal Multicast –Gossip based protocol achieves scalability and reliability. Lightweight Probabilistic Broadcast –Paying attention to cost of processes –memory management for pbcast. –Lightweight in large scale environment