Download presentation
Presentation is loading. Please wait.
Published byPauline Anderson Modified over 9 years ago
1
Adaptive and Robust Broadcast Algorithm Takeshi Sekiya Chikayama-Taura Lab. 2007/4/13
2
Broadcast “Broadcast” means… Transmitting a message that will be received by every node on the network Especially, Application-Level Multicast MPI Broadcast File Transfer Content Delivery etc…
3
Objective Designing a broadcast algorithm 1.Low latency sending small messages 2.High throughput sending large messages 3.Robustness with low redundancy
4
Agenda Background Broadcast Algorithms and Problem Settings Application Layer Multicast MPI Bcast Gossip-Based Broadcast Our Approach Related Works Conclusion
5
Application Layer Multicast For data stream applications ex.) Yahoo BB Broadcast, Peercast Constructing overlay network Many algorithms are proposed Tree (NICE etc.) Mesh (Chord etc.)
6
Pipeline Transfer Large size messages or data streaming Split large message to small parts P2 receives a part of message from P1 and sends previous one to P3 in parallel P1 P2 P3 P1 P2 P3 Pipelining
7
MPI Broadcast (MPICH etc.) For high performance computing Two algorithms are popular Binomial Tree [Van de Geijn et. al 1994] Features Low latency (Log N steps) Low robustness Binomial Tree
8
Pilot Study of Binomial Tree Process 12 is… A)idle B)high CPU load C)high IO + CPU load CPU: PentiumM Memory: 1GB OS: Linux Kernel2.6 NIC: Gigabit Ether 0 12 3 4 56 7 8 910 11 12 1314 15 16 1718 19 20 2122 23 24 2526 27 28 2930 31
9
Experimental Result In case (c), long time is spent not only process 12 but also process 13, 14, 15
10
Gossip-Based Broadcast ([Eugster et al. 2003] etc.) For large-scale distributed systems Each process sends the message to randomly selected processes Features High scalability High robustness Low efficiency (High redundancy)
11
Redundancy of Gossip-Based Broadcast Each process sends to k processes To ensure enough reliability, it needs to be k ≧ 3 Number of messages (n processes) Binomial tree: n-1 Gossip : 2kn If the message size is large, network load becomes worse 2k times
12
Tradeoffs Robustness VS Low Redundancy Gossip-based VS Spanning-Tree Flooding High Throughput VS Low Latency Single Chain VS Flat Tree
13
Objective (again) Designing a broadcast algorithm 1.Low latency sending small messages 2.High throughput sending large messages 3.Robustness with low redundancy
14
Problem Settings Messages are pushed toward a queue with random probability Frequent: split large message Rare: small message Nodes may fail except root node (adjacent nodes can detect) Algorithm must … send more number of messages in queue with fixed time reduce time which one message is received by all nodes
15
Agenda Background Broadcast Algorithms and Problem Settings Our Approach Graph Configuration Algorithm Related Works Conclusion
16
Basic Idea Adapt to message arrival density dynamically High Chain Low Random Graph Flooding
17
Graph Configuration First, configure “Chain” with layer 2 network topology Topology Estimation [Shirai et. al 2006]
18
Redundant Edges Node n connects to n+2 mod N The graph is ※ Harary graph refers that the removal of any subset t-1 nodes will not disconnect the graph n1 n2 n3 n4 n5 n6 n7 n8 n9 n10 n11 n0 N = 12
19
Random Edges Each node makes k edges randomly The larger k is, The higher robustness The lower efficiency The lower latency n1 n2 n3 n4 n5 n6 n7 n8 n9 n10 n11 n0
20
Algorithm received(m) { if (n+1 is dead) { send(m, n+2); connect(n+3 mod N); } else { send(m, n+1); } if (n-1 is dead) connect(n-2 mod N); for (I = 0; I < r; i++) { if (new message arrived) break; else send(m, random); } If no new message has come, sends the old message to other nodes If the next node is dead, sends to after-the-next node
21
Algorithm Behavior Low message density High message density Chain Flooding
22
Figure with LogP Model [Culler et. al 1993] Throughput = 1 / max(g) Latency = O(LogN) Chain Flooding MI gg P1 P2 P3 P4 P5 P6 P7 P1 P2 P3 P4 P5 P6 P7 L
23
Fault tolerance If process n+1 is dead send the message to n+2 connect to n+3 If process n-1 is dead connect to n-2 The algorithm tolerates one process fault at one time
24
Features of Algorithm Adapt to message size dynamically Random graph flooding (small messages) Single chain pipelining (large messages) Robustness Redundancy depending on randomness Fault tolerance by redundant edges
25
Agenda Background Broadcast Algorithms and Problem Settings Our Approach Related Works STAR-MPI Conclusion
26
STAR-MPI [Faraj et al.] Change collective MPI algorithm dynamically Select best algorithm at run time MS (Mesure_Select) stage Trying each algorithm and choose the best one MA (Monitor_Adapt) stage Checking efficiency of the algorithm
27
STAR-MPI Targeting programs that run for a large number of iterations Algorithm1 Algorithm2 Algorithm3 Algorithm2 Choose best algorithm Check efficiency Algorithm2 MS stageMA stage time
28
Agenda Background Broadcast Algorithms and Problem Settings Our Approach Related Works Conclusion
29
Proposed the robust algorithm that adapts message size Future work Implementation and evaluation Deciding best (better) “k” with evaluation
30
Publications 1. 関谷岳史,田浦健次朗,近山隆.適応的並列 計算を支援するプロトコルの設計と正当性の 証明.並列/分散/協調処理に関するサマー ワークショップ( SWoPP2006 ), pp.169-174 , 高知, 2006 年 7 月. 2. 関谷岳史, 田浦健次朗, 近山隆. 適応的並列計算 を支援するプロトコルの設計と正当性の証明. 先進的計算基盤システムシンポジウム (SACSIS2007). May 2007. ( 発表予定 )
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.