Adaptive and Robust Broadcast Algorithm Takeshi Sekiya Chikayama-Taura Lab. 2007/4/13.

Slides:



Advertisements
Similar presentations
Chapter 5: Tree Constructions
Advertisements

Multicasting in Mobile Ad hoc Networks By XIE Jiawei.
COS 461 Fall 1997 Routing COS 461 Fall 1997 Typical Structure.
Multicast in Wireless Mesh Network Xuan (William) Zhang Xun Shi.
Bidding Protocols for Deploying Mobile Sensors Reporter: Po-Chung Shih Computer Science and Information Engineering Department Fu-Jen Catholic University.
Towards an Exa-scale Operating System* Ely Levy, The Hebrew University *Work supported in part by a grant from the DFG program SPPEXA, project FFMK.
SplitStream: High- Bandwidth Multicast in Cooperative Environments Monica Tudora.
Gossip Algorithms and Implementing a Cluster/Grid Information service MsSys Course Amar Lior and Barak Amnon.
A Distributed and Oblivious Heap Christian Scheideler and Stefan Schmid Dept. of Computer Science University of Paderborn.
On the Effectiveness of Measurement Reuse for Performance-Based Detouring David Choffnes Fabian Bustamante Fabian Bustamante Northwestern University INFOCOM.
Gossip Scheduling for Periodic Streams in Ad-hoc WSNs Ercan Ucan, Nathanael Thompson, Indranil Gupta Department of Computer Science University of Illinois.
LightFlood: An Optimal Flooding Scheme for File Search in Unstructured P2P Systems Song Jiang, Lei Guo, and Xiaodong Zhang College of William and Mary.
Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.
Unstructured overlays: construction, optimization, applications Anne-Marie Kermarrec Joint work with Laurent Massoulié and Ayalvadi Ganesh.
Eddie Bortnikov/Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation.
1 CSE 591-S04 (lect 14) Interconnection Networks (notes by Ken Ryu of Arizona State) l Measure –How quickly it can deliver how much of what’s needed to.
On the Construction of Energy- Efficient Broadcast Tree with Hitch-hiking in Wireless Networks Source: 2004 International Performance Computing and Communications.
Dynamic Hypercube Topology Stefan Schmid URAW 2005 Upper Rhine Algorithms Workshop University of Tübingen, Germany.
 Idit Keidar, Technion Intel Academic Seminars, February Octopus A Fault-Tolerant and Efficient Ad-hoc Routing Protocol Idit Keidar, Technion Joint.
LPT for Data Aggregation in Wireless Sensor networks Marc Lee and Vincent W.S Wong Department of Electrical and Computer Engineering, University of British.
Adaptive Self-Configuring Sensor Network Topologies ns-2 simulation & performance analysis Zhenghua Fu Ben Greenstein Petros Zerfos.
Nor Asilah Wati Abdul Hamid, Paul Coddington. School of Computer Science, University of Adelaide PDCN FEBRUARY 2007 AVERAGES, DISTRIBUTIONS AND SCALABILITY.
Spanning Tree and Multicast. The Story So Far Switched ethernet is good – Besides switching needed to join even multiple classical ethernet networks Routing.
Multicast Communication Multicast is the delivery of a message to a group of receivers simultaneously in a single transmission from the source – The source.
Communication Part IV Multicast Communication* *Referred to slides by Manhyung Han at Kyung Hee University and Hitesh Ballani at Cornell University.
Efficient and Reliable Broadcast in ZigBee Networks Purdue University, Mitsubishi Electric Lab. To appear in SECON 2005.
Web Page Clustering based on Web Community Extraction Chikayama-Taura Lab. M2 Shim Wonbo.
Communication (II) Chapter 4
Collective Communication on Architectures that Support Simultaneous Communication over Multiple Links Ernie Chan.
Dominating Set Based and Power-aware Hierarchical Epidemics in P2P Systems Tugba KocEmrah CemOznur Ozkasap Department of Computer Engineering, Koç University,
Probabilistic Broadcast Presented by Keren Censor 1.
Publisher Mobility in Distributed Publish/Subscribe Systems Vinod Muthusamy, Milenko Petrovic, Dapeng Gao, Hans-Arno Jacobsen University of Toronto June.
2005/10/211 A Survey on Physical Network Topology Estimation October 21, 2005 Chikayama-Taura Lab. Tatsuya Shirai.
Content-Based Routing in Mobile Ad Hoc Networks Milenko Petrovic, Vinod Muthusamy, Hans-Arno Jacobsen University of Toronto July 18, 2005 MobiQuitous 2005.
A Routing Underlay for Overlay Networks Akihiro Nakao Larry Peterson Andy Bavier SIGCOMM’03 Reviewer: Jing lu.
1 A distributed Task Scheduler Optimizing Data Transfer Time Taura lab. Kei Takahashi (56428) Taura lab. Kei Takahashi (56428)
CprE 545 project proposal Long.  Introduction  Random linear code  LT-code  Application  Future work.
1 Detecting and Reducing Partition Nodes in Limited-routing-hop Overlay Networks Zhenhua Li and Guihai Chen State Key Laboratory for Novel Software Technology.
On Reducing Broadcast Redundancy in Wireless Ad Hoc Network Author: Wei Lou, Student Member, IEEE, and Jie Wu, Senior Member, IEEE From IEEE transactions.
Toward Fault-tolerant P2P Systems: Constructing a Stable Virtual Peer from Multiple Unstable Peers Kota Abe, Tatsuya Ueda (Presenter), Masanori Shikano,
1 An Adaptive File Distribution Algorithm for Wide Area Network Takashi Hoshino, Kenjiro Taura, Takashi Chikayama University of Tokyo.
1 A distributed Task Scheduler Optimizing Data Transfer Time (データ転送時間を最適化する分散タスクスケジュラー) Taura lab. Kei Takahashi (56428) Taura lab. Kei Takahashi (56428)
2007/1/15http:// Lightweight Probabilistic Broadcast M2 Tatsuya Shirai M1 Dai Saito.
Intermediate Presentation(05/04/15) Autonomous Failure Detection for Supporting Fault Tolerant Parallel Computation 05/04/15 Taura Lab. Master 2nd
A Method for Distributed Computation of Semi-Optimal Multicast Tree in MANET Eiichi Takashima, Yoshihiro Murata, Naoki Shibata*, Keiichi Yasumoto, and.
Design an MPI collective communication scheme A collective communication involves a group of processes. –Assumption: Collective operation is realized based.
An Adaptive Collective Communication Suppressing Contention Taura Lab. M2 Shota Yoshitomi.
LightFlood: An Efficient Flooding Scheme for File Search in Unstructured P2P Systems Song Jiang, Lei Guo, and Xiaodong Zhang College of William and Mary.
1 Gossip-Based Ad Hoc Routing Zygmunt J. Haas, Joseph Halpern, LiLi Cornell University Presented By Charuka Silva.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
Peer to Peer Network Design Discovery and Routing algorithms
MPI implementation – collective communication MPI_Bcast implementation.
CS 6401 Overlay Networks Outline Overlay networks overview Routing overlays Resilient Overlay Networks Content Distribution Networks.
Optimized Multicast Optimized Multicast Cho, song yean Samsung Electronics.
Self-stabilizing energy-efficient multicast for MANETs.
Pipelined Broadcast on Ethernet Switched Clusters Pitch Patarasuk, Ahmad Faraj, Xin Yuan Department of Computer Science Florida State University Tallahassee,
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
A Stable Broadcast Algorithm Kei Takahashi Hideo Saito Takeshi Shibata Kenjiro Taura (The University of Tokyo, Japan) 1 CCGrid Lyon, France.
1 Roie Melamed, Technion AT&T Labs Araneola: A Scalable Reliable Multicast System for Dynamic Wide Area Environments Roie Melamed, Idit Keidar Technion.
Mobile Networks and Applications (January 2007) Presented by J.H. Su ( 蘇至浩 ) 2016/3/21 OPLab, IM, NTU 1 Joint Design of Routing and Medium Access Control.
Pouya Ostovari and Jie Wu Computer & Information Sciences
1 Towards Scalable Pub/Sub Systems Shuping Ji 1, Chunyang Ye 2, Jun Wei 1 and Arno Jacobsen 3 1 Chinese Academy of Sciences 2 Hainan University 3 Middleware.
CS 3700 Networks and Distributed Systems
Advanced Computer Networks
Introduction to Wireless Sensor Networks
CS 3700 Networks and Distributed Systems
Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform
Peer-to-Peer Streaming: An Hierarchical Approach
Optimizing MPI collectives for SMP clusters
Presentation transcript:

Adaptive and Robust Broadcast Algorithm Takeshi Sekiya Chikayama-Taura Lab. 2007/4/13

Broadcast “Broadcast” means…  Transmitting a message that will be received by every node on the network Especially, Application-Level Multicast  MPI Broadcast  File Transfer  Content Delivery etc…

Objective Designing a broadcast algorithm 1.Low latency sending small messages 2.High throughput sending large messages 3.Robustness with low redundancy

Agenda Background Broadcast Algorithms and Problem Settings  Application Layer Multicast  MPI Bcast  Gossip-Based Broadcast Our Approach Related Works Conclusion

Application Layer Multicast For data stream applications  ex.) Yahoo BB Broadcast, Peercast Constructing overlay network Many algorithms are proposed  Tree (NICE etc.)  Mesh (Chord etc.)

Pipeline Transfer Large size messages or data streaming Split large message to small parts P2 receives a part of message from P1 and sends previous one to P3 in parallel P1 P2 P3 P1 P2 P3 Pipelining

MPI Broadcast (MPICH etc.) For high performance computing Two algorithms are popular  Binomial Tree  [Van de Geijn et. al 1994] Features  Low latency (Log N steps)  Low robustness Binomial Tree

Pilot Study of Binomial Tree Process 12 is… A)idle B)high CPU load C)high IO + CPU load CPU: PentiumM Memory: 1GB OS: Linux Kernel2.6 NIC: Gigabit Ether

Experimental Result In case (c), long time is spent not only process 12 but also process 13, 14, 15

Gossip-Based Broadcast ([Eugster et al. 2003] etc.) For large-scale distributed systems Each process sends the message to randomly selected processes Features  High scalability  High robustness  Low efficiency (High redundancy)

Redundancy of Gossip-Based Broadcast Each process sends to k processes To ensure enough reliability, it needs to be k ≧ 3 Number of messages (n processes)  Binomial tree: n-1  Gossip : 2kn If the message size is large, network load becomes worse 2k times

Tradeoffs Robustness VS Low Redundancy  Gossip-based VS Spanning-Tree Flooding High Throughput VS Low Latency  Single Chain VS Flat Tree

Objective (again) Designing a broadcast algorithm 1.Low latency sending small messages 2.High throughput sending large messages 3.Robustness with low redundancy

Problem Settings Messages are pushed toward a queue with random probability  Frequent: split large message  Rare: small message Nodes may fail except root node (adjacent nodes can detect) Algorithm must …  send more number of messages in queue with fixed time  reduce time which one message is received by all nodes

Agenda Background Broadcast Algorithms and Problem Settings Our Approach  Graph Configuration  Algorithm Related Works Conclusion

Basic Idea Adapt to message arrival density dynamically High  Chain Low  Random Graph Flooding

Graph Configuration First, configure “Chain” with layer 2 network topology Topology Estimation [Shirai et. al 2006]

Redundant Edges Node n connects to n+2 mod N The graph is ※ Harary graph refers that the removal of any subset t-1 nodes will not disconnect the graph n1 n2 n3 n4 n5 n6 n7 n8 n9 n10 n11 n0 N = 12

Random Edges Each node makes k edges randomly The larger k is,  The higher robustness  The lower efficiency  The lower latency n1 n2 n3 n4 n5 n6 n7 n8 n9 n10 n11 n0

Algorithm received(m) { if (n+1 is dead) { send(m, n+2); connect(n+3 mod N); } else { send(m, n+1); } if (n-1 is dead) connect(n-2 mod N); for (I = 0; I < r; i++) { if (new message arrived) break; else send(m, random); } If no new message has come, sends the old message to other nodes If the next node is dead, sends to after-the-next node

Algorithm Behavior Low message density High message density Chain Flooding

Figure with LogP Model [Culler et. al 1993] Throughput = 1 / max(g) Latency = O(LogN) Chain Flooding MI gg P1 P2 P3 P4 P5 P6 P7 P1 P2 P3 P4 P5 P6 P7 L

Fault tolerance If process n+1 is dead  send the message to n+2  connect to n+3 If process n-1 is dead  connect to n-2 The algorithm tolerates one process fault at one time

Features of Algorithm Adapt to message size dynamically  Random graph flooding (small messages)  Single chain pipelining (large messages) Robustness  Redundancy depending on randomness  Fault tolerance by redundant edges

Agenda Background Broadcast Algorithms and Problem Settings Our Approach Related Works  STAR-MPI Conclusion

STAR-MPI [Faraj et al.] Change collective MPI algorithm dynamically Select best algorithm at run time MS (Mesure_Select) stage  Trying each algorithm and choose the best one MA (Monitor_Adapt) stage  Checking efficiency of the algorithm

STAR-MPI Targeting programs that run for a large number of iterations Algorithm1 Algorithm2 Algorithm3 Algorithm2 Choose best algorithm Check efficiency Algorithm2 MS stageMA stage time

Agenda Background Broadcast Algorithms and Problem Settings Our Approach Related Works Conclusion

Proposed the robust algorithm that adapts message size Future work  Implementation and evaluation  Deciding best (better) “k” with evaluation

Publications 1. 関谷岳史,田浦健次朗,近山隆.適応的並列 計算を支援するプロトコルの設計と正当性の 証明.並列/分散/協調処理に関するサマー ワークショップ( SWoPP2006 ), pp , 高知, 2006 年 7 月. 2. 関谷岳史, 田浦健次朗, 近山隆. 適応的並列計算 を支援するプロトコルの設計と正当性の証明. 先進的計算基盤システムシンポジウム (SACSIS2007). May ( 発表予定 )