A Stable Broadcast Algorithm Kei Takahashi Hideo Saito Takeshi Shibata Kenjiro Taura (The University of Tokyo, Japan) 1 CCGrid 2008 - Lyon, France.

Slides:



Advertisements
Similar presentations
February 20, Spatio-Temporal Bandwidth Reuse: A Centralized Scheduling Mechanism for Wireless Mesh Networks Mahbub Alam Prof. Choong Seon Hong.
Advertisements

Impact of Interference on Multi-hop Wireless Network Performance Kamal Jain, Jitu Padhye, Venkat Padmanabhan and Lili Qiu Microsoft Research Redmond.
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
Min Song 1, Yanxiao Zhao 1, Jun Wang 1, E. K. Park 2 1 Old Dominion University, USA 2 University of Missouri at Kansas City, USA IEEE ICC 2009 A High Throughput.
Multicast in Wireless Mesh Network Xuan (William) Zhang Xun Shi.
Bidding Protocols for Deploying Mobile Sensors Reporter: Po-Chung Shih Computer Science and Information Engineering Department Fu-Jen Catholic University.
JetStream: Achieving Predictable Gossip Dissemination by Leveraging Social Network Principles Jay A. Patel 1, Indranil Gupta 1, and Noshir Contractor 2.
1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.
Gossip Algorithms and Implementing a Cluster/Grid Information service MsSys Course Amar Lior and Barak Amnon.
1 Network Coding: Theory and Practice Apirath Limmanee Jacobs University.
Distributed Algorithms for Secure Multipath Routing
1 Complexity of Network Synchronization Raeda Naamnieh.
A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO.
The Organic Grid: Self- Organizing Computation on a Peer-to-Peer Network Presented by : Xuan Lin.
A Comparison of Layering and Stream Replication Video Multicast Schemes Taehyun Kim and Mostafa H. Ammar.
Network Coding for Large Scale Content Distribution Christos Gkantsidis Georgia Institute of Technology Pablo Rodriguez Microsoft Research IEEE INFOCOM.
Using Redundancy to Cope with Failures in a Delay Tolerant Network Sushant Jain, Michael Demmer, Rabin Patra, Kevin Fall Source:
WiOpt’03: Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks March 3-5, 2003, INRIA Sophia-Antipolis, France Session : Energy Efficiency.
On the Construction of Energy- Efficient Broadcast Tree with Hitch-hiking in Wireless Networks Source: 2004 International Performance Computing and Communications.
Design, Implementation, and Evaluation of Differentiated Caching Services Ying Lu, Tarek F. Abdelzaher, Avneesh Saxena IEEE TRASACTION ON PARALLEL AND.
Bluenet a New Scatternet Formation Scheme * Huseyin Ozgur Tan * Zifang Wang,Robert J.Thomas, Zygmunt Haas ECE Cornell Univ*
Online Data Gathering for Maximizing Network Lifetime in Sensor Networks IEEE transactions on Mobile Computing Weifa Liang, YuZhen Liu.
Mario Čagalj supervised by prof. Jean-Pierre Hubaux (EPFL-DSC-ICA) and prof. Christian Enz (EPFL-DE-LEG, CSEM) Wireless Sensor Networks:
1 Algorithms for Bandwidth Efficient Multicast Routing in Multi-channel Multi-radio Wireless Mesh Networks Hoang Lan Nguyen and Uyen Trang Nguyen Presenter:
Pipelined Two Step Iterative Matching Algorithms for CIOQ Crossbar Switches Deng Pan and Yuanyuan Yang State University of New York, Stony Brook.
Distributed Quality-of-Service Routing of Best Constrained Shortest Paths. Abdelhamid MELLOUK, Said HOCEINI, Farid BAGUENINE, Mustapha CHEURFA Computers.
New Protocols for Remote File Synchronization Based on Erasure Codes Utku Irmak Svilen Mihaylov Torsten Suel Polytechnic University.
Network Aware Resource Allocation in Distributed Clouds.
Computer Science Informed Content Delivery Across Adaptive Overlay Networks Overlay networks have emerged as a powerful and highly flexible method for.
Leader Election Algorithms for Mobile Ad Hoc Networks Presented by: Joseph Gunawan.
Minimal Hop Count Path Routing Algorithm for Mobile Sensor Networks Jae-Young Choi, Jun-Hui Lee, and Yeong-Jee Chung Dept. of Computer Engineering, College.
Rate-based Data Propagation in Sensor Networks Gurdip Singh and Sandeep Pujar Computing and Information Sciences Sanjoy Das Electrical and Computer Engineering.
Multicast Routing Algorithms n Multicast routing n Flooding and Spanning Tree n Forward Shortest Path algorithm n Reversed Path Forwarding (RPF) algorithms.
A Survey of Distributed Task Schedulers Kei Takahashi (M1)
Chi-Cheng Lin, Winona State University CS 313 Introduction to Computer Networking & Telecommunication Chapter 5 Network Layer.
1 Multicast Algorithms for Multi- Channel Wireless Mesh Networks Guokai Zeng, Bo Wang, Yong Ding, Li Xiao, Matt Mutka Michigan State University ICNP 2007.
KAIS T A Bidding Protocol for Deploying Mobile Sensors 발표자 : 권 영 진 Guiling Wang, Guohong Cao, Tom LaPorta The Pennsylvania State University IEEE, ICNP.
G-REMiT: An Algorithm for Building Energy Efficient Multicast Trees in Wireless Ad Hoc Networks Bin Wang and Sandeep K. S. Gupta NCA’03 speaker : Chi-Chih.
Example: Sorting on Distributed Computing Environment Apr 20,
1 A Distributed Architecture for Multimedia in Dynamic Wireless Networks By UCLA C.R. Lin and M. Gerla IEEE GLOBECOM'95.
1 A distributed Task Scheduler Optimizing Data Transfer Time Taura lab. Kei Takahashi (56428) Taura lab. Kei Takahashi (56428)
Load-Balancing Routing in Multichannel Hybrid Wireless Networks With Single Network Interface So, J.; Vaidya, N. H.; Vehicular Technology, IEEE Transactions.
Data Communications and Networking Chapter 11 Routing in Switched Networks References: Book Chapters 12.1, 12.3 Data and Computer Communications, 8th edition.
Paper # – 2009 A Comparison of Heterogeneous Video Multicast schemes: Layered encoding or Stream Replication Authors: Taehyun Kim and Mostafa H.
Simulation of the OLSRv2 Protocol First Report Presentation.
1 An Adaptive File Distribution Algorithm for Wide Area Network Takashi Hoshino, Kenjiro Taura, Takashi Chikayama University of Tokyo.
A P2P-Based Architecture for Secure Software Delivery Using Volunteer Assistance Purvi Shah, Jehan-François Pâris, Jeffrey Morgan and John Schettino IEEE.
Dzmitry Kliazovich University of Luxembourg, Luxembourg
Static Process Scheduling
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
A Framework for Reliable Routing in Mobile Ad Hoc Networks Zhenqiang Ye Srikanth V. Krishnamurthy Satish K. Tripathi.
High Performance LU Factorization for Non-dedicated Clusters Toshio Endo, Kenji Kaneda, Kenjiro Taura, Akinori Yonezawa (University of Tokyo) and the future.
Energy-Efficient Randomized Switching for Maximizing Lifetime in Tree- Based Wireless Sensor Networks Sk Kajal Arefin Imon, Adnan Khan, Mario Di Francesco,
Data Consolidation: A Task Scheduling and Data Migration Technique for Grid Networks Author: P. Kokkinos, K. Christodoulopoulos, A. Kretsis, and E. Varvarigos.
Partially Overlapped Channels Not Considered Harmful Arunesh Mishra, Vivek Shrivastava, Suman Banerjee, William Arbaugh (ACM SIGMetrics 2006) Slides adapted.
Low Latency Broadcast in Multirate Wireless Mesh Networks Chun Tung Chou, Archan Misra, Junaid Qadir Keon Jang
1 Low Latency Multimedia Broadcast in Multi-Rate Wireless Meshes Chun Tung Chou, Archan Misra Proc. 1st IEEE Workshop on Wireless Mesh Networks (WIMESH),
Introduction to Multiple-multicast Routing Chu-Fu Wang.
1 Parallel Datacube Construction: Algorithms, Theoretical Analysis, and Experimental Evaluation Ruoming Jin Ge Yang Gagan Agrawal The Ohio State University.
-1/16- Maximum Battery Life Routing to Support Ubiquitous Mobile Computing in Wireless Ad Hoc Networks C.-K. Toh, Georgia Institute of Technology IEEE.
Impact of Interference on Multi-hop Wireless Network Performance
2018/4/23 Dynamic Load-balanced Path Optimization in SDN-based Data Center Networks Author: Yuan-Liang Lan , Kuochen Wang and Yi-Huai Hsu Presenter: Yi-Hsien.
Hydra: Leveraging Functional Slicing for Efficient Distributed SDN Controllers Yiyang Chang, Ashkan Rezaei, Balajee Vamanan, Jahangir Hasan, Sanjay Rao.
DDR-Distributed Dynamic Routing Algorithm for Mobile Ad Hoc Networks
ECE 544: Traffic engineering (supplement)
A Study of Group-Tree Matching in Large Scale Group Communications
ElasticTree Michael Fruchtman.
Kevin Lee & Adam Piechowicz 10/10/2009
PRESENTATION COMPUTER NETWORKS
Minimizing Broadcast Latency and Redundancy in Ad Hoc Networks
Presentation transcript:

A Stable Broadcast Algorithm Kei Takahashi Hideo Saito Takeshi Shibata Kenjiro Taura (The University of Tokyo, Japan) 1 CCGrid Lyon, France

 To distribute the same, but large data to many nodes Ex: content delivery  Widely used in parallel processing 2 Broadcasting Large Messages Data Kei Takahashi, Hideo Saito, Takeshi Shibata and Kenjiro Taura

 Usually, in a broadcast transfer, the source can deliver much less data than a single transfer from the source 3 Problem of Broadcast S D S D Kei Takahashi, Hideo Saito, Takeshi Shibata and Kenjiro Taura

 Pipeline-manner transfers improve the performance  Even in a pipeline transfer, nodes with small bandwidth (slow nodes) may degrade receiving bandwidth of all other nodes 4 Problem of Slow Nodes  

1.Propose an idea of Stable Broadcast In a stable broadcast: Slow nodes never degrade receiving bandwidth to other nodes All nodes receive the maximum possible amount of data Contributions 5

2.Propose a stable broadcast algorithm for tree topologies Proved to be stable in a theoretical model Improve performances in general graph networks 3.In a real-machine experiment, our algorithm achieved 2.5 times the aggregate bandwidth than the previous algorithm (FPFR) Contributions (cont.) 6

 Introduction  Problem Settings  Related Work  Proposed Algorithm  Evaluation  Conclusion 7 Agenda Kei Takahashi, Hideo Saito, Takeshi Shibata and Kenjiro Taura

1.Target: large message broadcast 2.Only computational nodes handle messages 8 Problem Settings Kei Takahashi, Hideo Saito, Takeshi Shibata and Kenjiro Taura

3.Only bandwidth matters for large messages (Transfer time) = (Latency) + 4.Bandwidth is only limited by link capacities Assume that nodes and switches have enough processing throughput 9 Problem Settings (cont.) (Message Size) (Bandwidth) 50msec 1Gbps 1GB 99%

5.Bandwidth-annotated topologies are given in advance Bandwidth and topologies can be rapidly inferred - Shirai et al. A Fast Topology Inference - A building block for network-aware parallel computing. (HPDC 2007) - Naganuma et al. Improving Efficiency of Network Bandwidth Estimation Using Topology Information (SACSIS 2008, Tsukuba, Japan) 10 Problem Settings (cont.)

 Previous algorithms evaluated broadcast by completion time  However, it cannot evaluate the effect of slowly receiving nodes It is desirable that each node receives as much data as possible  Aggregate Bandwidth is a more reasonable evaluation criterion in many cases Evaluation of Broadcast 11

 All nodes receive maximum possible bandwidth Receiving bandwidth for each node does not lessen by adding other nodes to the broadcast 12 Definition of Stable Broadcast D0D1D2D D2 120 Single Transfer Broadcast

 Maximize aggregate bandwidth  Minimize completion time 13 Properties of Stable broadcast

 Introduction  Problem Settings  Related Work  Proposed Algorithm  Evaluation  Conclusion 14 Agenda Kei Takahashi, Hideo Saito, Takeshi Shibata and Kenjiro Taura

 Flat tree: The outgoing link from the source becomes a bottleneck  Random Pipeline: Some links used many times become bottlenecks  Depth-first Pipeline: Each link is only used once, but fast nodes suffer from slow nodes  Dijkstra: Fast nodes do not suffer from slow nodes, but some link are used many times Single-Tree Algorithms 15 Flat TreeRandom PipelineDijkstraDepth-First (FPFR) Kei Takahashi, Hideo Saito, Takeshi Shibata and Kenjiro Taura

 FPFR (Fast Parallel File Replication) has improved the aggregate bandwidth from algorithms that use only one tree  Idea: (1) Construct multiple spanning trees (2) Use these trees in parallel FPFR Algorithm [†] 16 [†] Izmailov et al. Fast Parallel File Replication in Data Grid. (GGF-10, March 2004.)

 Iteratively construct spanning trees Create a spanning tree ( Tn ) by tracing every destination Set the throughput ( Vn ) to the bottleneck bandwidth in Tn Subtract Vn from the remaining bandwidth of each link Second Tree (T2) 17 Tree constructions in FPFR Bottleneck First Tree (T1) V1 V2

 Each tree sends different fractions of data in parallel The proportion of data sent through each tree may be optimized by linear programming (Balanced Multicasting [†] ) 18 Data transfer with FPFR T1: Sends the former partT2: sends the latter part [†] den Burger et al. Balanced Multicasting: High-throughput Communication for Grid Applications (SC ‘2005) V1 V2

 In FPFR, slow nodes degrade receiving bandwidth to other nodes  For tree topologies, FPFR only outputs one depth-first pipeline, which cannot utilize the potential network performance Problems of FPFR 19 Bottleneck  

 Introduction  Problem Settings  Related Work  Our Algorithm  Evaluation  Conclusion 20 Agenda Kei Takahashi, Hideo Saito, Takeshi Shibata and Kenjiro Taura

 Modify FPFR algorithm Create both spanning trees and partial trees  Stable for tree topologies whose links have the same bandwidth in both directions 21 Our Algorithm V V Kei Takahashi, Hideo Saito, Takeshi Shibata and Kenjiro Taura

22 T3: Third Tree (Partial Tree) SABC T1: First Tree (Spanning) SABC T2: Second Tree (Partial Tree) SABC  Iteratively construct trees Create a tree Tn by tracing every destination Set the throughput Vn to the bottleneck in Tn Subtract Vn from the remaining capacities Tree Constructions V1 V2 V3 Throughput of T1

 Send data proportional to the tree throughput Vn  Example: Stage1: use T1, T2 and T3 Stage2: use T1 and T2 to send data previously sent by T3 Stage3: use T1 to send data previously sent by T2 Data Transfer 23 AB S C T1 T2 T3 AB S C T1 T2 AB S C T1 (V1) (V2) (V3)

1.Our algorithm is Stable for tree topologies (whose links have the same capacities in both directions) Every node receives maximum bandwidth 2.For any topology, it achieves greater aggregate bandwidth than the baseline algorithm (FPFR) Fully utilize link capacity by using partial trees 3.It has small calculation cost to create a broadcast plan Properties of Our Algorithm 24

 Introduction  Problem Settings  Related Work  Proposed Algorithm  Evaluation  Conclusion 25 Agenda Kei Takahashi, Hideo Saito, Takeshi Shibata and Kenjiro Taura

 Simulated 5 broadcast algorithms using a real topology  Compared the aggregate bandwidth of each method Many bandwidth distributions Broadcast to 10, 50, and 100 nodes 10 kinds of conditions (src, dest) (1) Simulations 26 … … …… 110 nodes81 nodes 36 nodes4 nodes Kei Takahashi, Hideo Saito, Takeshi Shibata and Kenjiro Taura

Compared Algorithms 27 Flat Tree Random Dijkstra Depth-First (FPFR) … and OURS

 Mixed two kinds of Links (100 and 1000) Vertical axis: speedup from FlatTree 40 times more than random, 3 times more than depth-first (FPFR) with 100 nodes Result of Simulations

 Tested 8 bandwidth distributions Uniform distribution ( ) Uniform distribution ( ) Mixed 100 and 1000 links Uniform distribution ( ) between switches (for each distribution, tested two conditions that bandwidth of both directions are the same and different)  Our method achieved the largest bandwidth in 7/8 cases Large improvement especially in large bandwidth variance In a uniform distribution ( ) and link bandwidth in two directions are different, Dijkstra achieved 2% more aggregate bandwidth Result of Simulations (cont.) 29

 Performed broadcasts in 4 clusters Number of destinations:10, 47 and 105 nodes Bandwidths of each link: (10M - 1Gbps)  Compared the aggregate bandwidth in 4 algorithms 1.Our algorithm 2.Depth-first (FPFR) 3.Dijkstra 4.Random (Best among 100 trials) (2) Real Machine Experiment 30

Theoretical Maximum Aggregate Bandwidth 31  Also, we calculated the theoretical maximum aggregate bandwidth The total of the receiving bandwidth in a case of separate direct transfer from the source to each destination D0D1D2D

 For 105 nodes broadcast, 2.5 times more bandwidth than the baseline algorithm DepthFirst (FPFR)  However, our algorithm stayed 50-70% the aggregate bandwidth compared to the theoretical maximum Computational nodes cannot fully utilize up/down network Evaluation of Aggregate Bandwidth

 Compared aggregate bandwidth of 9 nodes before/after adding one slow node Unlike DepthFirst(FPFR), existing nodes do not suffer from adding a slow node in our algorithm Achieved 1.6 times bandwidth than Dijkstra Evaluation of Stability 33 Slow

 Introduction  Problem Settings  Related Work  Our Algorithm  Evaluation  Conclusion 34 Agenda Kei Takahashi, Hideo Saito, Takeshi Shibata and Kenjiro Taura

 Introduced the notion of Stable Broadcast Slow nodes never degrade receiving bandwidth of fast nodes  Proposed a stable broadcast algorithm for tree topologies Theoretically proved 2.5 times the aggregate bandwidth in real machine experiments Confirmed speedup in simulations with many different conditions 35 Conclusion Kei Takahashi, Hideo Saito, Takeshi Shibata and Kenjiro Taura

 Algorithm that maximizes aggregate bandwidth in general graph topologies  Algorithm that changes relay schedule by detecting bandwidth fluctuations Future Work 36 Kei Takahashi, Hideo Saito, Takeshi Shibata and Kenjiro Taura

 Algorithm that maximizes aggregate bandwidth in general graph topologies  Algorithm that changes relay schedule by detecting bandwidth fluctuations Future work 37

All the graphs 38

 BitTorrent gradually improves the transfer schedule by adaptively choosing the parent node  Since relaying structure created by BitTorrent has many branches, these links may become bottlenecks 39 Broadcast with BitTorrent [†] [†] Wei et al. Scheduling Independent Tasks Sharing Large Data Distributed with BitTorrent. (In GRID ’05) Transfer tree snapshot Bottleneck Link

 Uniform distribution ( ) between switches Vertical axis: speedup from FlatTree 36 times more than FlatTree, 1.2 times more than DepthFirst (FPFR) for 100-nodes broadcast Simulation ~1000

 Trace all the destinations from the source Some links used by many transfers become bottlenecks 41 Topology-unaware pipeline Bottleneck

 Construct a depth-first pipeline by using topology information Avoid link sharing by using each link only once Minimize the completion time in a tree topology  Slow nodes degrade the performance of other nodes 42 Depth-first Pipeline Slow Node [†] Shirai et al. A Fast Topology Inference - A building block for network-aware parallel computing. (HPDC 2007)

 Construct a relaying structure in a greedy manner Add a node reachable in the maximum bandwidth one by one Effects of slow nodes are small  Some links may be used by many transfers, may become bottlenecks Dijkstra Algorithm [†] 43 [†] Wang et al. A novel data grid coherence protocol using pipeline-based aggressive copy method. (GPC, pages 484–495, 2007) Bottleneck Link