1 An Adaptive File Distribution Algorithm for Wide Area Network Takashi Hoshino, Kenjiro Taura, Takashi Chikayama University of Tokyo.

Slides:



Advertisements
Similar presentations
Peer-to-Peer Infrastructure and Applications Andrew Herbert Microsoft Research, Cambridge
Advertisements

A DISTRIBUTED CSMA ALGORITHM FOR THROUGHPUT AND UTILITY MAXIMIZATION IN WIRELESS NETWORKS.
Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.
Windows® Deployment Services
SDN + Storage.
Multicast in Wireless Mesh Network Xuan (William) Zhang Xun Shi.
Esma Yildirim Department of Computer Engineering Fatih University Istanbul, Turkey DATACLOUD 2013.
COMMA: Coordinating the Migration of Multi-tier applications 1 Jie Zheng* T.S Eugene Ng* Kunwadee Sripanidkulchai† Zhaolei Liu* *Rice University, USA †NECTEC,
Content  Overview of Computer Networks (Wireless and Wired)  IP Address, MAC Address and Workgroups  LAN Setup and Creating Workgroup  Concept on.
SCAN: A Dynamic, Scalable, and Efficient Content Distribution Network Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy,
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Katz, Stoica F04 EECS 122 Introduction to Computer Networks (Fall 2003) Network simulator 2 (ns-2) Department of Electrical Engineering and Computer Sciences.
Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP.
Network Coding for Large Scale Content Distribution Christos Gkantsidis Georgia Institute of Technology Pablo Rodriguez Microsoft Research IEEE INFOCOM.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
Peer-Assisted Content Distribution Networks: Techniques and Challenges Pei Cao Stanford University.
Peer-to-Peer Based Multimedia Distribution Service Zhe Xiang, Qian Zhang, Wenwu Zhu, Zhensheng Zhang IEEE Transactions on Multimedia, Vol. 6, No. 2, April.
Application Layer Multicast
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Sections 8.1 – 8.5)
CS218 – Final Project A “Small-Scale” Application- Level Multicast Tree Protocol Jason Lee, Lih Chen & Prabash Nanayakkara Tutor: Li Lao.
Implementing ISA Server Caching. Caching Overview ISA Server supports caching as a way to improve the speed of retrieving information from the Internet.
Workload Management Massimo Sgaravatto INFN Padova.
Tesseract A 4D Network Control Plane
COM S 614 Advanced Systems Novel Communications U-Net and Active Messages.
UCSC 1 Aman ShaikhICNP 2003 An Efficient Algorithm for OSPF Subnet Aggregation ICNP 2003 Aman Shaikh Dongmei Wang, Guangzhi Li, Jennifer Yates, Charles.
Chapter 7 Configuring & Managing Distributed File System
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
1 Chapter Overview Creating Sites and Subnets Configuring Intersite Replication Troubleshooting Active Directory Replication.
A Framework for Collective Personalized Communication Laxmikant V. Kale, Sameer Kumar, Krishnan Varadarajan.
Design and implementation  Main features  Socket API  No need to modify existing applications/middleware  Overlay network  FW/NAT traversal.
Windows ® Deployment Services Infrastructure Planning and Design Published: February 2008 Updated: January 2012.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Cloud MapReduce : a MapReduce Implementation on top of a Cloud Operating System Speaker : 童耀民 MA1G Authors: Huan Liu, Dan Orban Accenture.
New Protocols for Remote File Synchronization Based on Erasure Codes Utku Irmak Svilen Mihaylov Torsten Suel Polytechnic University.
Grid Data Management A network of computers forming prototype grids currently operate across Britain and the rest of the world, working on the data challenges.
Overcast: Reliable Multicasting with an Overlay Network CS294 Paul Burstein 9/15/2003.
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.
De-Nian Young Ming-Syan Chen IEEE Transactions on Mobile Computing Slide content thanks in part to Yu-Hsun Chen, University of Taiwan.
Module 11: Implementing ISA Server 2004 Enterprise Edition.
The Alternative Larry Moore. 5 Nodes and Variant Input File Sizes Hadoop Alternative.
ROOT and Federated Data Stores What Features We Would Like Fons Rademakers CERN CC-IN2P3, Nov, 2011, Lyon, France.
HDFS (Hadoop Distributed File System) Taejoong Chung, MMLAB.
 Load balancing is the process of distributing a workload evenly throughout a group or cluster of computers to maximize throughput.  This means that.
Virtual Private Grid (VPG) : A Command Shell for Utilizing Remote Machines Efficiently Kenji Kaneda, Kenjiro Taura, Akinori Yonezawa Department of Computer.
Design an MPI collective communication scheme A collective communication involves a group of processes. –Assumption: Collective operation is realized based.
March 2001 CBCB The Holy Grail: Media on Demand over Multicast Doron Rajwan CTO Bandwiz.
MPI implementation – collective communication MPI_Bcast implementation.
CS 6401 Overlay Networks Outline Overlay networks overview Routing overlays Resilient Overlay Networks Content Distribution Networks.
MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.
A Bandwidth Scheduling Algorithm Based on Minimum Interference Traffic in Mesh Mode Xu-Yajing, Li-ZhiTao, Zhong-XiuFang and Xu-HuiMin International Conference.
09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.
1 Low Latency Multimedia Broadcast in Multi-Rate Wireless Meshes Chun Tung Chou, Archan Misra Proc. 1st IEEE Workshop on Wireless Mesh Networks (WIMESH),
A Stable Broadcast Algorithm Kei Takahashi Hideo Saito Takeshi Shibata Kenjiro Taura (The University of Tokyo, Japan) 1 CCGrid Lyon, France.
CIS 825 Lecture 9. Minimum Spanning tree construction Each node is a subtree/fragment by itself. Select the minimum outgoing edge of the fragment Send.
Module 11 Configuring and Managing Distributed File System.
Fault – Tolerant Distributed Multimedia Streaming Web Application By Nirvan Sagar – Srishti Ganjoo – Syed Shahbaaz Safir
Apache Ignite Compute Grid Research Corey Pentasuglia.
Tom Van Steenkiste Supervisor: Predrag Buncic
A Distributed Algorithm for Minimum-Weight Spanning Trees
Grid Canada Testbed using HEP applications
湖南大学-信息科学与工程学院-计算机与科学系
Ch 4. The Evolution of Analytic Scalability
Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform
Distributed computing deals with hardware
Resource Allocation in a Middleware for Streaming Data
Dynamic Replica Placement for Scalable Content Delivery
Brad Karp UCL Computer Science
Parallel Programming in C with MPI and OpenMP
Optional Read Slides: Network Multicast
Presentation transcript:

1 An Adaptive File Distribution Algorithm for Wide Area Network Takashi Hoshino, Kenjiro Taura, Takashi Chikayama University of Tokyo

2 Background New environments for parallel and distributed computation  Clusters, cluster of clusters, GRID Offer scalability and good cost performance Setting up computation in such environments is complex, however  Install programs/data

3 Setting up computation in DS Often involves copying large programs/data to many nodes Manually copying large files is troublesome because:  faults occur easily  firewalls block (some) connections  transfers must be scheduled carefully for good performance

4 Contribution NetSync  A file replicator optimized for copying large data to many nodes in parallel (application-level) Features  Automatic load-balancing scalability  Self-stabilizing construction of transfer route fault-tolerant  Adaptive optimization of transfer route  No reliance on physical topology information

5 Outline What are efficient/inefficient transfer routes? Demo Algorithm  Base algorithm  Adaptive optimization Implementation Experiments Related work Summary and future work

6 Inefficient Transfer Routes Many inter-subnet/cluster transfer connections Many branches Node Subnet/cluster Data transfer line

7 What’s Wrong with Branches? Branches  share hardware capability of nodes themselves CPU power Disk performance NIC ability  enlarge possibilities of bottleneck CPU NIC DISK CPU NIC DISK No bottleneckBottleneck One childThree children 100Mbps x133Mbps x3

8 Efficient Transfer Route Minimum inter-subnet/cluster transfer connections No or minimum branches Node Subnet/cluster Data transfer line

9 Demo Playback of our experiment using logs A00 A01 B00A07 A06 A05 A04 A03 A02 B07 B06 B05 B04 B03 B02 B01 CXX Node Data flow (Parent-Child) A00 A01 B00 A07 A06 A05 A04 A03 A02 B07 B06 B05 B04 B03 B02 B01

10 System Overview A.dat(1GB) User Order(A.dat,1GB) A.dat(1GB)

11 Algorithm Simple base algorithm  Fault-tolerance, scalability, self-stabilization Add-on adaptive optimization heuristics  Well-adapted today’s typical network Very easy configuration  Only need information of (some) neighbors  Need no physical topology  Need no performance measurement Pseudo-code is described in our paper

12 Base Algorithm (1) Each node seeks a node to be its parent Pipeline transfer in whole nodes Fault leads to seeking new parent again 100 % 0% 25 % 50 % 25 % 50 % 75 % 50 % 75 % 50 % 75 % 100 % 75 % 100 %

13 Base Algorithm(2) Pseudo code (simplified) while(not has complete data) parent doesn’t exist  seek candidate if found candidate then ask candidate if it can be the parent OK  start to get data from the parent NG  seek candidate again end parent timed out  seek candidate again end

14 Base Algorithm (2) Child (has not its parent) side send ASK to candidate to be its parent recv OK  start getting data recv NG  seeks candidate again Parent (received ask message) side recv ASK from a node  if my offset > node’s offset and # of children < LIMIT_CHILDREN then send OK and start putting data else send NG end

15 Adaptive Optimization Two heuristics  NearParent  Tree2List

16 NearParent Heuristics NearParent: reduce "long" connections  Each node changes its parent to a closer node parent candidate self candidate parent

17 Tree2List Heuristics Tree2List: reduce branches  If the current parent is not closer than one of its siblings X, change its parent to X  A node which has more than one children suggests its children to change their parent to one of their siblings self X parent X self

18 How to measure closeness? Features  Throughput  Latency  Prefix of IP address A B C

19 Property of Heuristics (1) Assuming there is no firewall… 1. Minimum inter-cluster/subnet connections 2. All nodes connect each other as a list subnet/cluster

20 Property of Heuristics (2) If firewall blocks some connections… 1. Minimum inter-cluster/subnet connections 2.  N – 1 branches for N subnets (assume no firewalls inside a subnet) subnet/cluster Firewall

21 Property of Heuristics (2) If there is no firewall  Distribution tree becomes MST  Minimum inter-group connections with any scale  All nodes connect each other as a list subnet cluster

22 Property of Heuristics (3) Firewall subnet cluster If multiple levels of groups exist (subnets, clusters), it optimizes all levels simultaneously  Minimum inter-subnet edges  Minimum inter-cluster edges

23 Property of Heuristics (3) If multiple levels of groups exist (subnets, clusters), it optimizes all levels simultaneously  Minimum inter-subnet edges  Minimum inter-cluster edges subnet cluster

24 Implementation File replicator for a large data and many nodes written in Java Ability of detecting latency: about 1ms Usage:  Install and run NetSync in all nodes  Throw a file information to several nodes  Wait for finishing the replication Very simple usage!!!

25 Experiments Measure performance of our heuristics Distributed a file to many nodes  Compared completion time Environments  A single cluster  Multiple clusters

26 Experiment in a single cluster (1) Distributed 500MB from one node to other 16nodes in the cluster  Only NIC (100Mbps) can be bottleneck Compared two settings  Random Tree Only using base algorithm Limited # of children from 1 to 5.  Tree2List NearParent has no effect

27 Experiment in a single cluster (2) Fewer children, better performance Tree2List is very close to optimal Limit 1 is not scalable (using our base algorithm)

28 Experiment in multiple clusters (1) Distributed 300MB to over 150 nodes in seven clusters Heuristics on, off, and fixed manually optimized tree 1G1G 100M 1G1G 1G1G

29 Experiment in multiple clusters (2) Our heuristics is close to the ideal fixed tree

30 Related Work Application-level Multicast  Overcast[Jannotti], ALMI[Pendarakis], etc.  Aims to optimize bandwidth and latency Content Distribution Network (CDN)  Has roots in HTTP accelerator and HTTP proxy.  Aims to optimize latency and load-balancing. Our approach  Maximize throughput, even if sacrificing latency

31 Summary and Future Work We designed a simple algorithm  for copying large data to many nodes in parallel  with fault-tolerance, scalability, self- organization, and adaptive optimization Evaluations show our implementation is effective in real environment Future Work  Integration with searching for contents, or storage systems for distributed computing