Large Scale File Distribution Troy Raeder & Tanya Peters.

Slides:



Advertisements
Similar presentations
Two phase commit. Failures in a distributed system Consistency requires agreement among multiple servers –Is transaction X committed? –Have all servers.
Advertisements

Greedy Algorithms.
COS 461 Fall 1997 Routing COS 461 Fall 1997 Typical Structure.
1 Planetary Network Testbed Larry Peterson Princeton University.
NDN in Local Area Networks Junxiao Shi The University of Arizona
1 Aman Shaikh: June 02 UCSC INFOCOM 2002 Avoiding Instability during Graceful Shutdown of OSPF Aman Shaikh, UCSC Joint work with Rohit Dube, Xebeo Communications.
Jaringan Komputer Lanjut Packet Switching Network.
CS4550: Computer Networks II network layer basics 3 routing & congestion control.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Advanced Networking Wickus Nienaber Daniel Beech.
Sponsored by the National Science Foundation Systematic Experimentation Case Study: Virtual Router Failure Restoration Xuan Liu UMKC/GENI Project Office.
Gossip Scheduling for Periodic Streams in Ad-hoc WSNs Ercan Ucan, Nathanael Thompson, Indranil Gupta Department of Computer Science University of Illinois.
Overlay Networks + Internet routing has exhibited scalability - Internet routing is inefficient -Difficult to add intelligence to Internet Solution: Overlay.
1 Internet Networking Spring 2004 Tutorial 13 LSNAT - Load Sharing NAT (RFC 2391)
William Stallings Data and Computer Communications 7 th Edition (Selected slides used for lectures at Bina Nusantara University) Internetworking.
Effects of Applying Mobility Localization on Source Routing Algorithms for Mobile Ad Hoc Network Hridesh Rajan presented by Metin Tekkalmaz.
Two phase commit. What we’ve learnt so far Sequential consistency –All nodes agree on a total order of ops on a single object Crash recovery –An operation.
CMPE 150- Introduction to Computer Networks 1 CMPE 150 Fall 2005 Lecture 22 Introduction to Computer Networks.
1 PLuSH – Mesh Tree Fast and Robust Wide-Area Remote Execution Mikhail Afanasyev ‧ Jose Garcia ‧ Brian Lum.
Nicholas Sterling.  To create an efficient scheduling algorithm to dynamically start up and shut down servers. Based on: ◦ Current Server Load  If 30%
Strategies for Implementing Dynamic Load Sharing.
Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller Center for High Throughput Computing Department of Computer Sciences.
Chapter 23: ARP, ICMP, DHCP IS333 Spring 2015.
1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #12 LSNAT - Load Sharing NAT (RFC 2391)
Introduction to Computer Networks 09/23 Presenter: Fatemah Panahi.
How to Cluster both Servers and Storage W. Curtis Preston President The Storage Group.
VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT.
Connecting LANs, Backbone Networks, and Virtual LANs
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
LAN Overview (part 2) CSE 3213 Fall April 2017.
An Efficient Topology-Adaptive Membership Protocol for Large- Scale Cluster-Based Services Jingyu Zhou * §, Lingkun Chu*, Tao Yang* § * Ask Jeeves §University.
1 CO Games Development 1 Week 6 Introduction To Pathfinding + Crash and Turn + Breadth-first Search Gareth Bellaby.
EXPOSE GOOGLE APP ENGINE AS TASKTRACKER NODES AND DATA NODES.
De-Nian Young Ming-Syan Chen IEEE Transactions on Mobile Computing Slide content thanks in part to Yu-Hsun Chen, University of Taiwan.
Session objectives Discuss whether or not virtualization makes sense for Exchange 2013 Describe supportability of virtualization features Explain sizing.
A Routing Underlay for Overlay Networks Akihiro Nakao Larry Peterson Andy Bavier SIGCOMM’03 Reviewer: Jing lu.
MIS Week 4 Site:
1 Maximal Independent Set. 2 Independent Set (IS): In a graph G=(V,E), |V|=n, |E|=m, any set of nodes that are not adjacent.
The Network Layer.
© J. Liebeherr, All rights reserved 1 Multicast Routing.
Server Performance, Scaling, Reliability and Configuration Norman White.
The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03.
© J. Christopher Beck Lecture 6: Time/Cost Trade-off in Project Planning.
Lecture 3: Uninformed Search
Lecture 3 Classes, Structs, Enums Passing by reference and value Arrays.
McGraw-Hill©The McGraw-Hill Companies, Inc., 2004 Connecting Devices CORPORATE INSTITUTE OF SCIENCE & TECHNOLOGY, BHOPAL Department of Electronics and.
May 4, 2007 The Corradino Group SE Florida Model Users Group 1 SERPM-6 and Cube Cluster Corradino’s Initial Experience.
Static Process Scheduling
Author: Haoyu Song, Murali Kodialam, Fang Hao and T.V. Lakshman Publisher/Conf. : IEEE International Conference on Network Protocols (ICNP), 2009 Speaker:
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.
A Protocol for Tracking Mobile Targets using Sensor Networks H. Yang and B. Sikdar Department of Electrical, Computer and Systems Engineering Rensselaer.
Ben Miller.   A distributed algorithm is a type of parallel algorithm  They are designed to run on multiple interconnected processors  Separate parts.
Service Challenge Meeting “Review of Service Challenge 1” James Casey, IT-GD, CERN RAL, 26 January 2005.
MCast Errors and HV Adjustments Multicast Errors (seen on the DATA ERIS connection) have caused a disruption of a HV Adjustment due to a timeout (since.
Best-first search is a search algorithm which explores a graph by expanding the most promising node chosen according to a specified rule.
Day 13 Intro to MANs and WANs. MANs Cover a larger distance than LANs –Typically multiple buildings, office park Usually in the shape of a ring –Typically.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Cassandra Architecture.
A Fragmented Approach by Tim Micheletto. It is a way of having multiple cache servers handling data to perform a sort of load balancing It is also referred.
Condor on Dedicated Clusters Peter Couvares and Derek Wright Computer Sciences Department University of Wisconsin-Madison
Improved Algorithms for Network Topology Discovery
Network Load Balancing Functionality
Replication Middleware for Cloud Based Storage Service
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT -Sumanth Kandagatla Instructor: Prof. Yanqing Zhang Advanced Operating Systems (CSC 8320)
Distributed System Structures 16: Distributed Structures
COS 561: Advanced Computer Networks
Distributed computing deals with hardware
Outline System architecture Current work Experiments Next Steps
MapReduce: Simplified Data Processing on Large Clusters
Assignment #2 (Assignment due: Nov. 06, 2018) v1 v2 v3 v4 v5
Towards Predictable Datacenter Networks
Presentation transcript:

Large Scale File Distribution Troy Raeder & Tanya Peters

Distribute a large file to some number of machines useful to deploy new programs, distribute data Chirp_distribute was implemented last year and distribute files using a spanning tree Want to improve upon the existing methods to transfer files more efficiently. Choke points exist – multiple machines will all transfer files through a single router/switch Minimizing failures, including permissions errors The Problem

The Solution Take advantage of network topology – transfer across routers and switches as soon as possible, and then machines in the same cluster transfer to each other. Using traceroute, we build a graph that represents the network. This is done as needed and saved in a file which is loaded at run time. Access Control Lists: if we know a source machine doesn’t have permissions to transfer to some target, don’t even try

Network Topology

Picking a Target: Check if all clusters in the graph contain a copy of the file. If some cluster does not, we copy to it. Next, if some node within your cluster doesn't have the file, transfer to it. Otherwise, pick some other node that doesn't have the file. If a node is unable to transfer to nodes that don't have the file yet, it is removed from the list of possible sources.

Initial Results Current version of algorithm doesn’t always do better As expected, for smaller files and/or smaller number of hosts, overhead costs us For larger files and/or number of hosts, things like timeouts can wash out relative gains.

What's Next... Pick source & target more intelligently If initial attempt to copy from some cluster A to cluster B fails, don't try transferring between these two clusters again unless no other possibilities exist. Try and manage straggler transfers Dynamically set timeout for transferring a single copy: set to some multiple of max or average transfer time seen so far. The end result hopefully that we have a significant improvement over existing algorithm