Outline Problem DiskRouter Overview Details Real life DiskRouters

Slides:



Advertisements
Similar presentations
NAGIOS AND CACTI NETWORK MANAGEMENT AND MONITORING SYSTEMS.
Advertisements

Cross-site data transfer on TeraGrid using GridFTP TeraGrid06 Institute User Introduction to TeraGrid June 12 th by Krishna Muriki
Current Testbed : 100 GE 2 sites (NERSC, ANL) with 3 nodes each. Each node with 4 x 10 GE NICs Measure various overheads from protocols and file sizes.
Esma Yildirim Department of Computer Engineering Fatih University Istanbul, Turkey DATACLOUD 2013.
Applying Data Grids to Support Distributed Data Management Storage Resource Broker Reagan W. Moore Ian Fisk Bing Zhu University of California, San Diego.
RDMA ENABLED WEB SERVER Rajat Sharma. Objective  To implement a Web Server serving HTTP client requests through RDMA replacing the traditional TCP/IP.
The Difficulties of Distributed Data Douglas Thain Condor Project University of Wisconsin
Grid IO APIs William Gropp Mathematics and Computer Science Division.
© 2009 IBM Corporation Statements of IBM future plans and directions are provided for information purposes only. Plans and direction are subject to change.
Lecture 1, 1Spring 2003, COM1337/3501Computer Communication Networks Rajmohan Rajaraman COM1337/3501 Textbook: Computer Networks: A Systems Approach, L.
Path selection Packet scheduling and multipath Sebastian Siikavirta and Antti aalto.
GridFTP Guy Warner, NeSC Training.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
How to Resolve Bottlenecks and Optimize your Virtual Environment Chris Chesley, Sr. Systems Engineer
CS An Overlay Routing Scheme For Moving Large Files Su Zhang Kai Xu.
Experiences in Design and Implementation of a High Performance Transport Protocol Yunhong Gu, Xinwei Hong, and Robert L. Grossman National Center for Data.
Globus Striped GridFTP Framework and Server Raj Kettimuthu, ANL and U. Chicago.
Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA.
Virtualization in the NCAR Mass Storage System Gene Harano National Center for Atmospheric Research Scientific Computing Division High Performance Systems.
Outline Overview Video Format Conversion Connection with An authentication Streaming media Transferring media.
Building a Parallel File System Simulator E Molina-Estolano, C Maltzahn, etc. UCSC Lab, UC Santa Cruz. Published in Journal of Physics, 2009.
Current Testbed : 100 GE 2 sites (NERSC, ANL) with 3 nodes each. Each node with 4 x 10 GE NICs Measure various overheads from protocols and file sizes.
Globus GridFTP and RFT: An Overview and New Features Raj Kettimuthu Argonne National Laboratory and The University of Chicago.
Example: Sorting on Distributed Computing Environment Apr 20,
STORK: Making Data Placement a First Class Citizen in the Grid Tevfik Kosar and Miron Livny University of Wisconsin-Madison March 25 th, 2004 Tokyo, Japan.
Latest news on JXTA and JuxMem-C/DIET Mathieu Jan GDS meeting, Rennes, 11 march 2005.
George Kola Computer Sciences Department University of Wisconsin-Madison DiskRouter: A Mechanism for High.
1 On Dynamic Parallelism Adjustment Mechanism for Data Transfer Protocol GridFTP Takeshi Itou, Hiroyuki Ohsaki Graduate School of Information Sci. & Tech.
LEGS: A WSRF Service to Estimate Latency between Arbitrary Hosts on the Internet R.Vijayprasanth 1, R. Kavithaa 2,3 and Raj Kettimuthu 2,3 1 Coimbatore.
Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison Managing and Scheduling Data.
GridFTP GUI: An Easy and Efficient Way to Transfer Data in Grid
STORK: Making Data Placement a First Class Citizen in the Grid Tevfik Kosar University of Wisconsin-Madison May 25 th, 2004 CERN.
GridFTP Richard Hopkins
Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop.
Problem-solving on large-scale clusters: theory and applications Lecture 4: GFS & Course Wrap-up.
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
A Fully Automated Fault- tolerant System for Distributed Video Processing and Off­site Replication George Kola, Tevfik Kosar and Miron Livny University.
Module 9 Planning and Implementing Monitoring and Maintenance.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
Bulk Data Transfer Activities We regard data transfers as “first class citizens,” just like computational jobs. We have transferred ~3 TB of DPOSS data.
Globus Data Storage Interface (DSI) - Enabling Easy Access to Grid Datasets Raj Kettimuthu, ANL and U. Chicago DIALOGUE Workshop August 2, 2005.
GridFTP Guy Warner, NeSC Training Team.
1 GridFTP and SRB Guy Warner Training, Outreach and Education Team, Edinburgh e-Science.
George Kola Computer Sciences Department University of Wisconsin-Madison Data Pipelines: Real Life Fully.
Reliable and Efficient Grid Data Placement using Stork and DiskRouter Tevfik Kosar University of Wisconsin-Madison April 15 th, 2004.
Run-time Adaptation of Grid Data Placement Jobs George Kola, Tevfik Kosar and Miron Livny Condor Project, University of Wisconsin.
BDTS and Its Evaluation on IGTMD link C. Chen, S. Soudan, M. Pasin, B. Chen, D. Divakaran, P. Primet CC-IN2P3, LIP ENS-Lyon
High Performance Storage System (HPSS) Jason Hick Mass Storage Group HEPiX October 26-30, 2009.
Scott Koranda, UWM & NCSA 20 November 2016www.griphyn.org Lightweight Replication of Heavyweight Data Scott Koranda University of Wisconsin-Milwaukee &
Simulation Production System
WP18, High-speed data recording Krzysztof Wrona, European XFEL
The demonstration of Lustre in EAST data system
Diskpool and cloud storage benchmarks used in IT-DSS
Windows Azure Migrating SQL Server Workloads
Network Requirements Javier Orellana
Enabling High Speed Data Transfer in High Energy Physics
CS703 - Advanced Operating Systems
Grid Canada Testbed using HEP applications
Part Three: Data Management
File Transfer Issues with TCP Acceleration with FileCatalyst
Brian L. Tierney, Dan Gunter
Outline What users want ? Data pipeline overview
STORK: A Scheduler for Data Placement Activities in Grid
Beyond FTP & hard drives: Accelerating LAN file transfers
An Empirical Evaluation of Wide-Area Internet Bottlenecks
Summer 2002 at SLAC Ajay Tirumala.
Single path routing in most of the servers
Implementation Plan system integration required for each iteration
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

DiskRouter: A Mechanism for High Performance Large Scale Data Transfers

Outline Problem DiskRouter Overview Details Real life DiskRouters Experiments

Problem SDSC to NCSA Bottleneck Bandwidth : 12.5 MBPS Latency 67 ms Transfer Rate got by applications for a 1GB file Scp : 0.66MBPS GridFTP(1 stream) : 0.85 MBPS GridFTP(10 streams) : 3.52 MBPS

DiskRouter Overview Mechanism to efficiently move large amounts of data (order of terabytes) Uses disk as a buffer to aid in large scale data transfers Application-level overlay network used for routing Ability to use higher level knowledge for data movement

A is transferring a large amount of data to B A Simple Case A B A is transferring a large amount of data to B

C is an intermediate node between A and B A Simple Case A B C DiskRouter C is an intermediate node between A and B

A Simple Case with DiskRouter B C With DiskRouter DiskRouter Without DiskRouter Improves performance when bandwidth fluctuation between A and C is independent of the bandwidth fluctuation between C and B

Data Mover/Distributed Cach e Destination DiskRouter Cloud Source Source writes to the closest DiskRouter and Destination receives it up from its closest DiskRouter

Outline Problem DiskRouter Overview Details Real life DiskRouters Experiments

Routing Between DiskRouters DiskRouter C DiskRouter A DiskRouter B C need not be in the path between A and B

Network Monitoring Uses ‘Pathrate’ for estimating network capacity Performs actual transfers for measurement Logging the data rate seen by different components Generate network interface stats on the machines involved in the transfers

Implementation Details Uses multiple sockets and explicitly sets TCP buffer sizes Overlaps disk I/O and socket I/O

Client Side Client library provided Applications can call library functions for network I/O Functions provided for common case file transfer (overlaps network I/O and disk I/O) Third party transfer support

Outline Problem DiskRouter Overview Details Real life DiskRouters Experiments

Real Life DiskRouters UW Milwaukee SDSC NCSA INFN Italy StarLight 90 Mbps 3.3 ms INFN Italy 411 Mbps 8 ms 518 Mbps 67 ms 94 Mbps 2.7 ms 30 Mbps 126.6 ms 90 Mbps 5.5 ms 514 Mbps 0.85 ms StarLight UW Madison MCS ANL

Outline Overview Details Real Life DiskRouters Experiments

Testing Multiroute UW Milwaukee StarLight UW Madison 90 Mbps 3.3 ms

Multiroute Improves Performance Total Data into Starlight Data From Milwaukee Megabits/second Data From Madison

SRB to Unitree Transfer Using Stork Data movement from SDSC to NCSA via Starlight (3 TB of data had to be moved) Integrated into Stork Found significant performance gain

Link between SDSC and NCSA 518 Mbps 67 ms 94 Mbps 2.7 ms StarLight

Starlight DiskRouter Stats Data Inflow Data Outflow Memory Used Disk Used

End-to-End Data Rate Seen by Stork(MBPS) vs. Time GridFTP vs DiskRouter End-to-End Data Rate Seen by Stork(MBPS) vs. Time DiskRouter GridFTP Megabytes/second

A Glimpse of Performance Transfer of 1 GB file from SDSC (SanDiego) to NCSA (Urbana-Champaign) Tool Transfer Rate Scp 0.66 MBPS GridFTP(1 stream) 0.85 MBPS GridFTP(10 streams) 3.52 MBPS DiskRouter 10.77 MBPS

Work In Progress Computation on data streams in the DiskRouter Ability to perform computation in the nodes attached locally to the DiskRouter Working together with Stork to add intelligence to data movement

Questions Thanks for listening