Modeling and Optimizing Large-Scale Wide-Area Data Transfers

Slides:

Advertisements

Similar presentations

Cross-site data transfer on TeraGrid using GridFTP TeraGrid06 Institute User Introduction to TeraGrid June 12 th by Krishna Muriki

Advertisements

MINERVA: an automated resource provisioning tool for large-scale storage systems G. Alvarez, E. Borowsky, S. Go, T. Romer, R. Becker-Szendy, R. Golding,

Esma Yildirim Department of Computer Engineering Fatih University Istanbul, Turkey DATACLOUD 2013.

OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.

Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.

GridFTP Guy Warner, NeSC Training.

Dynamic and Decentralized Approaches for Optimal Allocation of Multiple Resources in Virtualized Data Centers Wei Chen, Samuel Hargrove, Heh Miao, Liang.

Descriptive Data Analysis of File Transfer Data Sudarshan Srinivasan Victor Hazlewood Gregory D. Peterson.

Software Reliability SEG3202 N. El Kadri.

Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA.

Building a Parallel File System Simulator E Molina-Estolano, C Maltzahn, etc. UCSC Lab, UC Santa Cruz. Published in Journal of Physics, 2009.

Euro-Par, A Resource Allocation Approach for Supporting Time-Critical Applications in Grid Environments Qian Zhu and Gagan Agrawal Department of.

Data transfer over the wide area network with a large round trip time H. Matsunaga, T. Isobe, T. Mashimo, H. Sakamoto, I. Ueda International Center for.

Example: Sorting on Distributed Computing Environment Apr 20,

EGEE is a project funded by the European Union under contract IST Bandwidth Measurements Loukik Kudarimoti Network Engineer, DANTE JRA4 Meeting,

Modeling and Adaptive Scheduling of Large-Scale Wide-Area Data Transfers Raj Kettimuthu Advisors: Gagan Agrawal, P. Sadayappan.

Energy Aware Consolidation for Cloud Computing Srikanaiah, Kansal, Zhao Usenix HotPower 2008.

HPDC 2013 Taming Massive Distributed Datasets: Data Sampling Using Bitmap Indices Yu Su*, Gagan Agrawal*, Jonathan Woodring # Kary Myers #, Joanne Wendelberger.

Active Sampling for Accelerated Learning of Performance Models Piyush Shivam, Shivnath Babu, Jeff Chase Duke University.

GridFTP GUI: An Easy and Efficient Way to Transfer Data in Grid

OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.

The Globus eXtensible Input/Output System (XIO): A protocol independent IO system for the Grid Bill Allcock, John Bresnahan, Raj Kettimuthu and Joe Link.

Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.

Author Utility-Based Scheduling for Bulk Data Transfers between Distributed Computing Facilities Xin Wang, Wei Tang, Raj Kettimuthu,

1.3 ON ENHANCING GridFTP AND GPFS PERFORMANCES A. Cavalli, C. Ciocca, L. dell’Agnello, T. Ferrari, D. Gregori, B. Martelli, A. Prosperini, P. Ricci, E.

GridFTP Guy Warner, NeSC Training Team.

1 Design and Implementation of a High-Performance Distributed Web Crawler Polytechnic University Vladislav Shkapenyuk, Torsten Suel 06/13/2006 석사 2 학기.

OPERATING SYSTEMS CS 3502 Fall 2017

Virtual memory.

Chapter 7. Classification and Prediction

WP18, High-speed data recording Krzysztof Wrona, European XFEL

CPU SCHEDULING.

Fast Pattern-Based Throughput Prediction for TCP Bulk Transfers

Parallel Data Laboratory, Carnegie Mellon University

Study course: “Computing clusters, grids and clouds” Andrey Y. Shevel

Deployment & Advanced Regular Testing Strategies

Chapter 12: Query Processing

ALICE Computing Upgrade Predrag Buncic

A Framework for Automatic Resource and Accuracy Management in A Cloud Environment Smita Vijayakumar.

Supporting Fault-Tolerance in Streaming Grid Applications

Grid Canada Testbed using HEP applications

Chapter 12: Automated data collection methods

1 Department of Engineering, 2 Department of Mathematics,

1 Department of Engineering, 2 Department of Mathematics,

Steve Zhang Armando Fox In collaboration with:

1 Department of Engineering, 2 Department of Mathematics,

Raj Kettimuthu, Gagan Agrawal, P. Sadayappan, Ian Foster

Raj Kettimuthu, Gayane Vardoyan, Gagan Agrawal, P

Smita Vijayakumar Qian Zhu Gagan Agrawal

Data Communication Networks

CPU SCHEDULING.

Conjoint Analysis.

File Storage and Indexing

Lecture 2 Part 3 CPU Scheduling

GATES: A Grid-Based Middleware for Processing Distributed Data Streams

Performance Evaluation of Computer Networks

Process Scheduling B.Ramamurthy 4/11/2019.

Process Scheduling B.Ramamurthy 4/7/2019.

Performance Evaluation of Computer Networks

The New Internet2 Network: Expected Uses and Application Communities

By Manish Jain and Constantinos Dovrolis 2003

CSE 550 Computer Network Design

Chapter-5 Traffic Engineering.

Samsung Austin Semiconductor

Outline System architecture Current work Experiments Next Steps

Retrieval Performance Evaluation - Measures

Evaluation of Objectivity/AMS on the Wide Area Network

L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher

Presentation transcript:

Modeling and Optimizing Large-Scale Wide-Area Data Transfers Raj Kettimuthu, Gayane Vardoyan, Gagan Agrawal, and P. Sadayappan

Exploding data volumes Astronomy Climate 2004: 36 TB 2012: 2,300 TB MACHO et al.: 1 TB Palomar: 3 TB 2MASS: 10 TB GALEX: 30 TB Sloan: 40 TB Pan-STARRS: 40,000 TB Genomics 105 increase in data volumes in 6 years 100,000 TB

Datasets must frequently be transported over WAN Data movement Datasets must frequently be transported over WAN Analysis, visualization, archival Data movement bandwidths not increasing at same rate as dataset sizes Major constraint for data-driven sciences File transfer - dominant data transfer mode GridFTP - widely used by scientific communities 1000s of servers deployed worldwide move >1 PB per day Characterize, control and optimize transfers Data movement bandwidth includes disk speeds, NIC, WAN rates. It is important not only to understand the characteristics of these transfers but also to be able to control and optimize them.

Globus implementation of GridFTP is widely used. High-performance, secure data transfer protocol optimized for high-bandwidth wide-area networks Based on FTP protocol - defines extensions for high-performance operation and security Globus implementation of GridFTP is widely used. Globus GridFTP servers support usage statistics collection Transfer type, size in bytes, start time of the transfer, transfer duration etc. are collected for each transfer

GridFTP usage log To understand trends associated with WAN transfers, we looked at the GridFTP usage logs for a 24-hour period for top 10 sites that transferred most data. The transfer patterns show a large variability—sometimes there are no transfers, and sometimes there are many concurrent transfers. This points to the fact that the overall load can vary substantially over time. While there is a substantial variation over the 24-hour period, there is more stability over a shorter period. This figure show the variance over the entire 24-hour period, the four disjoint 6-hour periods, the 24 disjoint 1-hour periods, and each disjoint 15-minute period. Variance drops significantly for shorter durations. Thus, one can measure the performance of data transfers at a certain time and get a good indicator of the load for the immediate future.

Parallelism vs concurrency in GridFTP Parallel File System Data Transfer Node at Site A Data Transfer Node at Site B Parallel File System GridFTP Server Process TCP Connection GridFTP Server Process TCP Connection TCP Connection TCP Connection GridFTP Server Process GridFTP Server Process TCP Connection TCP Connection TCP Connection GridFTP Server Process GridFTP Server Process TCP Connection TCP Connection Parallelism = 3 Concurrency = 3

Parallelism vs concurrency Concurrency turns out to be a more powerful control knob than is parallelism for increasing the throughput. For example, the average throughput for cc = 8 and p = 1 is >2.2 Gbps, whereas the average throughput for p = 8 and cc = 1 is <600 Mbps. As far as NIC and WAN connections are concerned, parallelism and concurrency work in the same way. The multiple processes used when concurrency is increased seem to help get better I/O performance.

Most large transfers between supercomputers Problem formulation Objective - control bandwidth allocation for transfer(s) from a source to the destination(s) Most large transfers between supercomputers Ability to both store and process large amounts of data Site heavily loaded, most bandwidth consumed by small number of sites Goal – develop simple model for GridFTP Source concurrency - total number of ongoing transfers between the endpoint A and all its major transfer endpoints Destination concurrency - total number of ongoing transfers between the endpoint A and the endpoint B External load - All other activities on the endpoints including transfers to other sites

Modeling throughput Linear models Models that consider only source and destination CC Separate model for each destination Data to train, validate models – load variation experiments Errors >15% for most cases Log models Y’ = a1X1 + a2X2 + … + akXk + b DT = a1*DC + a2*SC + b1 DT = a3 *DC/SC + b2 A linear model between several input variables X1,X2, … ,Xk and a target variable Y is Y’ = a1X1 + a2X2 + … + akXk + b. Y’ is the prediction of the observed value of Y for the corresponding values of Xi. Load variation experiments: we start with a baseline case (a case in which all of the destinations have the same concurrency), we continue by increasing destination concurrency for a destination by 1, we run the baseline case again, we increase destination concurrency for the next destination by 1, we run the baseline again, and so on. We used three-fifths of the data from our experiments for training and two-fifths of the data for validation. After a model is built by using the training data, it will not fit the training data perfectly. The resulting error rate is called the training error. The model is then validated by using the validation data; the resulting error rate is called the validation error. Some of the nonlinear dependencies (such as throughput saturation) between the terms could be captured through a model in the form of Y’ = X1a1 * X2a2 * …* Xkak * 2b DT = SCa4 *DCa5 * 2b3 log(DT)=a4*log(SC) + a5*log(DC) + b3

Modeling throughput Log model better than linear models, still high errors Model based on just SC and DC too simplistic Incorporate external load External load - network, disk, and CPU activities outside transfers How to measure the external load? How to include external load in model(s)? Table shows the training and validation errors for the log model and the best one among the non-log models. We can see that the log-based model is clearly better: the training and validation errors went down in every single case and up to 27%. Still the relative error rate is around 15% in many cases.

External load Transfers stable over short duration but vary widely over entire day Multiple training data – same SC, DC - different days & times Throughput differences for same SC, DC attributed to difference in external load Three different functions for external load (EL) EL1=T −AT, T - throughput for transfer t, AT - average throughput of all transfers with same SC, DC as t EL2=T−MT, MT - max throughput with same SC, DC as t EL3 = T/MT

Models with external load DT = a6*DC + a7*SC + a8*EL + b4 Linear ELa11 if EL>0 |EL|(−a11) otherwise Log DT = SCa9 * DCa10 * AEL{a11} * 2b5 AEL{a11} = Note that relative error rates for all destinations are less than 10% with the log + EL3 model and less than or equal to 15% for all the models.

Calculating external load in practice DT = a6*DC + a7*SC + a8*EL + b4 Given Control Unknown Unlike SC and DC, external load is unknown Multiple data points with same SC, DC used to train models In practice, may not be any recent transfers with same SC, DC Some recent transfers, no substantial change in external load over few minutes Most recent transfer’s load as current load Average load of transfers in past 30 minutes as current load Average load in the past 30 minutes with error correction

Recent transfers load with error correction Previous Transfer Method DT = a6*DC + a7*SC + a8*EL + b4 Known Compute Recent Transfers Method Transfers in past 30 minutes Recent Transfers with Error Correction DT = a6*DC + a7*SC + a8*EL + b4 + e Historic transfers

Applying models to control bandwidth Experimental setup: DTNs at 5 XSEDE sites (Source: TACC, Destinations: PSC, NCAR, NICS, Indiana, SDSC) Goal – control bandwidth allocation to destinations when source is saturated Models express throughput in terms of SC, DC, and EL Given target throughput, determine DC to achieve target Often more than one destination transfer data, SC is also unknown. Limit DC to 20 to narrow search space Even then, large number of possible DC combinations (20n) Heuristics to limit search space to (SCmax – ND + 1)

Experiments Ratio experiments – allocate available bandwidth at source to destinations using predefined ratio Achieve specific fraction of bandwidth for each destination Four ratio combinations Factoring experiments – increase destination’s throughput by a factor when source is saturated Bandwidth increase because of certain priorities Four models/methods (log EL1/EL3 models and RT/RTEC methods) were used Effective in predicting the throughputs 83.6% of the errors are below 15%, and 65.5% of them are below 10% Ratio combinations were picked based on the maximum throughputs that can be independently achieved by these destinations in various tests.

Results – Ratio experiments Ratios are 4:5:6:8:9 for Kraken, Mason, Blacklight, Gordon, and Yellowstone. Concurrencies picked by Algorithm were {1,3,3,1,1}. Model: log with EL1. Method: RTEC Ratios are 4:5:6:8:9 for Kraken, Mason, Blacklight, Gordon, and Yellowstone. Concurrencies picked by Algorithm were {1,4,3,1,1}. Model: log with EL3. Method: RT

Results – Factoring experiments Increasing Yellowstone’s baseline throughput by 1.5x. Concurrency picked by picked by Algorithm for Yellowstone was 3 Increasing Gordon’s baseline throughput by 2x. Concurrency picked by picked by Algorithm for Gordon was 5

Related work Several models for predicting behavior & finding optimal parallel TCP streams Uncongested networks, simulations Several studies developed models to find optimal streams, TCP buffer size for GridFTP Buffer size not needed with TCP autotuning Major difference - attempt to model GridFTP throughput based on end-to-end behavior End-system load, destinations’ capabilities, concurrent transfers Many studies on bandwidth allocation at router Our focus is application-level control

Summary Understand performance of WAN transfers Control bandwidth allocation at FTP level Transfers between major supercomputing centers Concurrency powerful than parallelism Models to help control bandwidth allocation Log models that combine total source CC, destination CC, and a measure of external load are effective Methods that utilize both recent and historical experimental data better at estimating external load

Questions