Download presentation
Presentation is loading. Please wait.
Published byHolly Greer Modified over 9 years ago
1
Modeling and Adaptive Scheduling of Large-Scale Wide-Area Data Transfers Raj Kettimuthu Advisors: Gagan Agrawal, P. Sadayappan
2
Exploding data volumes 100,000 TB MACHO et al.: 1 TB Palomar: 3 TB 2MASS: 10 TB GALEX: 30 TB Sloan: 40 TB Pan-STARRS: 40,000 TB 2004: 36 TB 2014: 3,300 TB 10 5 increase in data volumes in 6 years AstronomyClimate Genomics
3
Data movement Data Transfer Node Storage
4
Current work Understand characteristics, control and optimize transfers Efficient scheduling of wide-area transfers Model – predict and control throughput –Characterize, identify key features –Data-driven modeling using experimental data Adaptive scheduling –Algorithm to minimize slowdown –Experimental evaluation using real transfer logs
5
High-performance, secure data transfer protocol optimized for high-bandwidth wide-area networks Parallel TCP streams, PKI security for authentication, integrity and encryption, checkpointing for transfer restarts Based on FTP protocol - defines extensions for high- performance operation and security Globus implementation of GridFTP is widely used. Globus GridFTP servers support usage statistics collection –Transfer type, size in bytes, start time of the transfer, transfer duration etc. are collected for each transfer GridFTP 5
6
GridFTP usage log
7
Parallelism vs concurrency in GridFTP Data Transfer Node at Site B Parallel File System Data Transfer Node at Site A Parallel File System Parallelism = 3 TCP Connection GridFTP Daemon GridFTP Daemon GridFTP Client 2811 GridFTP Server GridFTP Server GridFTP Server GridFTP Server TCP Connection Concurrency = 2 Control channel Control channel
8
Parallelism vs concurrency
9
Objective - control bandwidth allocation for transfer(s) from a source to the destination(s) Most large transfers between supercomputers –Ability to both store and process large amounts of data Site heavily loaded, most bandwidth consumed by small number of sites Goal – develop simple model for GridFTP –Source concurrency - total number of ongoing transfers between the endpoint A and all its major transfer endpoints –Destination concurrency - total number of ongoing transfers between the endpoint A and the endpoint B –External load - All other activities on the endpoints including transfers to other sites Model throughput and control bandwidth allocation
10
Modeling throughput Linear models Model dest throughput (DT) using source & destination CC Data to train, validate models – load variation experiments Errors >15% for most cases Log models Y’ = a 1 X 1 + a 2 X 2 + … + a k X k + b DT = a 1 *DC + a 2 *SC + b 1 DT = a 3 *DC/SC + b 2 log(DT)=a4*log(SC) + a5*log(DC) + b3
11
Modeling throughput Log model better than linear models, still high errors Model based on just SC and DC too simplistic Incorporate external load –External load - network, disk, and CPU activities outside transfers –How to measure the external load? –How to include external load in model(s)?
12
External load Multiple training data – same SC, DC - different days & times EL - Throughput differences for same SC, DC Three different functions for external load (EL) –EL1=T −AT, T - throughput for transfer t, AT - average throughput of all transfers with same SC, DC as t –EL2=T−MT, MT - max throughput with same SC, DC as t –EL3 = T/MT EL a11 if EL>0 |EL| (−a11) otherwise AEL{a11} = DT = a6*DC + a7*SC + a8*EL + b4 DT = SCa9 * DCa10 * AEL{a11} * 2b5 Linear Log
13
Models with external load DT = a6*DC + a7*SC + a8*EL + b4 PredictControllable Uncontrollable Unlike SC and DC, external load is uncontrollable Train models – multiple data points with same SC, DC In practice, some recent transfers possible but all combinations of SC, DC unlikely
14
Calculating external load in practice DT = a6*DC + a7*SC + a8*EL + b4 Known Compute Transfers in past 30 minutes DT = a6*DC + a7*SC + a8*EL + b4 + e Historic transfers Previous Transfer Method Recent Transfers Method Recent Transfers with Error Correction
15
Applying models to control bandwidth Find DC, SC to achieve target throughput Limit DC to 20 to narrow search space –Even then, large number of possible DC combinations (20 n ) SCmax (max source concurrency allowed) is the number of possible values for SC –Heuristics to limit search space to SCmax * #destinations DT = a6*DC + a7*SC + a8*EL + b4 PredictGiven Known (Compute w/ PT, RT or RTEC) DT = a6*DC + a7*SC + a8*EL + b4 Give n Compute Known (Compute w/ PT, RT or RTEC)
16
Experimental setup TACC NCAR SDSC Indiana NICS PSC
17
Experiments Ratio experiments – allocate available bandwidth at source to destinations using predefined ratio Available bandwidth at stampede is 9 Gbps 2:1:2:3:3 for Kraken, Mason, Blacklight, Gordon, Yellowstone Kraken = 2*9Gbps/(2+1+2+3+3) = 2*9Gbps/9 = 2Gbps Mason=1Gbps, Blacklight=2Gbps, Gordon=3Gbps, Yellowstone=3Gbps Kraken=2Gbps, Mason=1Gbps, Blacklight=2Gbps, Gordon=3Gbps, Yellowstone=3Gbps Kraken=3Gbps, Mason=X 1 Gbps, Blacklight=X 2 Gbps, Gordon=X 3 Gbps, Yellowstone=X 4 Gbps Factoring experiments – increase destination’s throughput by a factor when source is saturated
18
Results – Ratio experiments Ratios are 4:5:6:8:9 for Kraken, Mason, Blacklight, Gordon, and Yellowstone. Concurrencies picked by Algorithm were {1,3,3,1,1}. Model: log with EL1. Method: RTEC Ratios are 4:5:6:8:9 for Kraken, Mason, Blacklight, Gordon, and Yellowstone. Concurrencies picked by Algorithm were {1,4,3,1,1}. Model: log with EL3. Method: RT
19
Results – Factoring experiments Increasing Gordon’s baseline throughput by 2x. Concurrency picked by picked by Algorithm for Gordon was 5 Increasing Yellowstone’s baseline throughput by 1.5x. Concurrency picked by picked by Algorithm for Yellowstone was 3
20
Adaptive scheduling of data transfers Data Transfer Node Storage
21
Adaptive scheduling of data transfers
22
Bursty transfers opportunity for adaptive scheduling Goals - optimize throughput, improve response times Challenge – adaptive concurrency –Low load – increase CC (unsaturated destinations) to max. utilization –New requests queue or adjust ongoing transfer concurrency Data transfer scheduling analogous to parallel job scheduling? –Data transfers ≅ compute jobs. wide-area bandwidth ≅ compute resources, transfer concurrency ≅ job parallelism CPU, storage network different at source, destination Shared wide area network Scheduling wide-area data transfers challenging –Heterogenous resources, shared network, dynamic nature of load –Scheduling decisions not based on resource availability at one site
23
Metrics Turnaround time – time a job spends in the system: completion time - arrival time Job slowdown – factor slowed relative to the time on a unloaded system: turnaround time / processing time Bounded slowdown in parallel job scheduling Bounded slowdown for wide-area transfers Job priority for wide-area transfers
24
Scheduling algorithm Maximize resource utilization and reduce slowdown –Adaptively queue and adjust concurrency based on load Preemption/restart –State required is missing block information & No migration –Still overhead (auth, checkpoint restart), p-factor limits preemption Four key decision-making points –Upon task arrival – schedule or queue –If scheduled, what concurrency value? –When to preempt (and schedule a waiting job) –When to change concurrency of a running job Use both models and recent observed behavior –Models to predict throughput and determine concurrency value –5-second averages of observed throughput to determine saturation
25
Illustrative example Average turnaround time is 10.92 Average turnaround time for baseline is 12.04
26
Workload traces Traces from actual executions –Anonymized GridFTP usage statistics Busiest day from a 1 month period Busiest server log on that day Limit length of logs due to production environment Three 15-minute logs - 25%, 45%, and 60% load traces –“load” is total bytes transferred / max. that can be transferred Destination anonymized in logs –Weighted random split based on capacities
27
Experimental results – turnaround 60% load
28
Experimental results – worst case 60% load
29
Experimental results – 60% load improved baseline
30
Related work Several models for predicting behavior & finding optimal parallel TCP streams –Uncongested networks, simulations Many studies on bandwidth allocation at router –Our focus is application-level control Adaptive replica selection, algorithms to utilize multiple paths –Ability to control network path –Overlay networks Workflow schedulers - dependencies between computation and data movement Adaptive file transfer scheduling w/preemption in production environments not studied
31
Summary of current work Models for wide-area data transfer throughput in terms of few key parameters Log models that combine total source CC, destination CC, and a measure of external load are effective Methods that utilize both recent and historical experimental data better at estimating external load Adaptive scheduling algorithm to improve the overall user experience Evaluated it using real traces on a production system Significant improvements over the current state-of-the-art
32
Proposed work File transfers have different time constraints –Near real time to highly flexible Objective – account time requirements to improve overall user experience Consider 2 job types – batch and interactive –First, exploit relaxed deadlines of batch jobs –Next, exploit knowledge about future arrival times Finally, maximize utility value for jobs –Each job has a utility function
33
Batch jobs If deadline closer, batch jobs get highest priority –Scheduled with a concurrency of 2, no preemption Otherwise, batch jobs get lowest priority Interactive jobs measured by turnaround and slowdown, batch jobs measured by deadline satisfaction rate
34
Knowledge about future jobs T1 (d2) T2 (d1) T3 (d2) T1 (d2) T2 (d1 ) T3 (d2 ) 0 1 2 0 1 2 3 Wait queue Schedule A – no knowledge of future jobs 4 5 T1 (d2) T2 (d1) T3 (d2) 0 1 2 3 4 5 3 Schedule B – w/ knowledge of future jobs T1 – 1GB, T2 – 1GB Source – 1GB/s Destination d1 – 1GB/s Destination d2 – 0.5GB/s T3 – 0.5GB 0.5 1.0 Throughput in GB/s Time in Seconds 0.5 1.0 Throughput in GB/s Time in Seconds Average Slowdown is (1.5+1+2)/3 = 1.5 Average Slowdown is (1+2+1)/3 = 1.33
35
Utility based scheduling Both interactive and batch jobs have deadline Associated utility function –Impact of missing the deadline Decay – linear, exponential, step, or a combination Each transfer request R defined by tuple, R = (d,A,S,D,U) –d = destination, –A = arrival time of R, –S = size of the file to be transferred, –D = deadline of R, and –U = utility function of R. Objective – maximize aggregate utility value of jobs
36
Utility based scheduling Inverse of instantaneous utility value as priority Instantaneous utility value calculated as follows
37
Questions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.