Scheduling in Heterogeneous Grid Environments: The Effects of Data Migration Leonid Oliker, Hongzhang Shan Future Technology Group Lawrence Berkeley Research.

Scheduling in Heterogeneous Grid Environments: The Effects of Data Migration Leonid Oliker, Hongzhang Shan Future Technology Group Lawrence Berkeley Research Laboratory Warren Smith, Rupak Biswas NASA Advanced Supercomputing Division NASA Ames Research Center

Motivation Geographically distributed resources Difficult to schedule and manage efficiently – Autonomy (local scheduler) – Heterogeneity – Lack of perfect global information – Conflicting requirements between users and system administrators

Current Status Grid Initiatives – Global Grid Forum, NASA Information Power Grid, TeraGrid, Particle Physics Data Grid, E-Grid, LHC Challenge Grid Scheduling Services – Enabling multi-site application  Multi-Disciplinary Applications, Remote Visualization, Co-Scheduling, Distributed Data Mining, Parameter Studies – Job Migration  Improve Time-to-Solution  Avoid dependency on single resource provider  Optimize application mapping to target architecture  But what are the tradeoffs of data migration?

Our Contributions Interaction between grid scheduler and local scheduler Architecture: distributed, centralized, and ideal Real workloads Performance metrics Job migration overhead Superscheduler scalability Fault tolerance Multi-resource requirements

Distributed Architecture Local Env Grid Env Info Job Local Queue Compute Server Local Scheduler Grid Scheduler Middleware Grid Queue Job Communication Infrastructure PE PE … PE

Interaction between Grid and Local Schedulers If AWT <  : JR AWT & CRU Local Scheduler Grid Scheduler Middleware Grid Queue Job Local Queue Else : Considered for Migration Sender-Initiated (S-I) Receiver-Initiated (R-I) Symmetrically-Initiated (Sy-I) AWT: Approximate Wait Time CRU: Current Resource Utilization JR: Job Requirements

Sender-Initiated (S-I) HostPartner 1Partner 2 Job i Requirements ART 1 & CRU 1 Job i ART 2 & CRU 2 Job i Results i Select the machine with the smallest Approximate Response Time (ART), Break tie by CRU ART = Approx Wait Time + Estimated Run Time ART 0 & CRU 0 Job i Requirements

Receiver-Initiated (R-I) HostPartner 1Partner 2 Free Signal Job i Requirements Job i Requirements ART 1 & CRU 1 ART 2 & CRU 2 ART 0 & CRU 0 Job i Querying begins after receiving free signal

Symmetrically-Initiated (Sy-I) First, work in R-I mode Change to S-I mode if no machines volunteer Switch back to R-I after job is scheduled R-I Have Volunteers S-I No Volunteer After Time Period 

Centralized Architecture Advantages: Global View Disadvantages: Single point of failure, Scalability Web Portals Or Super Shell Jobs Grid Queue Grid Scheduler Middleware

Performance Metrics

Resource Configuration and Site Assignment Resource Configuration and Site Assignment Server ID Number of Nodes CPUs per Node CPU SpeedSite Locator 3 Sites6 Sites12 Sites S118416375 MHz000 S23054332 MHz111 S31448375 MHz232 S42564600 MHz103 S5322250 MHz224 S61284400 MHz255 S7642250 MHz256 S81448375 MHz127 S92564600 MHz048 S10322250 MHz019 S111284400 MHz0310 S12642250 MHz1411 Each local site network has peak bandwidth of 800Mb/s (gigabit Ethernet LAN) External network has 40Mb/s available point-to-point (high-performance WAN) Assume all data transfers share network equally (network contention is modeled) Assume performance linearly related to CPU speed Assume users pre-compiled code for each of the heterogeneous platforms

Job Workloads Workload ID Time Period (Start-End) # of Jobs Avg. Input Size (MB) W103/2002-05/200259,623312.7 W203/2002-05/200222,941300.8 W303/2002-05/200216,295305.0 W403/2002-05/20028,291237.3 W503/2002-05/200210,54328.9 W603/2002-05/20027,591236.1 W703/2002-05/20027,25186.5 W809/2002-11/200227,063293.0 W909/2002-11/200212,666328.3 W1009/2002-11/20025,23629.3 W1109/2002-11/200211,804226.5 W1209/2002-11/20026,91153.7 Systems located at Lawrence Berkeley Laboratory, NASA Ames Research Center, Lawrence Livermore Laboratory, San Diego Supercomputing Center Data volume info not available. Assume volume is correlated to volume of work B is number if Kbytes of each work unit (CPU * runtime) Our best estimate is B=1Kb for each CPU second of application execution

Scheduling Policy Large potential gain using grid superscheduler Reduced average wait time by 25X compared with local scheme! Sender-Initiated performance comparable to Centralized Inverse between migration (FOJM,FDVM) and timing (NAWT, NART) Very small fraction of response time spent moving data (DMOH) 12 Sites Workload B

Data Migration Sensitivity NAWT for 100B almost 8X than B, NART 50% higher DMOH increases to 28% and 44% for 10B and 100B respectively As B increases, data migration (FDVM) decreases due to increasing overhead FOJM inconsistent because it measures # of jobs NOT data volume Sender-I 12 Sites

Site Number Sensitivity 0.1B causes no site sensitivity, 10B has noticeable effect as sites decrease from 12 to 3: Decrease in time (NAWT, NART) due to increase in network bandwidth Increase in fraction of data volume migrated (FDVM) 40% Increase in fraction of response time moving data (DMOH) Sender-I

Communication Oblivious Scheduling For B10 If data migration cost is not considered in scheduling algorithm: NART increases 14X, 40X for 12Sites, 3Sites respectively NAWT increases 28X,43X for 12Sites, 3Sites respectively DMOH is over 96%! (only 3% for B set) 16% of all jobs blocked from executing waiting for data  Compared with practically 0% for communication-aware scheduling Sender-I

Increased Workload Sensitivity Grid scheduling 40% more jobs, compared with non-grid local scheme: No increase in time NAWT NART Weighted Utilization increased from 66% to 93% However there is fine line, when # of jobs increase by 45% NAWT grows 3.5X, NART grows 2.4X! Sender-I 12 Sites Workload B

Conclusions Studied impact of data migration, simulating: – Compute servers – Grouping of serves into sites – Inter-server networks Results showed huge benefits of grid scheduling S-I reduced average turnaround time by 60% compared with local approach, even in the presence of input/output data migration Algorithm can execute 40% more jobs in grid environment and deliver same turnaround times as non-grid scenario For large data files, critical to consider migration overhead – 43X increase in NART using communication-oblivious scheduling

Future Work Superscheduling scalability: – Resource discovery – Fault tolerance Multi-resource requirements Architectural heterogeneity Practical deployment issues

Scheduling in Heterogeneous Grid Environments: The Effects of Data Migration Leonid Oliker, Hongzhang Shan Future Technology Group Lawrence Berkeley Research.

Similar presentations

Presentation on theme: "Scheduling in Heterogeneous Grid Environments: The Effects of Data Migration Leonid Oliker, Hongzhang Shan Future Technology Group Lawrence Berkeley Research."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Scheduling in Heterogeneous Grid Environments: The Effects of Data Migration Leonid Oliker, Hongzhang Shan Future Technology Group Lawrence Berkeley Research.

Similar presentations

Presentation on theme: "Scheduling in Heterogeneous Grid Environments: The Effects of Data Migration Leonid Oliker, Hongzhang Shan Future Technology Group Lawrence Berkeley Research."— Presentation transcript:

Similar presentations

About project

Feedback