Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Characterization of Approaches to Parrallel Job Scheduling

Similar presentations


Presentation on theme: "A Characterization of Approaches to Parrallel Job Scheduling"— Presentation transcript:

1 A Characterization of Approaches to Parrallel Job Scheduling
Scheduling of parallel jobs in a heterogeneous grid environment Each site has a homogeneous cluster of processors, but processors at different sites have different speeds Much of the research on scheduling in heterogeneous systems has focussed on independent sequential jobs Research on parallel job scheduling has concentrated primarily on the homogeneous context The algorithms used for scheduling sequential tasks on heterogeneous systems are too computationally complex to extend to parallel jobs We extend the techniques used for parallel job scheduling in a homogeneous context to the heterogeneous context Problem Addressed A Characterization of Approaches to Parrallel Job Scheduling Simulation Environment Heterogeneous sites, with a homogeneous cluster of processors at each site 5000 job subset of either the 430 node Cornell Theory Center (CTC) trace or the 128 node IBM SP2 system at the San Diego Supercomputer Center (SDSC) NAS Parallel Benchmarks 2.0 used to model the heterogeneous runtimes 24.2 35.6 94.893 20.328* LU Class B (256 Nodes) 1.1* 1.8 2.2724 1.3147 MG Class B 17.2* 25.3 34.3 35.5 (8 Nodes) 17.7 16.3* 22.6 23.3 IS Class B IBM SP (P2SC 160 MHz) Cray T3E 900 IBM SP (WN/66) SGI Origin 2000 * Denotes best runtime for the job Gerald Sabin Rajkumar Kettimuthu Arun Rajan P Sadayappan Supported in part by Sandia National Laboratory Metrics Conservative vs. Arrgessive Jobs are processed in arrival order by the meta-scheduler Greedy assigns each job to the site with the lowest instantaneous load Greedy-MR (Multiple Requests) submits each job to all sites When the job starts at a site, the other instances are removed We have shown this mechanism to be effective in a homogenous context (HPDC ’02) However, only a slight improvement is seen in a heterogeneous context We use the following metrics for evaluating the proposed schemes Average Slowdown Average Turnaround Time Utilization Effective Utilization When network bandwidth is limited, jobs can be submitted to a smaller number of sites and the multi-reservation scheduler can still realize a substantial fraction of the benefits achievable from a scheduler that schedules each job at all sites, by making fewer reservations Use a more accurate approach to select which sites the job is submitted, instead of using the instantaneous load, we query the site to determine the earliest completion time When using completion time as the criteria, submitting to fewer sites can be almost as effective as submitting to all sites Restricted Multi-Site Reservations Backfilling Backfilling A later arriving job is allowed to leap frog previously queued jobs Aggressive vs. Conservative Processors Processors Time Time Processors Time Jobs are processed in arrival order by the meta-scheduler In a heterogeneous context, the site where the job starts the earliest may not be the best site In order to get the completion of a job at a particular site, conservative backfilling has to be employed at the local site Conservative performs better than aggressive in all case, quite the opposite of a homogenous context Improved backfilling caused by holes created due to the dynamic removal of replicated jobs at each site, and an increased number of jobs to attempt to backfill at each site Conservative Every job is given a reservation when it enters the system and a job is allowed to backfill only if it does not violate any of the previous reservations. EASY Only the job at the head of the queue is given a reservation and a job is allowed to backfill if it does not violate this reservation Explicitly take into account efficacy to improve the effective utilization Use efficacy as the priority order for the jobs in the reserved and idle queue Starvation free Effective utilization increases and turn around time decreases, in spite of the decreases in raw utilization Efficacy Based Queues Conclusions and Future Work Improvement in turn around time and effective utilization for parallel job scheduling in a heterogeneous grid environment have been demonstrated through simulation Next Steps: Incorporate these changes into the Silver/Maui Scheduler Deploy and evaluate the scheduler first on our research clusters, and then at the Ohio Super Computer Center


Download ppt "A Characterization of Approaches to Parrallel Job Scheduling"

Similar presentations


Ads by Google