Download presentation
Presentation is loading. Please wait.
Published byMyrtle Hardy Modified over 9 years ago
1
Meta Scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References” slide
2
Evaluation of Job Scheduling Strategies for Grid Computing – Hamscher et. al (2000) Scheduling structures Centralized schedulers Centralized schedulers Single-site scheduling – a job does not span across sites Multi-site – the opposite Hierarchical structures - A central scheduler (metascheduler) for global scheduling and local scheduling on individual sites Hierarchical structures - A central scheduler (metascheduler) for global scheduling and local scheduling on individual sites Decentralized scheduling – distributed schedulers interact, exchange information and submit jobs to remote systems Decentralized scheduling – distributed schedulers interact, exchange information and submit jobs to remote systems Direct communication – local scheduler directly contacts remote schedulers and transfers some of its jobs Communication via central job pool – jobs that cannot be immediately executed are pushed to a central pool, other local schedulers pull the jobs out of the pool
3
Various Scheduling Architectures
5
Multiple simultaneous requests – Subramani et. al (2002) Job Scheduling Representation
6
Metascheduler across MPPs Types Centralized Centralized A meta scheduler and local dispatchers Jobs submitted to meta scheduler Hierarchical Hierarchical Combination of central and local schedulers Jobs submitted to meta scheduler Meta scheduler sends job to the site for which earliest start time is expected Local schedulers can follow their own policies Distributed Distributed Each site has a metascheduler and a local scheduler Jobs submitted to local metascheduler Jobs can be transffered to sites with lowest load
7
Evaluation of schemes Centralized Hierarchical Distributed 1.Global knowledge of all resources – hence optimized schedules 2.Can act as a bottleneck for large number of resources and jobs 3.May take time to transfer jobs from meta scheduler to local dispatchers – need strategic position of meta scheduler 1.Medium level overhead 2.Sub optimal schedules 3.Still need strategic position of central scheduler 1.No bottleneck – workload evenly distributed 2.Needs all-to-all connections between MPPs
8
Experiments Experiments to evaluate slowdowns in the 3 schemes Based on actual trace from a supercomputer centre – 5000 job set 4 sites were simulated – 2 with the same load as trace, other 2 where run time was multiplied by 1.7 FCFS with EASY backfilling was used slowdown = (wait_time + run_time) / run_time 2 more schemes Independent – when local schedulers acted independently, i.e. sites are not connected Independent – when local schedulers acted independently, i.e. sites are not connected United – resources of all processors are combined to form a single site United – resources of all processors are combined to form a single site
9
Results
10
Observations 1. Centralized and hierarchical performed slightly better than united a. Compared to hierarchical, scheduling decisions have to be made for all jobs and all resources in united – overhead and hence wait time is high b. Comparing united and centralized. i.4 categories of jobs corresponding to 4 different combinations of 2 parameters – execution time (short, long) and number of resources requested (narrow, wide) ii.Usually larger number of long narrow jobs than short wide jobs iii.Why is centralized and hierarchical better than united? 2. Distributed performed poorly a.decision based on summary information may not yield good results b.Back filling dynamics are complex
11
Newly Proposed Models K-distributed model Distributed scheme where local metascheduler distributes jobs to k least loaded sites Distributed scheme where local metascheduler distributes jobs to k least loaded sites When job starts on a site, notification is sent to the local metascheduler which in turn asks the k-1 schedulers to dequeue When job starts on a site, notification is sent to the local metascheduler which in turn asks the k-1 schedulers to dequeue K-Dual queue model 2 queues are maintained at each site – one for local jobs and other for remote jobs 2 queues are maintained at each site – one for local jobs and other for remote jobs Remote jobs are executed only when they don’t affect the start times of the local jobs Remote jobs are executed only when they don’t affect the start times of the local jobs Local jobs are given priority during backfilling Local jobs are given priority during backfilling
12
Results – Benefits of new schemes 45% improvement15% improvement
13
Results – Usefulness of K-Dual scheme Grouping jobs submitted at lightly loaded sites and heavily loaded sites
14
References A taxonomy of scheduling in general-purpose distributed computing systems. IEEE Transactions on Software Engineering. Volume 14, Issue 2 (February 1988) Pages: 141 - 154 Year of Publication: 1988 Authors T. L. Casavant J. G. Kuhl Evaluation of Job-Scheduling Strategies for Grid ComputingSourceLecture Notes In Computer Science. Proceedings of the First IEEE/ACM International Workshop on Grid Computing. Pages: 191 - 202 Year of Publication: 2000 ISBN:3-540-41403-7. Volker Hamscher Uwe Schwiegelshohn Achim Streit Ramin Yahyapour "Distributed Job Scheduling on Computational Grids using Multiple Simultaneous Requests" Vijay Subramani, Rajkumar Kettimuthu, Srividya Srinivasan, P. Sadayappan, Proceedings of 11th IEEE Symposium on High Performance Distributed Computing (HPDC 2002), July 2002
15
Taxonomy of scheduling for distributed heterogeneous systems – Casavant and Kuhl (1988)
16
Taxonomy Local vs Global Local – scheduling processes to time slices on a single processor Local – scheduling processes to time slices on a single processor Global – deciding which processor should a job go to Global – deciding which processor should a job go to Approximate vs heuristic Approximate – stop when you find a “good” solution. Uses same formal computational model. The ability to succeed depends on. Approximate – stop when you find a “good” solution. Uses same formal computational model. The ability to succeed depends on. Availability of a function to evaluate a solution The time required to evaluate a solution The ability to judge according to some metric value Mechanism to intelligently prune the solution space Heuristics Heuristics Works on assumptions about the impact of “important” parameters Cannot quantize the assumption and the amount of impact all the times
17
Also… Flat characteristics Adaptive vs. non-adaptive Adaptive vs. non-adaptive Load balancing Load balancing Bidding – e.g. Condor Bidding – e.g. Condor Probabilistic – random searches Probabilistic – random searches One time assignment vs. dynamic reassignment One time assignment vs. dynamic reassignment
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.