Holding slide prior to starting show
Scheduling Parametric Jobs on the Grid Jonathan Giddy
Parametric computation Scientifically: –Study the behaviour of output variables against a range of different input scenarios Computationally: –Execute an application multiple times, each time with a different combination of input parameters
Why use the Grid? Parametric computations –Require high performance computational resources –Require large numbers of computational resources –Generate large amounts of concurrency –Generate uncoupled computations –Tolerate high latencies
Nimrod/G CostDeadline
Minimise Cost Increasing price Node 4 Node 3 Node 2 Node 1 Time Jobs Budget Cost
7 Minimise Time Increasing price Node 4 Node 3 Node 2 Node 1 Time Jobs Budget Cost Budget / Job
Globus 1.1 GRAM API int globus_gram_client_job_check( char *resource_manager_contact, const char *description, const float conf_percentage, globus_gram_client_time_t *estimate, globus_gram_client_time_t *interval) Note: This is not yet implemented This function returns an estimate of the time it would take for a job of the description provided to reach an ACTIVE state.
Historical profiling Examine characteristics of all jobs in queue against historical profiles in order to determine expected start time of a job Returns start time and error estimate Warren Smith, Ian T. Foster, Valerie E. Taylor: Predicting Application Run Times Using Historical Information. Job Scheduling Strategies for Parallel Processing Workshop (JSSPP) 1998:
Information Overload Too many variables: –Number of CPUs –CPU speed –Processor architecture –Operating system –Real memory –Disk speed –Bandwidth –Latency –Other users
Extrapolation of completion rate A B C 2 jobs/hr 3 jobs/hr 6 jobs/hr 1 hr2 hr
Time Average No. Processors 20 Hour deadline 15 hour deadline 10 hour deadline
Assumptions Compute time >> Network time All jobs are the same length on any particular resource Price of a resource is constant over time Not much wriggle room during the end- game –Both scheduling schemes push up against the limit that they’re not minimising –Heuristic nature of completion time
What we really want… Guaranteed completion time –globus_gram_client_job_check() with teeth –Requires scheduler to internally reserve space for job in advance Advance reservation –As above, but with external interface
And this too… A real grid economy –Incentive for providers to provide resources –Incentive for consumers to describe requirements accurately –Incentive for consumers to use resources judiciously –Price mechanism budget as a timely global information parameter universally understood enables trade-offs in making QoS decisions
A final point Optimising is really hard in a wide area network –Requires centralised decision maker –Information is missing –Information is not contemporaneous –Information is out-of-date
Scalable information …is slow to change Budget and deadline are (relatively) constant and can be propagated far and wide in a timely manner Slow information comes from specifying requirements in the real world Satisfying (instead of optimising) a requirement is relatively simple –A resource can so it does –A resource can’t so it doesn’t