1 Job Scheduling for Grid Computing on Metacomputers Keqin Li Proceedings of the 19th IEEE International Parallel and Distributed Procession Symposium.

1 Job Scheduling for Grid Computing on Metacomputers Keqin Li Proceedings of the 19th IEEE International Parallel and Distributed Procession Symposium (IPDPS’05)

2 Outline Introduction The Scheduling Model A Communication Cost Model Scheduling Algorithms Worst-Case Performance Analysis Experimental Data

3 Introduction 1 A metacomputer is a network of computational resources linked by software in such a way that they can be used as easily as a single computer. A metacomputer is able to support distributed supercomputing applications by combining multiple high-speed high-capacity resources on a computational grid into a single, virtual distributed supercomputer.

4 Introduction 2 The most significant result of the paper is that by using any initial order of jobs and any processor allocation algorithm, the list scheduling algorithm can achieve worst-case performance bound with Notation: p is the maximum size of an individual machine P is the total size of a metacomputer s is minimum job size with s ≥ p α is the ratio of the communication bandwidth within a parallel machine to the communication bandwidth of a network β is the fraction of the communication time in the jobs

5 Introduction The Scheduling Model A Communication Cost Model Scheduling Algorithms Worst-Case Performance Analysis Experimental Data

6 A metacomputer is specified as M = (P 1, P 2,..., P m ), where P j, 1 ≤ j ≤ m, is the name as well as the size (i.e., the number of processors) of a parallel machine. Let P = P 1 +P 2 +…+P m denote the total number of processors. The m machines are connected by a LAN, MAN, WAN, or the Internet. A job J is specified as (s, t), where s is the size of J (i.e., the number of processors required to execute J) and t is J’s execution time. The cost of J is the product st. Given a metacomputer M and a list of jobs L = (J 1, J 2,..., J n ), where J i = (s i, t i ), 1 ≤ i ≤ n, we are interested in scheduling the n jobs on M.

7 A schedule of a job J i = (s i, t i ) is τ i is the starting time of J i J i is divided into r i subjobs J i,1, J i,2,..., J i,ri, of sizes s i,1, s i,2,..., s i,ri, respectively, with s i = s i,1 + s i,2 + … + s i,ri The subjob J i,k is executed on P jk by using s i,k processors, for all 1 ≤ k ≤ r i

9 s i processors allocated to J i communicate with each other during the execution of J i. Communication time between two processors residing on different machines connected by a LAN, MAN, WAN, or the Internet is significantly longer than that on the same machine. The communication cost model takes both inter-machine and intra- machine communications into consideration. The execution time t i is divided into two components, t i = t i,comp + t i,comm Each processor on P jk needs to communicate with the s i,k processors on P jk and the s i − s i,k processors on P jk’ with k’ ≠ k. t * I,k, the execution time of the subjob J i,k on P jk, as

11 The execution time of job J i is t * I = max(t * I,1, t * i,2, …, t * I,ri ) we call t * I the effective execution time of job J i. The above measure of extra communication time among processors on different machines discourages division of a job into small subjobs.

12 Our job scheduling problem for grid computing on metacomputers can be formally defined as follows: given a metacomputer M = (P 1, P 2,..., P m ) and a list of jobs L = (J 1, J 2,..., J n ), where J i = (s i, t i ), 1 ≤ i ≤ n, find a schedule ψ of L, ψ = (ψ 1, ψ 2,..., ψ n ), with ψ i = (τ i, (P j1, s i,1 ), (P j2, si,2 ),..., (P jri, s i,ri )), where J i is executed during the time interval [τ i, τ i +t * i ] by using s i,k processors on P jk for all 1 ≤ k ≤ ri, such that the total execution time of L on M, is minimized.

13 When α = 1, that is, extra communication time over a LAN, MAN, WAN, or the Internet is not a concern, the above scheduling problem is equivalent to the problem of scheduling independent parallel tasks in multiprocessors, which is NP-hard even when all tasks are sequential.

15 A complete description of the list scheduling (LS) algorithm is given in the next slide. There is a choice on the initial order of the jobs in L. Four ordering strategies:  Largest Job First (LJF) – Jobs are arranged such that s 1 ≥ s 2 ≥…≥ s n  Longest Time First (LTF) – Jobs are arranged such that t 1 ≥ t 2 ≥…≥ t n  Largest Cost First (LCF) – Jobs are arranged such that s 1 t 1 ≥ s 2 t 2 ≥…≥ s n t n.  Unordered (U) – Jobs are arranged in any order.

16 The number of available processors P’ j on machine P j is dynamically maintained. The total number of available processors is P’ = P’ 1 + P’ 2 + · · · + P’ m

19 Each job scheduling algorithm needs to use a processor allocation algorithm to find resources in a metacomputer. Several processor allocation algorithms have been proposed, including Naive, LMF (largest machine first), SMF (smallest machine first), and MEET (minimum effective execution time).

21 Let A(L) be the length of a schedule produced by algorithm A for a list L of jobs, and OPT(L) be the length of an optimal schedule of L. We say that algorithm A achieves worst-case performance bound B if A(L)/OPT(L) ≤ B for all L

22 Let t * i,LS be the effective execution time of a job J i in an LS schedule. Assume that all the n jobs are executed during the time interval [0, LS(L)]. Let J i be a job which is finished at time LS(L). It is clear that before J i is scheduled at time LS(L) − t * i,LS, there are no s i processors available; otherwise, J i should be scheduled earlier. That is, during the time interval [0, LS(L)−t * i,LS ], the number of busy processors is at least P − s i + 1. During the time interval [LS(L)−t * i,LS, LS(L)], the number of busy processors is at least s i. Define effective cost of L in an LS schedule as Then, we have

23 No matter which processor allocation algorithm is used, always have The effective execution time of J i in an optimal schedule is Thus, we get where It is clear that φ i is an increasing function of s i, which is minimized when s i = s. Hence, we have where

24 => Since => The right hand side of the above inequality is minimized when =>

25 => The right hand side of the above inequality is a decreasing function of S i, which is maximized when S i = s.

26 Theorem. If P j ≤ p for all 1 ≤ j ≤ m, and s i ≥ s for all 1 ≤ i ≤ n, where p ≤ s, then algorithm LS can achieve worst-case performance bound where The above performance bound is independent of the initial order of L and the processor allocation algorithm.

27 Corollary. If a metacomputer only contains sequential machines, i.e., p = 1, communication heterogeneity vanishes and the worst- case performance bound in the theorem becomes

1 Job Scheduling for Grid Computing on Metacomputers Keqin Li Proceedings of the 19th IEEE International Parallel and Distributed Procession Symposium.

Similar presentations

Presentation on theme: "1 Job Scheduling for Grid Computing on Metacomputers Keqin Li Proceedings of the 19th IEEE International Parallel and Distributed Procession Symposium."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Job Scheduling for Grid Computing on Metacomputers Keqin Li Proceedings of the 19th IEEE International Parallel and Distributed Procession Symposium.

Similar presentations

Presentation on theme: "1 Job Scheduling for Grid Computing on Metacomputers Keqin Li Proceedings of the 19th IEEE International Parallel and Distributed Procession Symposium."— Presentation transcript:

Similar presentations

About project

Feedback