Presentation is loading. Please wait.

Presentation is loading. Please wait.

Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam.

Similar presentations


Presentation on theme: "Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam."— Presentation transcript:

1 of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam

2 of 21 2 Outline u Introduction u List Scheduling u Preliminaries u General Framework for LSSP u Complexity Analysis u Case Study u Extensions for LSDP u Conclusion

3 of 21 3 Introduction n Task Scheduling u Scheduling heuristics u Shared-memory - Distributed Memory u Bounded - unbounded number of processors u Multistep - singlestep methods u Duplicating - nonduplicating methods u Static - dynamic priorities

4 of 21 4 List Scheduling n LDSP and LSSP algorithms n LSSP (List Scheduling with Static Priorities); u Tasks are scheduled in the order of their previously computed priorities on the task’s “best” processor. u Best processor is... F The processor enabling the earliest start time, if the performance is the main concern F The processor becoming idle the earliest, if the speed is the main concern. n LSDP (List Scheduling with Dynamic Priorities); u Priorities for task-processor pairs u more complex

5 of 21 5 List Scheduling n Reducing LSSP time complexity u O(V log(V) + (E+V)P) => O(V log (P) + E) V = expected number of tasks E = expected number of dependencies P = number of processors 1. Considering only two processors 2. Maintaining partially-sorted task priority queue with a limited number of tasks

6 of 21 6 Preliminaries n Parallel programs u (DAG) G = (V,E) u Computation cost T w (t) u Communication cost T c (t, t’) u Communication and computation ratio (CCR) u The task graph width (W) EE EE E E E E E V VVV VVV V E

7 of 21 7 Preliminaries n Entry and exit tasks n The bottom level (T b ) of the task n Ready = parents scheduled n Start time T s (t) n Finish time T f (t) n Partial schedule n Processor ready time  T r (p) = max T f (t), t  V, P r (t)=p. n Processor becoming idle the earliest (p r )  T r (p r ) = min T r (p), p  P

8 of 21 8 Preliminaries n The last message arrival time  T m (t) = max { T f (t’) + T c (t’, t) } (t’, t)  E n The enabling processor p e (t); from which last message arrives n Effective message arrival time  T e (t,p) = max { T f (t’) + T c (t’, t) } (t’, t)  E, p t (t’) <> p n The start time of a ready task, once scheduled  T s (t, p) = max { T e (t, p), T r (p) }

9 of 21 9 General Framework for LSSP n General LSSP algorithm u Task’s priority computation, F O(E + V) u Task selection, F O(V log W) u Processor selection F O( (E + V) P)

10 of 2110 General Framework for LSSP n Processor Selection u selecting a processor 1. The enabling processor 2. Processor becoming idle first  T s (t) = max { T e (t, p), T r ( p ) }

11 of 2111 General Framework for LSSP n Lemma 1.  p <> p e (t) : T e (t, p) = T m (t) n Theorem 1. t is a ready task, one of the processors p  {p e (t), p r } satisfies  T s (t, p) = min T s (t, p x ), p x  P n O( (E + V) P )  O (V log (P) + E ) u O (E + V) to traverse the task graph u O (V log P) to maintain the processors sorted

12 of 2112 General Framework for LSSP n Task Selection u O (V log W) can be reduced by sorting only some of the tasks. u Task priority queue 1. A sorted list of size H 2. A FIFO list ( O ( 1 ) ) u decreases to O(V log H) F H needs to be adjusted F H = P is optimal ( O ( V log P ) )

13 of 2113 Complexity Analysis n Computing task priorities O ( E + V ) n Task selection O ( V log W )  O ( V log H ) for partially sorted priority queue  O ( V log (P) ) for queue of size P n Processor Selection O (E + V)  O (V log P) n Total complexity  O ( V ( log (W) + log (P) ) + E) fully sorted  O ( V ( log (P) + E ) partially sorted

14 of 2114 Case Study n MCP (Modified Critical Path) u The task having the highest bottom level has the highest priority n FCP (Fast Critical Path) n 3 Processors n Partially sorted priority queue of size 2 n 7 tasks 4 4 1 13 2 31 1 1 t 0 / 2 t 1 / 2t 2 / 2t 3 / 2 t 6 / 2t 5 / 3t 4 / 3 t 7 / 2 2

15 of 2115 Case Study 4 4 1 13 2 31 1 1 t 0 / 2 t 1 / 2t 2 / 2t 3 / 2 t 6 / 2t 5 / 3t 4 / 3 t 7 / 2 2

16 of 2116 Extensions for LSDP n Extend the approach to dynamic priorities  ETF : ready task starts the earliest  ERT : ready task finishes the earliest  DLS : task-processor having highest dynamic level u General formula   (t, p) =  ( t ) + max { T e (T, p), T r (p) } F  ETF ( t ) = 0 F  ERT ( t ) = T w ( t ) F  DLS ( t ) = - T b (t)

17 of 2117 Extensions for LSDP n EP case u on each processor, the tasks are sorted u the processors are sorted n non-EP case u the processor becoming idle first u if this is EP, it falls to the EP case

18 of 2118 Extensions for LSDP n 3 tries; u 1 for EP case, 1 for non-EP case n Task priority queues maintained; u P for EP case, 2 for non-EP case n Each task is added to 3 queues; u 1 for EP case, 2 for non-EP case n Processor queues; u 1 for EP case, 1 for non-EP case

19 of 2119 Complexity n Originally O ( W ( E + V ) P ) now O ( V (log (W) + log (P) ) + E ) can be further reduced using partially sorted priority queue. A size of P is required to maintain comparable performance O ( V log (P) + E )

20 of 2120 Conclusion n LSSP can be performed at a significantly lower cost... u Processor selection between only two processors; enabling processor or processor becoming idle first u Task selection, only a limited number of tasks are sorted n Using the extension of this method, LSDP complexity also can be reduced n For large program and processor dimensions, superior cost-performance trade-off.

21 of 2121 Thank You Questions?


Download ppt "Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam."

Similar presentations


Ads by Google