of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam
of 21 2 Outline u Introduction u List Scheduling u Preliminaries u General Framework for LSSP u Complexity Analysis u Case Study u Extensions for LSDP u Conclusion
of 21 3 Introduction n Task Scheduling u Scheduling heuristics u Shared-memory - Distributed Memory u Bounded - unbounded number of processors u Multistep - singlestep methods u Duplicating - nonduplicating methods u Static - dynamic priorities
of 21 4 List Scheduling n LDSP and LSSP algorithms n LSSP (List Scheduling with Static Priorities); u Tasks are scheduled in the order of their previously computed priorities on the task’s “best” processor. u Best processor is... F The processor enabling the earliest start time, if the performance is the main concern F The processor becoming idle the earliest, if the speed is the main concern. n LSDP (List Scheduling with Dynamic Priorities); u Priorities for task-processor pairs u more complex
of 21 5 List Scheduling n Reducing LSSP time complexity u O(V log(V) + (E+V)P) => O(V log (P) + E) V = expected number of tasks E = expected number of dependencies P = number of processors 1. Considering only two processors 2. Maintaining partially-sorted task priority queue with a limited number of tasks
of 21 6 Preliminaries n Parallel programs u (DAG) G = (V,E) u Computation cost T w (t) u Communication cost T c (t, t’) u Communication and computation ratio (CCR) u The task graph width (W) EE EE E E E E E V VVV VVV V E
of 21 7 Preliminaries n Entry and exit tasks n The bottom level (T b ) of the task n Ready = parents scheduled n Start time T s (t) n Finish time T f (t) n Partial schedule n Processor ready time T r (p) = max T f (t), t V, P r (t)=p. n Processor becoming idle the earliest (p r ) T r (p r ) = min T r (p), p P
of 21 8 Preliminaries n The last message arrival time T m (t) = max { T f (t’) + T c (t’, t) } (t’, t) E n The enabling processor p e (t); from which last message arrives n Effective message arrival time T e (t,p) = max { T f (t’) + T c (t’, t) } (t’, t) E, p t (t’) <> p n The start time of a ready task, once scheduled T s (t, p) = max { T e (t, p), T r (p) }
of 21 9 General Framework for LSSP n General LSSP algorithm u Task’s priority computation, F O(E + V) u Task selection, F O(V log W) u Processor selection F O( (E + V) P)
of 2110 General Framework for LSSP n Processor Selection u selecting a processor 1. The enabling processor 2. Processor becoming idle first T s (t) = max { T e (t, p), T r ( p ) }
of 2111 General Framework for LSSP n Lemma 1. p <> p e (t) : T e (t, p) = T m (t) n Theorem 1. t is a ready task, one of the processors p {p e (t), p r } satisfies T s (t, p) = min T s (t, p x ), p x P n O( (E + V) P ) O (V log (P) + E ) u O (E + V) to traverse the task graph u O (V log P) to maintain the processors sorted
of 2112 General Framework for LSSP n Task Selection u O (V log W) can be reduced by sorting only some of the tasks. u Task priority queue 1. A sorted list of size H 2. A FIFO list ( O ( 1 ) ) u decreases to O(V log H) F H needs to be adjusted F H = P is optimal ( O ( V log P ) )
of 2113 Complexity Analysis n Computing task priorities O ( E + V ) n Task selection O ( V log W ) O ( V log H ) for partially sorted priority queue O ( V log (P) ) for queue of size P n Processor Selection O (E + V) O (V log P) n Total complexity O ( V ( log (W) + log (P) ) + E) fully sorted O ( V ( log (P) + E ) partially sorted
of 2114 Case Study n MCP (Modified Critical Path) u The task having the highest bottom level has the highest priority n FCP (Fast Critical Path) n 3 Processors n Partially sorted priority queue of size 2 n 7 tasks t 0 / 2 t 1 / 2t 2 / 2t 3 / 2 t 6 / 2t 5 / 3t 4 / 3 t 7 / 2 2
of 2115 Case Study t 0 / 2 t 1 / 2t 2 / 2t 3 / 2 t 6 / 2t 5 / 3t 4 / 3 t 7 / 2 2
of 2116 Extensions for LSDP n Extend the approach to dynamic priorities ETF : ready task starts the earliest ERT : ready task finishes the earliest DLS : task-processor having highest dynamic level u General formula (t, p) = ( t ) + max { T e (T, p), T r (p) } F ETF ( t ) = 0 F ERT ( t ) = T w ( t ) F DLS ( t ) = - T b (t)
of 2117 Extensions for LSDP n EP case u on each processor, the tasks are sorted u the processors are sorted n non-EP case u the processor becoming idle first u if this is EP, it falls to the EP case
of 2118 Extensions for LSDP n 3 tries; u 1 for EP case, 1 for non-EP case n Task priority queues maintained; u P for EP case, 2 for non-EP case n Each task is added to 3 queues; u 1 for EP case, 2 for non-EP case n Processor queues; u 1 for EP case, 1 for non-EP case
of 2119 Complexity n Originally O ( W ( E + V ) P ) now O ( V (log (W) + log (P) ) + E ) can be further reduced using partially sorted priority queue. A size of P is required to maintain comparable performance O ( V log (P) + E )
of 2120 Conclusion n LSSP can be performed at a significantly lower cost... u Processor selection between only two processors; enabling processor or processor becoming idle first u Task selection, only a limited number of tasks are sorted n Using the extension of this method, LSDP complexity also can be reduced n For large program and processor dimensions, superior cost-performance trade-off.
of 2121 Thank You Questions?