K-Depth Look-ahead Task Scheduling in Network of Heterogeneous Processors Namyoon Woo and Heon Y. Yeom School of Computer Science and Engineering Seoul National University, Korea {nywoo, yeom}@dcslab.snu.ac.kr
List Scheduling Heurstic Introduction (1) Problem Definition Input Task precedence graph (Directed weighted acyclic graph) Processor-network graph Objective Minimizing the overall task execution time. Satisfying the precedence order of tasks. Before the run time. NP-Complete problem List Scheduling Heurstic It is know as Cost-effective heuristic
Introduction (2) : List Scheduling (3) (1) Time T4 T1 T2 T0 T1 T3 T4 T2 T0 T3 T0 T2 T1 T3 (2) T4 T0 P0 P1 P3 P2 T3 P0 P1 P3 P2 P0 P1 P3 P2
“Earilist Start Time” (EST) Earliest Finish Time” (HEFT) Related Works (1) “Earilist Start Time” (EST) Homogeneous Processing “Heterogeneous Earliest Finish Time” (HEFT) [topcuoglu99HCS] Heterogeneous Processing Tx Ti Ty Tz … Tx Tx Ty Ty Ti Ti Ty Ty Ty Ty P0 P1 P2 P0 P1 P2
“Bubble Scheduling and Allocation” (BSA) Related Works (2) “Bubble Scheduling and Allocation” (BSA) [kwok2000CC] Tx Tx Tx Tx Tx Ti Ti Ti Ti Ty Ty Ti Ti Ty Ty Ty Tz Tz Tz Ty Tz Tz Tz Tz P0 P1 P2 P3 P0 P1 P2 P3 P0 P1 P2 P3 pivot pivot pivot
Heterogeneous Network Links “Successor’s Expected Start Time” (SEST) Motivation (1) Heterogeneous Network Links “Successor’s Expected Start Time” (SEST) T0 e1 e2 e3 e4 e5 Tx Ty Ti Ty Ty P0 T0 e1 e3 e2 e4 e5 P1 ? Tz P0 P1 P2
Clustering k successive tasks. Motivation (2) Clustering k successive tasks. T0 P0 P1 P2 T0 T3 T2 T4 T5 T6 T2 T3 K-depth T4 T’ T5 T6
k-Depth Look-ahead Heuristic ID of Task X ID of Processor K Predefined Depth w’(i,x) Heterogeneous exe. Time of task i on Processor x h’x Average network cost of Processor x c(i) Average weight of out-edges from task i SUCC(Ti) A set of Task I’s successor tasks NB(Px) A set of neighbor processors of processor x
k-DLA Scheduling Heuristic List the tasks in the pre-defined order while the list is not empty do Select the first task Ti and remove it from the list. For all Px, calculate est(i,x) + ebl(i,x,k). Select Px which gives the minimum value of the sum Schedule Ti on Px end while
Experimental Environment Directed acyclic graphs Random Graph # of tasks (t ) : 50~900 # of edges = from 2t to 5t Real Application Stencil / LU-Decomposition / Laplace Transform # of tasks : over 2000. Processor network architecture 16 nodes –Ring / Mesh / Fully Connected Network Variables Heterogeneous Factor (HF) : 5, 10, 20, 40 Communication to Computation Ratio (CCR) : 0.1, 1, 10.0
Metrics for the Performance Comparison Metircs Normalized Schedule Length (NSL) Schedule Length / the weight of tasks on critical path NSL shows how close to the optimum the scheduling result is. Running Time The cost of the scheduling heuristic itself Used Processor The tendency or locality of task-processor mapping Heuristics BSA, HEFT, k-DLA (k=1, 5, infinite)
Results (1) : Number of Tasks (CCR=1.0, HF=20) Ring Mesh Clique BSA HEFT 1-DLA 5-DLA -DLA
Results (2) : CCR (n=500, HF=20) Ring Mesh Clique BSA HEFT 1-DLA 5-DLA -DLA
Results (3) : Scheduling Time (CCR=1.0, HF=20) Ring Mesh Clique BSA HEFT 1-DLA 5-DLA -DLA
Results (4) : # of scheduled processors Ring Mesh Clique BSA HEFT 1-DLA 5-DLA -DLA
Results (5) : Conventional graph (CCR=1.0, HF=20) LU 64 Stencil Laplace
Analysis and Conclusions Low High CCR 1-DLA -DLA (except in clique) HF Network Connectivity -DLA 1-DLA or HEFT The DLA heuristic with large k is suitable for the heterogeneous computing system where the network resource is expensive. We can adjust the value k according to the characteristic of a given computing system.