Download presentation
Presentation is loading. Please wait.
Published byDerek Clarke Modified over 6 years ago
1
New Workflow Scheduling Techniques Presentation: Anirban Mandal
VGrADS Sep 2005
2
Outline Drawbacks of Workflow Scheduler v.0 Middle-Out Scheduling
Scheduling onto systems with batch queues Scheduling onto Abstract Resource Classes Premise: Automating good application level scheduling using performance models by taking advantage of vgES features
3
Top-Down Scheduling Top-Down For each heuristic
Until all components mapped Map available components to resources Select mapping with minimum makespan While all available components not mapped For each (component, resource) pair ECT(c,r) = rank(c,r) + EAT(r) End For each Run min-min, max-min and sufferage Store mapping End while Top-Down
4
Drawbacks of Workflow Scheduler v.0
Top-Down Workflow Scheduler suffers from Myopia Top-down traversal implies no look ahead Potential of poor mapping of critical steps for decisions taken higher up in the workflow Assumption of instant resource availability Many systems have batch queue front ends Have to wait before job starts Scaling problems Scheduling onto individual nodes pose scaling problems in large resource environments - Issue raised at the site-visit
5
Addressing the Drawbacks
We address the drawbacks as follows Myopia Middle-Out Scheduling Schedule critical step first and propagate mapping up and down Assumption of instant resource availability Incorporating batch queue wait times to take scheduling decisions (Joint work: Rice+UCSB) Scaling problems Using a two-level scheduling strategy - explicit resource pruning using vgDL/other means and then scheduling (Joint Work: Rice+UCSD+Hawaii) Scheduling onto abstract resource classes / clusters Ryan’s talk
6
Middle-Out Scheduling
Key step Top-Down Middle-Out
7
Middle-Out Scheduling: Results
Compared makespans for middle-out vs. top-down scheduling Resource set: 5 clusters [2 Opteron clusters and 3 Itanium clusters] 6 resource-topology scenarios : combination of Opteron clusters close, normal and far with Fast and Slow Itaniums - {(Opteron close, Fast Itanium), ..} Application: Actual EMAN DAG with 3 different communication-to-computation ratios (CCR): 0.1, 1 and 10 Used known performance model values for computational components Varied file sizes to obtain desired CCR for each pair of synchronization points
8
Middle-Out Scheduling: Results
CCR: 0.1 Computation 10 times the communication Fast Itanium makes top-down scheduler to “get stuck” at the Itanium clusters Since key computation step is scheduled on both the Opteron clusters, makespan depends on the Opteron connectivity In the slow Itanium case, the top-down scheduler “got lucky” Gain from middle-out scheduling not much
9
Middle-Out Scheduling: Results
CCR: 1 Equal communication and computation Fast Itanium makes top-down scheduler to “get stuck” at the Itanium clusters Since key computation step is scheduled on both the Opteron clusters, makespan depends on the Opteron connectivity In the slow Itanium case, the top-down scheduler “got lucky”
10
Middle-Out Scheduling: Results
CCR: 10 Communication 10 times the computation Fast Itanium makes top-down scheduler to “get stuck” at the Itanium clusters Since key computation step is scheduled on both the Opteron clusters, makespan depends on the Opteron connectivity In the slow Itanium case, the top-down scheduler “got lucky”
11
Middle-Out Scheduling: Results
With increasing communication, the middle-out scheduler performs better when the top-down scheduler gets stuck
12
Outline Drawbacks of Workflow Scheduler v.0 Middle-Out Scheduling
Scheduling onto systems with batch queues Scheduling onto Abstract Resource Classes
13
Scheduling onto Batch-Queue Systems
Incorporated Point-value predictions for batch queue wait times Slight modification to the top-down scheduler At every scheduling step, take into account the estimated time the job has to wait in the queue in the estimated completion time for the job [ECT(c,r) in the algorithm] Keep track of the queue wait times for each cluster and the number of nodes that correspond to the queue wait time With each mapping, update the estimated availability time [EAT in the algorithm] with the queue wait time, as required Joint work with Dan Nurmi and Rich Wolski
14
Scheduling onto Batch-Queue Systems: Example
Cluster 0 Cluster 1 Input DAG R0 R1 R2 R3 Queue Wait Time [Cluster 0] = 20 # nodes for this wt. time = 1 Queue Wait Time [Cluster 1] = 10 # nodes for this wt. time = 2 T
15
Scheduling onto Batch-Queue Systems: Example
Cluster 0 Cluster 1 Input DAG R0 R1 R2 R3 Queue Wait Time [Cluster 0] = 20 # nodes for this wt. time = 1 Queue Wait Time [Cluster 1] = 10 # nodes for this wt. time = 2 T
16
Outline Drawbacks of Workflow Scheduler v.0 Middle-Out Scheduling
Scheduling onto systems with batch queues Scheduling onto Abstract Resource Classes Addressing the scaling problem Modify scheduler to schedule onto clusters instead of individual nodes
17
Scheduling onto Clusters
Input: Workflow DAG with restricted structure - nodes at the same level do the same computation Set of available Clusters (numNodes, arch, CPU speed etc.) and inter-cluster network connectivity Per-node performance models for each cluster Output: Mapping: for each level the number of instances mapped to each cluster Objective: Minimize makespan at each step
18
Scheduling onto Clusters: Modeling
Abstract modeling of mapping problem for a DAG level Given: N instances M clusters r1..rM nodes/cluster t1..tM - rank value per node per cluster (incorporates both computation and communication) Aim: To find a partition (n1, n2,… nM) of N such that overall time is minimized with n1+n2+..nM = N Analytical solution: No ‘obvious’ solution because of the discrete nature
19
Scheduling onto Clusters
Iterative approach to solve the problem Addresses the scaling issue For each instance, i from 1 to N For each cluster, j from 1 to M Tentatively map i onto j Record makespan for each j by taking care of round(j) End For each Find cluster, p with minimum makespan increase Map i to p Update round(p), numMapped(p) O(#instances * #clusters)
20
Discussions…
21
Middle-Out Scheduling
Key step Top-Down Middle-Out
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.