VGrADS Tools Activities

Slides:



Advertisements
Similar presentations
Energy-efficient Task Scheduling in Heterogeneous Environment 2013/10/25.
Advertisements

© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
ECE 667 Synthesis and Verification of Digital Circuits
Hadi Goudarzi and Massoud Pedram
Scheduling in Distributed Systems Gurmeet Singh CS 599 Lecture.
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
Using Parallel Genetic Algorithm in a Predictive Job Scheduling
1 Transportation problem The transportation problem seeks the determination of a minimum cost transportation plan for a single commodity from a number.
ISE480 Sequencing and Scheduling Izmir University of Economics ISE Fall Semestre.
All Hands Meeting, 2006 Title: Grid Workflow Scheduling in WOSE (Workflow Optimisation Services for e- Science Applications) Authors: Yash Patel, Andrew.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
GridFlow: Workflow Management for Grid Computing Kavita Shinde.
Security-Driven Heuristics and A Fast Genetic Algorithm for Trusted Grid Job Scheduling Shanshan Song, Ricky Kwok, and Kai Hwang University of Southern.
1 of 14 1 / 18 An Approach to Incremental Design of Distributed Embedded Systems Paul Pop, Petru Eles, Traian Pop, Zebo Peng Department of Computer and.
Genetic Algorithm.
A Budget Constrained Scheduling of Workflow Applications on Utility Grids using Genetic Algorithms Jia Yu and Rajkumar Buyya Grid Computing and Distributed.
Network Aware Resource Allocation in Distributed Clouds.
Young Suk Moon Chair: Dr. Hans-Peter Bischof Reader: Dr. Gregor von Laszewski Observer: Dr. Minseok Kwon 1.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Tekin Bicer Gagan Agrawal 1.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.
Zorica Stanimirović Faculty of Mathematics, University of Belgrade
Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.
A Survey of Distributed Task Schedulers Kei Takahashi (M1)
Stochastic DAG Scheduling using Monte Carlo Approach Heterogeneous Computing Workshop (at IPDPS) 2012 Extended version: Elsevier JPDC (accepted July 2013,
1 Andreea Chis under the guidance of Frédéric Desprez and Eddy Caron Scheduling for a Climate Forecast Application ANR-05-CIGC-11.
Predicting Queue Waiting Time in Batch Controlled Systems Rich Wolski, Dan Nurmi, John Brevik, Graziano Obertelli Computer Science Department University.
Bi-Hadoop: Extending Hadoop To Improve Support For Binary-Input Applications Xiao Yu and Bo Hong School of Electrical and Computer Engineering Georgia.
1 Short Term Scheduling. 2  Planning horizon is short  Multiple unique jobs (tasks) with varying processing times and due dates  Multiple unique jobs.
1 Iterative Integer Programming Formulation for Robust Resource Allocation in Dynamic Real-Time Systems Sethavidh Gertphol and Viktor K. Prasanna University.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Static Process Scheduling
A Fast Genetic Algorithm Based Static Heuristic For Scheduling Independent Tasks on Heterogeneous Systems Gaurav Menghani Department of Computer Engineering,
Genetic algorithms for task scheduling problem J. Parallel Distrib. Comput. (2010) Fatma A. Omara, Mona M. Arafa 2016/3/111 Shang-Chi Wu.
Scheduling Strategies for Mapping Application Workflows Onto the Grid A. Mandal, K. Kennedy, C. Koelbel, G. Marin, J. Mellor- Crummey, B. Liu, L. Johnsson.
1 Comparative Study of two Genetic Algorithms Based Task Allocation Models in Distributed Computing System Oğuzhan TAŞ 2005.
Resource Optimization for Publisher/Subscriber-based Avionics Systems Institute for Software Integrated Systems Vanderbilt University Nashville, Tennessee.
1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.
- DAG Scheduling with Reliability - - GridSolve - - Fault Tolerance In Open MPI - Asim YarKhan, Zhiao Shi, Jack Dongarra VGrADS Workshop April 2007.
Scheduling Algorithms Performance Evaluation in Grid Environments R, Zhang, C. Koelbel, K. Kennedy.
Resource Characterization Rich Wolski, Dan Nurmi, and John Brevik Computer Science Department University of California, Santa Barbara VGrADS Site Visit.
Robust Task Scheduling in Non-deterministic Heterogeneous Computing Systems Zhiao Shi Asim YarKhan, Jack Dongarra Followed by GridSolve, FT-MPI, Open MPI.
Lessons from LEAD/VGrADS Demo Yang-suk Kee, Carl Kesselman ISI/USC.
Resource Specification Prediction Model Richard Huang joint work with Henri Casanova and Andrew Chien.
The EMAN Application: An Update. EMAN Oversimplified Preliminary 3D Model Preliminary 3D model Particles Electron Micrographs Refine Final 3D model.
Computer Simulation Henry C. Co Technology and Operations Management,
OPERATING SYSTEMS CS 3502 Fall 2017
SC’07 Demo Draft VGrADS Team June 2007.
Introduction to Load Balancing:
Application-level Resource Provisioning
Resource Characterization
A Study of Group-Tree Matching in Large Scale Group Communications
New Workflow Scheduling Techniques Presentation: Anirban Mandal
Dynamic Graph Partitioning Algorithm
ISP and Egress Path Selection for Multihomed Networks
Chapter 6: CPU Scheduling
湖南大学-信息科学与工程学院-计算机与科学系
CPU Scheduling G.Anuradha
Module 5: CPU Scheduling
Metaheuristic methods and their applications. Optimization Problems Strategies for Solving NP-hard Optimization Problems What is a Metaheuristic Method?
Multi-Objective Optimization
On-time Network On-chip
COMP60621 Fundamentals of Parallel and Distributed Systems
Lecture 2 Part 3 CPU Scheduling
Boltzmann Machine (BM) (§6.4)
Module 5: CPU Scheduling
COMP60611 Fundamentals of Parallel and Distributed Systems
R, Zhang, A. Chien, A. Mandal, C. Koelbel,
Communication Driven Remapping of Processing Element (PE) in Fault-tolerant NoC-based MPSoCs Chia-Ling Chen, Yen-Hao Chen and TingTing Hwang Department.
Module 5: CPU Scheduling
Anand Bhat*, Soheil Samii†, Raj Rajkumar* *Carnegie Mellon University
Presentation transcript:

VGrADS Tools Activities Chuck Koelbel VGrADS Workshop, February 23, 2006

Tools: Where We Are Achieved Working On It Initial workflow scheduling methods “Anirban scheduler” [Rice, UH, UCSD, ISI] Supported by performance prediction, NWS Initial fault tolerance implementations FT-MPI [UTK] Optimal checkpoint scheduling [UCSB] Platform-independent application launch and optimization LLVM, run-time reoptimization experiments [Rice] Working On It Virtual Grid scheduling methods Building workflow DAGs

Ongoing Tools Thrusts Scheduling methods Performance prediction Other Most of rest of talk [Rice, UCSD, UCSB, UTK, ISI] All based on pre-scheduling (aka off-line scheduling) of workflow (aka datflow, aka DAGs) using performance prediction Performance prediction Queue delay model [UCSB] Other Launching and reoptimization [Rice] DAG construction [Rice]

Scheduling Methods Two-level (choose VG, map onto it) Richard Huang (UCSD), Anirban Mandal & Ryan Zhang (Rice) Batch queue (include est. queue delay in cost model) Anirban Mandal (Rice), Dan Nurmi (UCSB) Cluster (assign block of tasks to cluster) Anirban Mandal (Rice) Provisioning (minimize reservation time + execution time) Gurmeet Singh (ISI) Robust (schedule to reduce sensitivity to variability) Zhiao Shi (UTK)

Scheduling Comparison Objective function Min / Max Costs Important Assumptions Two-level Scheduler + Makespan DAG nodes & edges; (Rice) Proc type Dedicated resources available for duration of application Batch queue Scheduler + Makespan + Queue delays DAG nodes & edges; Proc type, Queue delay Resources controlled by queues, but no relevant allocation limits Cluster DAG nodes & edges; Proc type Dedicated resources available for duration of application; homogeneous DAG nodes on each level Provisioning Reservation time and Sched quality Reservation, Schedule costs Resources controlled by provisioning mechanism Robust Robustness subject to makespan DAG edges(?) Shared resources resulting in runtime performance variability

Results Maximize f Huang - 2-level scheduler, Montage DAG Mandal - cluster scheduler, EMAN DAG Maximize f Shi - robust scheduler, ??? DAG

Tools Research Going Forward Interface between vgES and schedulers What capabilities can schedulers expect from vgES? How can schedulers exploit this capability? How can schedulers work around this capability? Some interesting operating points vgES provisions VG / application takes what’s given vgES returns shared VG nodes / application adapts to perf variance vgES returns queued VG resources / application manages queues vgES provisions VG, monitors for additional resources / application starts immediately, adapts to changes

Tools Research Going Forward Generating vgDL request for 2-level methods Balance request complexity vs. difficulty scheduling onto VG VG1 = ClusterOf (node) [1:N] [Rank=Cluster.nodes] {node = [CPU=Opteron]} VG2 = ClusterOf (node) [1:N] [Rank=Cluster.nodes*node.clock] {node = [CPU=Opteron]} VG3 = ClusterOf (node) [1:N] [Rank=PerfModel(Cluster.nodes,Cluster.bw,node.clock,node.mem)] {node = [CPU=Opteron]} Automatic vgDL generation from DAGs Template-driven? Heuristic-driven? Extended vgDL capabilities Global constraints (e.g. total # of nodes) Temporal constraints (e.g. available within 60 min) Probabalistic constraints (e.g. 95% likely to succeed)

Tools Research Going Forward New scheduling criteria Deadline scheduling Economic scheduling Real-time scheduling New scheduling situations Rescheduling Adapting to new resources Adapting to resource failures Incremental scheduling Managing dynamic applications “Horizon scheduling” for limited-time predictions Hybrid static / dynamic scheduling Contingency scheduling Static planning for dynamic optimizations

Backup Slides Beyond This Point

Two-Level Scheduling (Huang) Target Application Workflows represented by DAG Performance Metrics Application Turn-Around Time Resource Selection Scheduling Time Application Makespan Major Assumptions of the Scheduler Resources are dedicated Resources available for duration of application Scheduling Algorithms (so far) Greedy Modified Critical Path

Experimental Setup Use synthetic resource generator to generate 1000 clusters (33,667 hosts) Execute one “simple” (greedy) and one “complex” (Modified Critical Path) scheduling heuristic Tests on Montage DAG Scheduling Heuristic Resources Complex Resource Universe Top x percent Fastest Hosts Appropriate Virtual Grid Simple

Initial Results Original CCR CCR = 0.1 Two-phase scheduling necessary to avoid excessive scheduling time Appropriate virtual grids necessary for better performance Using more complex heuristic did not improve performance if you have the appropriate resource abstractions!

Batch Queue Scheduling (Mandal) Make batch-queue predictions on-the-fly from the ``live’’ systems New NWS functionality Parameterize the performance models using the 95% upper bound on the median prediction as a prediction of delay The performance models can take into account the amount of time needed to start a computation Run a top-down (heuristic) scheduler to choose a resource set Scheduler is smart enough to understand that the start-up delay can be amortized Joint work with Dan Nurmi and Rich Wolski

Top-Down Scheduling Top-Down For each heuristic Until all components mapped Map available components to resources Select mapping with minimum makespan While all available components not mapped For each (component, resource) pair ECT(c,r) = rank(c,r) + EAT(r) End For each Run min-min, max-min and sufferage Store mapping End while Top-Down

Scheduling onto Batch-Queue Systems Details: Modification of Top-Down scheduler At every scheduling step, take into account the estimated time the job has to wait in the queue in the estimated completion time for the job [ECT(c,r) in the algorithm] Keep track of the queue wait times for each cluster and the number of nodes that correspond to the queue wait time With each mapping, update the estimated availability time [EAT in the algorithm] with the queue wait time, as required

Scheduling onto Batch-Queue Systems: Example Cluster 0 Cluster 1 Input DAG R0 R1 R2 R3 Queue Wait Time [Cluster 0] = 20 # nodes for this wt. time = 1 Queue Wait Time [Cluster 1] = 10 # nodes for this wt. time = 2 T

Scheduling onto Batch-Queue Systems: Example Cluster 0 Cluster 1 Input DAG R0 R1 R2 R3 Queue Wait Time [Cluster 0] = 20 # nodes for this wt. time = 1 Queue Wait Time [Cluster 1] = 10 # nodes for this wt. time = 2 T

Discussions Experiments to evaluate EMAN scheduling with batch-queues Control experiment Schedule with and without queue-wait estimates, run application with the two schedules on Teragrid and compare turnaround times Accuracy of the results - how close to actual Other future issues Predictive/opportunistic approach Submit to queues even before data arrives with hope that data arrives by the time job moves to the front of the queue Point-valued predictions of probabilistic systems are problematic Need to schedule based on ranges or distributions Probabilistic deadline scheduling

Cluster Scheduling (Mandal) Motivation: Scheduler scaling problem for ‘large’ Grids Idea: Schedule directly onto clusters Input: Workflow DAG with restricted structure - nodes at the same level do the same computation Set of available Clusters (numNodes, arch, CPU speed etc.) and inter-cluster network connectivity (latency, bandwidth) Per-node performance models for each cluster Output: Mapping: for each level the number of instances mapped to each cluster Objective: Minimize makespan

Scheduling onto Clusters: Modeling Abstract modeling of mapping problem for a DAG level Given: N instances M clusters r1..rM nodes/cluster t1..tM - rank value per node per cluster (incorporates both computation and communication) Aim: To find a partition (n1, n2,… nM) of N such that overall time is minimized with n1+n2+..nM = N Analytical solution: No ‘obvious’ solution because of discrete nature of problem

Scheduling onto Clusters Iterative solution Big picture: Iterative assignment of tasks to clusters DP approach For each instance, i from 1 to N For each cluster, j from 1 to M Tentatively map i onto j Record makespan for each j by taking care of round(j) End For each Find cluster, p with minimum makespan increase Map i to p Update round(p), numMapped(p) O(#instances * #clusters)

Scheduling onto Clusters: Evaluation Application Representative DAGs from Montage and EMAN with varying widths Known performance models Simulation Platform Resource Model: Synthetic cluster generator (Kee et al SC’04) Network Model: BRITE to generate network topology, generate latency/bandwidth following a truncated normal distribution Experiment Varying number of clusters (nodes) 250 to 1000 clusters (8.5K to 36K nodes) Ran three scheduling approaches Heuristic (min-min/max-min/sufferage heuristics based) Greedy (simple greedy heuristic based) Simple (the Cluster level scheduler) Compared turnaround time (Makespan + Scheduling Time)

Scheduling onto Clusters: Results Montage Application 103 node Montage DAG 717 node Montage DAG Cluster level Scheduler (Simple) offers Scalability - scales to ‘large’ Grids Improved turnaround time No significant degradation of application makespan quality

Scheduling onto Clusters: Results EMAN Application 171 node EMAN DAG 666 node EMAN DAG Cluster level Scheduler (Simple) offers Scalability - scales to ‘large’ Grids Improved turnaround time No significant degradation of application makespan quality

Robust Task Scheduling (Shi) Task scheduling: Assigning tasks of a meta-task (workflow-type application) to a set of resources, and achieving certain goals, e.g. minimizing the schedule length NP-complete, finding an optimal solution is either impossible or impractical. Heuristics (list scheduling, duplication, clustering) , optimization (Genetic algorithm, Simulated annealing etc.) Previously we focused on list scheduling algorithm considering the case that processors has different capabilities.

Non-deterministic environment Actual resource environment is non-deterministic inherently due to resource sharing. Previously we used expected values of execution times of tasks, network speed. The optimal solution for task scheduling problem with expected values of resource characteristics is NOT optimal for the corresponding problem with non-deterministic values. We focus on variable execution time in this work.

Possible Solutions Static scheduling Dynamic scheduling Overestimate the execution time to avoid exceeding the allotted use of machine at the expense of machine utilization Compute the schedules for various scenarios and at run time adopt the one which fits the current status. Find schedules more robust to variable execution time. Dynamic scheduling at each point of scheduling (when a task is ready to be executed), gather current resource information and compute a new schedule for unscheduled tasks

Robustness Schedule delay Robustness M0(s): Makespan of the schedule s obtained with expect values (execution time) M(s): Makespan of schedule s with real execution time Robustness Each realization of expect values gives different schedule delay.

Slack slack(ni) = makespan – [b_level(ni)+t_level(ni)] Slack of a task node is defined as follow: Slack is closely related to robustness. large slack means a task node can tolerate large increase of execution time without increasing the makespan slack(ni) = makespan – [b_level(ni)+t_level(ni)]

slack(ni) = makespan – [b_level(ni)+t_level(ni)] Robustness and slack Disjunctive graph P1 1 3 8 P2 2 5 7 P3 4 6 9 10 Disjunctive graph is used to calculate expected makespan and real makespan slack(ni) = makespan – [b_level(ni)+t_level(ni)]

Task execution time modeling Least Time to Compute (LTC) matrix : {ltcij} time to compute task i on processor j generated from a single number with twice gamma distribution on 2 dimensions (machine, task) different values of gamma parameters represent different heterogeneities of machine or task Uncertainty level: {ulij} expected actual time to compute / least time to compute Actual computation time: actij = ltcij * ulij

Genetic Algorithm 1.[Start] Generate initial population of n chromosomes (suitable solutions for the problem) 2.[Fitness] Evaluate the fitness f(x) of each chromosome x in the population 3.[New population] Create a new population by repeating following steps until the new population is complete [Selection] Select two parent chromosomes from a population according to their fitness [Crossover] With a crossover probability cross over the parents to form new offspring (children). If no crossover was performed, offspring is the exact copy of parents. [Mutation] With a mutation probability mutate new offspring at each locus (position in chromosome). [Accepting] Place new offspring in the new population 4.[Replace] Use new generated population for a further run of the algorithm 5.[Test] If the end condition is satisfied, stop, and return the best solution in current population 6.[Loop] Go to step 2

Single objective optimization makespan optimization robustness optimization

Multi-objective optimization Goal: Minimize makespan and maximize robustness at the same time. Conflict - there cannot be a single optimum solution which simultaneously optimizes both objectives. Solution – seek balance between the 2 objectives.

Multi-objective optimization Classical methods weighted sum -constraint Weigthed sum scalarizes multiple objectives into a single objective optimize one of the objectives , subject to some constraints imposed on the other objectives

Weighted sum Objective function aws: average weighted slack: ni is scheduled on pj

-constraint Objective: Solutions: Fitness: maximize aws (average weighted slack) subject to: ms < ε*ms0 Solutions: feasible (ms < ε*ms0) infeasible (ms ≥ ε*ms0) Fitness:

Summary Studied robust scheduling in non-deterministic environment using GA. Provided a measurement of robustness. Robust schedule could be generated through the optimization of average weighted slack (AWS) of a task graph. Makespan and robustness are two conflicting objectives. Multi-objective optimization methods are employed. Weighted sum method is easy to use and intuitive. Setting up an appropriate weight vector depends on the scaling of each objective function. Normalization of objectives is usually required. -constraint methods let user optimize one objective while imposing constraints on other objectives.