Click to edit Master text styles Second level Third level Fourth level Fifth level Sucha Smanchat, PhD 1
2 Concept of scheduling Scheduling models Workflow scheduling problem Computational complexity overview Workflow scheduling techniques In Grid computing environment In Cloud computing environment
3 A branch of Operational Research Scheduling Theory started in 1950s
4 Scheduling is a decision-making process of allocating tasks to (limited) resource Task or job – the activity to be carried out Resource – what is required to do task or job “A schedule is a job sequence determined for every machine of the processing system.” Normally 1 task per 1 resource at a given time Resource can be homogeneous or heterogeneous
5 Time taken to complete all tasks Cost to complete all tasks Resource utilization Reliability Security Energy consumption (carbon footprint) Etc.
6 Hard objectives are usually imposed as constraints Hard time objective is a deadline Hard cost objective is a budget Hard security objective is security constraint Soft objectives are usually optimized without specifying a threshold Soft time objective is minimizing overall makespan Soft cost objective is minimizing execution cost Soft energy objective is minimizing energy consumption Soft utilization objective is to maximize resource utilization
7 Scheduling objectives are usually in conflict with each other e.g. A faster / reliable resource usually costs higher Increasing resource utilization may increase time Increasing reliability may reduce security and consume more energy Optimize between two or more objectives Pareto optimal solution or “Pareto Front” is a set of optimal solutions given two or more objectives Trade off between multiple objectives Hard VS Soft objectives
8 Single machine - simplest Parallel machines Homogeneous – A job can be processed by any resource Homogeneous, different capability (e.g. speed) – A job can be processed by any resource but processing time of each resource is different Heterogeneous – A job can be processed by certain resources
9 Flow Shops For m machine, every job has to be processed on each one of the m machines (e.g. assembly line) Flexible Flow Shops Every job has to go through a number of stage. Each stage can be handled by one of multiple machines
10 Job Shops Each job has its own predetermined route Same as Flow Shops except that a job can be processed by the same machine more than once Flexible Job Shops Each job has its own predetermined route Similar to Job Shops but the machines are group as work centre so a job can be processed by any of machine in the centre Open Shops Each job has to be processed on each one of the m machines No restrictions on job routing (e.g. scheduler’s decision)
11 A workflow W is composed of a set of tasks T connected according to a set of precedence dependencies E W = (T, E) A precedence dependency e in E e = (t i, t j ) where t i ≠ t j specifies that t i must finish before t j can start Given a set of resources R, workflow scheduling problem is to find the mapping of the tasks in T to the resources in R so that the scheduling objective(s) is optimized.
12 To explain why some problems can be solved easier Time complexity or running time expresses the total number of elementary operations as a function of the size of the problem instance Input size is bounded by The number of jobs The number of resource
13 A decision problem is said to be polynomial or a polynomial-time algorithm if its running time is bounded by a polynomial in input size.” – P complexity E.g. O(n 2 ) : number of operations grows as the function Cn 2 “Polynomial algorithms are sometimes called efficient. The class of all polynomially solvable problems is called class P” If no polynomial-time algorithm is known for a problem, the problem is known as class NP-hard problem. Generally cannot be solved in polynomial time Many scheduling problems are NP-hard
14 Set of all decision problems Non-deterministic Polynomial-time The solution to a decision problem can be verified in polynomial time by a (non-)deterministic Turing machine
15
16 NP-Complete class is the hardest problems in class NP An NP-Complete problem is also NP-Hard No known polynomial-time algorithms to solve It is almost impossible to solve these problems or obtain optimal solution in a reasonable time period If you encounter such problem, do not try to find optimal answer because you won’t find one probably in your lifetime. Use alternative methods Approximation algorithm Heuristic algorithm
17 Non-deterministic Polynomial-time hard Can be many types of problems Decision problems Search problems Optimization problems Class of problems that are at least as hard as the hardest problems in NP class (NP-Complete) NP-Complete is NP-Hard, but NP-Hard is not necessarily be NP-Complete
18 Sequencing and scheduling Database problems Network design e.g. spanning tree Mathematical programming Games and puzzles Automata and language theory Program optimization e.g. code generation Algebra and number theory
19 Exact – Find optimal solutions Approximate – guaranteed fixed percentage of optimum in polynomial time, performance is verified analytically Heuristic – no guarantee, performance is verified by computational experiment Construction – start with no schedule and add a job at a time Improvement – Start with a schedule and try to find a better one
20 “Experience-based techniques for problem solving, learning, and discovery” Produce good-enough solutions, which may not be optimal But fast to compute and generate solution when exhaustive search is not practical Can be used with other methods to improve efficiency Examples Rule of thumb Trial and Error
21 High-level heuristic that is designed to generate a heuristic that shall give a good solution Metaheuristic techniques usually give better result than heuristic techniques, BUT are slower than heuristic techniques – not appropriate for time-sensitive applications Examples Genetic Algorithm (GA) Particle Swarm Optimization (PSO)
22
23 Grid workflow scheduling Time-sensitive Focus on fast execution (resource sharing model) Cloud workflow scheduling Time-sensitive Focus on cost (business-driven model) but still have to be fast enough (multi-objectives of cost and time) Because both are time-sensitive, metaheuristic techniques are usually not acceptable
24 Popular research field during the time of Grid computing Because Grid computing is based on resource sharing, the most important objective is to finish a workflow as fast as possible to allow other users to use Grid resources Grid environment may change at any time (resources may not be subject to central control) so scheduling process must be fast (time-sensitive)
25 Each task has an execution time on each resource EET – Estimated Execution Time Each resource may have a queue of waiting tasks EWT – Estimated Wait Time Hardware queues VS queues maintained by scheduler Data may be transferred between resources according to task dependencies ETT – Estimated Transfer Time
26 HEFT - Heterogeneous-Earliest-Finish-Time Another popular algorithm with decent performance Calculate task rank recursively backward from the last task through the longest path to the first task The last task has the lowest rank The first task has the highest rank
27 Ranking funcation w = average computation time of the task c i,j = average communication time between the task and each child task Iteratively assign the task with highest rank to the resource that can finish it at earliest time (fastest). Many HEFT extensions exist
28 H. Topcuoglu, S. Hariri and M. Wu, "Performance-effective and low-complexity task scheduling for heterogeneous computing," IEEE Transactions on Parallel and Distributed Systems, vol. 13, pp , 2002.
29
30 Three popular batch algorithms Min-Min, Max-Min, and Sufferage Task Prioritising Phase Create a list of tasks that are ready to execute according to precedence dependencies Find the resource that can execute each task fastest with Minimum Completion Time (MCT)
31 Resource Selection Phase Min-Min - iteratively schedule the task-resource pair with minimum MCT first Max-Min - iteratively schedule the task-resource pair with maximum MCT first Sufferage - iteratively schedule the task-resource pair that would suffer most if not scheduled first (sufferage determined by min MCT – second min MCT) XSufferage – same as Sufferage but also taking into account data transfer time between tasks
32 H. Topcuoglu, S. Hariri and M. Wu, "Performance-effective and low-complexity task scheduling for heterogeneous computing," IEEE Transactions on Parallel and Distributed Systems, vol. 13, pp , 2002.
33 CPOP (proposed together with HEFT) QoS guided Min-Min Min-Min Max-Min Selective Algorithm Balanced Minimum Completion Time Hybrid HEFT SDC Besom Cluster and Duplication Based Scheduling TDS and TANH
34 Supersede Grid workflow scheduling after Cloud computing became popular. Because Cloud computing is economy-driven, the most important objective is to lower the cost of cloud resources used for execution But the execution still needs to be fast enough - thus multi-objective Cost VS Time - faster servers cost higher Other objectives are receiving more attention e.g. energy consumption and security constraints
35 Cloud environment do not change much because of Service Level Agreement (SLA) Still, it is time-sensitive. No point that the scheduling time is longer than the actual execution. Mostly assume IaaS resources i.e. virtual machines Each task has an execution time on each virtual machine type Estimated Execution Time (EET) Easier to parameterize than Grid resources See EC2 Compute Unit or Elastic Compute Unit – ECU
36 Each virtual machine may have a queue of waiting tasks Estimated Wait Time The queues should mostly maintained by scheduler to avoid complicated virtual machine Data may be transferred between virtual machines according to task dependencies, however, Data transfer within the same region (data center) is usually assumed to be zero
37 Two cost-based scheduling approaches Backtracking and (Partial) Critical Path Workflow partitioning into sequential branches can be applied to reduce complexity Deadline and/or budget may be distributed To individual task as sub-deadline or sub-budget To each branch after partitioning
38 Minimize cost while meeting deadline (or vice versa) Allocates the ready tasks to the cheapest resources then calculate the execution time. If deadline is violated, the last allocated task is reallocated (backtracked) to a faster (more expensive) resource. Multiple backtracking may be required.
39 TaskP1P2P P1P2P3 Cost/time unit Find a schedule with minimum cost within deadline of 45 time units Find a schedule with minimum time with budget of 120
40 Algorithms using this approach first find the critical path of the workflow Critical path is the longest path from the entry task to the exit task of a workflow The tasks outside the critical path are less likely to affect the scheduling objectives Thus, ensuring that the critical path meets the scheduling objective will also ensure that the whole execution meets the scheduling objectives
41 Once the critical path is determined, the tasks in the critical path are usually assigned to: the cheaper (slower) resources that can still meet the workflow deadline or the sub-deadline of each task the fastest resources (more expensive) that can complete the workflow within its budget or the sub-budget of each task The process then finds the new critical among the remaining tasks and repeat the process Algorithms in this approach have different way to select resources for tasks depending on their focus
42
43 The first critical path t2-t6-t9 is allocated to a virtual machine instance of type s2 as it is the cheapest resource that can finish the three tasks within their latest finish times (LFT)
44 IC-PCPD2 (the variation of IC-PCP, proposed in the same paper, that distributes deadline to each individual task) Partitioned Balanced Time Scheduling (PBTS) Hybrid Cloud Optimized Cost scheduling (HCOC) Dynamic Critical Path for Cloud (DCP-C)
45 Workflow scheduling in Hybrid Cloud / Intercloud Virtual machines allocation/placement Which physical host each virtual machine should reside? Host utilization Energy consumption MapReduce scheduling Another unique scheduling problem
46