Static Process Schedule Csc8320 Chapter 5.2 Yunmei Lu
Outline Definition and Goal Models Precedence process model Communication system model Future work Reference 2
What is Static Process Schedule?(SPS) Scheduling a set of partially ordered tasks on a non-preemptive multiprocessor system of identical processors to minimize the overall finishing time (makespan)[1] 3
Implications ? Mapping of processes to processors is determined before execution of a process. Process behavior, process execution time, precedence relationships, and communication patterns need to be known before execution Non-preemptive, once started, process stays on processor until completed. 4
Goal ? Minimize the overall finish time (makespan) on a non-preemptive multiprocessor system (of identical processors) Scheduling algorithm that can best balance and overlap computation and communication 5
Other Characteristics? Optimize makespan NP-complete Need approximate or heuristic algorithms… For classical definition, inter-processor communication is considered to be negligible, but for distributed system, it is non-negligible. 6
Models? Precedence Process Model(PPM) Communication Process Model(CPM) 7
Precedence Process Model(PPM) Program is represented by a directed acyclic graph (DAG) (Figure a in following slide). Precedence constraints among tasks in a program are explicitly specified. It can be characterized by a communication system model showing unit communication delays between processors (Figure b in follow slide). The communication cost between two tasks = unit communication cost in the communication system graph multiply the message units in the DAG. 8
Example of DAG 9 In figure a, each node denotes a task with a known execution time An edge represents a precedence relationship between two tasks, arrow represents the priority of execution; The label show message units to be transferred [Chow and Johnson 1997]
Precedence process and communication system models 10 Figure b is an example of a communication system model with three processors(p1,p2,p3), the unit communication costs are non-negligible for inter- processor communication and negligible (zero weight on the internal edge) for intra-processor communication. Communication cost between A(on p1) and E(on p3) is: 4*2=8 [Chow and Johnson 1997]
Precedence Process Model Algorithms: List Scheduling (LS): A simple greedy heuristic: no processor remains idle if there are some tasks available that it could process. Without considering communication. Extended List Scheduling (ELS): the actual scheduling results of LS with communication consideration. Earliest Task First scheduling (ETF): the earliest schedulable task (with communication delay considered) is scheduled first. 11 [Chow and Johnson 1997]
12 (critical path): is the longest execution path in the DAG Dashed-lines represent waiting for communication Algorithms [Chow and Johnson 1997]
Communication Process Model(CPM) Modeled by a undirected graph G, nodes represent processes and weight on the edge is the amount of communication messages between two connected processes. There are no precedence constrains among processes Processors are not identical(different in speed and hardware) Scheduling goal: maximize the resource utilization and minimize inter-process communication. 13 [Chow and Johnson 1997]
Communication Process Model The problem is to find an optimal assignment of m processes to P processors with respect to the target function(called Module Allocation problem): P: a set of processors. e j (p i ): computation cost of execution process p j in processor P i. c i,j (p i,p j ): communication overhead between processes p i and p j. Assume a uniform communicating speed between processors. 14 [Chow and Johnson 1997]
Communication Process Model Stone’s two processors model to achieve minimum total execution and communication cost: 15 ProcessCost on ACost on B infinity (a) Computation cost Figure (a) shows execution time of each process on either processor, (b) shows inter-process communication [Chow and Johnson 1997]
How to map processes to processors? Partition the graph by drawing a line cutting through some edges Result in two disjoint graphs, one for each process Set of removed edges cut set Cost of cut set sum of weights of the edges, which represents the total inter-process communication cost between processors The cost of cut sets is 0 if all processes are assigned to the same node, but it makes no sense Computation constraints (no more k, distribute evenly…) 16 [Chow and Johnson 1997]
How to map process to processors? 17 The weight assigned to an edge between A and process i is the cost to execute process i on B. [Chow and Johnson 1997] Minimum-cost cut
Extend of stone’s two processors model To generalize the problem beyond two processors, Stone uses a repetitive approach based on two-processor algorithm to solve n- processor problems. Treat (n-1) processors as one super processor The processors in the super-processor are further broken down based on the results from previous step. 18 [Chow and Johnson 1997]
Problems? Too complex The optimization objectives of minimizing computation and communication costs are often conflicting Therefore, we use some other heuristic solutions 19
Some heuristic solutions Separate the optimization of computation and communicati on into two independent phases Merge processes with higher inter-process interaction into clusters of processes Processes in each cluster are then assigned to the processor that minimizes the computation cost 20 [Chow and Johnson 1997]
Problem and Solution Merging processes eliminates inter-processor communication but may impose a higher computation burden on the processor and thus reduce concurrency. Solution Merge only processes with communication costs higher than a certain threshold C Constrain the number of processes in a cluster, like that total execution cost of the processes in a single cluster cannot exceed another threshold X 21 [Chow and Johnson 1997]
Cluster of processes For C = 9, We get three clusters (2,4), (1,6 )and (3,5) Clusters (2,4) and (1,6) must be mapped to processors A and B. Cluster (3,5) can be assigned to A 0r B, according to the goal of minimizing of computation cost or communication cost Assigning (3,5) to A has a lower communication cost but higher computation cost If we assign (3,5) to A, the total Cost = 41 ( Computation cost = 17 on A and 14 on B Communication cost = 6+4= 10) 22 (2,4) ) (1,6) ) (3,5) ) [Chow and Johnson 1997]
Summary of static process schedule Non-preemptive, once a process is assigned to a processor, it remain there until its execution has been completed Need prior knowledge of execution time and communication behavior Scheduling decision is centralized and non-adaptive Not effective Not realistic To find the optimal solution is NP-hard, so always use heuristic algorithms. 23
Future work With the advancements in processors and networking hardware technologies, parallel processing can be accomplished in a wide spectrum of platforms. Designing diverse platforms makes the scheduling problem even more complex and challenging. Designing scheduling algorithms for efficient parallel processing, should consider the following aspects: 24
C ONT … Performance: Scheduling algorithm should produce high quality solution Time-complexity: It is an important factor insofar as the quality of solution is compromised. A fast algorithm is necessary for finding good solutions efficiently. Scalability: Have to consistently give good performance even for large input. Given more processors for a problem, the algorithm should produce solutions with comparable quality in a shorter period of time. Applicability: Must be applicable in practical environments, so it should take into account realistic assumptions about the program and multiprocessor models such as arbitrary computation and communication weights… 25
C ONT … The above mentioned goals are conflicting and thus pose a number of challenges to researchers. To combat these challenges, several new ideas are: Genetic algorithms Randomization approaches Parallelization techniques Extend DAG scheduling to heterogeneous computing platforms 26
Reference 1. Randy Chow, Theodore Johnson, “Distributed Operating Systems & Algorithms”, Addison Wesley, pp Yu-Kwong Kwok, Ishfaq Ahmad; Static scheduling algorithms for allocating directed task graphs to multiprocessors; ACM Computing Surveys; December Sachi Gupta, Gaurav Agarwal, Vikas Kumar “Task Scheduling in Multiprocessor System Using Genetic Algorithm”, /ICMLC Hongze Qiu, Wanli Zhou, Hailong Wang. “A Genetic Algorithm-based Approach to Flexible Job-shop Scheduling Problem”. DOI /ICNC Xueyan Tang & Samuel T. Chanson. “ Optimizing Static Job Scheduling in a Network of Heterogeneous Computers”. pp , icpp, IEEE
Thank you ! 28