Static Process Scheduling Section 5.2 CSc 8320 Alex De Ruiter
The Book: Static Process Scheduling What is it (classical definition)? The scheduling of a set of partially ordered tasks on a non-preemptive multiprocessor system of identical processors to minimize the overall finishing time (aka. Makespan). [1] Implications: Mapping of processes to processors is determined before execution of a process. Once started, processes stays on processor until completed. -- No preemption -- Process behavior, process execution time, precedence relationships, and communication patterns need to be known before execution.
The Book: Static Process Scheduling Scheduling to optimize makespan has been shown to be NP-complete. So, research is directed toward approximate/heuristic methods. How does static scheduling for distributed systems differ from classical definition? Interprocessor communication is considered to be negligible in classical definition. Definitely not the case in distributed system.
The Book: Static Process Scheduling Goal? Scheduling algorithm that can best balance and overlap computation and communication. Two types proposed by book Precedence Process Model: generally appropriate for user applications where process precedence is explicitly specified by the user. Communication Process Model: generally appropriate for system applications where the scheduling goal is to maximize resource utilization and minimize interprocess communication.
The Book: Precedence Process Model General Goal: minimize overall makespan. Represented by directed acyclic graph (DAG). Critical path: represents the longest execution path in the precedence process DAG. Possible scheduling strategy is to map all critical path processes onto the same processor.
The Book: Precedence Process Model Three forms: List Scheduling No processor remains idle if tasks remain Extended List Scheduling Use LS to distribute without concern for communication delays. Add in communication delays. -- No anticipation Earliest Task First Earliest schedulable task is scheduled first.
The Book: Precedence Process Model Each node represents a task/execution time combination. Each edge represents a precedence relationship. Each edge also notes the message unit weight. [1]
The Book: Precedence Process Model [1]
The Book: Communication Process Model Why a Communication Process Model? Processes don't always have an explicit completion time. Processes don't always have precedence constraints. Scheduling goal is to maximize resource utilization, minimize interprocess communication, and minimize total execution time.
The Book: Communication Process Model Module Allocation Problem: Used to define “cost” in the Communication Process Module G: is undirected graph with nodes V and edges E P: some number of processors e j (p i ) : execution cost of process j on processor i c (i,j) (p i,p j ): communication cost between processors i and j Also NP-Complete
The Book: Communication Process Model Approaches: Minimize communication cost by selecting “Cut Set” with least weight. Cut set represents total cost of interprocessor communication. By selecting for communication cost, concurrency is potentially reduced. Logical conclusion would be to schedule all processes on one processor. Leads to Maximum Flow / Minimum Cut which represents optimized two processor scheduling selection.
The Book: Communication Process Model Each node represents a processor. Each edge represents a weighted communication cost [1]
The Book: Communication Process Model [1]
The Book: Communication Process Model [1]
The Book: Communication Process Model Heuristic approach for more than one processor: Define super group S containing all proposed processes. Define a communication cost threshold where if communication cost between two processes exceeds threshold, both processes are assigned to same processor. Using Cost(G,P), iteratively combine processes into sub groups from super group S. Optimize for computation and communication cost as each subgroup is produced. Proceed until all process are removed from S.
The Book: Wrap-up Static process scheduling is imprecise due to problem complexity as number of processors and processes grows. When one remembers that the system need not maintain the static process allocation thanks to subsequent load balancing efforts, best effort approximations prior to process initiation become less significant in the overall system performance.
Today Realtime Grid computing scheduling schemes: Earliest Deadline First (EDF): Highest priority to processes with earliest required deadline [3]. Lest Laxity First (LLF) process are scheduled in non-decreasing order of slack time where slack time is given as the difference between the process's deadline and its remaining computational time. So processes that are closest to exceeding their deadlines go first [3].
Today Random Brokering: Specific process arrival time and process duration are unknown but in general conform to some statistical distribution. Resource assignment guided by known properties of arrival time/ process duration distribution (i.e. duration may conform to power law and arrival time may be represented by Poisson distributuion [2][3] ).
References 1)Randy Chow, Theodore Johnson, “Distributed Operating Systems & Algorithms”, Addison Wesley, pp )Vandy Berten, Joel Goossens, Emmanuel Jeannot, “On the Distribution of Sequential Jobs in Random Brokering for Heterogeneous Computational Grids”, IEEE Transactions on Parallel and Distributed Systems, VOL 17, No. 2, February 2006, Page )“Poisson Distribution”, Nikolaos D. Doulamis, Anastasios D. Doulamis, Emmanouel A. Varvarigos, Theodora A. Varvarigou, “Fair Scheduling Algorithms in Grids”, IEEE Transactions on Parallel and Distributed Systems, Vol 18, No. 11, November 2007.