Download presentation
Presentation is loading. Please wait.
Published byCornelius Day Modified over 9 years ago
1
1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini
2
2 Outline Scheduling models Scheduling without considering communication Including communication in scheduling Heuristic algorithms
3
3 Partitioner Grains of Sequential Code Parallel/Distributed System Parallel Program Tasks Scheduler Schedule Processors Time Program Tasks Sequential Program Explicit Approach Implicit Approach Dependence Analyzer Ideal Parallelism Scheduling Parallel Tasks
4
4 Program Tasks Task Notation: (T, <, D, A) T set of tasks < partial order on T D Communication Data A amount of computation
5
F 20 A 5 Task Graph 10 D 15 E 10 B 15 C 10 G 15 H I 30 5 5 87 5 55 10 5 4 54 20 Task Amount of Computation Communication Data Dependency
6
6 Machine m heterogeneous processors Connected via an arbitrary interconnection network (network graph) Associated with each processor P i is its speed S i Associated with each edge (i,j) is the transfer rate R ij
7
7 Task Schedule Gantt Chart Mapping (f) of tasks to a processing element and a starting time Formally: f(v) = (i,t) task v is scheduled to be processed by processor i starting at time t
8
8 Gantt Chart
9
9 Gantt Chart with Communication
10
10 Execution and Communication Times If task t i is executed on p j Execution time = A i /S j The communication delay between t i and t j, when executed on adjacent processing elements p k and p l is D ij /R kl
11
11 Complexity Computationally intractable in general Small number of polynomial optimal algorithms in restricted cases A large number of heuristics in more general cases schedule schedulerQuality of the schedule vs. Quality of the scheduler
12
12 Scheduling Task Graphs without considering communication Polynomial-Time Optimal Algorithms in the following cases: 1.Task graph is in-forest: each node has at most one immediate successor, or out-forest: each node has at most one immediate predecessor 2.Task graph is an interval order
13
In-Forest vs. Out-Forest Structure In-ForestOut-Forest 13
14
14 Assumptions A task graph consisting of n tasks A distributed system made up of m processors The execution time of each task is one unit of time Communication between any pair of tasks is zero The goal is to find an optimal schedule, which minimizes the completion time
15
15 List Scheduling All considered algorithms belong to the list scheduling class. Each task is assigned a priority, and a list of tasks is constructed in a decreasing priority order. A task becomes ready for execution when its immediate predecessors in the task graph have already been executed or if it does not have any predecessors.
16
16 Scheduling Inforest/Outforest task graphs 1.The level of each node in the task graph is calculated as given above and used as each node’s priority 2.Whenever a processor becomes available, assign it the unexecuted ready task with the highest priority
17
17 Example 1: Simple List Scheduling Scheduling
18
Example 2: Simple List Scheduling TaskPriority A5 B5 C5 D4 E4 F4 G4 H3 I3 J3 K2 L2 M1 18 ABC D EF H IJ KL M G tProcessors 0P1P2P3P4 1ABCE 2DFGH 3IJL 4K 5M Priority Assignment Scheduling
19
CDE F GH IJ KL M Priority Assignment Scheduling AB Example 3: Simple List Scheduling 19
20
20 Interval Orders A task graph is an interval order when its nodes can be mapped into intervals on the real line, and two elements are related iff the corresponding intervals do not overlap. For any interval ordered pair of nodes u and v, either the successors of u are also successors of v or the successors of v are also successors of u.
21
21 Scheduling interval ordered tasks 1.The number of successors of each node is used as each node’s priority 2.Whenever a processor becomes available, assign it the unexecuted ready task with the highest priority
22
22 Example 1: Scheduling Interval Ordered tasks
23
Example 2: Scheduling Interval Ordered tasks 23 TaskPriority A8 B6 C5 D5 E4 F1 G3 H0 I0 J0 23 AB C DE FG IJH tProcessors 0P1P2P3 1AB 2CDE 3GF 4HIJ Priority Assignment Scheduling
24
Example 3: Scheduling Interval Ordered tasks 24 AB C DE G KLH Priority Assignment Scheduling F IJH
25
25 Communication Models Completion Time –Execution time –Communication time Completion Time as 2 Components Completion Time from the Gantt Chart
26
26 Completion Time as 2 Components Completion Time = Execution Time + Total Communication Delay Total Communication Delay = Number of communication messages * delay per message Execution time maximum finishing time of any task Number of communication messages –Model A –Model B
27
27 Completion Time from the Gantt Chart (Model C) Completion Time = Schedule Length This model assumes the existence of an I/O processor with every processor in the system Communication delay between two tasks allocated to the same processor is negligible. Communication delay is counted only between two tasks assigned to different processors
28
28 Example A 1 D 1 E 1 B 1 C 1 Assume a system with 2 processors
29
29 Models A and B Assume tasks A, B, and D are assigned to P1 and tasks C and E are assigned to p2 A B D P1 C E P2 Model A Number of messages = 2 Completion time = 3 + 2 Model B Number of messages = 1 Completion time = 3 + 1 A 1 D 1 E 1 B 1 C 1
30
30 Model C A B CD E Communication Delay P1P2 0 1 2 3 4 A 1 D 1 E 1 B 1 C 1
31
31 A 4 D 5 E 3 B 9 C 7 L 1 M 1 F 1 G 1 I 1 H 1 K 1 J 1 Processors P1P2P3 A BCD EHJ FLK GM HI Model A B Task Assignment Processors P1P2P3 A B BCD EHJ FK GL HM I Model C Task Assignment Model A Number of Messages = 2 + 2 Completion time = 3 + (2*4 + 2*3) = 17 Model B Number of Messages = 2 + 1 = 3 Completion time = 3 + (2*4 + 1*3) = 14 Model C Completion time = 8 Communication delay is displayed in the graph for A & B. Assume execution time of a task is 1. (assume all communication delay is 1 for simplicity) Models A,B,C Example
32
32 Heuristics A heuristic produces an answer in less than exponential time, but does not guarantee an optimal solution. Communication delay versus parallelism Clustering Duplication
33
33 Communication Delay versus Parallelism
34
34 Clustering
35
Clustering Example 1 Part 1 35 A B C ED F G 4 3 2 1.5 2 5 1 1 1 1 2 1 1 TimeP1P2 1A 2 B 3C 4D 5 6E 7 8F 9 10G Task Assignment 1 Communication Delay NOP
36
Clustering Example 1 Part 2 36 A B C ED F G 4 3 2 1.5 2 5 1 1 1 1 2 1 1 TimeP1P2 1A 2 B 3C 4D 5 6 E 7 8F 9G Task Assignment 1 Communication Delay NOP
37
37 Clustering Example 2 37 A B D FE G H 4 3 2 22 5 3 2 1 1 2 3 1 TimeP1P2 1A 2 B 3 4 5D 6D 7 C E 8E 9F 10F 11G 12 13H Task Assignment C 2 1 5 2 1 Communication Delay NOP
38
38 Duplications
39
Duplication Example (Using Clustering Example 1 Part 2) 39 A B C ED F G 4 3 2 1.5 2 5 1 1 1 1 2 1 1 TimeP1P2 1AA 2 B C 3D 4 5 E 6 7F 8G Task Assignment 1 Communication Delay NOP
40
40 Scheduling and grain packing Four major steps are involved in the grain determination and the process of scheduling optimization: –Step 1. Construct a fine-grain program graph. –Step 2. Schedule the fine-grain computation. –Step 3. Grain packing to produce the coarse grains. –Step 4. Generate a parallel schedule based on the packed graph.
41
41 Program decomposition for static multiprocessor scheduling two 2 x 2 matrices A and B are multiplied to compute the sum of the four elements in the resulting product matrix C = A x B. There are eight multiplications and seven additions to be performed in this program, as written below:
42
42 Example 2.5 Ctd’ –C 11 = A 11 B 11 + A 12 B 21 –C 12 = A 11 B 12 + A 12 B 22 –C 21 = A 21 B 11 + A 22 B 21 –C 22 = A 21 B 11 + A 22 B 22 –Sum = C 11 + C 12 + C 21 + C 22
43
43
44
44
45
45
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.