Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scheduling Algorithms

Similar presentations


Presentation on theme: "Scheduling Algorithms"— Presentation transcript:

1 Scheduling Algorithms
A SoC Design Automation School of EECS Seoul National University

2 Unconstrained minimum-latency scheduling problem
Find j : V --> Z+ such that j(vi) = ti, ti ³ tj + dj, " i, j | (vj, vi) Î E and tn is minimum Resource-constrained minimum-latency scheduling problem |{vi | t(vi) = k and ti £ l < ti + di}|£ ak for each op type k = 1, 2, ..., nres and schedule step l = 1, 2, ..., tn and ti vi ti+di

3 Scheduling without Resource Constraints
Unconstrained scheduling Dedicated resource operation types are all different or resource cost is marginal Resource binding is done resource conflicts are resolved by serializing operations that share the same resource Unconstrained scheduling gives lower bound on latency for constrained problems

4 Scheduling without Resource Constraints
ASAP (As Soon As Possible) scheduling v0 NOP v1 * * v2 * v6 * + C-step 1 v8 v10 C-step 2 v7 + < v3 * * v11 v9 C-step 3 - v4 C-step 4 - v5 NOP vn

5 Scheduling without Resource Constraints
ASAP scheduling algorithm ASAP (G(V, E)) { schedule v0 by setting t0S = 1; repeat { select a vertex vj whose predecessors are all scheduled; schedule vj by setting tjS = max (tiS + di), (vi, vj) Î E; } until (vn is scheduled); return (TS); -- TS = {t0S, t1S,...,tnS} topological sorting --> O(|V| + |E|)

6 Scheduling without Resource Constraints
ALAP (As Late As Possible) scheduling NOP * - v0 v1 v2 v6 v3 v4 v7 v8 + v10 < v11 v9 v5 vn C-step 1 C-step 2 C-step 3 C-step 4 Mobility mi= tiL - tiS

7 Scheduling without Resource Constraints
ALAP scheduling algorithm ALAP (G(V, E), l’) { schedule vn by setting tnL = l’ + 1; repeat { select a vertex vi whose successors are all scheduled; schedule vi by setting tiL = min (tjL - di), (vi, vj) Î E; } until (v0 is scheduled); return (TL); --TL = {t0L, t1L,...,tnL} where l’ = tnS - t0S topological sorting --> O(|V| + |E|) mobility mi= tiL - tiS

8 Scheduling with Resource Constraints
Given resource constraint find area/latency trade-off points Integer Linear Programming (ILP) model C.-T. Hwang, J.-H. Lee, and Y.-C. Hsu, “A formal approach to the scheduling problem in high level synthesis,” IEEE Trans. on CAD, April 1991. Exact solution but NP-complete

9 Scheduling with Resource Constraints
l=tnS tnL Minimize cT t = [ ] [t0 t1 ... tn]T = tn = S l xnl subject to S xil = 1, i = 0, 1, ..., n S l xil – S l xjl - dj ³ 0, i, j = 0, 1, ..., n, (vj , vi) Î E S S xim £ ak , k= 1, 2, ..., nres , l = 1, 2, ..., l¢+1 xil Î {0, 1}, i = 0, 1, ..., n, l = 1, 2, ..., l¢+1 where 1 if vi starts in step l 0 otherwise dj : execution delay of operation j t(vi) : resource type of operation vi ak : resource constraint l¢ : latency obtained by a heuristic algorithm l=tiS tiL unique start time l=tiS tiL l=tjS tjL data dependency m=l-di+1 l resource constraint i:t(vi)=k xil ={ ti=l-di+1 vi l-1 l l+1 l+2

10 Scheduling with Resource Constraints
Minimize area under latency constraint --> ak: variable objective function: cT a = [area1, area2, ... areanres] [a1 a2 ... anres]T S l xnl £ l¢+1 --> added as latency constraint redundant (S xil = 1, i = 0, 1, ..., n) l=tnS tnL l=tiS tiL

11 Scheduling with Resource Constraints
Example 1 # mult = a1 = 2 # ALU = a2 = 2 by heuristic (list scheduling) algorithm l’ = 4 NOP v0 v1 * + * * v2 * v6 v8 v10 < v7 + v3 * * v11 v9 - v4 - v5 NOP vn

12 Scheduling with Resource Constraints
x0,1 = 1 x1,1 = 1 x2,1 = 1 x3,2 = 1 x4,3 = 1 x5,4 = 1 x6,1 + x6,2 = 1 x7,2 + x7,3 = 1 x8,1 + x8,2 + x8,3 = 1 x9,2 + x9,3 + x9,4 = 1 x10,1 + x10,2 + x10,3 = 1 x11,2 + x11,3 + x11,4 = 1 xn,5 = 1 2x7,2 + 3x7,3 - x6,1 - 2x6,2 - 1 ³ 0 2x9,2 + 3x9,3 + 4x9,4 - x8,1 - 2x8,2 - 3x8,3 - 1 ³ 0 2x11,2 + 3x11,3 + 4x11,4 - x10,1 - 2x10,2 - 3x10,3 - 1 ³ 0 4x5,4 - 2x7,2 - 3x7,3 - 1 ³ 0 5xn,5 - 2x9,2 - 3x9,3 - 4x9,4 -1 ³ 0 5xn,5 - 2x11,2 - 3x11,3 - 4x11,4 - 1 ³ 0 x1,1 + x2,1 + x6,1 + x8,1 £ 2 x3,2 + x6,2 + x7,2 + x8,2 £ 2 x7,3 + x8,3 £ 2 x10,1 £ 2 x9,2 + x10,2 + x11,2 £ 2 x4,3 + x9,3 + x10,3 + x11,3 £ 2 x5,4 + x9,4 + x11,4 £ 2 data-dependency unique start time resource constraint

13 Scheduling with Resource Constraints
NOP * - v0 v1 v2 v6 v3 v4 v7 v8 v10 < v11 + v9 v5 vn C-step 1 C-step 2 C-step 3 C-step 4

14 Scheduling with Resource Constraints
Example 2 (minimize area under latency constraint) cT a = [5, 1] [amult, aALU]T l’ = 4 x1,1 + x2,1 + x6,1 + x8,1 - a1 £ 0 x3,2 + x6,2 + x7,2 + x8,2 - a1 £ 0 x7,3 + x8,3 - a1 £ 0 x10,1 - a2 £ 0 x9,2 + x10,2 + x11,2 - a2 £ 0 x4,3 + x9,3 + x10,3 + x11,3 - a2 £ 0 x5,4 + x9,4 + x11,4 - a2 £ 0 x1,1 + x2,1 + x6,1 + x8,1 £ 2 x3,2 + x6,2 + x7,2 + x8,2 £ 2 x7,3 + x8,3 £ 2 x10,1 £ 2 x9,2 + x10,2 + x11,2 £ 2 x4,3 + x9,3 + x10,3 + x11,3 £ 2 x5,4 + x9,4 + x11,4 £ 2 result: a1 = amult = 2 a2 = aALU = 2 same as the previous example

15 Scheduling with Resource Constraints
Heuristic Scheduling Algorithms List scheduling Force-directed scheduling List scheduling (resource-constrained minimum-latency) Priority list: weight of the longest path to sink NOP - * + < v0 v1 v2 v6 v3 v4 v7 v8 v10 v11 v9 v5 vn 4 3 2 1

16 Scheduling with Resource Constraints
LIST_L (G(V, E), a) { l = 1; repeat { for each resource type k = 1, 2, ..., nres { Determine candidate operations Cl,k; Determine unfinished operations Ul,k; Select Sk Í Cl,k vertices, such that |Sk| + |Ul,k| £ ak; Schedule the Sk operations at step l by setting ti = l "i : vi Î Sk; } l = l + 1; until (vn is scheduled); return (T); NOP - * + < v0 v1 v2 v6 v3 v4 v7 v8 v10 v11 v9 v5 vn 4 3 2 1

17 Scheduling with Resource Constraints
Example 1 a1 = 2 mult a2 = 2 ALU {v1, v2} {v10} {v3, v6} {v11} {v7, v8} {v4} {v5, v9} NOP v0 v1 * 4 * v2 * v6 * + 4 v8 v10 3 2 2 * 3 * v7 + < v3 v9 v11 2 1 1 - v4 2 - v5 1 v0 NOP NOP vn v1 * * v2 + v10 C-step 1 * < C-step 2 v6 v3 * v11 - v8 C-step 3 v4 * v7 * - + v9 C-step 4 v5 NOP vn

18 Scheduling with Resource Constraints
Example 2 a1 = 3 mult, mult delay = 2 a2 = 1 ALU, ALU delay = 1 {v1, v2, v6} {v10} {v11} {v3, v7, v8} {v4} {v5} {v9} NOP v0 * v1 * v2 * v6 + v10 C-step 1 < v11 C-step 2 * v3 * v7 * v8 C-step 3 C-step 4 - v4 C-step 5 - v5 C-step 6 + v9 C-step 7 NOP vn

19 Scheduling with Resource Constraints
List scheduling (latency-constrained minimum-resource) Start with one resource per type (a = 1) Use slack computed by ALAP (tiL - l) The lower the slack, the higher the urgency Zero slack --> schedule --> no more resource --> add resource

20 Scheduling with Resource Constraints
LIST_R (G(V, E), l’ ) { a = 1; Compute the latest possible start times TL by ALAP(G(V, E), l’ ); if (t0L < 0) return (Æ); l = 1; repeat { for each resource type k = 1, 2, ..., nres { Determine candidate operations Cl,k; Compute the slacks {si = tiL - l, "vi Î Cl,k}; Schedule the candidate operations with zero slack and update a; Schedule the candidate operations requiring no additional resources; } l = l + 1; until (vn is scheduled); return (T, a);

21 Scheduling with Resource Constraints
Example a = [1, 1]T {v1, v2} ---> a = [2, 1]T {v10} {v3, v6} {v11} {v7, v8} {v4} {v5, v9} ---> a = [2, 2]T zero slack v0 NOP v1 * * v2 + v10 C-step 1 * v6 < C-step 2 v3 * v11 + v10 < v11 C-step 3 - * v4 * v7 v8 C-step 4 - + v5 v9 NOP vn

22 Scheduling with Resource Constraints
Force-directed scheduling P. Paulin and J. Knight, “Force-directed scheduling for the behavioral synthesis of ASIC’s,” IEEE Trans. on CAD, June 1989. Time frame: [tiS, tiL], i = 0, 1, ..., n ---> width = mobility + 1 Operation probability: pi (l) = 1/(width of time frame) Type distribution: qk (l) = sum of operation probabilities in step l for operations implementable by type k --> distribution graph NOP * + - < v0 v1 v2 v6 v3 v4 v7 v8 v10 v11 v9 v5 vn 1 2 3 4 Distribution graph for ALU 1/3 5/3

23 Scheduling with Resource Constraints
Example unit delay latency bound = 4 ASAP, ALAP --> time frames p1(1) = 1 p1(2) = p1(3) = p1(4) = 0 p6(1) = p6(2) = 1/2 p8(1) = p8(2) = p8(3) = 1/3 q1(1) = /2 + 1/3 = 17/6 NOP * + - < v0 v1 v2 v6 v3 v4 v7 v8 v10 v11 v9 v5 vn 1 2 3 4 Distribution graph for multiplier 17/6 7/3 5/6 multiplier

24 Scheduling with Resource Constraints
displacement in probability self-force(i, l) = S k x = Smqk(m) (dlm - pi(m)) ---> repelling force = qk(l) - (Smqk(m)) / (mi + 1) , m = tiS, ... tiL NOP v0 average distribution v1 * * v2 * v6 * + v8 v10 v7 + < v3 * * v9 v11 - v4 17/6 f=kx 1 2 3 4 1 2 3 4 - v5 v6 7/3 NOP vn 5/6

25 Scheduling with Resource Constraints
Example q1(1) = 17/6 q1(2) = 7/3 self-force(6, 1) = 17/6 (1 - 1/2) + 7/3 (0 - 1/2) = 0.25 self-force(6, 2) = 17/6 (0 - 1/2) + 7/3 (1 - 1/2) = Predecessor/successor force assigning an operation to a specific step may reduce the time frame of other operations due to dependency relations v6 to step 2 ---> v7 to step 3 self-force (7, 3) = q1(2) (0 - p7(2)) + q1(3) (1 - p7(3)) = -0.75 = successor-force (6, 2) total-force (6, 2) = = -1 NOP * + - < v0 v1 v2 v6 v3 v4 v7 v8 v10 v11 v9 v5 vn

26 Scheduling with Resource Constraints
ps-force (i, l) = Sj=ps(i) ((Sm’ qk(m’))/(mj’ + 1) - (Sm qk(m))/(mj + 1)) where m’ = [tjS’, tjL’] ---> reduced time frame m = [tjS, tjL] ---> initial time frame Example v8 to step 2 ---> v9 to step 3 or 4 ps-force (8, 2) = 1/2(q2(3) + q2(4)) - 1/3(q2(2) + q2(3) + q2(4)) = 0.3 Compared to list scheduling, force-directed scheduling produces better results but takes longer NOP * + - < v0 v1 v2 v6 v3 v4 v7 v8 v10 v11 v9 v5 vn

27 Scheduling Graphs with Alternative Paths
Branching S i:t(vi)=k Slm=l-di+1 xim £ ak , k = 1, ..., nres , l = 1, ..., l’ + 1 , c = 1, ..., nc C: V --> {1, ..., nc} partition V into nc groups operations in different groups are mutually exclusive and C(vi)=c TRUE FALSE mutually exclusive can share a resource without affecting the performance

28 Scheduling Graphs with Alternative Paths
Example Assume path (v0, v8, v9, vn) is mutually exclusive with the remaining operations v0 NOP x1,1 + x2,1 + x6,1 + x8,1 - a1 £ 0 x3,2 + x6,2 + x7,2 + x8,2 - a1 £ 0 x7,3 + x8,3 - a1 £ 0 x10,1 - a2 £ 0 x9,2 + x10,2 + x11,2 - a2 £ 0 x4,3 + x9,3 + x10,3 + x11,3 - a2 £ 0 x5,4 + x9,4 + x11,4 - a2 £ 0 v1 * * v2 * v6 * + v8 v10 v7 + < v3 * * v11 v9 - v4 x8,1 - a1 £ 0 x8,2 - a1 £ 0 x8,3 - a1 £ 0 x9,2 - a2 £ 0 x9,3 - a2 £ 0 x9,4 - a2 £ 0 x1,1 + x2,1 + x6,1 - a1 £ 0 x3,2 + x6,2 + x7,2 - a1 £ 0 x7,3 - a1 £ 0 x10,1 - a2 £ 0 x10,2 + x11,2 - a2 £ 0 x4,3 + x10,3 + x11,3 - a2 £ 0 x5,4 + x11,4 - a2 £ 0 - v5 NOP vn

29 Scheduling Graphs with Alternative Paths
List scheduling + condition vector + branching probability K. Wakabayashi and T. Yoshimura, “A resource sharing and control synthesis method for conditional branches,” ICCAD, Proceedings of the International Conference on Computer-Aided Design, 1989. na; if (c1) { nb; if (c2) nc; else { nd; if (c3) ne; else nf; } else ng; basic condition vector [1,0,0,0] nc [1,1,1,0] [0,1,0,0] nb ne [1,1,1,1] na nd [0,1,1,0] ng [0,0,0,1] nf [0,0,1,0]

30 Scheduling Graphs with Alternative Paths
Basic condition vector vi = one hot encoding for a leaf node vk or vl or ... or vm for non-leaf node where vk, vl, …, and vm are immediate successors of vi Extended condition vector ei = 1 for a predecessor of the sink node vi for a predecessor of a join node ek or el or ... or em in other cases where ek, el, …, and em are immediate successors of ei Actual condition vector ai = ei or el or ... or em where nl, ..., nm Î Ci , Ci = {nj | nj is a conditional node related to ni, nj has not been scheduled yet, ei and ej ¹ 0 } ni [1,1] [0,1] [1,0] nj source sink Statically compute the extended condition vectors and use them as the priorities. Dynamically compute the actual condition vectors and use them to compute the number of resources used.

31 Scheduling Graphs with Alternative Paths
if (a > b) then x = a + b else x = c + b; a b c a b c > + + > MUX + MUX x x

32 Scheduling Graphs with Alternative Paths
Priority function pf (ni) = (pi , di) = Sj pij * dij where pij is the occurrence probability of leaf condition j and di is the sum of extended condition vectors for all operation nodes in the path from the successors of ni to the sink di is computed from the sink (when several paths merge, largest components are adopted) source [1,1] [1,1] [1,1] [1,1] [1,1] [1,0] [1,1] [0,1] [1,0] dij = longest path length (no nodes in the path have 0 in the j-th position of the CV for non-zero eij). e is used rather than a because the schedule of conditional nodes is not known yet. [0,1] [1,0] [1,1] [1,1] [1,1] sink

33 Scheduling Graphs with Alternative Paths
Algorithm (1) calculate ei, di, pf (ni) for all nodes in set R, current c-step l = 1 (2) move candidate nodes for c-step l from R to set C (3) from C, select ni with the largest pf (ni) (4) if ni is a successor of a join node, duplicate (5) if the largest component of slk, k = t (ni) does not exceed the number of available functional units of type k, then assign ni to c-step l slk = S alk , for all nodes of type k scheduled at c-step l otherwise, put ni into R (6) if C is not empty go to (3) (7) if R is not empty, l = l + 1 and go to (2) (8) re-assign operation nodes (post processing for further optimization) (9) synthesize control sequence dij = longest path length (no nodes in the path have 0 in the j-th position of the CV for non-zero eij). e is used rather than a because the schedule of conditional nodes is not known yet.

34 Scheduling Graphs with Alternative Paths
Duplication of operation nodes (4) code lowering [1,0,0] [0,1,1] a c b a b c + + + [1,1,1] [1,0,0] [0,1,1] x x - Re-assignment of operation nodes (8) the number of zeros in sall increases by moving operations with mobility > 0 - Synthesis of control sequence (9) if some components of slall are zero, then c-step l can be skipped for the corresponding branches Duplication is used for (8) and (9). See the next slide.

35 Scheduling Graphs with Alternative Paths
sall s+ s- = + < - + + - [1,1,1] [1,0,1] [0,1,1] [0,0,1] [1,0,0] [0,0,1] [1,0,0] - + + [0,1,0] [1,1,2] [2,1,2] [2,2,2] [0,0,1] [1,0,1] [1,1,1] [1,1,1] [1,0,0] + [0,0,1] + + [1,1,1] - + + - [1,1,1] [0,1,1] [1,1,1] [0,1,1] [0,0,0] [0,1,1] + + [1,1,1]

36 Scheduling Pipelined Circuits
Structural pipelining Pipelined resources List scheduling can be extended allow scheduling of overlapping operations different start times no data dependency Example pipelined mult (Wallace tree + CPA) 3 pipelined mult 1 ALU latency 7 --> 6 ILP, FDS can also be extended v0 NOP * v1 * v2 * v6 + v10 C-step 1 < v11 C-step 2 * * v3 * v7 v8 C-step 3 * v8 + C-step 4 - v4 C-step 5 - v5 C-step 6 v9 + C-step 7 vn NOP

37 Scheduling Pipelined Circuits
Functional pipelining Lower bound of number of resources of type k: ak’ = énk / d0ù where nk is the number of operations of type k and d0 is the data introduction interval Sp= S i:t(vi)=k Sm=l-di+1+ pd0 xim £ ak , k = 1, ..., nres , l = 1, ..., d0 él’ / d0ù: number of pipeline stages Example unit delay d0 = 2 a1’ = é6 / 2ù = 3 mult a2’ = é5 / 2ù = 3 ALU try scheduling with a = [3, 3]T use ILP él’ / d0ù -1 l+pd0 v1 * * v2 * + v6 v10 C-step 1 stage 1 * * * v7 v8 < v3 v11 C-step 2 - v4 + v9 C-step 1 stage 2 - v5 C-step 2

38 Scheduling Pipelined Circuits
Heuristic scheduling can be extended to schedule pipelined circuits List scheduling at each control step l, check resource bound Sp= S i:t(vi)=k Sm=(l mod d0)-di+1+ pd0 xim £ ak to determine schedulable candidates N. Park and A.C. Parker, "Sehwa: a software package for synthesis of pipelines from behavioral specifications," IEEE Trans. on Computer-Aided Design, Mar Force-directed scheduling the computation of type distribution must consider the actual operation concurrency across the control step boundaries él / d0ù -1 (l mod d0)+pd0

39 Scheduling Pipelined Circuits
Loop folding Pipeline the loop body Data introduction interval: dl loop execution delay = (nl - 1) dl + éll / dlù dl # pipe stage ll dl nl

40 Scheduling Pipelined Circuits
Example 1 ll = 4 dl = 2 nl = 10 loop execution delay (without folding) = nl ll = 40 loop execution delay (with folding) = (nl - 1) dl + éll / dlù dl = 22 NOP NOP NOP 1 2 1 2 4 3 3 5 4 NOP NOP 5 NOP

41 Scheduling Pipelined Circuits
Example 2 s = 0; step 1 for i = 1 to 10 { p[i] = c[i] + in[i]; step 2 s = s + p[i]; step 3 } ---> p[1] = c[1] + in[1]; step 2 for i = 2 to 10 { s = s + p[i-1]; p[i] = c[i] + in[i]; step 3 s = s + p[10]; step 4 21 cycles --> 12 cycles


Download ppt "Scheduling Algorithms"

Similar presentations


Ads by Google