Mani Srivastava UCLA - EE Department Room: 6731-H Boelter Hall Tel: WWW: Copyright 2003 Mani Srivastava High-level Synthesis Scheduling, Allocation, Assignment, Note: Several slides in this Lecture are from Prof. Miodrag Potkonjak, UCLA CS
Copyright 2003 Mani Srivastava 2 Overview n High Level Synthesis n Scheduling, Allocation and Assignment n Estimations n Transformations
Copyright 2003 Mani Srivastava 3 Allocation, Assignment, and Scheduling Techniques Well Understood and Mature
Copyright 2003 Mani Srivastava 4 Scheduling and Assignment Control Step Control Step
Copyright 2003 Mani Srivastava 5 ASAP Scheduling Algorithm
Copyright 2003 Mani Srivastava 6 ASAP Scheduling Example
Copyright 2003 Mani Srivastava 7 ASAP: Another Example Sequence Graph ASAP Schedule
Copyright 2003 Mani Srivastava 8 ALAP Scheduling Algorithm
Copyright 2003 Mani Srivastava 9 ALAP Scheduling Example
Copyright 2003 Mani Srivastava 10 ALAP: Another Example Sequence Graph ALAP Schedule (latency constraint = 4)
Copyright 2003 Mani Srivastava 11 Observation about ALAP & ASAP n No priority is given to nodes on critical path n As a result, less critical nodes may be scheduled ahead of critical nodes n No problem if unlimited hardware n However of the resources are limited, the less critical nodes may block the critical nodes and thus produce inferior schedules n List scheduling techniques overcome this problem by utilizing a more global node selection criterion
Copyright 2003 Mani Srivastava 12 List Scheduling and Assignment
Copyright 2003 Mani Srivastava 13 List Scheduling Algorithm using Decreasing Criticalness Criterion
Copyright 2003 Mani Srivastava 14 Scheduling n NP-complete Problem n Optimal n Heuristics - Iterative Improvements n Heuristics – Constructive n Various versions of problem Unconstrained minimum latency Resource-constrained minimum latency Timing constrained n If all resources identical, reduced to multiprocessor scheduling Minimum latency multiprocessor problem is intractable
Copyright 2003 Mani Srivastava 15 Scheduling - Optimal Techniques n Integer Linear Programming n Branch and Bound
Copyright 2003 Mani Srivastava 16 Integer Linear Programming n Given : integer-valued matrix A mxn, vectors B = ( b 1, b 2, …, b m ), C = ( c 1, c 2, …, c n ) n Minimize : C T X n Subject to: AX B X = ( x 1, x 2, …, x n ) is an integer-valued vector
Copyright 2003 Mani Srivastava 17 Integer Linear Programming n Problem: For a set of (dependent) computations {t 1,t 2,...,t n }, find the minimum number of units needed to complete the execution by k control steps. n Integer linear programming: Let y 0 be an integer variable. For each control step i ( 1 i k ): define variable x ij as x ij = 1, if computation t j is executed in the ith control step. x ij = 0, otherwise. define variable y i = x i1 + x I x in.
Copyright 2003 Mani Srivastava 18 Integer Linear Programming n Integer linear programming: For each computation dependency: t i has to be done before t j, introduce a constraint: k x 1i + (k-1) x 2i x ki k x 1j + (k-1) x 2j x kj + 1(*) Minimize: y 0 Subject to : x 1i + x 2i x ki = 1 for all 1 i n y j y 0 for all 1 i k all computation dependency of type (*)
Copyright 2003 Mani Srivastava 19 An Example c1c1 c2c2 c3c3 c4c4 c6c6 c5c5 6 computations 3 control steps
Copyright 2003 Mani Srivastava 20 An Example n Introduce variables: u x ij for 1 i 3, 1 j 6 u y i = x i1 +x i2 +x i3 +x i4+ x i5 +x i6 for 1 i 3 u y 0 n Dependency constraints: e.g. execute c 1 before c 4 3x 11 +2x 21 +x 31 3x 14 +2x 24 +x n Execution constraints: x 1i +x 2i +x 3i = 1 for 1 i 6
Copyright 2003 Mani Srivastava 21 An Example n Minimize:y 0 n Subject to:y i y 0 for all 1 i 3 dependency constraints execution constraints n One solution:y 0 = 2 x 11 = 1, x 12 = 1, x 23 = 1, x 24 = 1, x 35 = 1, x 36 = 1. All other x ij = 0
Copyright 2003 Mani Srivastava 22 ILP Model of Scheduling n Binary decision variables x il u i = 0, 1, …, n u l = 1, 2, … +1 n Start time is unique
Copyright 2003 Mani Srivastava 23 ILP Model of Scheduling (contd.) n Sequencing relationships must be satisfied n Resource bounds must be met u let upper bound on # of resources of type k be a k
Copyright 2003 Mani Srivastava 24 Minimum-latency Scheduling Under Resource-constraints n Let t be the vector whose entries are start times n Formal ILP model
Copyright 2003 Mani Srivastava 25 Example n Two types of resources u Multiplier u ALU Adder Subtraction Comparison n Both take 1 cycle execution time
Copyright 2003 Mani Srivastava 26 Example (contd.) n Heuristic (list scheduling) gives latency = 4 steps n Use ALAP and ASAP (with no resource constraints) to get bounds on start times u ASAP matches latency of heuristic so heuristic is optimum, but let us ignore it! n Constraints?
Copyright 2003 Mani Srivastava 27 Example (contd.) n Start time is unique
Copyright 2003 Mani Srivastava 28 Example (contd.) n Sequencing constraints u note: only non-trivial ones listed those with more than one possible start time for at least one operation
Copyright 2003 Mani Srivastava 29 Example (contd.) n Resource constraints
Copyright 2003 Mani Srivastava 30 Example (contd.) n Consider c = [0, 0, …, 1] T u Minimum latency schedule u since sink has no mobility (x n,5 = 1), any feasible schedule is optimum n Consider c = [1, 1, …, 1] T u finds earliest start times for all operations u equivalently,
Copyright 2003 Mani Srivastava 31 Example Solution: Optimum Schedule Under Resource Constraint
Copyright 2003 Mani Srivastava 32 Example (contd.) n Assume multiplier costs 5 units of area, and ALU costs 1 unit of area n Same uniqueness and sequencing constraints as before n Resource constraints are in terms of unknown variables a 1 and a 2 a 1 = # of multipliers a 2 = # of ALUs
Copyright 2003 Mani Srivastava 33 Example (contd.) n Resource constraints
Copyright 2003 Mani Srivastava 34 Example Solution n Minimize c T a = 5.a a 2 n Solution with cost 12
Copyright 2003 Mani Srivastava 35 Precedence-constrained Multiprocessor Scheduling n All operations done by the same type of resource u intractable problem u intractable even if all operations have unit delay
Copyright 2003 Mani Srivastava 36 Scheduling - Iterative Improvement n Kernighan - Lin (deterministic) n Simulated Annealing n Lottery Iterative Improvement n Neural Networks n Genetic Algorithms n Taboo Search
Copyright 2003 Mani Srivastava 37 Scheduling - Constructive Techniques n Most Constrained n Least Constraining
Copyright 2003 Mani Srivastava 38 Force Directed Scheduling n Goal is to reduce hardware by balancing concurrency n Iterative algorithm, one operation scheduled per iteration n Information (i.e. speed & area) fed back into scheduler
Copyright 2003 Mani Srivastava 39 The Force Directed Scheduling Algorithm
Copyright 2003 Mani Srivastava 40 Step 1 n Determine ASAP and ALAP schedules * - + * * * + < * * - * - + *** + < ** - ASAP ALAP
Copyright 2003 Mani Srivastava 41 Step 2 n Determine Time Frame of each op u Length of box ~ Possible execution cycles u Width of box ~ Probability of assignment u Uniform distribution, Area assigned = 1 C-step 1 C-step 2 C-step 3 C-step 4 Time Frames * - * * - * * * + < + 1/2 1/3
Copyright 2003 Mani Srivastava 42 Step 3 n Create Distribution Graphs u Sum of probabilities of each Op type Indicates concurrency of similar Ops DG(i) = Prob(Op, i) DG for Multiply DG for Add, Sub, Comp
Copyright 2003 Mani Srivastava 43 Diff Eq Example: Precedence Graph Recalled
Copyright 2003 Mani Srivastava 44 Diff Eq Example: Time Frame & Probability Calculation
Copyright 2003 Mani Srivastava 45 Diff Eq Example: DG Calculation
Copyright 2003 Mani Srivastava 46 Conditional Statements n Operations in different branches are mutually exclusive n Operations of same type can be overlapped onto DG n Probability of most likely operation is added to DG DG for Add Fork Join
Copyright 2003 Mani Srivastava 47 Self Forces n Scheduling an operation will effect overall concurrency n Every operation has 'self force' for every C-step of its time frame n Analogous to the effect of a spring: f = Kx n Desirable scheduling will have negative self force l Will achieve better concurrency (lower potential energy ) Force(i) = DG(i) * x(i) DG(i) ~ Current Distribution Graph value x(i) ~ Change in operation’s probability Self Force(j) = [Force(i)]
Copyright 2003 Mani Srivastava 48 Example n Attempt to schedule multiply in C-step 1 Self Force(1) = Force(1) + Force(2) = ( DG(1) * X(1) ) + ( DG(2) * X(2) ) = [2.833*(0.5) * (-0.5)] = n This is positive, scheduling the multiply in the first C-step would be bad DG for Multiply * - * * - * * * + < + C-step 1 C-step 2 C-step 3 C-step 4 1/2 1/3
Copyright 2003 Mani Srivastava 49 Diff Eq Example: Self Force for Node 4
Copyright 2003 Mani Srivastava 50 Predecessor & Successor Forces n Scheduling an operation may affect the time frames of other linked operations n This may negate the benefits of the desired assignment n Predecessor/Successor Forces = Sum of Self Forces of any implicitly scheduled operations * - + * * * + < * * -
Copyright 2003 Mani Srivastava 51 Diff Eq Example: Successor Force on Node 4 n If node 4 scheduled in step 1 u no effect on time frame for successor node 8 n Total force = Froce4(1) = n If node 4 scheduled in step 2 u causes node 8 to be scheduled into step 3 u must calculate successor force
Copyright 2003 Mani Srivastava 52 Diff Eq Example: Final Time Frame and Schedule
Copyright 2003 Mani Srivastava 53 Diff Eq Example: Final DG
Copyright 2003 Mani Srivastava 54 Lookahead n Temporarily modify the constant DG(i) to include the effect of the iteration being considered Force (i) = temp_DG(i) * x(i) temp_DG(i) = DG(i) + x(i)/3 n Consider previous example: Self Force(1) = (DG(1) + x(1)/3)x(1) + (DG(2) + x(2)/3)x(2) =.5( /3) -.5( /3) = n This is even worse than before
Copyright 2003 Mani Srivastava 55 Minimization of Bus Costs n Basic algorithm suitable for narrow class of problems n Algorithm can be refined to consider “cost” factors n Number of buses ~ number of concurrent data transfers n Number of buses = maximum transfers in any C-step n Create modified DG to include transfers: Transfer DG Trans DG(i) = [Prob (op,i) * Opn_No_InOuts] Opn_No_InOuts ~ combined distinct in/outputs for Op n Calculate Force with this DG and add to Self Force
Copyright 2003 Mani Srivastava 56 Minimization of Register Costs n Minimum registers required is given by the largest number of data arcs crossing a C-step boundary n Create Storage Operations, at output of any operation that transfers a value to a destination in a later C-step n Generate Storage DG for these “operations” n Length of storage operation depends on final schedule
Copyright 2003 Mani Srivastava 57 Minimization of Register Costs( contd.) n avg life] = n storage DG(i) = (no overlap between ASAP & ALAP) n storage DG(i) = (if overlap) n Calculate and add “Storage” Force to Self Force 7 registers minimum ASAPForce Directed 5 registers minimum
Copyright 2003 Mani Srivastava 58 Pipelining * * * *** + + < - - * * * *** + + < - - DG for Multiply 1 2 3, 1’ 4, 2’ 3’ 4’ Instance Instance’ Functional Pipelining * * Structural Pipelining n Functional Pipelining u Pipelining across multiple operations u Must balance distribution across groups of concurrent C- steps u Cut DG horizontally and superimpose u Finally perform regular Force Directed Scheduling n Structural Pipelining u Pipelining within an operation u For non data-dependant operations, only the first C-step need be considered
Copyright 2003 Mani Srivastava 59 Other Optimizations n Local timing constraints u Insert dummy timing operations -> Restricted time frames n Multiclass FU’s u Create multiclass DG by summing probabilities of relevant ops n Multistep/Chained operations. u Carry propagation delay information with operation u Extend time frames into other C-steps as required n Hardware constraints u Use Force as priority function in list scheduling algorithms
Copyright 2003 Mani Srivastava 60 Scheduling using Simulated Annealing Reference: Devadas, S.; Newton, A.R. Algorithms for hardware allocation in data path synthesis. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, July 1989, Vol.8, (no.7):
Copyright 2003 Mani Srivastava 61 Simulated Annealing Local Search Solution space Cost function ?
Copyright 2003 Mani Srivastava 62 Statistical Mechanics Combinatorial Optimization State {r:} (configuration -- a set of atomic position ) weight e -E({r:])/K B T -- Boltzmann distribution E({r:]): energy of configuration K B : Boltzmann constant T: temperature Low temperature limit ??
Copyright 2003 Mani Srivastava 63 Analogy Physical System State (configuration) Energy Ground State Rapid Quenching Careful Annealing Optimization Problem Solution Cost Function Optimal Solution Iteration Improvement Simulated Annealing
Copyright 2003 Mani Srivastava 64 Generic Simulated Annealing Algorithm 1. Get an initial solution S 2. Get an initial temperature T > 0 3. While not yet 'frozen' do the following: 3.1 For 1 i L, do the following: Pick a random neighbor S'of S Let =cost(S') - cost(S) If 0 (downhill move) set S = S' If >0 (uphill move) set S=S' with probability e - /T 3.2 Set T = rT (reduce temperature) 4. Return S
Copyright 2003 Mani Srivastava 65 Basic Ingredients for S.A. n Solution Space n Neighborhood Structure n Cost Function n Annealing Schedule
Copyright 2003 Mani Srivastava 66 Observation n All scheduling algorithms we have discussed so far are critical path schedulers n They can only generate schedules for iteration period larger than or equal to the critical path n They only exploit concurrency within a single iteration, and only utilize the intra-iteration precedence constraints
Copyright 2003 Mani Srivastava 67 Example n Can one do better than iteration period of 4? u Pipelining + retiming can reduce critical path to 3, and also the # of functional units n Approaches u Transformations followed by scheduling u Transformations integrated with scheduling
Copyright 2003 Mani Srivastava 68 Estimations
Copyright 2003 Mani Srivastava 69 Estimation Given: Computation and Available Time Determine: Bounds on Arithmetic Operators, Memory and Interconnect Goals: Initial Solution, Cost Function, Scheduling Evaluation
Copyright 2003 Mani Srivastava 70 A Simple Approach
Copyright 2003 Mani Srivastava 71 In Reality
Copyright 2003 Mani Srivastava 72 Discrete Relaxation
Copyright 2003 Mani Srivastava 73 Behavioral Level Statistical Models
Copyright 2003 Mani Srivastava 74 Conclusions n High Level Synthesis n Connects Behavioral Description and Structural Description n Scheduling, Estimations, Transformations n High Level of Abstraction, High Impact on the Final Design