ECE Synthesis & Verification - Lecture 5 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Circuits Scheduling Constructive Algorithms
ECE Synthesis & Verification - Lecture 5 2 Scheduling – a Combinatorial Optimization Problem NP-complete ProblemNP-complete Problem Optimal solutions for special cases and ILPOptimal solutions for special cases and ILP Heuristics - iterative ImprovementsHeuristics - iterative Improvements Heuristics – constructiveHeuristics – constructive Various versions of the problemVarious versions of the problem Unconstrained minimum latencyUnconstrained minimum latency Resource-constrained minimum latencyResource-constrained minimum latency Timing constrained minimum latencyTiming constrained minimum latency Latency-constrained minimumLatency-constrained minimum If all resources are identical, problem is reduced to multiprocessor scheduling (Hu’s algorithm)If all resources are identical, problem is reduced to multiprocessor scheduling (Hu’s algorithm) Minimum latency multiprocessor problem is intractableMinimum latency multiprocessor problem is intractable
ECE Synthesis & Verification - Lecture 5 3 Scheduling - Iterative Improvement Kernighan - Lin (deterministic)Kernighan - Lin (deterministic) Simulated AnnealingSimulated Annealing Lottery Iterative ImprovementLottery Iterative Improvement Neural NetworksNeural Networks Genetic AlgorithmsGenetic Algorithms Taboo SearchTaboo Search
ECE Synthesis & Verification - Lecture 5 4 Scheduling - Constructive Techniques Most ConstrainedMost Constrained Least ConstrainingLeast Constraining
ECE Synthesis & Verification - Lecture 5 5 Force Directed Scheduling Goal is to reduce hardware by balancing concurrencyGoal is to reduce hardware by balancing concurrency Iterative algorithm, one operation scheduled per iterationIterative algorithm, one operation scheduled per iteration Information (i.e. speed & area) fed back into schedulerInformation (i.e. speed & area) fed back into scheduler
ECE Synthesis & Verification - Lecture 5 6 The Force Directed Scheduling Algorithm
ECE Synthesis & Verification - Lecture 5 7 Step 1 Determine ASAP and ALAP schedulesDetermine ASAP and ALAP schedules * - + * * * + < * * - * - + *** + < ** - ASAP ALAP
ECE Synthesis & Verification - Lecture 5 8 Step 2 Determine Time Frame of each opDetermine Time Frame of each op –Length of box ~ Possible execution cycles –Width of box ~ Probability of assignment –Uniform distribution, Area assigned = 1 C-step 1 C-step 2 C-step 3 C-step 4 Time Frames * - * * - * * * + < + 1/2 1/3
ECE Synthesis & Verification - Lecture 5 9 Step 3 Create Distribution GraphsCreate Distribution Graphs –Sum of probabilities of each Op type –Indicates concurrency of similar Ops DG(i) = Prob(Op, i) DG for Multiply DG for Add, Sub, Comp
ECE Synthesis & Verification - Lecture 5 10 Diff Eq Example: Precedence Graph Recalled
ECE Synthesis & Verification - Lecture 5 11 Diff Eq Example: Time Frame & Probability Calculation
ECE Synthesis & Verification - Lecture 5 12 Diff Eq Example: DG Calculation
ECE Synthesis & Verification - Lecture 5 13 Conditional Statements Operations in different branches are mutually exclusiveOperations in different branches are mutually exclusive Operations of same type can be overlapped onto DGOperations of same type can be overlapped onto DG Probability of most likely operation is added to DGProbability of most likely operation is added to DG DG for Add Fork Join
ECE Synthesis & Verification - Lecture 5 14 Self Forces Scheduling an operation will effect overall concurrency n Every operation has 'self force' for every C-step of its time frame Analogous to the effect of a spring: f = K x Desirable scheduling will have negative self force l Will achieve better concurrency (lower potential energy ) Force(i) = DG(i) * x(i) DG(i) ~ Current Distribution Graph value x(i) ~ Change in operation’s probability Self Force(j) = [Force(i)]
ECE Synthesis & Verification - Lecture 5 15 Example n Attempt to schedule multiply in C-step 1 Self Force(1) = Force(1) + Force(2) = ( DG(1) * X(1) ) + ( DG(2) * X(2) ) = [2.833*(0.5) * (-0.5)] = n This is positive, scheduling the multiply in the first C-step would be bad DG for Multiply * - * * - * * * + < + C-step 1 C-step 2 C-step 3 C-step 4 1/2 1/3
ECE Synthesis & Verification - Lecture 5 16 Diff Eq Example: Self Force for Node 4
ECE Synthesis & Verification - Lecture 5 17 Predecessor & Successor Forces Scheduling an operation may affect the time frames of other linked operationsScheduling an operation may affect the time frames of other linked operations This may negate the benefits of the desired assignment This may negate the benefits of the desired assignment Predecessor/Successor Forces = Sum of Self Forces of any implicitly scheduled operations Predecessor/Successor Forces = Sum of Self Forces of any implicitly scheduled operations * - + * * * + < * * -
ECE Synthesis & Verification - Lecture 5 18 Diff Eq Example: Successor Force on Node 4 If node 4 scheduled in step 1If node 4 scheduled in step 1 –no effect on time frame for successor node 8 Total force = Froce4(1) = +0.25Total force = Froce4(1) = If node 4 scheduled in step 2If node 4 scheduled in step 2 –causes node 8 to be scheduled into step 3 –must calculate successor force
ECE Synthesis & Verification - Lecture 5 19 Diff Eq Example: Final Time Frame and Schedule
ECE Synthesis & Verification - Lecture 5 20 Diff Eq Example: Final DG
ECE Synthesis & Verification - Lecture 5 21 Lookahead Temporarily modify the constant DG(i) to include the effect of the iteration being consideredTemporarily modify the constant DG(i) to include the effect of the iteration being considered Force (i) = temp_DG(i) * x(i) temp_DG(i) = DG(i) + x(i)/3 Consider previous example:Consider previous example: Self Force(1) = (DG(1) + x(1)/3)x(1) + (DG(2) + x(2)/3)x(2) =.5( /3) -.5( /3) =.5( /3) -.5( /3) = = This is even worse than beforeThis is even worse than before
ECE Synthesis & Verification - Lecture 5 22 Minimization of Bus Costs Basic algorithm suitable for narrow class of problemsBasic algorithm suitable for narrow class of problems Algorithm can be refined to consider “cost” factorsAlgorithm can be refined to consider “cost” factors Number of buses ~ number of concurrent data transfersNumber of buses ~ number of concurrent data transfers Number of buses = maximum # transfers in any C-stepNumber of buses = maximum # transfers in any C-step Create modified DG to include transfers: Transfer DGCreate modified DG to include transfers: Transfer DG Trans DG(i) = [Prob (op,i) * Opn_No_InOuts] Opn_No_InOuts ~ combined distinct in/outputs for Op Calculate Force with this DG and add to Self ForceCalculate Force with this DG and add to Self Force
ECE Synthesis & Verification - Lecture 5 23 Minimization of Register Costs Minimum no. registers required is given by the largest number of data arcs crossing a C-step boundaryMinimum no. registers required is given by the largest number of data arcs crossing a C-step boundary Create Storage Operations, at output of any operation that transfers a value to a destination in a later C-stepCreate Storage Operations, at output of any operation that transfers a value to a destination in a later C-step Generate Storage DG for these “operations”Generate Storage DG for these “operations” Length of storage operation depends on final scheduleLength of storage operation depends on final schedule
ECE Synthesis & Verification - Lecture 5 24 Minimization of Register Costs ( contd.) avg life] =avg life] = storage DG(i) = (no overlap between ASAP & ALAP)storage DG(i) = (no overlap between ASAP & ALAP) storage DG(i) = (if overlap)storage DG(i) = (if overlap) Calculate and add “Storage” Force to Self ForceCalculate and add “Storage” Force to Self Force 7 registers minimum ASAPForce Directed 5 registers minimum
ECE Synthesis & Verification - Lecture 5 25 Pipelining * * * *** + + < - - * * * *** + + < - - DG for Multiply 1 2 3, 1’ 4, 2’ 3’ 4’ Instance Instance’ Functional Pipelining * * Structural Pipelining Functional PipeliningFunctional Pipelining –Pipelining across multiple operations –Must balance distribution across groups of concurrent C-steps –Cut DG horizontally and superimpose –Finally perform regular Force Directed Scheduling Structural PipeliningStructural Pipelining –Pipelining within an operation –For non data-dependant operations, only the first C-step need be considered
ECE Synthesis & Verification - Lecture 5 26 Other Optimizations Local timing constraintsLocal timing constraints –Insert dummy timing operations -> Restricted time frames Multiclass FU’sMulticlass FU’s –Create multiclass DG by summing probabilities of relevant ops Multistep/Chained operations.Multistep/Chained operations. –Carry propagation delay information with operation –Extend time frames into other C-steps as required Hardware constraintsHardware constraints –Use Force as priority function in list scheduling algorithms
ECE Synthesis & Verification - Lecture 5 27 Scheduling using Simulated Annealing Reference: Devadas, S.; Newton, A.R. Algorithms for hardware allocation in data path synthesis. Algorithms for hardware allocation in data path synthesis. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, July 1989, Vol.8, (no.7): IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, July 1989, Vol.8, (no.7):
ECE Synthesis & Verification - Lecture 5 28 Simulated Annealing Local Search Solution space Cost function ?
ECE Synthesis & Verification - Lecture 5 29 Statistical Mechanics Combinatorial Optimization State {r:} (configuration -- a set of atomic position ) weight e -E({r:])/K B T -- Boltzmann distribution E({r:]): energy of configuration K B : Boltzmann constant T: temperature Low temperature limit ??
ECE Synthesis & Verification - Lecture 5 30 Analogy Physical System State (configuration) Energy Ground State Rapid Quenching Careful Annealing Optimization Problem Solution Cost Function Optimal Solution Iteration Improvement Simulated Annealing
ECE Synthesis & Verification - Lecture 5 31 Generic Simulated Annealing Algorithm 1. Get an initial solution S 2. Get an initial temperature T > 0 3. While not yet 'frozen' do the following: 3.1 For 1 i L, do the following: Pick a random neighbor S'of S Let =cost(S') - cost(S) If 0 (downhill move) set S = S' If >0 (uphill move) set S=S' with probability e - /T 3.2 Set T = rT (reduce temperature) 4. Return S
ECE Synthesis & Verification - Lecture 5 32 Basic Ingredients for S.A. Solution SpaceSolution Space Neighborhood StructureNeighborhood Structure Cost FunctionCost Function Annealing ScheduleAnnealing Schedule
ECE Synthesis & Verification - Lecture 5 33 Observation All scheduling algorithms we have discussed so far are critical path schedulersAll scheduling algorithms we have discussed so far are critical path schedulers They can only generate schedules for iteration period larger than or equal to the critical pathThey can only generate schedules for iteration period larger than or equal to the critical path They only exploit concurrency within a single iteration, and only utilize the intra-iteration precedence constraintsThey only exploit concurrency within a single iteration, and only utilize the intra-iteration precedence constraints
ECE Synthesis & Verification - Lecture 5 34 Example Can one do better than iteration period of 4?Can one do better than iteration period of 4? –Pipelining + retiming can reduce critical path to 3, and also the # of functional units ApproachesApproaches –Transformations followed by scheduling –Transformations integrated with scheduling