ECE 667 - Synthesis & Verification - Lecture 5 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Circuits Scheduling.

Slides:



Advertisements
Similar presentations
Mani Srivastava UCLA - EE Department Room: 6731-H Boelter Hall Tel: WWW: Copyright 2003.
Advertisements

CALTECH CS137 Fall DeHon 1 CS137: Electronic Design Automation Day 19: November 21, 2005 Scheduling Introduction.
ECE 667 Synthesis and Verification of Digital Circuits
Courseware Integer Linear Programming approach to Scheduling Sune Fallgaard Nielsen Informatics and Mathematical Modelling Technical University of Denmark.
Chapter 4 Retiming.
Simulated Annealing General Idea: Start with an initial solution
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 14: March 3, 2004 Scheduling Heuristics and Approximation.
COE 561 Digital System Design & Synthesis Scheduling Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals.
Introduction to Data Flow Graphs and their Scheduling Sources: Gang Quan.
Winter 2005ICS 252-Intro to Computer Design ICS 252 Introduction to Computer Design Lecture 5-Scheudling Algorithms Winter 2005 Eli Bozorgzadeh Computer.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 10: RC Principles: Software (3/4) Prof. Sherief Reda.
Clock Skewing EECS 290A Sequential Logic Synthesis and Verification.
Parallel Scheduling of Complex DAGs under Uncertainty Grzegorz Malewicz.
Best-First Search: Agendas
Martha Garcia.  Goals of Static Process Scheduling  Types of Static Process Scheduling  Future Research  References.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 17: April 2, 2008 Scheduling Introduction.
ECE Synthesis & Verification - Lecture 2 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Circuits Scheduling.
COE 561 Digital System Design & Synthesis Architectural Synthesis Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum.
1 A Tree Based Router Search Engine Architecture With Single Port Memories Author: Baboescu, F.Baboescu, F. Tullsen, D.M. Rosu, G. Singh, S. Tullsen, D.M.Rosu,
ECE Synthesis & Verification - Lecture 3 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Circuits Scheduling.
MAE 552 – Heuristic Optimization Lecture 6 February 6, 2002.
Courseware High-Level Synthesis an introduction Prof. Jan Madsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens.
Pipelining and Retiming 1 Pipelining  Adding registers along a path  split combinational logic into multiple cycles  increase clock rate  increase.
Simulated-Annealing-Based Solution By Gonzalo Zea s Shih-Fu Liu s
Courseware Force-Directed Scheduling Sune Fallgaard Nielsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens Plads,
Continuous Retiming EECS 290A Sequential Logic Synthesis and Verification.
Simulated Annealing 10/7/2005.
EDA (CS286.5b) Day 11 Scheduling (List, Force, Approximation) N.B. no class Thursday (FPGA) …
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2009 Architecture Synthesis (Provisioning, Allocation)
COE 561 Digital System Design & Synthesis Resource Sharing and Binding Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum.
ECE Synthesis & Verification - Lecture 4 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Circuits Allocation:
A Tool for Partitioning and Pipelined Scheduling of Hardware-Software Systems Karam S Chatha and Ranga Vemuri Department of ECECS University of Cincinnati.
ICS 252 Introduction to Computer Design
ECE Synthesis & Verification - LP Scheduling 1 ECE 667 ECE 667 Synthesis and Verification of Digital Circuits Scheduling Algorithms Analytical approach.
ECE Synthesis & Verification 1 ECE 667 ECE 667 Synthesis and Verification of Digital Systems Retiming.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 15: March 18, 2009 Static Timing Analysis and Multi-Level Speedup.
Scheduling for Synthesis of Embedded Hardware
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2009 Architecture Synthesis (Provisioning, Allocation)
Register-Transfer (RT) Synthesis Greg Stitt ECE Department University of Florida.
Optimization of thermal processes2007/2008 Optimization of thermal processes Maciej Marek Czestochowa University of Technology Institute of Thermal Machinery.
Gradual Relaxation Techniques with Applications to Behavioral Synthesis Zhiru Zhang, Yiping Fan, Miodrag Potkonjak, Jason Cong Department of Computer Science.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 12: February 13, 2002 Scheduling Heuristics and Approximation.
Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.
1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 10: February 18, 2015 Architecture Synthesis (Provisioning, Allocation)
Static Process Scheduling Section 5.2 CSc 8320 Alex De Ruiter
Resource Mapping and Scheduling for Heterogeneous Network Processor Systems Liang Yang, Tushar Gohad, Pavel Ghosh, Devesh Sinha, Arunabha Sen and Andrea.
Thursday, May 9 Heuristic Search: methods for solving difficult optimization problems Handouts: Lecture Notes See the introduction to the paper.
C OMPARING T HREE H EURISTIC S EARCH M ETHODS FOR F UNCTIONAL P ARTITIONING IN H ARDWARE -S OFTWARE C ODESIGN Theerayod Wiangtong, Peter Y. K. Cheung and.
1 SYNTHESIS of PIPELINED SYSTEMS for the CONTEMPORANEOUS EXECUTION of PERIODIC and APERIODIC TASKS with HARD REAL-TIME CONSTRAINTS Paolo Palazzari Luca.
1 Scheduling Processes with Release Times, Deadlines, Precedence and Exclusion Relations J. Xu and D. L. Parnas IEEE Transactions on Software Engineering,
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 23: April 20, 2015 Static Timing Analysis and Multi-Level Speedup.
ELEC692 VLSI Signal Processing Architecture Lecture 3
Gradual Relaxation Techniques with Applications to Behavioral Synthesis Zhiru Zhang, Yiping Fan, Miodrag Potkonjak, Jason Cong Department of Computer Science.
Pipelining and Retiming
L12 : Lower Power High Level Synthesis(3) 성균관대학교 조 준 동 교수
Optimization Problems
Static Process Scheduling
Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Nov 3, 2005.
Lecture 17: Dynamic Reconfiguration I November 10, 2004 ECE 697F Reconfigurable Computing Lecture 17 Dynamic Reconfiguration I Acknowledgement: Andre DeHon.
Retiming EECS 290A Sequential Logic Synthesis and Verification.
Scheduling Determines the precise start time of each task.
Graphcut Textures:Image and Video Synthesis Using Graph Cuts
Artificial Intelligence (CS 370D)
CSE 589 Applied Algorithms Spring 1999
Optimization with Meta-Heuristics
Algorithms for Budget-Constrained Survivable Topology Design
Integrated Systems Centre © Giovanni De Micheli – All rights reserved
Boltzmann Machine (BM) (§6.4)
Fast Min-Register Retiming Through Binary Max-Flow
Presentation transcript:

ECE Synthesis & Verification - Lecture 5 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Circuits Scheduling Constructive Algorithms

ECE Synthesis & Verification - Lecture 5 2 Scheduling – a Combinatorial Optimization Problem NP-complete ProblemNP-complete Problem Optimal solutions for special cases and ILPOptimal solutions for special cases and ILP Heuristics - iterative ImprovementsHeuristics - iterative Improvements Heuristics – constructiveHeuristics – constructive Various versions of the problemVarious versions of the problem Unconstrained minimum latencyUnconstrained minimum latency Resource-constrained minimum latencyResource-constrained minimum latency Timing constrained minimum latencyTiming constrained minimum latency Latency-constrained minimumLatency-constrained minimum If all resources are identical, problem is reduced to multiprocessor scheduling (Hu’s algorithm)If all resources are identical, problem is reduced to multiprocessor scheduling (Hu’s algorithm) Minimum latency multiprocessor problem is intractableMinimum latency multiprocessor problem is intractable

ECE Synthesis & Verification - Lecture 5 3 Scheduling - Iterative Improvement Kernighan - Lin (deterministic)Kernighan - Lin (deterministic) Simulated AnnealingSimulated Annealing Lottery Iterative ImprovementLottery Iterative Improvement Neural NetworksNeural Networks Genetic AlgorithmsGenetic Algorithms Taboo SearchTaboo Search

ECE Synthesis & Verification - Lecture 5 4 Scheduling - Constructive Techniques Most ConstrainedMost Constrained Least ConstrainingLeast Constraining

ECE Synthesis & Verification - Lecture 5 5 Force Directed Scheduling Goal is to reduce hardware by balancing concurrencyGoal is to reduce hardware by balancing concurrency Iterative algorithm, one operation scheduled per iterationIterative algorithm, one operation scheduled per iteration Information (i.e. speed & area) fed back into schedulerInformation (i.e. speed & area) fed back into scheduler

ECE Synthesis & Verification - Lecture 5 6 The Force Directed Scheduling Algorithm

ECE Synthesis & Verification - Lecture 5 7 Step 1 Determine ASAP and ALAP schedulesDetermine ASAP and ALAP schedules * - + * * * + < * * - * - + *** + < ** - ASAP ALAP

ECE Synthesis & Verification - Lecture 5 8 Step 2 Determine Time Frame of each opDetermine Time Frame of each op –Length of box ~ Possible execution cycles –Width of box ~ Probability of assignment –Uniform distribution, Area assigned = 1 C-step 1 C-step 2 C-step 3 C-step 4 Time Frames * - * * - * * * + < + 1/2 1/3

ECE Synthesis & Verification - Lecture 5 9 Step 3 Create Distribution GraphsCreate Distribution Graphs –Sum of probabilities of each Op type –Indicates concurrency of similar Ops DG(i) =  Prob(Op, i) DG for Multiply DG for Add, Sub, Comp

ECE Synthesis & Verification - Lecture 5 10 Diff Eq Example: Precedence Graph Recalled

ECE Synthesis & Verification - Lecture 5 11 Diff Eq Example: Time Frame & Probability Calculation

ECE Synthesis & Verification - Lecture 5 12 Diff Eq Example: DG Calculation

ECE Synthesis & Verification - Lecture 5 13 Conditional Statements Operations in different branches are mutually exclusiveOperations in different branches are mutually exclusive Operations of same type can be overlapped onto DGOperations of same type can be overlapped onto DG Probability of most likely operation is added to DGProbability of most likely operation is added to DG DG for Add Fork Join

ECE Synthesis & Verification - Lecture 5 14 Self Forces Scheduling an operation will effect overall concurrency n Every operation has 'self force' for every C-step of its time frame Analogous to the effect of a spring: f = K x Desirable scheduling will have negative self force l Will achieve better concurrency (lower potential energy ) Force(i) = DG(i) * x(i) DG(i) ~ Current Distribution Graph value x(i) ~ Change in operation’s probability Self Force(j) = [Force(i)]

ECE Synthesis & Verification - Lecture 5 15 Example n Attempt to schedule multiply in C-step 1 Self Force(1) = Force(1) + Force(2) = ( DG(1) * X(1) ) + ( DG(2) * X(2) ) = [2.833*(0.5) * (-0.5)] = n This is positive, scheduling the multiply in the first C-step would be bad DG for Multiply * - * * - * * * + < + C-step 1 C-step 2 C-step 3 C-step 4 1/2 1/3

ECE Synthesis & Verification - Lecture 5 16 Diff Eq Example: Self Force for Node 4

ECE Synthesis & Verification - Lecture 5 17 Predecessor & Successor Forces Scheduling an operation may affect the time frames of other linked operationsScheduling an operation may affect the time frames of other linked operations This may negate the benefits of the desired assignment This may negate the benefits of the desired assignment Predecessor/Successor Forces = Sum of Self Forces of any implicitly scheduled operations Predecessor/Successor Forces = Sum of Self Forces of any implicitly scheduled operations * - + * * * + < * * -

ECE Synthesis & Verification - Lecture 5 18 Diff Eq Example: Successor Force on Node 4 If node 4 scheduled in step 1If node 4 scheduled in step 1 –no effect on time frame for successor node 8 Total force = Froce4(1) = +0.25Total force = Froce4(1) = If node 4 scheduled in step 2If node 4 scheduled in step 2 –causes node 8 to be scheduled into step 3 –must calculate successor force

ECE Synthesis & Verification - Lecture 5 19 Diff Eq Example: Final Time Frame and Schedule

ECE Synthesis & Verification - Lecture 5 20 Diff Eq Example: Final DG

ECE Synthesis & Verification - Lecture 5 21 Lookahead Temporarily modify the constant DG(i) to include the effect of the iteration being consideredTemporarily modify the constant DG(i) to include the effect of the iteration being considered Force (i) = temp_DG(i) * x(i) temp_DG(i) = DG(i) + x(i)/3 Consider previous example:Consider previous example: Self Force(1) = (DG(1) + x(1)/3)x(1) + (DG(2) + x(2)/3)x(2) =.5( /3) -.5( /3) =.5( /3) -.5( /3) = = This is even worse than beforeThis is even worse than before

ECE Synthesis & Verification - Lecture 5 22 Minimization of Bus Costs Basic algorithm suitable for narrow class of problemsBasic algorithm suitable for narrow class of problems Algorithm can be refined to consider “cost” factorsAlgorithm can be refined to consider “cost” factors Number of buses ~ number of concurrent data transfersNumber of buses ~ number of concurrent data transfers Number of buses = maximum # transfers in any C-stepNumber of buses = maximum # transfers in any C-step Create modified DG to include transfers: Transfer DGCreate modified DG to include transfers: Transfer DG Trans DG(i) =  [Prob (op,i) * Opn_No_InOuts] Opn_No_InOuts ~ combined distinct in/outputs for Op Calculate Force with this DG and add to Self ForceCalculate Force with this DG and add to Self Force

ECE Synthesis & Verification - Lecture 5 23 Minimization of Register Costs Minimum no. registers required is given by the largest number of data arcs crossing a C-step boundaryMinimum no. registers required is given by the largest number of data arcs crossing a C-step boundary Create Storage Operations, at output of any operation that transfers a value to a destination in a later C-stepCreate Storage Operations, at output of any operation that transfers a value to a destination in a later C-step Generate Storage DG for these “operations”Generate Storage DG for these “operations” Length of storage operation depends on final scheduleLength of storage operation depends on final schedule

ECE Synthesis & Verification - Lecture 5 24 Minimization of Register Costs ( contd.) avg life] =avg life] = storage DG(i) = (no overlap between ASAP & ALAP)storage DG(i) = (no overlap between ASAP & ALAP) storage DG(i) = (if overlap)storage DG(i) = (if overlap) Calculate and add “Storage” Force to Self ForceCalculate and add “Storage” Force to Self Force 7 registers minimum ASAPForce Directed 5 registers minimum

ECE Synthesis & Verification - Lecture 5 25 Pipelining * * * *** + + < - - * * * *** + + < - - DG for Multiply 1 2 3, 1’ 4, 2’ 3’ 4’ Instance Instance’ Functional Pipelining * * Structural Pipelining Functional PipeliningFunctional Pipelining –Pipelining across multiple operations –Must balance distribution across groups of concurrent C-steps –Cut DG horizontally and superimpose –Finally perform regular Force Directed Scheduling Structural PipeliningStructural Pipelining –Pipelining within an operation –For non data-dependant operations, only the first C-step need be considered

ECE Synthesis & Verification - Lecture 5 26 Other Optimizations Local timing constraintsLocal timing constraints –Insert dummy timing operations -> Restricted time frames Multiclass FU’sMulticlass FU’s –Create multiclass DG by summing probabilities of relevant ops Multistep/Chained operations.Multistep/Chained operations. –Carry propagation delay information with operation –Extend time frames into other C-steps as required Hardware constraintsHardware constraints –Use Force as priority function in list scheduling algorithms

ECE Synthesis & Verification - Lecture 5 27 Scheduling using Simulated Annealing Reference: Devadas, S.; Newton, A.R. Algorithms for hardware allocation in data path synthesis. Algorithms for hardware allocation in data path synthesis. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, July 1989, Vol.8, (no.7): IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, July 1989, Vol.8, (no.7):

ECE Synthesis & Verification - Lecture 5 28 Simulated Annealing Local Search Solution space Cost function ?

ECE Synthesis & Verification - Lecture 5 29 Statistical Mechanics Combinatorial Optimization State {r:} (configuration -- a set of atomic position ) weight e -E({r:])/K B T -- Boltzmann distribution E({r:]): energy of configuration K B : Boltzmann constant T: temperature Low temperature limit ??

ECE Synthesis & Verification - Lecture 5 30 Analogy Physical System State (configuration) Energy Ground State Rapid Quenching Careful Annealing Optimization Problem Solution Cost Function Optimal Solution Iteration Improvement Simulated Annealing

ECE Synthesis & Verification - Lecture 5 31 Generic Simulated Annealing Algorithm 1. Get an initial solution S 2. Get an initial temperature T > 0 3. While not yet 'frozen' do the following: 3.1 For 1  i  L, do the following: Pick a random neighbor S'of S Let  =cost(S') - cost(S) If   0 (downhill move) set S = S' If  >0 (uphill move) set S=S' with probability e -  /T 3.2 Set T = rT (reduce temperature) 4. Return S

ECE Synthesis & Verification - Lecture 5 32 Basic Ingredients for S.A. Solution SpaceSolution Space Neighborhood StructureNeighborhood Structure Cost FunctionCost Function Annealing ScheduleAnnealing Schedule

ECE Synthesis & Verification - Lecture 5 33 Observation All scheduling algorithms we have discussed so far are critical path schedulersAll scheduling algorithms we have discussed so far are critical path schedulers They can only generate schedules for iteration period larger than or equal to the critical pathThey can only generate schedules for iteration period larger than or equal to the critical path They only exploit concurrency within a single iteration, and only utilize the intra-iteration precedence constraintsThey only exploit concurrency within a single iteration, and only utilize the intra-iteration precedence constraints

ECE Synthesis & Verification - Lecture 5 34 Example Can one do better than iteration period of 4?Can one do better than iteration period of 4? –Pipelining + retiming can reduce critical path to 3, and also the # of functional units ApproachesApproaches –Transformations followed by scheduling –Transformations integrated with scheduling