Mani Srivastava UCLA - EE Department Room: 6731-H Boelter Hall Tel: 310-267-2098 WWW: Copyright 2003.

Slides:



Advertisements
Similar presentations
1 A B C
Advertisements

5.1 Rules for Exponents Review of Bases and Exponents Zero Exponents
PDAs Accept Context-Free Languages
ALAK ROY. Assistant Professor Dept. of CSE NIT Agartala
AP STUDY SESSION 2.
1
Slide 1Fig 25-CO, p.762. Slide 2Fig 25-1, p.765 Slide 3Fig 25-2, p.765.
Sequential Logic Design
Copyright © 2013 Elsevier Inc. All rights reserved.
STATISTICS INTERVAL ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
STATISTICS POINT ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
David Burdett May 11, 2004 Package Binding for WS CDL.
Create an Application Title 1Y - Youth Chapter 5.
Add Governors Discretionary (1G) Grants Chapter 6.
CALENDAR.
CHAPTER 18 The Ankle and Lower Leg
The 5S numbers game..
A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
Stationary Time Series
Break Time Remaining 10:00.
The basics for simulations
Factoring Quadratics — ax² + bx + c Topic
EE, NCKU Tien-Hao Chang (Darby Chang)
Turing Machines.
PP Test Review Sections 6-1 to 6-6
Briana B. Morrison Adapted from William Collins
Outline Minimum Spanning Tree Maximal Flow Algorithm LP formulation 1.
Regression with Panel Data
Operating Systems Operating Systems - Winter 2012 Chapter 2 - Processes Vrije Universiteit Amsterdam.
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
Biology 2 Plant Kingdom Identification Test Review.
Chapter 1: Expressions, Equations, & Inequalities
Adding Up In Chunks.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
Artificial Intelligence
When you see… Find the zeros You think….
1 Using Bayesian Network for combining classifiers Leonardo Nogueira Matos Departamento de Computação Universidade Federal de Sergipe.
Before Between After.
Slide R - 1 Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Prentice Hall Active Learning Lecture Slides For use with Classroom Response.
12 October, 2014 St Joseph's College ADVANCED HIGHER REVISION 1 ADVANCED HIGHER MATHS REVISION AND FORMULAE UNIT 2.
Subtraction: Adding UP
: 3 00.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
1 hi at no doifpi me be go we of at be do go hi if me no of pi we Inorder Traversal Inorder traversal. n Visit the left subtree. n Visit the node. n Visit.
Static Equilibrium; Elasticity and Fracture
12 System of Linear Equations Case Study
Converting a Fraction to %
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
Clock will move after 1 minute
Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 11 Simple Linear Regression.
Mani Srivastava UCLA - EE Department Room: 6731-H Boelter Hall Tel: WWW: Copyright 2003.
Mani Srivastava UCLA - EE Department Room: 6731-H Boelter Hall Tel: WWW: Copyright 2003.
Introduction to Management Science
Select a time to count down from the clock above
16. Mean Square Estimation
Copyright Tim Morris/St Stephen's School
1.step PMIT start + initial project data input Concept Concept.
9. Two Functions of Two Random Variables
1 Dr. Scott Schaefer Least Squares Curves, Rational Representations, Splines and Continuity.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
ECE 667 Synthesis and Verification of Digital Circuits
ECE Synthesis & Verification - Lecture 3 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Circuits Scheduling.
ECE Synthesis & Verification - LP Scheduling 1 ECE 667 ECE 667 Synthesis and Verification of Digital Circuits Scheduling Algorithms Analytical approach.
Scheduling for Synthesis of Embedded Hardware
ECE Synthesis & Verification - Lecture 5 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Circuits Scheduling.
Presentation transcript:

Mani Srivastava UCLA - EE Department Room: 6731-H Boelter Hall Tel: WWW: Copyright 2003  Mani Srivastava High-level Synthesis Scheduling, Allocation, Assignment, Note: Several slides in this Lecture are from Prof. Miodrag Potkonjak, UCLA CS

Copyright 2003  Mani Srivastava 2 Overview n High Level Synthesis n Scheduling, Allocation and Assignment n Estimations n Transformations

Copyright 2003  Mani Srivastava 3 Allocation, Assignment, and Scheduling Techniques Well Understood and Mature

Copyright 2003  Mani Srivastava 4 Scheduling and Assignment Control Step Control Step

Copyright 2003  Mani Srivastava 5 ASAP Scheduling Algorithm

Copyright 2003  Mani Srivastava 6 ASAP Scheduling Example

Copyright 2003  Mani Srivastava 7 ASAP: Another Example Sequence Graph ASAP Schedule

Copyright 2003  Mani Srivastava 8 ALAP Scheduling Algorithm

Copyright 2003  Mani Srivastava 9 ALAP Scheduling Example

Copyright 2003  Mani Srivastava 10 ALAP: Another Example Sequence Graph ALAP Schedule (latency constraint = 4)

Copyright 2003  Mani Srivastava 11 Observation about ALAP & ASAP n No priority is given to nodes on critical path n As a result, less critical nodes may be scheduled ahead of critical nodes n No problem if unlimited hardware n However of the resources are limited, the less critical nodes may block the critical nodes and thus produce inferior schedules n List scheduling techniques overcome this problem by utilizing a more global node selection criterion

Copyright 2003  Mani Srivastava 12 List Scheduling and Assignment

Copyright 2003  Mani Srivastava 13 List Scheduling Algorithm using Decreasing Criticalness Criterion

Copyright 2003  Mani Srivastava 14 Scheduling n NP-complete Problem n Optimal n Heuristics - Iterative Improvements n Heuristics – Constructive n Various versions of problem  Unconstrained minimum latency  Resource-constrained minimum latency  Timing constrained n If all resources identical, reduced to multiprocessor scheduling  Minimum latency multiprocessor problem is intractable

Copyright 2003  Mani Srivastava 15 Scheduling - Optimal Techniques n Integer Linear Programming n Branch and Bound

Copyright 2003  Mani Srivastava 16 Integer Linear Programming n Given : integer-valued matrix A mxn, vectors B = ( b 1, b 2, …, b m ), C = ( c 1, c 2, …, c n ) n Minimize : C T X n Subject to: AX  B X = ( x 1, x 2, …, x n ) is an integer-valued vector

Copyright 2003  Mani Srivastava 17 Integer Linear Programming n Problem: For a set of (dependent) computations {t 1,t 2,...,t n }, find the minimum number of units needed to complete the execution by k control steps. n Integer linear programming: Let y 0 be an integer variable. For each control step i ( 1  i  k ): define variable x ij as x ij = 1, if computation t j is executed in the ith control step. x ij = 0, otherwise. define variable y i = x i1 + x I x in.

Copyright 2003  Mani Srivastava 18 Integer Linear Programming n Integer linear programming: For each computation dependency: t i has to be done before t j, introduce a constraint: k x 1i + (k-1) x 2i x ki  k x 1j + (k-1) x 2j x kj + 1(*) Minimize: y 0 Subject to : x 1i + x 2i x ki = 1 for all 1  i  n y j  y 0 for all 1  i  k all computation dependency of type (*)

Copyright 2003  Mani Srivastava 19 An Example c1c1 c2c2 c3c3 c4c4 c6c6 c5c5 6 computations 3 control steps

Copyright 2003  Mani Srivastava 20 An Example n Introduce variables: u x ij for 1  i  3, 1  j  6 u y i = x i1 +x i2 +x i3 +x i4+ x i5 +x i6 for 1  i  3 u y 0 n Dependency constraints: e.g. execute c 1 before c 4 3x 11 +2x 21 +x 31  3x 14 +2x 24 +x n Execution constraints: x 1i +x 2i +x 3i = 1 for 1  i  6

Copyright 2003  Mani Srivastava 21 An Example n Minimize:y 0 n Subject to:y i  y 0 for all 1  i  3 dependency constraints execution constraints n One solution:y 0 = 2 x 11 = 1, x 12 = 1, x 23 = 1, x 24 = 1, x 35 = 1, x 36 = 1. All other x ij = 0

Copyright 2003  Mani Srivastava 22 ILP Model of Scheduling n Binary decision variables x il u i = 0, 1, …, n u l = 1, 2, …  +1 n Start time is unique

Copyright 2003  Mani Srivastava 23 ILP Model of Scheduling (contd.) n Sequencing relationships must be satisfied n Resource bounds must be met u let upper bound on # of resources of type k be a k

Copyright 2003  Mani Srivastava 24 Minimum-latency Scheduling Under Resource-constraints n Let t be the vector whose entries are start times n Formal ILP model

Copyright 2003  Mani Srivastava 25 Example n Two types of resources u Multiplier u ALU  Adder  Subtraction  Comparison n Both take 1 cycle execution time

Copyright 2003  Mani Srivastava 26 Example (contd.) n Heuristic (list scheduling) gives latency = 4 steps n Use ALAP and ASAP (with no resource constraints) to get bounds on start times u ASAP matches latency of heuristic  so heuristic is optimum, but let us ignore it! n Constraints?

Copyright 2003  Mani Srivastava 27 Example (contd.) n Start time is unique

Copyright 2003  Mani Srivastava 28 Example (contd.) n Sequencing constraints u note: only non-trivial ones listed  those with more than one possible start time for at least one operation

Copyright 2003  Mani Srivastava 29 Example (contd.) n Resource constraints

Copyright 2003  Mani Srivastava 30 Example (contd.) n Consider c = [0, 0, …, 1] T u Minimum latency schedule u since sink has no mobility (x n,5 = 1), any feasible schedule is optimum n Consider c = [1, 1, …, 1] T u finds earliest start times for all operations u equivalently,

Copyright 2003  Mani Srivastava 31 Example Solution: Optimum Schedule Under Resource Constraint

Copyright 2003  Mani Srivastava 32 Example (contd.) n Assume multiplier costs 5 units of area, and ALU costs 1 unit of area n Same uniqueness and sequencing constraints as before n Resource constraints are in terms of unknown variables a 1 and a 2  a 1 = # of multipliers  a 2 = # of ALUs

Copyright 2003  Mani Srivastava 33 Example (contd.) n Resource constraints

Copyright 2003  Mani Srivastava 34 Example Solution n Minimize c T a = 5.a a 2 n Solution with cost 12

Copyright 2003  Mani Srivastava 35 Precedence-constrained Multiprocessor Scheduling n All operations done by the same type of resource u intractable problem u intractable even if all operations have unit delay

Copyright 2003  Mani Srivastava 36 Scheduling - Iterative Improvement n Kernighan - Lin (deterministic) n Simulated Annealing n Lottery Iterative Improvement n Neural Networks n Genetic Algorithms n Taboo Search

Copyright 2003  Mani Srivastava 37 Scheduling - Constructive Techniques n Most Constrained n Least Constraining

Copyright 2003  Mani Srivastava 38 Force Directed Scheduling n Goal is to reduce hardware by balancing concurrency n Iterative algorithm, one operation scheduled per iteration n Information (i.e. speed & area) fed back into scheduler

Copyright 2003  Mani Srivastava 39 The Force Directed Scheduling Algorithm

Copyright 2003  Mani Srivastava 40 Step 1 n Determine ASAP and ALAP schedules * - + * * * + < * * - * - + *** + < ** - ASAP ALAP

Copyright 2003  Mani Srivastava 41 Step 2 n Determine Time Frame of each op u Length of box ~ Possible execution cycles u Width of box ~ Probability of assignment u Uniform distribution, Area assigned = 1 C-step 1 C-step 2 C-step 3 C-step 4 Time Frames * - * * - * * * + < + 1/2 1/3

Copyright 2003  Mani Srivastava 42 Step 3 n Create Distribution Graphs u Sum of probabilities of each Op type  Indicates concurrency of similar Ops DG(i) =  Prob(Op, i) DG for Multiply DG for Add, Sub, Comp

Copyright 2003  Mani Srivastava 43 Diff Eq Example: Precedence Graph Recalled

Copyright 2003  Mani Srivastava 44 Diff Eq Example: Time Frame & Probability Calculation

Copyright 2003  Mani Srivastava 45 Diff Eq Example: DG Calculation

Copyright 2003  Mani Srivastava 46 Conditional Statements n Operations in different branches are mutually exclusive n Operations of same type can be overlapped onto DG n Probability of most likely operation is added to DG DG for Add Fork Join

Copyright 2003  Mani Srivastava 47 Self Forces n Scheduling an operation will effect overall concurrency n Every operation has 'self force' for every C-step of its time frame n Analogous to the effect of a spring: f = Kx n Desirable scheduling will have negative self force l Will achieve better concurrency (lower potential energy ) Force(i) = DG(i) * x(i) DG(i) ~ Current Distribution Graph value x(i) ~ Change in operation’s probability Self Force(j) = [Force(i)]

Copyright 2003  Mani Srivastava 48 Example n Attempt to schedule multiply in C-step 1 Self Force(1) = Force(1) + Force(2) = ( DG(1) * X(1) ) + ( DG(2) * X(2) ) = [2.833*(0.5) * (-0.5)] = n This is positive, scheduling the multiply in the first C-step would be bad DG for Multiply * - * * - * * * + < + C-step 1 C-step 2 C-step 3 C-step 4 1/2 1/3

Copyright 2003  Mani Srivastava 49 Diff Eq Example: Self Force for Node 4

Copyright 2003  Mani Srivastava 50 Predecessor & Successor Forces n Scheduling an operation may affect the time frames of other linked operations n This may negate the benefits of the desired assignment n Predecessor/Successor Forces = Sum of Self Forces of any implicitly scheduled operations * - + * * * + < * * -

Copyright 2003  Mani Srivastava 51 Diff Eq Example: Successor Force on Node 4 n If node 4 scheduled in step 1 u no effect on time frame for successor node 8 n Total force = Froce4(1) = n If node 4 scheduled in step 2 u causes node 8 to be scheduled into step 3 u must calculate successor force

Copyright 2003  Mani Srivastava 52 Diff Eq Example: Final Time Frame and Schedule

Copyright 2003  Mani Srivastava 53 Diff Eq Example: Final DG

Copyright 2003  Mani Srivastava 54 Lookahead n Temporarily modify the constant DG(i) to include the effect of the iteration being considered Force (i) = temp_DG(i) * x(i) temp_DG(i) = DG(i) + x(i)/3 n Consider previous example: Self Force(1) = (DG(1) + x(1)/3)x(1) + (DG(2) + x(2)/3)x(2) =.5( /3) -.5( /3) = n This is even worse than before

Copyright 2003  Mani Srivastava 55 Minimization of Bus Costs n Basic algorithm suitable for narrow class of problems n Algorithm can be refined to consider “cost” factors n Number of buses ~ number of concurrent data transfers n Number of buses = maximum transfers in any C-step n Create modified DG to include transfers: Transfer DG Trans DG(i) =  [Prob (op,i) * Opn_No_InOuts] Opn_No_InOuts ~ combined distinct in/outputs for Op n Calculate Force with this DG and add to Self Force

Copyright 2003  Mani Srivastava 56 Minimization of Register Costs n Minimum registers required is given by the largest number of data arcs crossing a C-step boundary n Create Storage Operations, at output of any operation that transfers a value to a destination in a later C-step n Generate Storage DG for these “operations” n Length of storage operation depends on final schedule

Copyright 2003  Mani Srivastava 57 Minimization of Register Costs( contd.) n avg life] = n storage DG(i) = (no overlap between ASAP & ALAP) n storage DG(i) = (if overlap) n Calculate and add “Storage” Force to Self Force 7 registers minimum ASAPForce Directed 5 registers minimum

Copyright 2003  Mani Srivastava 58 Pipelining * * * *** + + < - - * * * *** + + < - - DG for Multiply 1 2 3, 1’ 4, 2’ 3’ 4’ Instance Instance’ Functional Pipelining * * Structural Pipelining n Functional Pipelining u Pipelining across multiple operations u Must balance distribution across groups of concurrent C- steps u Cut DG horizontally and superimpose u Finally perform regular Force Directed Scheduling n Structural Pipelining u Pipelining within an operation u For non data-dependant operations, only the first C-step need be considered

Copyright 2003  Mani Srivastava 59 Other Optimizations n Local timing constraints u Insert dummy timing operations -> Restricted time frames n Multiclass FU’s u Create multiclass DG by summing probabilities of relevant ops n Multistep/Chained operations. u Carry propagation delay information with operation u Extend time frames into other C-steps as required n Hardware constraints u Use Force as priority function in list scheduling algorithms

Copyright 2003  Mani Srivastava 60 Scheduling using Simulated Annealing Reference: Devadas, S.; Newton, A.R. Algorithms for hardware allocation in data path synthesis. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, July 1989, Vol.8, (no.7):

Copyright 2003  Mani Srivastava 61 Simulated Annealing Local Search Solution space Cost function ?

Copyright 2003  Mani Srivastava 62 Statistical Mechanics Combinatorial Optimization State {r:} (configuration -- a set of atomic position ) weight e -E({r:])/K B T -- Boltzmann distribution E({r:]): energy of configuration K B : Boltzmann constant T: temperature Low temperature limit ??

Copyright 2003  Mani Srivastava 63 Analogy Physical System State (configuration) Energy Ground State Rapid Quenching Careful Annealing Optimization Problem Solution Cost Function Optimal Solution Iteration Improvement Simulated Annealing

Copyright 2003  Mani Srivastava 64 Generic Simulated Annealing Algorithm 1. Get an initial solution S 2. Get an initial temperature T > 0 3. While not yet 'frozen' do the following: 3.1 For 1  i  L, do the following: Pick a random neighbor S'of S Let  =cost(S') - cost(S) If   0 (downhill move) set S = S' If  >0 (uphill move) set S=S' with probability e -  /T 3.2 Set T = rT (reduce temperature) 4. Return S

Copyright 2003  Mani Srivastava 65 Basic Ingredients for S.A. n Solution Space n Neighborhood Structure n Cost Function n Annealing Schedule

Copyright 2003  Mani Srivastava 66 Observation n All scheduling algorithms we have discussed so far are critical path schedulers n They can only generate schedules for iteration period larger than or equal to the critical path n They only exploit concurrency within a single iteration, and only utilize the intra-iteration precedence constraints

Copyright 2003  Mani Srivastava 67 Example n Can one do better than iteration period of 4? u Pipelining + retiming can reduce critical path to 3, and also the # of functional units n Approaches u Transformations followed by scheduling u Transformations integrated with scheduling

Copyright 2003  Mani Srivastava 68 Estimations

Copyright 2003  Mani Srivastava 69 Estimation Given: Computation and Available Time Determine: Bounds on Arithmetic Operators, Memory and Interconnect Goals: Initial Solution, Cost Function, Scheduling Evaluation

Copyright 2003  Mani Srivastava 70 A Simple Approach

Copyright 2003  Mani Srivastava 71 In Reality

Copyright 2003  Mani Srivastava 72 Discrete Relaxation

Copyright 2003  Mani Srivastava 73 Behavioral Level Statistical Models

Copyright 2003  Mani Srivastava 74 Conclusions n High Level Synthesis n Connects Behavioral Description and Structural Description n Scheduling, Estimations, Transformations n High Level of Abstraction, High Impact on the Final Design