Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sub-expression elimination Logic expressions: –Performed by logic optimization. –Kernel-based methods. Arithmetic expressions: –Search isomorphic patterns.

Similar presentations


Presentation on theme: "Sub-expression elimination Logic expressions: –Performed by logic optimization. –Kernel-based methods. Arithmetic expressions: –Search isomorphic patterns."— Presentation transcript:

1 Sub-expression elimination Logic expressions: –Performed by logic optimization. –Kernel-based methods. Arithmetic expressions: –Search isomorphic patterns in the parse trees. –Example: – a= x+ y; b = a+ 1; c = x+ y; – a= x+ y; b = a+ 1; c = a;

2 Examples of other transformations Dead-code elimination: –a= x; b = x+ 1; c = 2 * x; –a= x; can be removed if not referenced. Operator-strength reduction: –a= x 2 ; b = 3 * x; –a= x * x; t = x<<1; b = x+ t; Code motion: –for ( i = 1; i < a * b) { } –t = a * b; for ( i = 1; i < t) { }

3 Strength reduction

4 Strength Reduction

5 Control- flow based transformations Model expansion. –Expand subroutine flatten hierarchy. – Useful to expand scope of other optimization techniques. – Problematic when routine is called more than once. – Example: –x= a+ b; y= a * b; z = foo( x, y) ; –foo( p, q) {t =q-p; return(t);} –By expanding foo: –x= a+ b; y= a * b; z = y-x; Conditional expansion Transform conditional into parallel execution with test at the end. Useful when test depends on late signals. May preclude hardware sharing. Always useful for logic expressions. Example: y= ab; if ( a) x= b+d; else x= bd; can be expanded to: x= a( b+ d) + a’bd; y= ab; x= y+ d( a+ b);

6 Pipelining

7 Associativity Transformation

8 FIR Parallelization

9 FIR PARALLELIZATION

10 FIR Filter Parallelization

11 FIR parallelization: two working phases

12 IIR filter recursive function

13 Recursive Function

14 Interlaced Accumulation Programming for Low Power

15 4. Register Transfer Level Design

16 FIR3 Block Diagram and Flow Graph

17 High-Level Power Estimation P core = P DP + P MEM + P CNTR + P PROC P DP = P REG +P MUX +P FU + +P FU, where P REG is the power of the registers P MUX is the power of multiplexers P FU is the power of functional units P INT is the power of physical interconnet capacitance

18 High-Level Power Estimation: P REG Compute the lifetimes of all the variables in the given VHDL code. Represent the lifetime of each variable as a vertical line from statement i through statement i + n in the column j reserved for the corresponding varibale v j. Determine the maximum number N of overlapping lifetimes computing the maximum number of vertical lines intersecting with any horizontal cut-line. Estimate the minimal number of N of set of registers necessary to implement the code by using register sharing. Register sharing has to be applied whenever a group of variables, with the same bit-width b i. Select a possible mapping of variables into registers by using register sharing Compute the number w i of write to the variables mapped to the same set of registers. Estimate n i of each set of register dividing w i by the number of statements S: i =w i /S; hence TR imax = n i f clk. Power of latches and flip flops is consumed not only during output transitions, but also during all clock edges by the internal clock buffers The non-switching power P NSK dissipated by internal clock buffers accounts for 30% of the average power for the 0.38-micron and 3.3 V operating system. In total,

19 P CNTR After scheduling, the control is defined and optimized by the hardware mapper and further by the logic synthesis process before mapping to layout. Like interconnect, therefore, the control needs to be estimated statistically. Global control model: Local control model: the local controller account for a larger percentage of the total capacitance than the global controller. Where Ntrans is the number of tansitions, nstates is the number of states, Bf is the bus factor, and Clc is the capacitance switched in any local controller in one sample period. Bf is the ratio of the number of bus accesses to the number of busses.

20 N trans The number of transitions depends on assignment, scheduling, optimizations, logic optimization, the standard cell library used, the amount of glitchings and the statistics of the inputs.

21 Behavioral Synthesis loop unrolling : localize the data to reduce the activity of the inputs of the functional units or two output samples are computed in parallel based on two input samples. Neither the capacitance switched nor the voltage is altered. However, loop unrolling enables several other transformations (distributivity, constant propagation, and pipelining). After distributivity and constant propagation, The transformation yields critical path of 3, thus voltage can be dropped. Clock Selection : Choose optimal system clock period Eliminate slacks/improve resource utilization and Enable greater voltage scaling Module selection : For each operation, choose library template Flow graph restructuring : pull out operations on the critical cycle.

22 High-Level Power Estimation: P MUX and P FU

23 Critical Path Longest delayed path from input to output in combinational logic Determine operating clock frequency Resizing non-critical path transistor (In-Place Optimization) Critical path in Synchronous Sequential logic

24 Loop Unrolling for Low Power

25 Retiming Flip- flop insertion to minimize hazard activity moving a flip- flop in a circuit

26 Exploiting spatial locality for interconnect power reduction Global Local Adder1 Adder2

27 Balancing maximal time-sharing and fully-parallel implementation A fourth-order parallel-form IIR filter (a) Local assignment (2 global transfers), (b) Non-local assignment (20 global transfers)

28 Retiming/pipelining for Critical path


Download ppt "Sub-expression elimination Logic expressions: –Performed by logic optimization. –Kernel-based methods. Arithmetic expressions: –Search isomorphic patterns."

Similar presentations


Ads by Google