Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSE241 RTL Performance.1Kahng & Cichy, UCSD ©2003 CSE241A VLSI Digital Circuits Winter 2003 Recitation 2.5: Performance Coding.

Similar presentations


Presentation on theme: "CSE241 RTL Performance.1Kahng & Cichy, UCSD ©2003 CSE241A VLSI Digital Circuits Winter 2003 Recitation 2.5: Performance Coding."— Presentation transcript:

1 CSE241 RTL Performance.1Kahng & Cichy, UCSD ©2003 CSE241A VLSI Digital Circuits Winter 2003 Recitation 2.5: Performance Coding

2 CSE241 RTL Performance.2Kahng & Cichy, UCSD ©2003 Introduction: Performance Coding l Overview l Critical Paths l Hierarchy l RTL Operators l Multiplexers l Parallelism l Order

3 CSE241 RTL Performance.3Kahng & Cichy, UCSD ©2003 Overview: Why Code?  Motivation l Increase speed! l Reduce cycle times l Build portability

4 CSE241 RTL Performance.4Kahng & Cichy, UCSD ©2003 Critical Paths  Identify critical signals l Slow signals, slow paths l Reduce logic path depth -Less, gates in path = higher clock rate l Connect critical net closest to output -Decrease overall delay for function a b c d e a b e d c

5 CSE241 RTL Performance.5Kahng & Cichy, UCSD ©2003 Hierarchy  Block size l Too many blocks increase delay (logic depth) l Too much hierarchy -Signals have to traverse whole hierarchy. l Possible for synthesizer to reduce this -Synopsys “ungroup” command

6 CSE241 RTL Performance.6Kahng & Cichy, UCSD ©2003 Resource Sharing  Bad Example l Adders use a lot of resources + + MUX sum if (select) sum <= a + b; else sum <= c + d; A D C B select

7 CSE241 RTL Performance.7Kahng & Cichy, UCSD ©2003 Resource Sharing  Good Example l Infer two muxes + sum if (select) tmp1 <= a; tmp2 <= b; else tmp1 <= c; tmp2 <= d; sum <= tmp1 + tmp2; A D C B select

8 CSE241 RTL Performance.8Kahng & Cichy, UCSD ©2003 Loops  Move operators outside loops l I.e move + (adder) outside the loop l Will reduce adders instantiated l Make sure: -Critical signals are addressed

9 CSE241 RTL Performance.9Kahng & Cichy, UCSD ©2003 Muxes  Instantiate Muxes l Don’t wait for the tool to infer muxes l Create a gate-level mux and use it in code l If – then else -Might create unwanted logic l Use explicit case statements to infer muxes -If not using muxes -Technology independence

10 CSE241 RTL Performance.10Kahng & Cichy, UCSD ©2003 Parentheses  Use parentheses to optimize e.g.: out = a + b + c + d; vs out = (a + b) + (c + d);  The first statement creates 3 adders in series  The second statement creates 2 adders in parallel

11 CSE241 RTL Performance.11Kahng & Cichy, UCSD ©2003 Operators  * / %  Multiply, divide, and modulo l High Cost of operators l Will create extra non-optimized logic l Better to create design blocks -Instantiate fast tree adder -Instantiate wallace tree -Otherwise have synthesis tool create RCA (Ripple Carry Adder)

12 CSE241 RTL Performance.12Kahng & Cichy, UCSD ©2003 Adder example  Adder Carry look ahead module Add_prop_gen (sum, c_out, a, b, c_in);// generic 4-bit carry // look-ahead adder // behavioral model output [3:0] sum; output c_out; input [3:0] a, b; input c_in; reg [3:0] carrychain; wire [3:0] g = a & b; // carry generate, contin assignment, bitwise and wire [3:0] p = a ^ b; // carry propagate, contin assignment, bitwise xor always @(a or b or c_in) // event "or" begin: carry_generation // usage: block name integer i; #0 carrychain[0] = g[0] | (p[0] & c_in); / Eliminate race for(i = 1; i <= 3; i = i + 1) begin carrychain[i] = g[i] | (p[i] & carrychain[i-1]); end wire [4:0] shiftedcarry = {carrychain, c_in} ; // concatenation wire [3:0] sum = p ^ shiftedcarry; // summation wire c_out = shiftedcarry[4]; // carry out bit select endmodule

13 CSE241 RTL Performance.13Kahng & Cichy, UCSD ©2003 Parallelizing  Use hierarchy l Like in hardware, use parallel connections l + operator will create RCA l Example: Carry lookahead adder C 1 = G 0 | P 0 C 0 C 2 = G 1 | P 1 G 0 | P 1 P 0 C 0 C 3 = G 2 | P 2 G 1 | P 2 P 1 G 0 | P 2 P 1 P 0 C 0 C 4 = G 3 | P 3 G 2 | P 3 P 2 G 1 | P 3 P 2 P 1 G 0 | P 3 P 2 P 1 P 0 C 0 l Only 4 gate delay vs 9 gates for RCA l Only trade-off is more logic (more area) -Silicon has lots of real estate (to a point; think wiring)

14 CSE241 RTL Performance.14Kahng & Cichy, UCSD ©2003 If – then else, Case  Use of if statement l Introduces priority of nested “ifs” l Case has no order l Can mix cas and if Similar delay for all data signals CASE state IS WHEN s1 => Z Z Z Z <= d; END CASE; Speeds up signal s2 CASE state IS WHEN s1 => tmp <= a; WHEN s3 => tmp <= c; WHEN OTHERS => tmp <= d; END CASE; IF (state = s2) THEN Z <= b; ELSE Z <= tmp; END IF;

15 CSE241 RTL Performance.15Kahng & Cichy, UCSD ©2003 Order Dependency  Blocking vs. Non-Blocking Assignments l Use non-blocking statements when doing sequential assignments like pipelining and modeling of several mutually exclusive data transfers l Blocking assignments within sequential processes may cause race conditions l Non-blocking assignments are order independent


Download ppt "CSE241 RTL Performance.1Kahng & Cichy, UCSD ©2003 CSE241A VLSI Digital Circuits Winter 2003 Recitation 2.5: Performance Coding."

Similar presentations


Ads by Google