Download presentation
Presentation is loading. Please wait.
1
CSE241 RTL Performance.1Kahng & Cichy, UCSD ©2003 CSE241A VLSI Digital Circuits Winter 2003 Recitation 2.5: Performance Coding
2
CSE241 RTL Performance.2Kahng & Cichy, UCSD ©2003 Introduction: Performance Coding l Overview l Critical Paths l Hierarchy l RTL Operators l Multiplexers l Parallelism l Order
3
CSE241 RTL Performance.3Kahng & Cichy, UCSD ©2003 Overview: Why Code? Motivation l Increase speed! l Reduce cycle times l Build portability
4
CSE241 RTL Performance.4Kahng & Cichy, UCSD ©2003 Critical Paths Identify critical signals l Slow signals, slow paths l Reduce logic path depth -Less, gates in path = higher clock rate l Connect critical net closest to output -Decrease overall delay for function a b c d e a b e d c
5
CSE241 RTL Performance.5Kahng & Cichy, UCSD ©2003 Hierarchy Block size l Too many blocks increase delay (logic depth) l Too much hierarchy -Signals have to traverse whole hierarchy. l Possible for synthesizer to reduce this -Synopsys “ungroup” command
6
CSE241 RTL Performance.6Kahng & Cichy, UCSD ©2003 Resource Sharing Bad Example l Adders use a lot of resources + + MUX sum if (select) sum <= a + b; else sum <= c + d; A D C B select
7
CSE241 RTL Performance.7Kahng & Cichy, UCSD ©2003 Resource Sharing Good Example l Infer two muxes + sum if (select) tmp1 <= a; tmp2 <= b; else tmp1 <= c; tmp2 <= d; sum <= tmp1 + tmp2; A D C B select
8
CSE241 RTL Performance.8Kahng & Cichy, UCSD ©2003 Loops Move operators outside loops l I.e move + (adder) outside the loop l Will reduce adders instantiated l Make sure: -Critical signals are addressed
9
CSE241 RTL Performance.9Kahng & Cichy, UCSD ©2003 Muxes Instantiate Muxes l Don’t wait for the tool to infer muxes l Create a gate-level mux and use it in code l If – then else -Might create unwanted logic l Use explicit case statements to infer muxes -If not using muxes -Technology independence
10
CSE241 RTL Performance.10Kahng & Cichy, UCSD ©2003 Parentheses Use parentheses to optimize e.g.: out = a + b + c + d; vs out = (a + b) + (c + d); The first statement creates 3 adders in series The second statement creates 2 adders in parallel
11
CSE241 RTL Performance.11Kahng & Cichy, UCSD ©2003 Operators * / % Multiply, divide, and modulo l High Cost of operators l Will create extra non-optimized logic l Better to create design blocks -Instantiate fast tree adder -Instantiate wallace tree -Otherwise have synthesis tool create RCA (Ripple Carry Adder)
12
CSE241 RTL Performance.12Kahng & Cichy, UCSD ©2003 Adder example Adder Carry look ahead module Add_prop_gen (sum, c_out, a, b, c_in);// generic 4-bit carry // look-ahead adder // behavioral model output [3:0] sum; output c_out; input [3:0] a, b; input c_in; reg [3:0] carrychain; wire [3:0] g = a & b; // carry generate, contin assignment, bitwise and wire [3:0] p = a ^ b; // carry propagate, contin assignment, bitwise xor always @(a or b or c_in) // event "or" begin: carry_generation // usage: block name integer i; #0 carrychain[0] = g[0] | (p[0] & c_in); / Eliminate race for(i = 1; i <= 3; i = i + 1) begin carrychain[i] = g[i] | (p[i] & carrychain[i-1]); end wire [4:0] shiftedcarry = {carrychain, c_in} ; // concatenation wire [3:0] sum = p ^ shiftedcarry; // summation wire c_out = shiftedcarry[4]; // carry out bit select endmodule
13
CSE241 RTL Performance.13Kahng & Cichy, UCSD ©2003 Parallelizing Use hierarchy l Like in hardware, use parallel connections l + operator will create RCA l Example: Carry lookahead adder C 1 = G 0 | P 0 C 0 C 2 = G 1 | P 1 G 0 | P 1 P 0 C 0 C 3 = G 2 | P 2 G 1 | P 2 P 1 G 0 | P 2 P 1 P 0 C 0 C 4 = G 3 | P 3 G 2 | P 3 P 2 G 1 | P 3 P 2 P 1 G 0 | P 3 P 2 P 1 P 0 C 0 l Only 4 gate delay vs 9 gates for RCA l Only trade-off is more logic (more area) -Silicon has lots of real estate (to a point; think wiring)
14
CSE241 RTL Performance.14Kahng & Cichy, UCSD ©2003 If – then else, Case Use of if statement l Introduces priority of nested “ifs” l Case has no order l Can mix cas and if Similar delay for all data signals CASE state IS WHEN s1 => Z Z Z Z <= d; END CASE; Speeds up signal s2 CASE state IS WHEN s1 => tmp <= a; WHEN s3 => tmp <= c; WHEN OTHERS => tmp <= d; END CASE; IF (state = s2) THEN Z <= b; ELSE Z <= tmp; END IF;
15
CSE241 RTL Performance.15Kahng & Cichy, UCSD ©2003 Order Dependency Blocking vs. Non-Blocking Assignments l Use non-blocking statements when doing sequential assignments like pipelining and modeling of several mutually exclusive data transfers l Blocking assignments within sequential processes may cause race conditions l Non-blocking assignments are order independent
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.