Logical Effort- A Way of Designing Fast CMOS Circuits

Logical Effort- A Way of Designing Fast CMOS Circuits
EE 671: VLSI DESIGN COURSE for Master and Bachelor Class of 2010 Logical Effort- A Way of Designing Fast CMOS Circuits Arun N. Chandorkar Professor of Nano-Electronics and VLSI Department of Electrical Engineering Indian Institute of Technology Bombay. Powai, Mumbai ,Maharashtra 6th September 2010 Arun N.Chandorkar,IIT Bombay

ACKNOWLEDGEMENTS Evan Sutherland,Bob Sproull and D. Harris
for their Great “ Logical Effort”. I salute them. D. Harris and B.Murmann of Stanford University for their Course Slides. J. Rabaey, A Chandrakasan and Nikolic for their “ Effort” in Chapter 13 of book “ Digital Integrated Circuits Design”( Prentice –Hall). My Former students P.Agashe, Kapil Jain and Prakash Guwalani for help in PPT preparation. Many Post Graduate students of my VLSI Design Course in Last 20 Years –Asking Queries resulting in my “ Fundas”* improving. * “Fundamentals that is Basics”

OUTLINE Introduction Logical effort estimation for Gates
“Forks” – Amplifier Chain of stages Branches –Designing the circuits with them Logical effort for Circuit families Concluding Remarks on “Logical Effort”

D.Harris

Evaluation of Total Load Capacitance

Switching Response of CMOS Inverter
Rabaey’s Book

Weak NMOS & Strong PMOS Weak PMOS & Strong NMOS

CMOS Inverter Switching Characteristics
Rabaey’s Book

Definitions for Dynamic response of an Inverter
• Switching speed - limited by time taken to charge and discharge, CL • Rise time, tr : waveform to Rise from 10% to 90% of its steady state value • Fall time tf : Waveform to Fall from 90% to 10% of steady state value • Delay time, td : time difference between input transition (50%) and 50% output level

The Propagation delay The propagation delay as the average of the two delays The Speed ( Max.Frequency)of an Inverter is approx. f = 1 /( Propagation Delay)

Here kn and kp are same as beta of n and p transistors
The propagation delays if calculated as indicated before turn out to be, Here kn and kp are same as beta of n and p transistors

Arun N.Chandorkar,IIT Bombay
Optimizing Delay Optimizing delay can be broken into two categories Gate Size selection Transistor sizing Gate size selection is done in a standard cell design approach in which you have a library that offers multiple drive strength cells and pick the cells sizes that give the highest speed for a design Current synthesis tools do a good job Transistor sizing is done in a custom design in which you size individual transistors during the design process to optimize delay quality depends on individual designer some synthesis help available simulation iteration a tempting option but can be time consuming Arun N.Chandorkar,IIT Bombay

Gate Size Selection Many algorithms for gate size selection exist One iterative approach is known as the Tilos algorithm Assumptions: Can compute the delay along a path of gates Have multiple gate sizes to choose from Will yield good results for a path delay Arun N.Chandorkar,IIT Bombay

Tilos Algorithm Step #1: Start with Minimum gate sizes, set current_gate equal to last gate, driving_gate to current_gate –1. 1x 1x 1x 1x CL g1 g4 g2 g3 Measure delay, call this last_delay . Step #2a: Increment size of current_gate, compute delay_a 2x 1x 1x 1x CL g1 g4 g2 g3 Arun N.Chandorkar,IIT Bombay

Tilos Algorithm (cont.) Step #2b: Restore current_gate size. Increment size of driving_gate, compute delay_b 1x 1x 1x 2x CL g1 g4 g2 g3 Step #3: Compare delays A, B against last_delay. Whichever shows the greatest improvement, use this new gate configuration and set last_delay equal to the new delay. Repeat Steps #2, #3 until no further delay improvement. Set current_gate to driving_gate, driving_gate to current_gate-1 and repeat until all gates sized (an exception: the first gate size is considered a FIXED size as in an input buffer). Arun N.Chandorkar,IIT Bombay

To save execution time, do have to compute entire path delay.
Some Observations To save execution time, do have to compute entire path delay. Computing changes in delay in a ‘window’ around sized-gate CL 1x 1x 1x 1x 2x g1 g4 g4 g2 g3 Compute delay changes here Also, gate sizes do not have to be exact to get near optimum delay. If optimum gate size happens to be 2.5x, a choice of 2X or 3X will yield good results. This means that rough estimation of gate sizes or transistor sizes can often be satisfactory.

Rules of Thumb Keep fan-in low to keep #transistors in series low (for sub-micron, often <= 3). Keep fan-out < 5 Along a critical path, the minimum delay is achieved if each stage delay is about equal . Keep rise/fall times about equal. Arun N.Chandorkar,IIT Bombay

Will look at static CMOS application first.
Estimating Gate Delay, Transistor sizing Would be nice to have a “back of the envelope” method of sizing gates/transistors that would be easy to use and would yield reasonable results Sutherland/Sproull/Harris book “Logic Effort: Designing Fast CMOS Circuits” introduces a method called “Logical Effort” We will attempt to apply this method during this Course to the circuits that we will look at. Will look at static CMOS application first.

Gate Delay Model Delay will always be normalized to dimensionless units to isolate effects of fabrication process dabs = d *  Where  (Tau) is the delay of a minimum sized inverter driving an identical inverter with no parasitics. Tau is NOT the no-load delay of an inverter. Also, it is not the delay of a 1x inverter driving a 1x inverter since this includes the delay contributions due to parasitics. Delay of a logic gate is composed of the delay due to parasitic delay p (no load delay) and the delay due to load (effort delay or stage effort f) d = f + p

f = g * h So delay is dabs = (f + p) *  = (g*h + p) * 
Logical effort, Electrical Effort The stage effort f (delay due to load) can be expressed as a product of two terms: f = g * h So delay is dabs = (f + p) *  = (g*h + p) *  g captures properties of the logic gate and is called the logical effort. h captures properties of the load and is called the electrical effort.

Logical Effort: dabs = (g*h + p) *  Previous RC model
RC model versus Logical Effort Model On the surface, this does not look different from the model discussed earlier: Logical Effort: dabs = (g*h + p) *  Previous RC model Gate delay = K * Cload + no-load delay Where K represented the pull-up/pull-down strength of the PMOS/NMOS tree. It would help to see how the RC model can be used to derive the logical effort model.

Derivation of Logical Effort Equations via RC model
Logic Gate Model Logic Gate Model Rui : pull-up resistance R Rui input in output out Cpi Cout Cout Cin C R Rdi Rdi : pull-down resistance Cpi: parasitic cap of gate

Tau - Unit Delay Tau () is the absolute delay of a 1x inverter driving a 1x inverter with no parasitics. We assume equal pull-up/pull down Rinv, and Cin = Cinv, so: Tau =  * Rinv * Cinv where  is a constant characteristic of the fabrication process that relates RC time constants to delay. Note: Tau is NOT the no-load delay of an inverter. Also, it is not the delay of a 1x inverter driving a 1x inverter since this includes the parasitic delay! This means that determination of Tau cannot be done via one delay measurement.

Cin =  * Ct input cap scales up
Template Circuit A template circuit is chosen as the basis upon which other gates are scaled. The scaling factor is  Ct is the input cap of the template. Rt is the Pull-up or Pull-down resistance of the template. Cpt is the parasitic capacitance of the template. Cin =  * Ct input cap scales up Ri = Rui = Rdi = Rt /  channel resistance scales down Cpi =  * Cpt parasitics scale up

=  (Rt/  ) Cin (Cout/Cin) +  (Rt/  ) ( Cpt)
RC Delay Method( J. Rabaey et al.) Dabs =  Ri (Cout + Cpi) =  (Rt/  ) Cin (Cout/Cin) +  (Rt/  ) ( Cpt) = ( Rt Ct) (Cout/Cin) +  Rt Cpt Written in this form, can see relation to logical effort model: Dabs =  (gh + p)  =  Rinv Cinv (previous definition) g = (Rt Ct)/(Rinv Cinv) Note: if template = 1X inverter, then g = 1 !!!! h = Cout/ Cin p = (Rt Cpt)/ (Rinv Cinv) Note: book value of Pinv = 1 only true if Cpt (parasitics) = Cinv (Cgate)!!

“Forks” – Amplifier Chain of stages Branches –Designing the circuits with them. Design for Asymmetric Logic Gates. Logical effort for Circuit families Concluding Remarks on “Logical Effort”

Logical Effort inverter vs. nor2
4 B g = 5/3 W=2 A 4 A Y Y W=1 A B 1 1 Intuitive result, g for Nor2 is higher than g for Nand2

Logical Effort inverter vs. Complex gate
B C F g = 1 4 W=2 2 A 4 Y W=1 g(a) = 4/3 g(b,c) = 2 2 2 2 Intuitive result, worse case g of complex gate higher than Nand2 or Nor2. In general, more inputs, more series transistors, the higher the g value.

Logical Effort vs. Electrical Effort The value for logical effort g depends on what gate is chosen as the template gate (g=1) Choosing a different template gate will alter ‘g’ values for the other gates in your library The g value captures the effects of varying number of inputs, and transistor topology on more complex gates than your template gate More complex gates will require more logical effort to produce the same output current as the template gate, and will also present a higher input load The logical effort for a 1x Nand2, 2X Nand2, 4X Nand2 are all the same – the effect of the extra load by the larger transistors is captured by the electrical effort parameter Arun N.Chandorkar,IIT Bombay

Logical Effort vs. Electrical Effort The electrical effort h parameter is used to capture the driving capability of the gate via transistor sizing and also the effect of transistor sizes on loading Electrical effort h is defined as h = Cout / Cin where Cout is the load capacitance, Cin is the input capacitance of the gate. Note that h for a gate will reduce as the transistors become wider since Cin increases (Cout is assume fixed). Arun N.Chandorkar,IIT Bombay

The Parasitic Delay p Note that the parasitic delay (no-load) p is a constant and independent of transistor size; as you increase the transistor sizes the capacitance of the gate/source/drain areas increase also which keeps no-load delay constant To measure p (once p is known, can compute  ). 1x 1x Method #1 A B A_delay = (g*h + p) *  = (1*1 + p) *   = (A_delay)/(1+p) 1x 2x C_delay = (g*h + p) *  = (1*2 + p) *  C_delay = (2+p) (A_delay)/(1+p) p = (2*A_delay – C_delay)/ (C_delay-A_delay) C D Arun N.Chandorkar,IIT Bombay

Logical Effort ( g ) In the Sutherland/Sproull model, the logical effort g factor is normalized to a minimum sized inverter for static CMOS. So g for an inverter is equal to 1. Logical effort g of other gates represents how much more input capacitance a gate must present to produce the same output current as the inverter (the template gate) B g = 1 2 2 A W=2 g = 4/3 A Y Y 2 W=1 g = Cin(nand)/Cin(inv) B 2

Some Observations To save execution time, do have to compute entire path delay. Computing changes in delay in a ‘window’ around sized-gate CL 1x 1x 1x 1x 2x g1 g4 g5 g2 g3 Compute delay changes here Also, gate sizes do not have to be exact to get near optimum delay. If optimum gate size happens to be 2.5x, a choice of 2X or 3X will yield good results. This means that rough estimation of gate sizes or transistor sizes can often be satisfactory. Arun N.Chandorkar,IIT Bombay

Method #2: A Better Way to Measure p and Tau
1x 1x 1x ?x 1x g4 DUT g3 Vary 1x, 2x, 4x, 6x, 8x Cload ratio is G3/DUT realistic waveform shaping Measure delay from A to B Fixed, end load to prevent Miller effect on G3

Plot Delay, Fit to Straight line (delay = mX + b)
No load delay

Total logical effort : Logical effort per data input :
n-way Multiplexer Total logical effort : Logical effort per data input :

Xor Gate Total Logical effort : Logical effort per input : 2
Logical effort per input bundle :

Generalized Xor/Parity Gates
Previous Gate can be generalized for n inputs pull down chains n transistors in series each each of width n pull up chains each of width Total Logical effort Logical effort per input Logical effort per input bundle

Generalized Xor/Parity Gates
e.g. for 3 inputs Total Logical effort = 36 Effort per bundle = 12

Asymmetric Design with reduced logical effort
Total Logical effort = 24 Effort for bundle a = 6 Effort for bundle b = 12 Effort for bundle c = 6

Majority Gate Symmetric Design Total logical effort :
Logical effort per input = 4 Symmetric 3 input majority gate

Majority Gate Asymmetric Design Total Logical effort = 10
Logical effort for a = 2 Logical effort for b = 4 Logical effort for c = 4 Asymmetric 3 input majority gate

Adder Carry Chain Total Logical effort = Logical effort for
One stage of a ripple carry chain in an adder Total Logical effort = Logical effort for Effort for input g = Effort for input

Dynamic Latch Total Logical effort = 4
Logical effort per input for d =2 Logical effort of bundle

Dynamic Muller C-element
Total Logical effort Logical effort per input = n

Delay Estimation in a Path

1 Cout/Cin=1 1 gh+p= 1x1+1=2 1 / (2.d. Tau. N )

1 Cout/Cin=4 1 gh+p=1x4+1=5

Miscellaneous Comments
Note that you never size the first gate. This gate size is assumed to be fixed (same as in the Tilos algorithm) – if you were allowed to size this gate you find that the algorithm would want to make it as large as possible. This is an estimation algorithm. The author claims that sizing a gate by 1.5x too big or two small still results in path delay within 5% of minimum.

“Forks” – Amplifier Chain of stages Branches –Designing the circuits with them. Logical effort for Circuit families Concluding Remarks on “Logical Effort”

Forks Applying Logical effort for branching is difficult
A common case of branching is generating true and complement forms of a signal. Such circuits are called forks. For example Multiplexer and XOR circuits require complementary signal so they need fork. A general Fork circuit one leg inverts one does not.

The fork circuit form A fork consists of two strings of inverter that share a common input. One string contains odd number of inverters and other even. A 2-1 fork and 3-2 fork

A general fork with load capacitance
Cout=ca+cb, the input capacitance is also divided in 2 parts one for each path Cin=cina+cinb Total electrical effort H=Cout/Cin Individual electrical efforts are Ha=Ca/Cina, Hb=Cb/Cinb, Respectively Even if Ca=Cb,Ha and Hb may not be equal

An example Design a 2-1 fork with input capacitance Cin=10 and total output capacitance Cout=200?what is the total delay of the fork?

We know delay in a path is D= N(F)**(1/N) +P
Solution We have Cin= 10, Ca=Cb=100, Let fraction of input capacitance given to path with 2 inverters is β,such that delay in both legs is equal The equation can be written as By solving the equation we get β=0.258, Cina=10 β=2.6 Cinb=10(1- β)=7.4 The delay will be 14.5 units and will be equal for both legs. We know delay in a path is D= N(F)**(1/N) +P

If we use 3-2 fork then equation will be
This equation gives β=0.513 and delay of 11.1, that is 3-2 fork is better than 2-1 fork for this electrical effort, Now question is what is the best number of stages for given electrical effort?

How many stages should be used?
Following table gives the fork circuit form to be used for given electrical efforts. Electrical Effort Fork Structure From To 9.68 2-1 38.7 3-2 146 4-3 538 5-4 1970 6-5 7150 7-6 The various electrical efforts are the breakpoints where 2 circuit forms provide same delay,

Summary Draw a network Buffer non critical path with minimum sized gates Estimate the total effort along each path Verify the no. of stages Estimate the branching ratio Compute accurate delays including parasitic effects Adjust B to minimize these delays

Designing Asymmetric Logic Gates
If s=1/2 gate is symmetric If 0<s<½ , it favors input a If ½<s<1, favors input b For s=0.01 ga= gb=34 gtot=35

Less extreme (more practical) asymmetric circuits
E.g. for s= ¼ Pulldown transistors widths=4/3 & 4 Similarly, gb=2 gtot=3.1 little more than 8/3 Stray capacitances must be taken care of order transistors such that smaller transistors are near output node This means which favors input a

“Forks” – Amplifier Chain of stages Branches –Designing the circuits with them. Logical effort for Circuit families Concluding Remarks on “Logical Effort”

Circuit Families Pseudo NMOS Inverter Nand Nor

For falling transition the current drive by NMOS will be 4/3 times the normal inverter but there will be 1/3 current drive from PMOS making delay for falling same as normal inverter so gd=4/9 For Rising transition the delay will be three times the normal inverter so gu=4/3, so average g=8/9 for inverter The table shown below gives logical efforts for various gates using Pseudo NMOS logic style GATE Logical Effort g Rising Falling Average 2-Nand 8/3 8/9 16/9 3-Nand 4 4/3 4-Nand 16/3 32/9 n-Nor 4/9 n-Mux

The falling logical effort gd=2/3
Symmetric NOR gate Johnson proposed a novel structure for a 2-input NOR, shown in Figure The gate consists of two inverters with shorted outputs, ratioed such that an inverter pulling down can overpower an inverter pulling up. The falling logical effort gd=2/3 For rising effort there will be 2 PMOS in parallel giving gu=1 So average logical effort g=5/6

Dynamic logic gates Problems with Pseudo-NMOS
quiescent power dissipation. contention between the pull-up and pull down transistors. Dynamic gates offer even better logical effort and lower power consumption by using a clocked precharge transistor instead of a pull up that is always conducting. There are 2 phases of logic evaluation in Dynamic logic gates Precharge Output node is precharged to logic 1 when clock is low 2. Evaluation Logic is evaluated by enabling pull down network when clock is high

Logical effort of Domino circuits
Gates with evaluation transistor Inverter Nand Nor Gates without evaluation transistor Inverter Nand Nor

Clocked evaluation transistor?
Table for Logical effort of dynamic gates Gate Type Clocked evaluation transistor? Formula N=2 N=3 N=4 Inverter Yes 2/3 No 1/3 Nand (n+1)/3 1 4/3 5/3 n/3 Nor Multiplexer

“Forks” – Amplifier Chain of stages Branches –Designing the circuits with them. Design for Asymmetric Logic Gates. Logical effort for Circuit families Concluding Remarks on “Logical Effort”

Insights from logical effort
Allows to compare alternative circuit topologies Circuits are fastest when effort delay of each stage is the same One should select the no. of stages to make this effort about 4 Path delay is very insensitive to modest deviations from the optimum thus allowing “back of an envelop” calculations

Delay of well designed path is 4 (log4G+ log4H)+P The logical effort of a gate increases as the number of inputs grows We should practically limit 4 series transistors in logic gates and 4 inputs to multiplexers, beyond this split gates into multiple stages

Branched circuits should differ by not more than one gate between the branches Better to use 1-2 or 2-3 forks instead of 0-1 forks Choosing P/N ratio equal to square root of the ratio which gives equal rising and falling delays P/N ratio of 1.5 works well for virtually all processes Logical effort quantifies the benefits of different circuit families.

THANK YOU

Logical Effort- A Way of Designing Fast CMOS Circuits

Similar presentations

Presentation on theme: "Logical Effort- A Way of Designing Fast CMOS Circuits"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Logical Effort- A Way of Designing Fast CMOS Circuits

Similar presentations

Presentation on theme: "Logical Effort- A Way of Designing Fast CMOS Circuits"— Presentation transcript:

Similar presentations

About project

Feedback