COMBINATIONAL LOGIC - 1
Overview
Combinational vs. Sequential Logic
Static CMOS Circuit At every point in time (except during the switching transients) each gate output is connected to either V DD or ss via a low-resistive path. The outputs of the gates assume at all times the value of the Boolean function, implemented by the circuit (ignoring, once again, the transient effects during switching periods). This is in contrast to the dynamic circuit class, which relies on temporary storage of signal values on the capacitance of high impedance circuit nodes.
Static CMOS
NMOS Transistors in Series/Parallel Connection Transistors can be thought as a switch controlled by its gate signal NMOS switch closes when switch control input is high
PMOS Transistors in Series/Parallel Connection
Pull-Up and Pull-Down with NMOS and PMOS
Complementary CMOS Logic Style Construction
Example Gate: NAND
Example Gate: NOR
Example Gate: COMPLEX CMOS GATE
Cell Design Standard Cells Datapath Cells General purpose logic Can be synthesized Same height, varying width Datapath Cells For regular, structured designs (arithmetic) Includes some wiring in the cell Fixed height and width
Standard Cell Layout Methodology – 1980s Routing channel VDD signals Route VDD and GND horizontally Route singals in poly perpendicular to VDD and GND (vertically) – poly can serve as input to both nfets and pfets Order inputs (consistent Euler path) to optimize the horizontal connectivity of diff strips want unbroken row of devices with abutting source/drain connections – so there is only one strip of diffusion in both wells 3) Place diffs in horizontal strips 4) Interconnect appropriately Interconnect between cells are done in “routing channels” Contacts and wells not shown. What does this implement?? (NAND feeding an Inverter – so an AND) GND
Standard Cell Layout Methodology – 1990s Mirrored Cell No Routing channels VDD VDD M2 Contacts and wells not shown. What does this implement?? M3 GND Mirrored Cell GND
Standard Cells Cell height 12 metal tracks N Well Cell height 12 metal tracks Metal track is approx. 3 + 3 Pitch = repetitive distance between objects Cell height is “12 pitch” V DD Out 2 In Rails ~10 GND Cell boundary
Standard Cells With minimal diffusion routing With silicided diffusion V DD With silicided diffusion V DD Out In Out In GND GND
Standard Cells 2-input NAND gate V DD A B Out GND
Stick Diagrams Contains no dimensions Represents relative positions of transistors V DD V DD Inverter NAND2 Out Out In A B GND GND
Two Versions of C • (A + B) VDD VDD X X Line of diffusion layout – abutting source-drain connections Note crossover eliminated by A B C ordering GND GND
Logic Graph Logic Graph j VDD X i GND A B C PUN PDN A j C B X = C • (A + B) C i Systematic approach to derive order of input signal wires so gate can be laid out to minimize area Note PUN and PDN are duals (parallel <-> series) Vertices are nodes (signals) of circuit, VDD, X, GND and edges are transitions A B A B C
Euler Path & Consistent Euler Path X V DD C A X B i j PUN C i VDD X B A j A path through all nodes in the graph such that each edge is visited once and only once. The sequence of signals on the path is the signal ordering for the inputs. PUN and PDN Euler paths are (must be) consistent (same sequence) If you can define a Euler path then you can generate a layout with no diffusion breaks A B C C A B B C A no PDN B A C A C B -> no PDN C B A PUN A B C GND
OAI22 Logic Graph Consistent Euler Path X PUN A C D C B D VDD X X = (A+B)•(C+D) C D B A A B PDN A GND B C D Consistent Euler Path
AOI22 Consistent Euler Path X X B C B C X V DD X V DD A D A D GND GND (a) Logic graphs for ( AB+CD ) (b) Euler Paths { a b c d } V DD x GND A B C D (c) stick diagram for ordering { ABCD }
Multi-Fingered Transistors One finger Two fingers (folded) Less diffusion capacitance
Properties of Complementary CMOS Gates Full rail-to-rail swing; high noise margins Logic levels not dependent upon the relative device sizes; ratioless Always a path to Vdd or Gnd in steady state; low output impedance Extremely high input resistance; nearly zero steady-state input current No direct path steady state between power and ground; no static power dissipation Propagation delay function of load capacitance and resistance of transistors
The Switch Model
VTC of Complementary CMOS Gates
Analysis of Propagation Delay Delay is dependent on the pattern of inputs Low to high transition both inputs go low delay is 0.69 Rp/2 CL one input goes low delay is 0.69 Rp CL High to low transition both inputs go high delay is 0.69 2Rn CL CL B Rn A Rp Cint
Delay Dependence on Input Patterns Input Data Pattern Delay (psec) A=B=01 67 A=1, B=01 64 A= 01, B=1 61 A=B=10 45 A=1, B=10 80 A= 10, B=1 76 A=B=10 A=1, B=1 0 Voltage [V] A=1 0, B=1 Gate sizing should result in approximately equal worst case rise and fall times. Reason for difference in the last two delays is due to internal node capacitance of the pulldown stack. When A transitions, the pullup only has to charge CL; when A=1 and B transitions pullup have to charge up both CL and Cint. For high to low transitions (first three cases) delay depends on state of internal node. Worst case happens when internal node is charged up to VDD – VTn. Conclusions: Estimates of delay can be fairly complex – have to consider internal node capacitances and the data patterns. time [ps] Assumes worst case RPUP = RPDN NMOS = 0.5m/0.25 m RPDN = 2 x [1/2 x Rn(Min)] PMOS = 0.75m/0.25 m RPUP = 3 x [1/3 x Rn(Min)] CL = 100 fF
Transistor Sizing Assumes Rp (min) = 2 x Rn(min) CL B Rn A Rp Cint B 4 2 2 Assumes Rp = 2Rn 1 Assumes Rp (min) = 2 x Rn(min)
Transistor Sizing a Complex CMOS Gate • for symmetrical response (dc, ac) • for performance OUT = D + A • (B + C) D A B C 1 2 4 8 6 12 Assuming Rp(min) = 2Rn(min) Assuming Rp(min) = 3Rn(min) For class lecture. Red sizing assuming Rp = Rn Follow short path first; note PMOS for C and B 4 rather than 3 – average in pull-up chain of three – (4+4+2)/3 = 3 Also note structure of pull-up and pull-down to minimize diffusion cap at output (e.g., single PMOS drain connected to output) Green for symmetric response and for performance (where Rn = 3 Rp) Sizing rules of thumb PMOS = 3 * NMOS 1 in series = 1 2 in series = 2 3 in series = 3 etc. Focus on worst-case Input Dependent
4-input NAND Gate D C B A CL C3 C2 C1 Vdd Out GND A B C D
Fan-In Considerations B A CL C3 C2 C1 While output capacitance makes full swing transition (from VDD to 0), internal nodes only transition from VDD-VTn to GND C1, C2, C3 on the order of 0.85 fF for W/L of 0.5/0.25 NMOS and 0.375/0.25 PMOS CL of 3.2 fF with no output load (all diffusion capacitance – intrinsic capacitance of the gate itself). To give a 80.3 psec tpHL (simulated as 86 psec) Distributed RC model (Elmore delay) tpHL = 0.69 Reqn(C1+2C2+3C3+4CL) Propagation delay deteriorates rapidly as a function of fan-in – quadratically in the worst case.
tp as a Function of Fan-In and Fan-Out Fan-in: quadratic due to increasing resistance and capacitance Fan-out: each additional fan-out gate adds two gate capacitances to CL tp = a1FI + a2FI2 + a3FO a1 term is for parallel chain, a2 term is for serial chain, a3 is fan-out
tp as a Function of Fan-In tpHL quadratic tpLH tp Gates with a fan-in greater than 4 should be avoided. tp (psec) tpLH Fixed fan-out (NMOS 0.5 micrcon, PMOS 1.5 micron) tpLH increases linearly due to the linearly increasing value of the diffusion capacitance tpHL increase quadratically due to the simultaneous incrase in pull-down resistance and internal capacitance fan-in
tp as a Function of Fan-Out All gates have the same drive current. tpNOR2 tpNAND2 tpINV tp (psec) Slope is a function of “driving strength” slope is a function of the driving strength eff. fan-out
Propagation Delay Analysis (Example)
Numerical Examples for 0.25mm CMOS All NMOS: W = 0.5 m m, L = 0.25 m m R N = 13 k W /2 = 6.5 k W C1 = C2 = C3 = 0.85 fF CL (with FO = 1) = 3.2 fF t pHL = 0×69 (6.5 k W)(0.85 fF + 2 0.85 fF + 3 0.85 fF + 4 3.5 fF ) tpHL = 80ps
Fast Complex Gates: Design Technique 1 Transistor sizing as long as fan-out capacitance dominates Progressive sizing Distributed RC line M1 > M2 > M3 > … > MN (the fet closest to the output is the smallest) InN CL C3 C2 C1 In1 In2 In3 M1 M2 M3 MN M1 have to carry the discharge current from M2, M3, … MN and CL so make it the largest MN only has to discharge the current from MN (no internal capacitances) Can reduce delay by more than 20%; decreasing gains as technology shrinks
Fast Complex Gates: Design Technique 2 Transistor ordering critical path critical path 01 CL CL charged charged 1 In1 In3 M3 M3 1 C2 1 C2 In2 In2 M2 discharged M2 charged For lecture. Critical input is latest arriving signal Place latest arriving signal (critical path) closest to the output 1 C1 C1 In3 discharged In1 charged M1 M1 01 delay determined by time to discharge CL, C1 and C2 delay determined by time to discharge CL
Transistor Sizing and Ordering
Fast Complex Gates: Design Technique 3 Alternative logic structures F = ABCDEFGH Reduced fan-in -> deeper logic depth Reduction in fan-in offsets, by far, the extra delay incurred by the NOR gate (second configuration). Only simulation will tell which of the last two configurations is faster, lower power
Fast Complex Gates: Design Technique 4 Isolating fan-in from fan-out using buffer insertion CL CL Reduce CL on large fan-in gates, especially for large CL, and size the inverters progressively to handle the CL more effectively
Fast Complex Gates: Design Technique 5 Reducing the voltage swing linear reduction in delay also reduces power consumption But the following gate is much slower! Or requires use of “sense amplifiers” on the receiving end to restore the signal level (memory design) tpHL = 0.69 (3/4 (CL VDD)/ IDSATn ) = 0.69 (3/4 (CL Vswing)/ IDSATn )
Example: Full Adder Cout = Cin & (A | B) | (A & B) V DD V DD C B i A A B A B C in B V A DD X C in C in A S C in A B B V DD A B C A in C out B Cout = Cin & (A | B) | (A & B) Sum = !Cout & (A | B | Cin) | (A & B & Cin) 28 transistors
Static CMOS Full Adder Circuit !Sum = Cout & (!A | !B | !Cin) | (!A & !B & !Cin) !Cout = !Cin & (!A | !B) | (!A & !B) B A Cin !Cout !Sum 24 + 4 (for C and Sum inverter) transistor Full Adder No more than 3 transistors in series Loads: A-8, B-8, Cin-6, !Cout-2 Number of “gate delays” to Sum – 3? Cout = Cin & (A | B) | (A & B) Sum = !Cout & (A | B | Cin) | (A & B & Cin)
A Revised Adder Circuit 24 + 4 (for C and Sum inverter) transistor Full Adder No more than 3 transistors in series Loads: A-8, B-8, Cin-6, !Cout-2 Number of “gate delays” to Sum – 3?
Project 1
Design of Clock Driver Network Goal: To reduce Energy/transition for the given timing constraints · trise and tfall < 1000 psec · tskew < 50 psec The following design parameters are also given: · Vsupply = 2.5 V trise and tfall of Clkin = 0.5 nsec Tclk = 250 MHz Use scmos.mod technology Ignore wiring capacitance
Clock Skew fclk can be as high as 2.5 GHz Clock edge depends on the position
Clock Skew Formulation Maximum Clock Skew Determined by Minimum Delay between Latches Minimum Clock Period Determined by Maximum Delay between Latches
Clock Distribution H-Tree Network Only Relative Skew is Important
Clock Distribution Buffers Reduces absolute delay, and makes Power Down easier Sensitive to variations in Buffer Delay
Clock Distribution in Alpha Processor
Project Report Be concise and to the point (Limit of 4 pages) Demonstrate clearly that your claims are true Express your motivations and your reasoning Make sure to make it quantitative Be honest – we will check your spice files and run them!
Project Report Do not start with “optimization by simulation”. Think through the problem first and build a first-order analytical model to start >>provide quantitative estimates of performance. Comment on approach and important design decisions. Include sized transistor schematics. Explain why simulated results may differ from estimates.
References Chapters 5 & 9 » study the relevant sections Chapter 10 » study 10.3.3 section