Download presentation
Presentation is loading. Please wait.
Published byGabriel Bailey Modified over 6 years ago
1
CMOS VLSI Design Chapter 9: Combinational Circuits Chapter 10: Sequential Circuits
This work is protected by Canadian copyright laws and is provided solely for the use of instructors in teaching their courses and assessing student learning. Dissemination or sale of any part of this work (including on the Internet) will destroy the integrity of the work and is not permitted. The copyright holder grants permission to instructors who have adopted the textbook accompanying this work to post this material online only if the use of the website is restricted by access codes to students in the instructor's class that is using the textbook and provided the reproduced material bears this copyright notice. slides from David Harris adapted by Duncan Elliott Textbook: CMOS VLSI Design - A Circuits and Design Perspective, 4th Edition, N. H. E. Weste & D. Harris Chapters 9,10
2
Combinational Logic Bubble Pushing Compound Gates
Logical Effort Example Input Ordering Asymmetric Gates Skewed Gates Best P/N ratio Chapters 9,10
3
Example 1 1) Sketch a design using AND, OR, and NOT gates.
module mux(input s, d0, d1, output y); assign y = s ? d1 : d0; endmodule 1) Sketch a design using AND, OR, and NOT gates. Chapters 9,10
4
Example 2 2) Sketch a design using NAND, NOR, and NOT gates. Assume ~S is available. Chapters 9,10
5
Bubble Pushing Start with network of AND / OR gates
Convert to NAND / NOR + inverters Push bubbles around to simplify logic Remember DeMorgan’s Law Chapters 9,10
6
Example 3 3) Sketch a design using one compound gate and one NOT gate. Assume ~S is available. Chapters 9,10
7
Compound Gates Logical Effort of compound gates Chapters 9,10
8
Example 4 The multiplexer has a maximum input capacitance of 16 units on each input. It must drive a load of 160 units. Estimate the delay of the two designs. H = 160 / 16 = 10 B = 1 N = 2 Chapters 9,10
9
Example 5 Annotate designs with transistor sizes that achieve this delay. Chapters 9,10
10
Input Order Our parasitic delay model was too simple
Calculate parasitic delay for Y falling If A arrives latest? 2t If B arrives latest? 2.33t Chapters 9,10
11
Inner & Outer Inputs Inner input is closest to output (A)
Outer input is closest to rail (B) If input arrival time is known Connect latest input to inner terminal Mental Exercise: How can you make the delays equal? Chapters 9,10
12
Asymmetric Gates Asymmetric gates favor one input over another
Ex: suppose input A of a NAND gate is most critical Use smaller transistor on A (less capacitance) Boost size of noncritical input So total resistance is same gA = 10/9 gB = 2 gtotal = gA + gB = 28/9 Asymmetric gate approaches g = 1 on critical input But total logical effort goes up Chapters 9,10
13
Symmetric Gates Inputs can be made perfectly symmetric Chapters 9,10
14
Skewed Gates Skewed gates favor one edge over another
Ex: suppose rising output of inverter is most critical Downsize noncritical nMOS transistor Calculate logical effort by comparing to unskewed inverter with same effective resistance on that edge. gu = 2.5 / 3 = 5/6 gd = 2.5 / 1.5 = 5/3 Chapters 9,10
15
HI- and LO-Skew Def: Logical effort of a skewed gate for a particular transition is the ratio of the input capacitance of that gate to the input capacitance of an unskewed inverter delivering the same output current for the same transition. Skewed gates reduce size of noncritical transistors HI-skew gates favor rising output (small nMOS) LO-skew gates favor falling output (small pMOS) Logical effort is smaller for favored direction But larger for the other direction Chapters 9,10
16
Catalog of Skewed Gates
Chapters 9,10
17
Asymmetric Skew Combine asymmetric and skewed gates
Downsize noncritical transistor on unimportant input Reduces parasitic delay for critical input Chapters 9,10
18
Best P/N Ratio We have selected P/N ratio for equal rise and fall resistance (m = 2-3 for an inverter non-idealities). Alternative: choose ratio for least average delay Ex: inverter Delay driving identical inverter tpdf = (P+1) tpdr = (P+1)(m/P) tpd = (P+1)(1+m/P)/2 = (P m + m/P)/2 dtpd / dP = (1- m/P2)/2 = 0 Least delay for P = Chapters 9,10
19
P/N Ratios In general, best P/N ratio is sqrt of equal delay ratio.
Only improves average delay slightly for inverters But significantly decreases area and power Chapters 9,10
20
Observations For speed: NAND vs. NOR
Many simple stages vs. fewer high fan-in stages Latest-arriving input For area and power: Chapters 9,10
21
More Circuit Families Pseudo-nMOS Logic Dynamic Logic
Pass Transistor Logic Chapters 9,10
22
Introduction What makes a circuit fast?
I = C dV/dt -> tpd (C/I) DV low capacitance high current small swing Logical effort is proportional to C/I pMOS are the enemy! High capacitance for a given current Can we take the pMOS capacitance off the input? Various circuit families try to do this… Chapters 9,10
23
Pseudo-nMOS In the old days, nMOS processes had no pMOS
Instead, use pull-up transistor that is always ON In CMOS, use a pMOS that is always ON Ratio issue Make pMOS about ¼ effective strength of pulldown network Chapters 9,10
24
Pseudo-nMOS Gates Design for unit current on output
to compare with unit inverter. pMOS fights nMOS Chapters 9,10
25
Pseudo-nMOS Gates Design for unit current on output
to compare with unit inverter. pMOS fights nMOS Chapters 9,10
26
Pseudo-nMOS Design Ex: Design a k-input AND gate using pseudo-nMOS. Estimate the delay driving a fanout of H G = 1 * 8/9 = 8/9 F = GBH = 8H/9 P = 1 + (4+8k)/9 = (8k+13)/9 N = 2 D = NF1/N + P = Chapters 9,10
27
Pseudo-nMOS Power Pseudo-nMOS draws power whenever Y = 0
Called static power P = IDDVDD A few mA / gate * 1M gates would be a problem Explains why nMOS went extinct Use pseudo-nMOS sparingly for wide NORs Turn off pMOS when not in use Chapters 9,10
28
Ratio Example The chip contains a 32 word x 48 bit ROM
Uses pseudo-nMOS decoder and bitline pullups On average, one wordline and 24 bitlines are high Find static power drawn by the ROM Ion-p = 36 mA, VDD = 1.0 V Solution: Chapters 9,10
29
Dynamic Logic Dynamic gates uses a clocked pMOS pullup
Two modes: precharge and evaluate Chapters 9,10
30
The Foot What if pulldown network is ON during precharge?
Use series evaluation transistor to prevent fight. Chapters 9,10
31
Logical Effort Chapters 9,10
32
Monotonicity Dynamic gates require monotonically rising inputs during evaluation 0 -> 0 0 -> 1 1 -> 1 But not 1 -> 0 Chapters 9,10
33
Monotonicity Woes But dynamic gates produce monotonically falling outputs during evaluation Illegal for one dynamic gate to drive another! Chapters 9,10
34
Domino Gates Follow dynamic stage with inverting static gate
Dynamic / static pair is called domino gate Produces monotonic outputs Chapters 9,10
35
Domino Optimizations Each domino gate triggers next one, like a string of dominos toppling over Gates evaluate sequentially but precharge in parallel Thus evaluation is more critical than precharge HI-skewed static stages can perform logic Chapters 9,10
36
Dual-Rail Domino Domino only performs noninverting functions:
AND, OR but not NAND, NOR, or XOR Dual-rail domino solves this problem Takes true and complementary inputs Produces true and complementary outputs sig_h sig_l Meaning Precharged 1 ‘0’ ‘1’ invalid Chapters 9,10
37
Example: AND/NAND Given A_h, A_l, B_h, B_l Compute Y_h = AB, Y_l = AB
Pulldown networks are conduction complements Chapters 9,10
38
Example: XOR/XNOR Sometimes possible to share transistors
Chapters 9,10
39
Leakage Dynamic node floats when high during evaluation
Transistors are leaky (IOFF 0) Dynamic value will leak away over time Formerly milliseconds, now nanoseconds Use keeper to hold dynamic node Must be weak enough not to fight evaluation Chapters 9,10
40
Charge Sharing Dynamic gates suffer from charge sharing Chapters 9,10
41
Secondary Precharge Solution: add secondary precharge transistors
Typically need to precharge every other node Big load capacitance CY helps as well Chapters 9,10
42
Noise Sensitivity Dynamic gates are very sensitive to noise
Inputs: VIH Vtn Outputs: floating output susceptible noise Noise sources Capacitive crosstalk Charge sharing Power supply noise Feedthrough noise And more! Chapters 9,10
43
Power Domino gates have high activity factors
Output evaluates and precharges If output probability = 0.5, a = 0.5 Output rises and falls on half the cycles Large clock load, a = 1 But, reduced Cin Leads to high power consumption Chapters 9,10
44
Chapters 9,10
45
Chapters 9,10
46
Chapters 9,10
47
Domino Summary Domino logic is attractive for high-speed circuits
1.3 – 2x faster than static CMOS But many challenges: Monotonicity, leakage, charge sharing, noise Has been widely used in high-performance microprocessors In many-core microprocessors, heat dissipation can dominate performance When power is the main concern, static CMOS is preferred Chapters 9,10
48
Pass Transistor Circuits
Use pass transistors like switches to do logic Inputs drive diffusion terminals as well as gates CMOS + Transmission Gates: 2-input multiplexer Gates should be restoring Chapters 9,10
49
LEAP LEAn integration with Pass transistors
Get rid of pMOS transistors Use weak pMOS feedback to pull fully high Ratio constraint Chapters 9,10
50
CPL Complementary Pass-transistor Logic
Dual-rail form of pass transistor logic Optional cross-coupling for rail-to-rail swing Chapters 9,10
51
Pass Transistor Summary
Researchers investigated pass transistor logic for general purpose applications in the 1990’s Benefits over static CMOS were small or negative No longer generally used However, pass transistors still have a niche in special circuits Chapters 9,10
52
Outline Sequencing Sequencing Element Design Max and Min-Delay
Clock Skew Time Borrowing Two-Phase Clocking Chapters 9,10
53
Sequencing Combinational logic output depends on current inputs
Sequential logic output depends on current and previous inputs Requires separating previous, current, future Called state or tokens Ex: FSM, pipeline Chapters 9,10
54
Sequencing Cont. If tokens moved through pipeline at constant speed, no sequencing elements would be necessary Ex: fiber-optic cable Light pulses (tokens) are sent down cable Next pulse sent before first reaches end of cable No need for hardware to separate pulses But dispersion sets min time between pulses This is called wave pipelining in circuits In most circuits, dispersion is high Delay fast tokens so they don’t catch slow ones. Chapters 9,10
55
Sequencing Overhead Use flip-flops to delay fast tokens so they move through exactly one stage each cycle. Inevitably adds some delay to the slow tokens Makes circuit slower than just the logic delay Called sequencing overhead Some people call this clocking overhead But it applies to asynchronous circuits too Inevitable side effect of maintaining sequence Chapters 9,10
56
Sequencing Elements Latch: Level sensitive
a.k.a. transparent latch, D latch Flip-flop: edge triggered A.k.a. master-slave flip-flop, D flip-flop, D register Timing Diagrams Transparent Opaque Edge-trigger Chapters 9,10
57
Latch Design Pass Transistor Latch Pros + Tiny + Low clock load Cons
Vt drop nonrestoring backdriving output noise sensitivity dynamic diffusion input Used in 1970’s Chapters 9,10
58
Latch Design Transmission gate + No Vt drop - Requires inverted clock
Chapters 9,10
59
Latch Design Inverting buffer + Restoring + No backdriving
+ Fixes either Output noise sensitivity Or diffusion input Inverted output Chapters 9,10
60
Latch Design Tristate feedback + Static Backdriving risk
Static latches are now essential because of leakage for low clock rates Chapters 9,10
61
Latch Design Buffered input + Fixes diffusion input + Noninverting
Chapters 9,10
62
Latch Design Buffered output + No backdriving
Widely used in standard cells + Very robust (most important) Rather large Rather slow (1.5 – 2 FO4 delays) Chapters 9,10
63
Latch Design Datapath latch + smaller + faster - unbuffered input
Chapters 9,10
64
Flip-Flop Design Flip-flop is built as pair of back-to-back latches
Chapters 9,10
65
Enable Enable: ignore clock when en = 0 Mux: increase latch D-Q delay
Clock Gating: increase en setup time, skew, false edges Chapters 9,10
66
Reset Force output low when reset asserted
Synchronous vs. asynchronous Chapters 9,10
67
Set / Reset Set forces output high when enabled
Flip-flop with asynchronous set and reset Chapters 9,10
68
Sequencing Methods Flip-flops 2-Phase Latches Pulsed Latches
Chapters 9,10
69
Timing Diagrams Contamination and Propagation Delays tpd tcd tpcq tccq
Logic Prop. Delay tcd Logic Cont. Delay tpcq Latch/Flop Clk->Q Prop. Delay tccq Latch/Flop Clk->Q Cont. Delay tpdq Latch D->Q Prop. Delay tcdq Latch D->Q Cont. Delay tsetup Latch/Flop Setup Time thold Latch/Flop Hold Time Chapters 9,10
70
Max-Delay: Flip-Flops
Chapters 9,10
71
Max Delay: 2-Phase Latches
Chapters 9,10
72
Max Delay: Pulsed Latches
Chapters 9,10
73
Min-Delay: Flip-Flops
Chapters 9,10
74
Min-Delay: 2-Phase Latches
Hold time reduced by nonoverlap Paradox: hold applies twice each cycle, vs. only once for flops. But a flop is made of two latches! Chapters 9,10
75
Min-Delay: Pulsed Latches
Hold time increased by pulse width Chapters 9,10
76
Time Borrowing In a flop-based system:
Data launches on one rising edge Must setup before next rising edge If it arrives late, system fails If it arrives early, time is wasted Flops have hard edges In a latch-based system Data can pass through latch while transparent Long cycle of logic can borrow time into next As long as each loop completes in one cycle Chapters 9,10
77
Time Borrowing Example
Chapters 9,10
78
How Much Borrowing? 2-Phase Latches Pulsed Latches Chapters 9,10
79
Clock Skew We have assumed zero clock skew
Clocks really have uncertainty in arrival time Decreases maximum propagation delay Increases minimum contamination delay Decreases time borrowing Chapters 9,10
80
Skew: Flip-Flops Chapters 9,10
81
Skew: Latches 2-Phase Latches Pulsed Latches Chapters 9,10
82
Two-Phase Clocking If setup times are violated, reduce clock speed
If hold times are violated, chip fails at any speed Working chips are most important tools to analyze clock skew An easy way to guarantee hold times is to use 2-phase latches with big or variable nonoverlap times Call these clocks f1, f2 (ph1, ph2) Chapters 9,10
83
Two-phase Clocking What if?
What if tnonoverlap > Tc/2? What if tnonoverlap ≥ a negative time? What if thold <0? Chapters 9,10
84
Adaptive Sequencing Designers include timing margin Voltage
Temperature Process variation Data dependency Tool inaccuracies Alternative: run faster and check for near failures Idea introduced as “Razor” Increase frequency until at the verge of error Can reduce cycle time by ~30% Chapters 9,10
85
Summary Flip-Flops: Very easy to use, supported by all tools
2-Phase Transparent Latches: Lots of skew tolerance and time borrowing Pulsed Latches: Fast, some skew tol & borrow, hold time risk Chapters 9,10
86
More Synchronous Fun Chapters 9,10
87
Wave pipelining When can we skip the pipeline registers?
Requires tight control of contamination delay (which many manufactures don't even specify) Advice – don't do it, but you may see more of this in the future Chapters 9,10
88
Calculating Probability of Metastability
t = required decision time, clock to stable output m = time constant of the rate of decay of metastability Tw = effective size of metastability window related to Tsetup + Thold of a D-flipflop fclock; fdata // asynchronous to clock MTBF = mean time between failure = ? (109hours = 114,155years = 1/(1FIT) is really good) Chapters 9,10
89
Calculating Probability of Metastability
There are multiple models for metastability. This is a simple one: Problem (180nm, 1.8V) m =21.7ps; Tw =770ps; fclock =1GHz; fdata =1MHz; t=100ps Chapters 9,10
90
Arbiters Synchronizers "arbitrate" whether data changed before or after the clock edge. Arbiters can arbitrate which signal arrived first e.g. DMA bus master arbitration Asynchronous arbiters are subject to metastability Chapters 9,10
91
Technology for handling multiple clock domains
DLL = Delay Lock Loop PLL = Phase Lock Loop Clock enable Clock multiplexor FIFOs, Dual-port memories Chapters 9,10
92
Synchronous design good practices
Use a single clock and clock on a single edge of that clock throughout system where possible (e.g. rising edge only) Exceptions to single clock: input timing forces use of multiple timing domains (asynchronous communications), power consumption, bus interfaces Avoid dividing down the clock unless your entire system can operate at this lower clock frequency Instead, generate a periodic enable signal fed to enabled D-FFs Use synchronizers on asynchronous inputs or signals from a different timing domain (different clock) A synchronizer can be implemented (for high speed clocks) as 2 or more D-FFs connected as a shift register This reduces the probability of metastability affecting internal state Use D-FFs/latches on the output of each architecture if possible (makes timing and behaviour predictable when assembling large numbers of components) Use Moore machines (output dependant on state) Cascading Mealy machines can create unbounded delays Chapters 9,10
93
…Synchronous design good practices
Don't gate clocks for individual flip-flops or use combinational circuits to drive edge-triggered inputs. Use D-FFs with enable (or implement with a MUX) Use proven circuits for decoupling clocks as required for power savings Don't drive asynchronous D-FF reset/preset inputs from local signals or synchronous signals Use a synchronous reset/preset - (AND or OR gates connected to D input) If clock skew may be an issue, route clock in opposite direction with respect to data to reduce hold time violations This works well for long shift registers and pipelines Follow local guidelines for testability Global reset of all D-FFs Design for test / built in self test Scan design, etc. Chapters 9,10
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.