Presentation is loading. Please wait.

Presentation is loading. Please wait.

ΗΜΥ 307 ΨΗΦΙΑΚΑ ΟΛΟΚΛΗΡΩΜΕΝΑ ΚΥΚΛΩΜΑΤΑ Εαρινό Εξάμηνο 2017 ΔΙΑΛΕΞΕΙΣ 12-13: Designing Dynamic and Static CMOS Sequential Circuits Other handouts To.

Similar presentations


Presentation on theme: "ΗΜΥ 307 ΨΗΦΙΑΚΑ ΟΛΟΚΛΗΡΩΜΕΝΑ ΚΥΚΛΩΜΑΤΑ Εαρινό Εξάμηνο 2017 ΔΙΑΛΕΞΕΙΣ 12-13: Designing Dynamic and Static CMOS Sequential Circuits Other handouts To."— Presentation transcript:

1 ΗΜΥ 307 ΨΗΦΙΑΚΑ ΟΛΟΚΛΗΡΩΜΕΝΑ ΚΥΚΛΩΜΑΤΑ Εαρινό Εξάμηνο ΔΙΑΛΕΞΕΙΣ 12-13: Designing Dynamic and Static CMOS Sequential Circuits Other handouts To handout next time ΧΑΡΗΣ ΘΕΟΧΑΡΙΔΗΣ (ack: Prof. Mary Jane Irwin and Vijay Narayanan) [Προσαρμογή από “Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.”]

2 Review: How to Choose a Logic Style
Must consider ease of design, robustness (noise immunity), area, speed, power, system clocking requirements, fan-out, functionality, ease of testing 4-input NAND Style # Trans Ease Ratioed? Delay Power Comp Static 8 1 no 3 CPL* 12 + 2 2 4 domino 6 + 2 2 + clk DCVSL* 10 yes * Dual Rail Current trend is towards an increased use of complementary static CMOS: design support through DA tools, robust, more amenable to voltage scaling. Current trend is towards an increased use of complementary static CMOS - tools driven that emphasis optimization at the logic level rather than the circuit level and that put a premium on robustness. Static CMOS is also more amenable to voltage scaling than some of the other approaches. CPL – Complementary Pass-Transistor Logic Dynamic Cascade Voltage Swing (Logic)

3 Sequential Logic – REVIEW
Inputs Outputs Combinational Logic Current State Next State Registers State Output is a function of the inputs AND the current state Have already discussed how to design the combinational logic part – now need to focus on designing the state registers Mealy (both the input and the current state is used to determine the output) versus Moore state machines (output depends only on the current state). Moore machines have potential implementation advantages in speed and size – speed because the control outputs, which are needed early in the clock cycle, do not depend upon the inputs. Moore machine disadvantages are that they may require additional states. The State Register shown is “edge-triggered” (can tell by the >). In fact, its positive edge triggered, i.e., changes output state when the clock edge goes from 0 to 1 (can tell by the absence of a o) clock

4 Timing Metrics clock time In Out data stable output tsu thold tc-q
tsetup – time that the data inputs (D) must be valid before the clock transition (0 ti 1 transition for a positive edge-triggered device) thold is the time that the data inputs must remain valid after the clock edge Tc-q is the worst case propagation delay (with reference to the clock edge) – time to copy D to Q

5 System Timing Constraints
Inputs Outputs Combinational Logic Current State Next State Registers State T (clock period) clock contamination delay - minimum delay of the combinational logic or register Thus, it is important to minimize the values of the timing parameters associated with the register. In modern high-performance systems, the register propagation delay and set-up times account for a significant portion of the clock period. E.g., DEC Alpha EV6 has a maximum logic depth of 12 gates and the register overhead accounts for about 15% of the clock period. Hold time becomes and issue then there is little logic between registers or when the clocks at different registers are somewhat out of phase due to clock skew. Modern machines are characterized by a very-low logic depth and, in fact, the register propagation delay and setup times account for a significant portion of the clock period. E.g., DEC EV6 has a maximum logic depth of 12 gates and the register overhead accounts for approx. 15% of the clock period. tcdreg + tcdlogic  thold T  tc-q + tplogic + tsu

6 Static vs Dynamic Storage
Static storage preserve state as long as the power is on have positive feedback (regeneration) with an internal connection between the output and the input useful when updates are infrequent (clock gating) Dynamic storage store state on parasitic capacitors only hold state for short periods of time (milliseconds to nanoseconds) require periodic refresh usually simpler, so higher speed and lower power clock gating - conditional clocks - where the clock is turned off for unused modules to save on power

7 Latches vs Flipflops Latches Flipflops (edge-triggered)
level sensitive circuit that passes inputs to Q when the clock is high (or low) - transparent mode input sampled on the falling edge of the clock is held stable when clock is low (or high) - hold mode Flipflops (edge-triggered) edge sensitive circuits that sample the inputs on a clock transition positive edge-triggered: 0  1 negative edge-triggered: 1  0 built using latches (e.g., master-slave flipflops) Positive latch - can also have negative latch with flip definitions Should insert a slide with figure 7.3 after this one?

8 Review: The Regenerative Property
Vo1 Vi2 Vo2 cascaded inverters If the gain in the transient region is larger than 1, only A and B are stable operation points. C is a metastable operation point. A Vi2 = Vo1 C bistability principle - a circuit having two stable states that represent 0 and 1 Consider just two inverters - VTC of first inverter and second inverter (later plot is rotated to accentuate that Vi2 = Vo1). Resulting circuit has just three possible operation points (A, B and C). A small deviation from the bias point C (e.g., from noise) is amplified and regenerated around the circuit loop until either point A or B is reached. A and B are stable operation points. At these points, the loop gain is much smaller than unity. B Vi1 = Vo2

9 Bistable Circuits Vi1 The cross-coupling of two inverters results in a bistable circuit (a circuit with two stable states) Vi2 Have to be able to change the stored value by making A (or B) temporarily unstable by increasing the loop gain to a value larger than 1 done by applying a trigger pulse at Vi1 or Vi2 the width of the trigger pulse need be only a little larger than the total propagation delay around the loop circuit (twice the delay of an inverter) Two approaches used cutting the feedback loop (mux based latch) overpowering the feedback loop (as used in SRAMs) cutting the feedback loop is the most popular in today’s latches

10 Review (from ECE 210): SR Latch
Q !Q memory 1 set reset disallowed S !Q Q R

11 Review (from CSE 210): Clocked D Latch
Q !Q clock clock D Latch Q D transparent mode clock hold mode

12 MUX Based Latches Change the stored value by cutting the feedback loop
1 Q Q D D 1 clk clk Negative Latch Positive Latch Nonratioed latch – sizing of the devices affects performance and is not critical to functionality Q = !clk & Q | clk & D Q = clk & Q | !clk & D transparent when the clock is low transparent when the clock is high

13 TG MUX Based Latch Implementation
Q D clk !clk input sampled (transparent mode) !clk clk clk D Latch Q D Positive latch – latch is transparent (D is copied to Q) when clk is high (bottom transmission gate is on) clk load is two transistors (and two for !clk) = clock load of 4 Also have the problem of having to generate both clk and !clk (nonoverlapping clocks) feedback (hold mode)

14 PT MUX Based Latch Implementation
clk !Q Q D input sampled (transparent mode) !clk Reduced clock load, but threshold drop at output of pass transistors so reduced noise margins and performance !clk clk Positive latch – latch is transparent (D is copied to Q) when clk is high Have reduced clock load by replacing transmission gates with pass transistors, but that impacts both noise margin and switching performance and causes static power dissipation due to threshold drop. clk load is one transistors (and one for !clk) = clock load of 2 Still have the problem of having to generate both clk and !clk feedback (hold mode)

15 Which value of B is stored?
Latch Race Problem Combinational Logic B B B’ Registers State clk Which value of B is stored? clk Two-sided clock constraint So, a solution to the latch race problem is to design with edge-triggered (master-slave) devices T  tc-q + tplogic + tsu Thigh  tc-q + tcdlogic

16 Master Slave Based ET Flipflop
clock D FF Q D 1 1 Q Q 1 1 QM QM D D clk clk clk Slave Slave D Master Master clk = 0 transparent hold On low phase of clock, master is transparent and D input is passed to master stage output QM. Slave is in hold mode, keeping its previous value using feedback. During the rising edge of the clock, the master stops sampling and goes into hold mode, the slave starts sampling coping QM to its output. The value of Q is the value of D right before the rising edge of the clock achieving a POSITIVE EDGE TRIGGERED effect. Can build a negative edge triggered by switching the order of master and slave (master positive, slave negative) QM clk = 01 hold transparent Q ET – Edge Triggered

17 MS ET Implementation Slave Master Q D clk QM I1 I2 I3 I4 I5 I6 T2 T1
For class handout !clk clk

18 MS ET Implementation Slave Master Q D clk QM I1 I2 I3 I4 I5 I6 T2 T1
master transparent slave hold For lecture Note that !clk is generated locally 20 transistors (plus clock inverter) – 8 clock loads (4 on clk and 4 on !clk) (can ignore the buffer inverter overhead since it can be amortized over multiple register bits) master hold slave transparent !clk clk

19 MS ET Timing Properties
Assume propagation delays are tpd_inv and tpd_tx, that the contamination delay is 0, and that the inverter delay to derive !clk is 0 Set-up time - time before rising edge of clk that D must be valid Propagation delay - time for QM to reach Q Hold time - time D must be stable after rising edge of clk - For class handout

20 MS ET Timing Properties
Assume propagation delays are tpd_inv and tpd_tx, that the contamination delay is 0, and that the inverter delay to derive !clk is 0 Set-up time - time before rising edge of clk that D must be valid Propagation delay - time for QM to reach Q Hold time - time D must be stable after rising edge of clk 3 * tpd_inv + tpd_tx tpd_inv + tpd_tx For lecture set-up - how long before the rising edge does D have to be stable such that QM samples the value reliably? - D has to propagate through I1, T1, I3 and I2 before the rising edge to ensure that the node voltages on both terminals of T2 are the same value. prop delay - since the delay of I2 is included in the set-up time, the output of I4 is valid before the rising edge of clk, so the delay is simply the delay through T3 and I6 hold time - since T1 turns off when the clock goes high, any changes in D after clk goes high are not seen, so hold time is 0 zero

21 More Precise Setup Time

22 Setup/Hold Time Illustrations
Circuit before clock arrival (Setup-1 case)

23 Setup/Hold Time Illustrations
Circuit before clock arrival (Setup-1 case)

24 Setup/Hold Time Illustrations
Circuit before clock arrival (Setup-1 case)

25 Setup/Hold Time Illustrations
Circuit before clock arrival (Setup-1 case)

26 Setup/Hold Time Illustrations
Circuit before clock arrival (Setup-1 case)

27 Setup/Hold Time Illustrations
Hold-1 case

28 Setup/Hold Time Illustrations
Hold-1 case

29 Setup/Hold Time Illustrations
Hold-1 case

30 Setup/Hold Time Illustrations
Hold-1 case

31 Setup/Hold Time Illustrations
Hold-1 case

32 Set-up Time Simulation
Q tsetup = 0.21 ns QM Volts D clk I2 out progressively skew the input wrt to the clock edge until the circuit fails. works correctly Time (ns)

33 Set-up Time Simulation
Q I2 out tsetup = 0.20 ns Volts D clk QM the clock is enabled before the nodes on both sides of the transmission gate T2 settle to the same value fails Time (ns)

34 Propagation Delay Simulation
tc-q(LH) = 160 psec Volts tc-q(LH) tc-q(HL) = 180 psec tc-q(HL) hold time - D input edge is skewed relative to the clock signal until the circuit fails propagation delay – delay is measured from the 50% point of the clk edge to the 50% point of the Q output Time (ns)

35 Reduced Load MS ET FF Clock load per register is important since it directly impacts the power dissipation of the clock network. Can reduce the clock load (at the cost of robustness) by making the circuit ratioed clk !clk I1 I3 QM T1 T2 Q D I2 reverse conduction I4 !clk clk 12 transistors with clock load of 4 (2 on clk and 2 on !clk) – but now ratioed design (and thus less robust) T1 must overpower I2 to switch the state of the cross-coupled inverters I1 and I2. But want to use minimum (or close to it) transistors in T1 and T2 to keep clock load small (to reduce power dissipation in flipflops and in the clock distribution network). Thus, probably want to downsize the transistors in I1 (making them longer and thus weaker). Another problem is reverse conduction is possible – second stage can affect the state of the first latch. When slave (second latch) is on, it is possible for a combination of T2 and I4 to influence the data stored in I1&I2. As long as I4 is a weak device, this is not a problem to switch the state of the master, T1 must be sized to overpower I2 to avoid reverse conduction, I4 must be weaker than I1

36 Non-Ideal Clocks Non-ideal clocks Ideal clocks clock skew 1-1 overlap
!clk clk Ideal clocks Non-ideal clocks clock skew 0-0 overlap 1-1 overlap Clk and !clk are never perfect inversions of one another – must generate !clk and route both signals (variations can exist in the wires used to route the two clock signals and load capacitances can vary) Clock skew can result in clock overlap

37 Example of Clock Skew Problems
!clk Q clk A !Q P1 P3 I3 I4 D I1 I2 B P4 P2 clk !clk Race condition – direct path from D to Q during the short time when both clk and !clk are high (1-1 overlap) Undefined state – both B and D are driving A when clk and !clk are both high When clock goes high, slave should go into hold mode. But since clk and !clk are both high for a short period of time there is a direct path from D to Q. So data output could change on rising edge (not this is a negative et device!). Race condition where value of Q is a function of whether the input D arrives at node X before or after the falling edge of !clk. Node A is driven by both D and B when clk and !clk are both high resulting in an undefined state Dynamic storage – when clk and !clk are both low (0-0 overlap)

38 Pseudostatic Two-Phase ET FF
X clk2 Q clk1 A !Q P1 P3 I3 I4 D I1 I2 B P4 P2 clk1 clk2 dynamic storage master transparent slave hold Keep clock nonoverlap time large enough that no overlap occurs even in the presence of clock skew During the nonoverlap time, the ff is in the high-impedance state – the feedback loop is open (the loop gain is zero) and the input is disconnected. Leakage will destroy the state if this condition holds for too long – hence the name pseudostatic (the register employs a combination of static and dynamic storage approaches depending upon the state of the clock). Don’t want to stop the clocks when both are low!! clk1 master hold slave transparent tnon_overlap clk2

39 Two Phase Clock Generator
clk clk1 clk2 A B clk A B clk1 clk2

40 Power PC Flipflop clk !clk !clk clk 1 D Q 1 1 !clk clk
1 1 !clk clk For class handouts !clk clk

41 Power PC Flipflop master transparent master hold slave hold
Q clk !clk 1 1 1 0 0 For lecture 16 transistors with a clock load of 8 (4 on clk and 4 on !clk) – fast and static master transparent slave hold master hold slave transparent !clk clk

42 Ratioed CMOS Clocked SR Latch
off on M2 M4 1 Q !Q clk M8 clk M6 M1 M3 on off R S M7 M5 1 off on For class handout

43 Ratioed CMOS Clocked SR Latch
off  on on  off M2 M4 1  0 Q 1  !Q off->on off->on clk M8 clk 0  1 0  1 M6 M1 M3 on  off off  on R S M7 M5 1 off on For lecture - 8 transistor SR level sensitive latch - two clock loads (sized) No static power consumption, but … Ratioed device where sizing is critical to ensure proper functionality For the case shown, M7 and M8 must succeed in bringing Q low (overcoming M4) - below the threshold of M1 Therefore, must increase the sizes of transistors M5,M6,M7, and M8

44 Sizing Issues so W/L5and6 > 3 !Q (Volts) W/L5and6
Want VM at Vdd/2 Assuming Q=0, determine the minimum sizes of M5, M6, M7, and M8 to make the device switchable so the individual device ration for M5 or M6 must be larger than approx. 6. Analysis results give 2.26 (instead of 3) since it doesn’t take into account channel length modulation and DIBL (drain induced barrier loading). W/L5and6 W/L2and4 = 1.5m/0.25 m W/L1and3 = 0.5m/0.25 m

45 Transient Response SET !Q Q Q & !Q (Volts) tc-!Q tc-Q Time (ns)
tp!Q = 120 psec tpQ = 230 psec Time (ns)

46 6 Transistor CMOS SR Latch
clk S R clk clk M2 M4 Q M6 S !Q M5 R Problems with noise margins and static power consumption due to threshold drop across pass transistors Once again, sizing is important - especially M5 and M6 Will see this structure again when we talk about SRAMs!! M1 M3

47 Review: Sequential Definitions
Static versus dynamic storage static uses a bistable element with feedback (regeneration) and thus preserves its state as long as the power is on static is preferred when updates are infrequent (clock gating) dynamic stores state on parasitic capacitors so only holds the state for a period of time (milliseconds) and requires periodic refresh dynamic is usually simpler (fewer transistors), higher speed, lower power Latch versus flipflop latches are level sensitive with two modes: transparent - inputs are passed to Q and hold - output stable fliplflops are edge sensitive that only sample the inputs on a clock transition Dynamic storage requires periodic refresh of the value. Reading the value of the stored signal from a capacitor without disrupting the charge requires the availability of a device with a high input impedance

48 Dynamic ET Flipflop master slave T1 T2 I1 I2 Q QM D C1 C2 !clk clk
tsu = thold = tc-q = !clk clk For class handout

49 Dynamic ET Flipflop master slave T1 T2 I1 I2 Q QM D C1 C2 !clk clk
tsu = thold = tc-q = tpd_tx zero master transparent slave hold 2 tpd_inv + tpd_tx !clk clk C1 is the gate cap of I1, the junction cap of T1 and the overlap gate cap of T1 8 transistors, so very efficient tsetup is delay of the transmission gate (time it takes C1 to sample D input) hold time is zero since T1 is turned off on the clock edge so further input changes are ignored tpFF is two inverter delays plus the delay of T2 Remember – dynamic nodes (C1 and C2) only hold their state so long, so ff has to be refreshed periodically to prevent state loss due to charge leakage master hold slave transparent

50 Dynamic ET FF Race Conditions
!clk clk QM T1 I1 T2 I2 D Q C1 C2 clk !clk 0-0 overlap race condition toverlap0-0 < tT1 +tI1 + tT2 clk clock overlap leads to race conditions 1-1 race fixed by enforcing a hold time - data must be stable during the high-high overlap period 0-0 race fixed by making sure there is enough delay between D and C2 so that new data sampled by the master does not propagate to the slave (can be ensured by enforcing appropriate setup time) !clk 1-1 overlap race condition toverlap1-1 < thold

51 Dynamic Two-Phase ET FF
clk1 clk2 QM T1 I1 T2 I2 D Q C1 C2 !clk1 !clk2 master transparent slave hold clk1 Keep clock nonoverlap time large enough that no overlap occurs even in the presence of clock skew But now have 4 clock signals to route! tnon_overlap clk2 master hold slave transparent

52 Pseudostatic Dynamic Latch
Robustness considerations limit the use of dynamic FF’s coupling between signal nets and internal storage nodes can inject significant noise and destroy the FF state leakage currents cause state to leak away with time internal dynamic nodes don’t track fluctuations in VDD that reduces noise margins A simple fix is to make the circuit pseudostatic !clk D adding a weak feedback inverter to each latch comes at a slight cost in delay (adds to the capacitive load) and power consumption, but it improves noise immunity significantly clk Add above logic added to all dynamic latches

53 C2MOS (Clocked CMOS) ET Flipflop
A clock-skew insensitive FF clk !clk QM C1 C2 Q D M1 M3 M4 M2 M6 M8 M7 M5 Master Slave For class handout !clk clk

54 C2MOS (Clocked CMOS) ET Flipflop
A clock-skew insensitive FF clk !clk QM C1 C2 Q D M1 M3 M4 M2 M6 M8 M7 M5 Master Slave on off on off master transparent slave hold For lecture Positive edge-triggered MS flipflop, just like the one two slides ago (and again only 8 transistors and 4 clock loads), however with one important difference A C2MOS flipflp with clk and !clk clocking is insensitive to clock overlap as long as the rise and fall times of the clock edges are sufficiently small !clk clk master hold slave transparent

55 C2MOS FF 0-0 Overlap Case Clock-skew insensitive as long as the rise and fall times of the clock edges are sufficiently small M2 M6 M4 M8 QM Q D C1 C2 M1 M5 Does any new data sampled during the overlap window propagate to Q (race)? New data is sampled on QM, but cannot propagate to Q since M7 is off (slave is in hold). Any new data sampled on the falling clock edge is not seen at Q For clocking on left – at the end of the overlap period !clk = 1 and both M7 and M8 turn off, putting the slave stage in the hold mode For the clocking on the right – at the end of the overlap period clk = 1 and both M3 and M4 turn off, putting the master in the hold mode (affects setup time as well) Means that the FF is slower (slower tc-q time) !clk clk !clk clk

56 C2MOS FF 1-1 Overlap Case QM Q D 1 C1 1 C2 !clk clk !clk clk
Does any new data sampled during the overlap window (right after the clock goes high) propagate to Q (race)? New data is sampled on QM, but cannot propagate to Q since M8 is off (slave is in hold). Any new data sampled on the falling clock edge is not seen at Q A bit more problematic than 0-0 overlap. Must enforce a hold time on D, so that D changing that makes it to QM is not copied to Q when overlap time is over (and !clk goes to zero turning on M8) - first clocking condition. By imposing a hold time on D - that D must be stable during clock overlap - overcome this problem as well However, if the rise/fall times of the clock are sufficiently slow, have possible race. Works correctly as long as the clock rise/fall times is smaller than approximately five times the propagation delay of the flipflop. 1-1 overlap constraint toverlap1-1 < thold

57 C2MOS Transient Response
For a 0.1 ns clock QM(3) Q(3) Volts Q(0.1) clk(0.1) For a 3 ns clock (race condition exists) clk(3) For slow clocks, potential for a race condition exists Time (nsec)

58 What are the constraints on F and G?
Pipelining using C2MOS M2 M6 M2 clk !clk clk M4 M8 M4 F G Out In C1 clk C2 !clk M3 M7 !clk M3 C3 M1 M5 M1 Positive edge-triggered MS flipflop, just like the one two slides ago (and again only 8 transistors), however with one important difference A C2MOS flipflp with clk and !clk clocking is insensitive to clock overlap as long as the rise and fall times of the clock edges are sufficiently small NORA Logic What are the constraints on F and G?

59 Example Need to redo

60 NORA CMOS Modules Need to redo

61 True Single Phase Clocked (TSPC) Latches
Negative Latch Positive Latch Q clk clk In In clk clk Q Uses only a single clock – so no clock overlap (skew) to worry about; also reduced clock load Transparent mode is equivalent to two cascaded inverters (latch is non-inverting) hold when clk = 1 transparent when clk = 0 transparent when clk = 1 hold when clk = 0

62 Embedding Logic in TSPC Latch
clk A Q B PUN Q In clk clk PDN Can embed logic into latch (or ff) - reduces the delay overhead associated with the latch. Set-up time increased, but overall performance improved: the increase in the set-up time is typically smaller than the delay of an AND gate. E.g., using minimum size devices set-up of AND latch is 140 psec. Using the conventional approach of AND gate followed by latch has an effective set-up time of 600 psec. Technique used extensively in the design of the EV4 DEC Alpha microprocessor and many other high performance processors.

63 TSPC ET FF clk D Master Slave Q QM For class handout clk

64 TSPC ET FF clk D Master Slave Q QM on off on off master transparent
slave hold For lecture Clock load of 4 transistors (similar to transmission gate or C2MOS) but only one clock to drive and route (12 transistors as compared to 8 in the previous two designs) Virtually all constraints removed - no clocks to overlap, no race Warning - similar to C2MOS, TSPC malfunctions when the slope of the clock is not sufficiently steep. Slow clock cause both the NMOS and PMOS clocked transistors to be on simultaneously, resulting in undefined values of the states and race conditions. Clock slopes thus must be carefully engineered. If necessary, local buffers must be introduced to ensure the quality of the clock signal. master hold slave transparent clk

65 Simplified TSPC ET FF clk D Q X QM clk M1 M2 M3 M6 M5 M4 M7 M8 M9
For class handout clk

66 Simplified TSPC ET FF on off  D clk D Q X QM on off  1  !D clk M1
master transparent slave hold Positive edge triggered - ask class why! Still clock load of 4 transistors (similar to transmission gate or C2MOS) but only one clock to drive and route, and now only 9 (or 11 if really need Q not !Q) transistors (as compared to 8 in previous two) When clk=0, the input inverter is sampling D onto X, the second (dynamic inverter) is in the precharge mode so Y is 1, and the third inverter is in hold mode (so Q is stable). On the rising edge of the clock, the middle inverter evaluates and since the third inverter is sampling when clk=1 the output Q goes to its new state. On the positive edge of the clock, note that the node X transitions to a low if D is high. Therefore, the input must be kept stable until the value on node X before the rising edge of the clock propagates to Y – hold time of the register (less than 1 inverter delay since it takes 1 inverter delay for the input to affect node X). Propagation delay is essentially three inverters since the value on node X must propagate to output Q Set-up time is the time for node X to be valid – one inverter delay clk master hold slave transparent

67 Sizing Issues in Simplified TSPC ET FF
clk !Qmod Transistor sizing Original width M4, M5 = 0.5m M7, M8 = 2m Modified width M4, M5 = 1m M7, M8 = 1m !Qorig Volts Qorig Sizing is critical – with improper sizing glitches may occur due to race condition when the clock transitions from low to high. When clk transitions from low to high, nodes Y and !Q start to discharge simultaneously (case for D low). Once Y is sufficiently low, the trend on !Q reverses. Note glitch (red case) and also reduces contamination delay. Can fix by resizing (note green case) so that the relative strengths of the pull-down paths of the second and third inverter let Y discharge faster than !Q Qmod Time (nsec)

68 Split-Output TSPC Latches
Negative Latch Positive Latch Q A In clk clk In A Q transparent when clk = 1 hold when clk = 0 hold when clk = 1 transparent when clk = 0 Also called split-output latches - reduces clock load by half (to two for a ff composed of a positive-negative latch pair). Downside is not all node voltages in the latch experience full logic swing due to threshold drop. E.g., for positive latch when D=0 and clk=1, A=Vdd-Vth (Also limits the amount of Vdd scaling possible with this latch). When In = 0, A = VDD - VTn When In = 1, A = | VTp |

69 Split-Output TSPC ET FF
clk D QM clk Q Which edge-triggered? Now clock load of only 2 transistors and 8+2 transistors clk

70 Pulsed FF (AMD-K6) Pulse registers - a short pulse (glitch clock) is generated locally from the rising (or falling) edge of the system clock and is used as the clock input to the flipflop race conditions are avoided by keeping the transparent mode time very short (during the pulse only) advantage is reduced clock load; disadvantage is substantial increase in verification complexity 1/0 ON/ OFF 0/Vdd ON/OFF 1 OFF ON clk D Q M1 M2 M3 M4 M5 M6 P1 P2 P3 X !clkd ON Vdd OFF 1 When the clock is low, M3 and M6 are off, and P1 is on precharging node X. And the output node Q is decoupled from X so is in hold mode. !clkd is a delayed inverted version of clk. On the rising edge of clk, M3 and M6 turn on while M1 and M4 stay on for a short period. During this period the ff is transparent and the input data D is sampled by the ff. Once !clkd goes low, node X is decoupled from the input and is either held or starts to precharge to Vdd by PMOS device P2. On the falling edge of the clock, node X is held at Vdd and the output is held stable by the cross-coupled inverters. Note that the one-shot (pulse) is integrated into the register. The transparency period determines the hold time. The window must be wide enough for the input data to propagate to Q. Note also that the set-up time can be NEGATIVE (if the transparency window is longer than the delay from input to output). This is attractive, as data can arrive at the register even after the clock goes high, meaning that time can be borrowed from the previous cycle. OFF

71 Sense Amp FF (StrongArm SA100)
Sense amplifier (circuits that accept small swing input signals and amplify them to full rail-to-rail signals) flipflops advantages are reduced clock load and that it can be used as a receiver for reduced swing differential buses 1 clk D Q !Q M1 M2 M3 M5 M6 M4 M9 M7 M8 M10 1 1 1 Sense amplifier based 1 1 1

72 Flipflop Comparison Chart
Name Type #clk ld #tr tset-up thold tpFF Mux Static 8 (clk-!clk) 20 3tpinv+tptx tpinv+tptx PowerPC 16 2-phase Ps-Static 8 (clk1-clk2) T-gate Dynamic 4 (clk-!clk) 8 tptx to1-1 2tpinv+tptx C2MOS TSPC 4 (clk) 11 tpinv 3tpinv S-O TSPC 2 (clk) 10 AMD K6 5 (clk) 19 SA 100 SenseAmp 3 (clk)

73 Choosing a Clocking Strategy
Choosing the right clocking scheme affects the functionality, speed, and power of a circuit Two-phase designs + robust and conceptually simple - need to generate and route two clock signals - have to design to accommodate possible skew between the two clock signals Single phase designs + only need to generate and route one clock signal + supported by most automated design methodologies + don’t have to worry about skew between the two clocks - have to have guaranteed slopes on the clock edges


Download ppt "ΗΜΥ 307 ΨΗΦΙΑΚΑ ΟΛΟΚΛΗΡΩΜΕΝΑ ΚΥΚΛΩΜΑΤΑ Εαρινό Εξάμηνο 2017 ΔΙΑΛΕΞΕΙΣ 12-13: Designing Dynamic and Static CMOS Sequential Circuits Other handouts To."

Similar presentations


Ads by Google