Presentation is loading. Please wait.

Presentation is loading. Please wait.

University of California Davis

Similar presentations


Presentation on theme: "University of California Davis"— Presentation transcript:

1 University of California Davis
Clocked Storage Elements for High-Performance and Low-Power Systems ICCD 2001 Tutorial Vojin G. Oklobdzija University of California Davis Integration Corp. Berkeley, CA 94708

2 Prof. V.G. Oklobdzija, University of California
Outline Importance of Clocked Storage Elements (CSE) Basic Definitions Difference between Latch and Flip-Flop Timing and Power metrics Representative designs used in High-Performance Microprocessors Comparison Conclusion, New Directions and Some novel designs 9/16/2018 Prof. V.G. Oklobdzija, University of California

3 Importance of Clocked Storage Elements (CSE)
9/16/2018 Prof. V.G. Oklobdzija, University of California

4 Trends in high-performance systems: Higher clock frequency
9/16/2018 Prof. V.G. Oklobdzija, University of California

5 Prof. V.G. Oklobdzija, University of California
Power vs. Year High-end growing at 25% / year 12% / yr 15% / yr Consumer (low-end) At 13% / year 9/16/2018 Prof. V.G. Oklobdzija, University of California

6 Prof. V.G. Oklobdzija, University of California
Predictions Source: Shekhar Borkar, Intel 9/16/2018 Prof. V.G. Oklobdzija, University of California

7 Recent Interest in Clocked Storage Elements
Trends in high-performance systems Higher clock frequency: 1.8GHz Pentium 4 4GHz logic presented) More transistors on chip (214 million, ISSCC 2001) Consequences Increased Flip-Flop overhead relative to cycle time Pipeline depth of 20 or more Cycle time FO4 delays, F-F overhead FO4 9/16/2018 Prof. V.G. Oklobdzija, University of California

8 Courtesy: Doug Carmean, Hot-Chips-13 presentation
9/16/2018 Prof. V.G. Oklobdzija, University of California

9 Processor Frequency Trend
Source: Intel S. Borkar Frequency doubles each generation Number of gates/clock reduce by 25% 9/16/2018 Prof. V.G. Oklobdzija, University of California

10 Pentium 3 uArchitecture
stage stage stage logic register logic register logic register Delay: 0.6 ? 0.3 ? 0.6 ? 0.3 ? 0.6 ? 0.3 ? The total delay from pipeline stage to pipeline stage is 0.9 ns. The maximum clock rate for this design is 1.1 GHz. 9/16/2018 Prof. V.G. Oklobdzija, University of California

11 The Pentium 4 Depends on Pipelines
logic register logic register logic register logic register logic register logic register Delay: 0.4? 0.16? 0.4? 0.16? 0.4? 0.16? 0.4? 0.16? 0.4? 0.16? 0.4? 0.16? The total delay from pipeline stage to pipeline stage is 560 pS. This design, with twice the stages, has a maximum clock rate of 1.8 GHz. As the design is broken into more pipeline stages, the logic in each stage has less delay, and the registers between stages consume a higher percentage of the delay, causing diminishing returns. At some point the cost of adding more stages, such as branch prediction, causes a very marginal return. The only way out of this bottleneck is a faster register. This is one reason why the P4 is not significantly faster than a slower-clocked P3 for many applications. 9/16/2018 Prof. V.G. Oklobdzija, University of California

12 Courtesy: Doug Carmean, Hot-Chips-13 presentation
9/16/2018 Prof. V.G. Oklobdzija, University of California

13 Why Interest in Clocked Storage Elements ?
Higher impact of storage element delay High-speed requires low CSE pipeline overhead: 3 FO4 or less. Logic embedding property Limits on performance FF delays of 10pS - 100pS Higher impact of clock skew Ability to control both edges of the clock Higher power consumption >100W for recent processors Clock system burns up to 40%, storage elements up to 20% of total power Battery-powered applications 9/16/2018 Prof. V.G. Oklobdzija, University of California

14 Prof. V.G. Oklobdzija, University of California
Basic Definitions 9/16/2018 Prof. V.G. Oklobdzija, University of California

15 Prof. V.G. Oklobdzija, University of California
Clock Signals Clocks are defined as pulsed, synchronizing signals that provide the time reference for the movement of data in the synchronous digital system. The clocking in a digital system can be either single phase, or multi-phase (usually two-phase). Clocking strategy is dependent and largely influenced by the choice of the CSE: latch or flip-flop The dark rectangles in the figure represent the interval during which the bi-stable element samples its data input. Fig. 4.2 shows the possible types of clocking techniques and corresponding general finite-state machine structures: 9/16/2018 Prof. V.G. Oklobdzija, University of California

16 Clock Signal Uncertainty
Effects on cycle- time: – maximum delay restriction – violation of set- up time May cause race – minimum delay restriction – violation of hold time Uncertainty is: Jitter, Skew, and Duty Cycle 9/16/2018 Prof. V.G. Oklobdzija, University of California

17 Prof. V.G. Oklobdzija, University of California
Jitter • Uncertainty in consecutive edges of a periodic signal • Caused by temporal noise events • Quantified as: – cycle-to-cycle or short-term jitter, tJS – long-term jitter, tJL 9/16/2018 Prof. V.G. Oklobdzija, University of California

18 Prof. V.G. Oklobdzija, University of California
Clock Skew Time difference between temporally-equivalent or concurrent edges of two periodic signals Caused by spatial noise events 9/16/2018 Prof. V.G. Oklobdzija, University of California

19 Prof. V.G. Oklobdzija, University of California
Clocking Strategies Single-phase clocking and single latch machine Edge-triggered clocking and Flip-Flop based machine 9/16/2018 Prof. V.G. Oklobdzija, University of California

20 Prof. V.G. Oklobdzija, University of California
Clocking Strategies Two-phase clocking and two-phase latch machine with single latch Two-phase clocking and two-phase latch machine with double latch 9/16/2018 Prof. V.G. Oklobdzija, University of California

21 Prof. V.G. Oklobdzija, University of California
Delay Restrictions Clock defines hard boundaries for edge-triggered design Clock boundaries are soft for level sensitive clocking and they are: Tolerant for clock edge uncertainty Tolerant to uncertainty of data arrival Timing slack can voluntarily be passed forward Time can forcefully be borrowed *Taken from Hamid Partovi’s ISSCC-2000 GHz Processor Design Workshop presentation 9/16/2018 Prof. V.G. Oklobdzija, University of California

22 Single-Phase Clocking, Single Latch: Timing Constraints
9/16/2018 Prof. V.G. Oklobdzija, University of California

23 Two-Phase Clocking with Two-Phase Double Latch
9/16/2018 Prof. V.G. Oklobdzija, University of California

24 Two-Phase Clocking with One-Phase Double Latch
Some people refer to this clocking arrangement as: “negative edge Flip-Flop” – erroneously ! 9/16/2018 Prof. V.G. Oklobdzija, University of California

25 Difference between Latch and Flip-Flop
9/16/2018 Prof. V.G. Oklobdzija, University of California

26 Difference between Latch and Flip-Flop
After the transition of the clock data can not change Latch is “transparent” 9/16/2018 Prof. V.G. Oklobdzija, University of California

27 Flip-Flop and M-S Latch Arrangement
How can one recognize the difference without knowing what is inside the “black-box” ? 9/16/2018 Prof. V.G. Oklobdzija, University of California

28 F-F and M-S Latch: Difference
Experiment: 9/16/2018 Prof. V.G. Oklobdzija, University of California

29 F-F and M-S Latch: Difference
Structural Difference: No Clock Flip-Flop M-S Latch 9/16/2018 Prof. V.G. Oklobdzija, University of California

30 Prof. V.G. Oklobdzija, University of California
Flip-Flop vs. Latch Edge sensitive Easier to use as frequency increases Robustness to duty cycle Simpler logic timing requirements Fits into CAD tools Level sensitive May consume less power for the operation Better clock skew/jitter characteristics More difficult clock requirements Choice between use of FF or latch is subject to each individual design and its specifications Flip-flops are edge sensitive - simpler timing requirements and lower sensitivity to duty cycle imperfections Latches are level sensitive, simpler - less power consumption and better clock skew/jitter characteristics 9/16/2018 Prof. V.G. Oklobdzija, University of California

31 Prof. V.G. Oklobdzija, University of California
Flip-Flop: Example HLFF (Partovi) 9/16/2018 Prof. V.G. Oklobdzija, University of California

32 Prof. V.G. Oklobdzija, University of California
Flip-Flop: Example HLFF (Partovi) 9/16/2018 Prof. V.G. Oklobdzija, University of California

33 Pulse-Based Flip-Flops*
*Taken from Hamid Partovi’s ISSCC-2000 GHz Processor Design Workshop presentation 9/16/2018 Prof. V.G. Oklobdzija, University of California

34 Prof. V.G. Oklobdzija, University of California
Flip-Flop: Example D=0 pulse D=1 SAFF DEC Alpha 21264 9/16/2018 Prof. V.G. Oklobdzija, University of California

35 Requirements in the Flip-Flop Design
Small Clk-Output delay, Narrow sampling window Low power Small clock load High driving capability (increased levels of parallelism) Typical load ranges from 3-4 FO4 to FO4. High driving should be achieved by inserting inverters and following “logical effort” rules starting with minimal size CSE. Symmetry: balanced D-Q and D-Q/not delay. Integration of logic into the flop Multiplexed or clock scan Cross-talk insensitivity - dynamic/high impedance nodes are affected 9/16/2018 Prof. V.G. Oklobdzija, University of California

36 Timing and Power metrics
9/16/2018 Prof. V.G. Oklobdzija, University of California

37 Prof. V.G. Oklobdzija, University of California
Delay Sum of setup time U and Clk-Q delay is the only true measure of the performance with respect to the system speed T = TClk-Q + TLogic + Tsetup+ Tskew TClk-Q TSetup TLogic 9/16/2018 Prof. V.G. Oklobdzija, University of California

38 Delay vs. Setup/Hold Times
9/16/2018 Prof. V.G. Oklobdzija, University of California

39 Timing Characteristics
Figure presenting typical clock-to-output and data-to-output characteristics is shown.. In stable region, clock-to-output characteristic is constant. As setup requirement of the device starts to be violated, clock-to-output curve rises, ending in failure at some point. Data-to-output characteristic, being simple sum of clock-to-output and data-to-clock time, falls with the slope of 45° in stable region. In metastable region, the slope starts to decrease as a function of increased clock-to-output characteristic. Minimum of data-to-output curve occurs at 45 ° slope of clock-to-output curve. Data-to-clock time that corresponds to this point is termed optimal setup time. 9/16/2018 Prof. V.G. Oklobdzija, University of California

40 Timing parameters, details
The best point to pick on delay curve is minimum D-Q 9/16/2018 Prof. V.G. Oklobdzija, University of California

41 Simulation Condition and Testbench
Power Data activity dependence as a FF characteristics Consumption with 50% (30%)activity adopted as a figure of merit Dissipation of driving inverters is part of total power consumption In order to perform evaluation and comparison of flip-flops, simulation conditions and testbench for simulations are defined. They are set according to flip-flop characterization presented earlier. Measurement of power consumption is done with several different input activities; power consumption with input activity of 50% is adopted as a figure of merit. Total dissipation includes dissipation of driving inverters 9/16/2018 Prof. V.G. Oklobdzija, University of California

42 Simulation Condition and Testbench
Timing Total FF overhead is setup + clock-to-output time Circuit optimization towards td-q Clock skew robustness obtained from observing DQ curve Power-Delay Product Overall performance parameter at fixed frequency Circuit delay parameter used for evaluation is data-to-output time. Circuits are optimized towards this parameter. Ultimate performance parameter is power-delay product, measured at fixed clock frequency. It is calculated as a product of data-to-output time and total power consumption measured at optimal-setup time 9/16/2018 Prof. V.G. Oklobdzija, University of California

43 Flip-Flop Performance Comparison
Test bench Total power consumed internal power data power clock power Measured for four cases no activity (0000… and 1111…) maximum activity ( ) average activity (random sequence) Delay is (minimum D-Q) Clk-Q + setup time 9/16/2018 Prof. V.G. Oklobdzija, University of California

44 The sources of internal power consumption
9/16/2018 Prof. V.G. Oklobdzija, University of California

45 Design & optimization tradeoffs
Opposite Goals Minimal Total power consumption Minimal Delay Power-Delay tradeoff Minimize Power-Delay product (PDPtot) 9/16/2018 Prof. V.G. Oklobdzija, University of California

46 Clocked Storage Elements in High-Performance Microprocessors
9/16/2018 Prof. V.G. Oklobdzija, University of California

47 Prof. V.G. Oklobdzija, University of California
Master-Slave Latches Positive setup times Two clock phases: distributed globally generated locally Small penalty in delay for incorporating MUX Some circuit tricks needed to reduce the overall delay 9/16/2018 Prof. V.G. Oklobdzija, University of California

48 PowerPC 603 M-S Latch Combination
Used in PowerPC family Low-power High speed Big clock load Easily embedded scan function Our simulations show PowerPC 603 (Gerosa, JSSC 12/94) Small internal power consumption Low-power feedback Double the clock load compared with other latches Locally generated second phase (reduces overall clock load) 9/16/2018 Prof. V.G. Oklobdzija, University of California

49 Prof. V.G. Oklobdzija, University of California
mC2MOS M-S Latch Small clock load (local clock buffering) Low-power feedback Big positive setup time Robustness to clock slope, unlike classic C2MOS structure Our simulations show Y. Suzuki, “Clocked CMOS Calculator Circuitry”, IEEE J. Solid-State Circuits, Dec. 1973 9/16/2018 Prof. V.G. Oklobdzija, University of California

50 Prof. V.G. Oklobdzija, University of California
Advanced Flip-Flops 9/16/2018 Prof. V.G. Oklobdzija, University of California

51 Prof. V.G. Oklobdzija, University of California
21264 Flip-Flop Used in Digital's WD21264 high-performance processor Runs at 600MHz 450pS Clk-Q delay, simulated in 0.35u technology Our simulations show Small clock load High internal power consumption S-R latch ruins the speed by 40% Dynamic nodes, potential hazard in low-power applications 9/16/2018 Prof. V.G. Oklobdzija, University of California

52 Prof. V.G. Oklobdzija, University of California
Strong Arm 110 Flip-Flop Used in SA W low-power processor Runs at 200MHz One transistor more than flip-flop 450ps Clk-Q delay, simulated in 0.35u CMOS technology Our simulations show Additional transistor provides fully static operation (robustness to leakage currents) essential for low-power applications, but slightly increased internal power consumption 9/16/2018 Prof. V.G. Oklobdzija, University of California

53 Prof. V.G. Oklobdzija, University of California
Flip-Flops First stage is a pulse generator generates a pulse (glitch) on a rising edge of the clock Second stage is a latch captures the pulse generated in the first stage Pulse generation results in a negative setup time Frequently exhibit a soft edge property Must check for hold time violations Note: power is always consumed in the clocked pulse generator 9/16/2018 Prof. V.G. Oklobdzija, University of California

54 Prof. V.G. Oklobdzija, University of California
Partovi’s HLFF Hybrid Latch-Flip-Flop combination 280pS Clk-Q delay Negative set-up time of pS Robustness to clock skew and fast clocking Our simulations show AMD K-6, Partovi, ISSCC’96 Hybrid design Gains speed (negative setup time) robustness to clock skew Drawbacks sensitivity to clock slope relatively high internal power (due to precharge) 9/16/2018 Prof. V.G. Oklobdzija, University of California

55 Hybrid Latch Flip-Flop
Skew absorption Partovi et al, ISSCC’96 9/16/2018 Prof. V.G. Oklobdzija, University of California

56 Prof. V.G. Oklobdzija, University of California
HLFF Flip-Flop Flip-flop features: single phase clock edge triggered, on one clock edge Features: Soft clock edge property brief transparency, equal to 3 inverter delays negative setup time allows slack passing absorbs skew Hold time is comparable to HLFF delay minimum delay between flip-flops must be controlled Pseudo static Possible to incorporate logic 9/16/2018 Prof. V.G. Oklobdzija, University of California

57 Prof. V.G. Oklobdzija, University of California
K-6 Dual-Rail ETL Self-reset property Hybrid combination 260ps Clk-Q delay simulated in .35u CMOS technology negative setup time: -20ps small clock load Our simulations show Double-ended, precharge structure is the most power hungry (switching on all input combinations) Self-reset property increases power consumption drives succeeding fast domino stages Precharge increases speed 9/16/2018 Prof. V.G. Oklobdzija, University of California

58 Semi-Dynamic Flip-Flop
Hybrid combination used in UltraSPARC-III Very fast circuit ( 188ps Clk-Q delay .25u technology, 1.6V, 105oC ) Our simulations show F. Klass, VLSI Circuits’98 Negative setup time Feature of small penalty for embedded logic Relatively high internal power consumption and clock load 9/16/2018 Prof. V.G. Oklobdzija, University of California

59 Modified Sense Amplifier-Based Flip-Flop
Nikolic, Oklobdzija, Stojanovic, ISSCC, 1999 Delay of each of the outputs is independent of the load on the other output Delay of Q and Q is symmetrical as opposed to the NAND based design Convenient for dual rail logic and driving strength for standard CMOS is effectively doubled SAFF presents a small clock load, small setup time and all the advantages of original design Possible tradeoff between speed and robustness to cross-talk 9/16/2018 Prof. V.G. Oklobdzija, University of California

60 Modified Sense Amplifier-Based Flip-Flop
The first stage is unchanged sense amplifier Second stage is sized to provide maximum switching speed Driver transistors are large Keeper transistors are small and disengaged during transitions Nikolic, Oklobdzija, Stojanovic ISSCC ‘99 9/16/2018 Prof. V.G. Oklobdzija, University of California

61 New Sense Amplifier-Based Flip-Flop
New pulse-generating stage Inverters relocated to de-couple gates of MN3, MN4 MN5, MN6 provide leakage current paths Second stage is unchanged Nikolic, Oklobdzija, ESSCIRC’99 9/16/2018 Prof. V.G. Oklobdzija, University of California

62 New Sense Amplifier-Based Flip-Flop
Falling edge flip-flop Output stage has identical topology Nikolic, Oklobdzija, ESSCIRC’99 9/16/2018 Prof. V.G. Oklobdzija, University of California

63 Comparison with Other Flip-Flops
Delay vs. power comparison of different flip-flops Flip-flops are optimized for speed with output transistor sizes limited to 7.5m/4.3 m, driving 200fF Total transistor gate width is indicated Nikolic, Oklobdzija, ESSCIRC’99 70 60 TG M-S 52mm 50 Original SAFF 60mm HLFF 54mm 40 Total power [uW] THIS 30 WORK 69mm C 2 MOS 80mm 20 SDFF 49mm 10 100 150 200 250 300 350 400 450 500 Delay [ps] 9/16/2018 Prof. V.G. Oklobdzija, University of California

64 Prof. V.G. Oklobdzija, University of California
Overall results 9/16/2018 Prof. V.G. Oklobdzija, University of California

65 Comparison in terms of speed and PDPtot
Delay below 200ps SDFF ps HLFF ps K-6 ETL ps ps PowerPC latch ps 21264 Alpha FF ps Strong Arm FF ps mC2MOS latch ps above 500ps SSTC latch ps DSTC latch ps SSTC* latch ps DSTC* latch ps PDPtot below 30fJ PowerPC latch fJ fJ HLFF fJ SDFF fJ mC2MOS latch fJ 21264 Alpha FF fJ Strong Arm FF fJ fJ K-6 ETL fJ above 70fJ SSTC latch fJ DSTC latch fJ 9/16/2018 Prof. V.G. Oklobdzija, University of California

66 Prof. V.G. Oklobdzija, University of California
Delay comparison F-F design brings the fastest structures 9/16/2018 Prof. V.G. Oklobdzija, University of California

67 Prof. V.G. Oklobdzija, University of California
Delay comparison F-F design brings the fastest structures 9/16/2018 Prof. V.G. Oklobdzija, University of California

68 Overall ranking, zoomed
Real signals have the activity between 0 and 0.25 () Precharged hybrid structures are the fastest but their power consumption strongly depends on the probability of “ones” More “ones” above the  point 9/16/2018 Prof. V.G. Oklobdzija, University of California

69 Prof. V.G. Oklobdzija, University of California
Overall performance Real signals have the activity between 0 and 0.5 () Precharged hybrid structures are the fastest but their power consumption strongly depends on the probability of “ones” More “ones” above the  point 9/16/2018 Prof. V.G. Oklobdzija, University of California

70 Conventional Clk-Q vs. minimum D-Q
Hidden positive setup time Degradation of Clk-Q 9/16/2018 Prof. V.G. Oklobdzija, University of California

71 Internal Power distribution
Four sequences characterize the boundaries for internal power consumption …010101… maximum random, equal transition probability, average …111111… precharge activity …000000… leakage + internal clock processing 9/16/2018 Prof. V.G. Oklobdzija, University of California

72 Comparison of Clock power consumption
9/16/2018 Prof. V.G. Oklobdzija, University of California

73 Conclusion and New Directions
9/16/2018 Prof. V.G. Oklobdzija, University of California

74 Prof. V.G. Oklobdzija, University of California
New Directions Reducing CSE power: Using conditional pre-charge techniques Using conditional data capture techniques Reducing clock distribution network power: Capture data on each edge – Double Edge Triggered structure Improving CSE reliability: Fully derived CSE (ESSCIRC’99, ICCD 2000) 9/16/2018 Prof. V.G. Oklobdzija, University of California

75 Conditional Precharge Flip-Flop Circuit
Proposed flip-flop is shown. First stage employs the feedback from the output to disable the precharge and keep the internal node at the low level if Q is high <Mn4, Mp2>. Second stage implement conditional keeping function <Mn8, Mp3, Mp4> Nedovic, Oklobdzija, SBCCI 2000 9/16/2018 Prof. V.G. Oklobdzija, University of California

76 Prof. V.G. Oklobdzija, University of California
Conditional Capture Flip-Flop (Im-CCFF: Nedovic, Oklobdzija, ICECS 2001) Use conditional capture idea When Q=1, 1=>0 transition of X is prohibited To equalize 1=>0 and 0=>1 set-up times, the signal from the middle of the stack (Y) controls HL transition on Q Y is output of the first stage of domino-like inverter, obtained almost for free Easy logic embedding First stage has dynamic behavior only in transparency window Improved Conditional Capture Flip-Flop: First stage computes nodes X and Y. If CLK=1, D=1, and CLKbb=Q=0 (I.e. if D=1, Q=0 in transparency window), X evaluates to 0. Lower part of the stack is used for Y: Y=not(D) if clock is at high level (CLK=1). X is ‘conditional-capture signal’ with the activity equal to activity of D. Y has larger activity. Second stage uses both X and Y: If X=0 (i.e. D=1, Q=0 in the transparency window), Q is brought to high level. If Y=1 when CLKbb=1 (i.e. D=0 in transparency window), Q is brought to 0. CLKbb in second stage is used instead of CLK to leave time to Y to evaluate to 0 and remove hazard in second stage 9/16/2018 Prof. V.G. Oklobdzija, University of California

77 Power Consumption Comparison: Im-CCFF: Nedovic, Oklobdzija, ICECS-2001
SBCCI 2000 NOTE: Conditional flip-flops behave like MS latches with respect to input data activity 9/16/2018 Prof. V.G. Oklobdzija, University of California

78 Dual-Edge Triggered Flip-Flops
Structurally, two different designs are distinguished a) Latch-Mux (LM) b) Pulsed Latch (PL, flip-flop) Classification very similar to single edge triggered SE 9/16/2018 Prof. V.G. Oklobdzija, University of California

79 Prof. V.G. Oklobdzija, University of California
DETSE Overall Results 1 4 3 2 1 4 3 2 9/16/2018 Prof. V.G. Oklobdzija, University of California

80 Summary: Double-Edge Flip-Flops
PDP [fJ] PD2P [10-24 Js] Fujitsu 0.18m, wmin = 0.22m, wmax = 10m, le = 0.18m, fclk=250/500MHz, activity =0.5, VDD = 1.8V, Temp = 25º, load=14 min. inv Even ‘local’ performance of DETFFs (not considering power savings of clock distribution) is comparable to that of SETFFs Analogy between double edge flip-flops behavior and their single-edge counterparts 9/16/2018 Prof. V.G. Oklobdzija, University of California

81 SDFF improvement: Nedovic, Oklobdzija ICCD 2000
Eliminated glitch Avoided keeper overpowering Faster operation Improved power PDP improvement over SDFF about 27% (first version only 8% improvement Preserved Logic Embedding Property Achieved strong driving capability at the output More robust to scaling down supply voltage 0.25u bulk CMOS, VDD=2.5V, T=27 C, fclk=500MHz, load=14 min. inv’s 9/16/2018 Prof. V.G. Oklobdzija, University of California

82 New Sense Amplifier-Based Flip-Flop
New pulse-generating stage Inverters relocated to de-couple gates of MN3, MN4 MN5, MN6 provide leakage current paths Second stage is unchanged Nikolic, Oklobdzija, ESSCIRC’99 9/16/2018 Prof. V.G. Oklobdzija, University of California

83 Comparison with Other Flip-Flops
Delay vs. power comparison of different flip-flops Flip-flops are optimized for speed with output transistor sizes limited to 7.5m/4.3 m, driving 200fF Total transistor gate width is indicated Nikolic, Oklobdzija, ESSCIRC’99 70 60 TG M-S 52mm 50 Original SAFF 60mm HLFF 54mm 40 Total power [uW] THIS 30 WORK 69mm C 2 MOS 80mm 20 SDFF 49mm 10 100 150 200 250 300 350 400 450 500 Delay [ps] 9/16/2018 Prof. V.G. Oklobdzija, University of California

84 What to Expect in the Future ?
Important: Incorporating logic into the CSE Absorbing clock skew Quiet state (battery powered applications) Pipeline boundaries will start to blur CSE will be mixed with logic Waver pipelining, domino style, signals used to clock Synchronous design only in a limited domain Asynchronous communication between synchronous domains 9/16/2018 Prof. V.G. Oklobdzija, University of California

85 Modified Test Bench and PD2P Optimization

86 Prof. V.G. Oklobdzija, University of California
PDP, EDP Comparison SDFF is best; PowerPC and SAFF are competitive 9/16/2018 Prof. V.G. Oklobdzija, University of California

87 50%-Data-Activities -- 1GHz Clock -- PD2P Optimization
1.8VDD, 0.18um CMOS Technology 50%-Data-Activities -- 1GHz Clock -- PD2P Optimization 9/16/2018 Prof. V.G. Oklobdzija, University of California


Download ppt "University of California Davis"

Similar presentations


Ads by Google