Presentation is loading. Please wait.

Presentation is loading. Please wait.

University of California Davis

Similar presentations


Presentation on theme: "University of California Davis"— Presentation transcript:

1 University of California Davis
Clocked Storage Elements: Master-Slave Latches and Flip-Flops for High-Performance and Low-Power Systems Vojin G. Oklobdzija University of California Davis Integration Corp. Berkeley, CA 94708

2 Prof. V.G. Oklobdzija, University of California
Outline Recent interest and importance Timing and Power metrics Master-Slave vs. Flip-Flop Design and optimization tradeoffs Representative designs Comparison Some novel designs Conclusion 11/22/2018 Prof. V.G. Oklobdzija, University of California

3 Prof. V.G. Oklobdzija, University of California
11/22/2018 Prof. V.G. Oklobdzija, University of California

4 Recent Interest in Storage Elements
Trends in high-performance systems: Higher clock frequency 11/22/2018 Prof. V.G. Oklobdzija, University of California

5 Prof. V.G. Oklobdzija, University of California
Performance 3X / generation Source: ISSCC, uP Report, Hot-Chips 11/22/2018 Prof. V.G. Oklobdzija, University of California

6 Prof. V.G. Oklobdzija, University of California
Total transistors 3X / generation Logic transistors 2X / generation Source: ISSCC, uP Report, Hot-Chips 11/22/2018 Prof. V.G. Oklobdzija, University of California

7 Processor Design Challenges
Performance is tracking frequency increase Where are the transistors contributing ? 3X per generation growth in transistors seems to be uncompensated as far as performance is concerned 11/22/2018 Prof. V.G. Oklobdzija, University of California

8 Prof. V.G. Oklobdzija, University of California
Power versus Year High-end growing at 25% / year 12% / yr 15% / yr Consumer (low-end) At 13% / year 11/22/2018 Prof. V.G. Oklobdzija, University of California

9 Prof. V.G. Oklobdzija, University of California
Power Trend 100 x4 / 3years 10 Power (W) 1 0.1 0.01 80 85 90 95 Courtesy of Sakurai Sensei 11/22/2018 Prof. V.G. Oklobdzija, University of California

10 Gloom and Doom predictions
Source: Shekhar Borkar, Intel 11/22/2018 Prof. V.G. Oklobdzija, University of California

11 Prof. V.G. Oklobdzija, University of California
Recent Interest in Storage Elements Or Why Do Computer Architect Care ? Trends in high-performance systems Higher clock frequency (1.5GHz Pentium, 4GHz presented) More transistors on chip (214 million, ISSCC 2001) Consequences Increased Flip-Flop overhead relative to cycle time Pipeline depth of 20 or more Cycle time FO4 delays, Flop overhead FO4 11/22/2018 Prof. V.G. Oklobdzija, University of California

12 Processor Frequency Trend
Source: Intel S. Borkar Frequency doubles each generation Number of gates/clock reduce by 25% 11/22/2018 Prof. V.G. Oklobdzija, University of California

13 Traditional Pentium 3 uArchitecture
stage stage stage logic register logic register logic register Delay: 0.6 0.3 0.6 0.3 0.6 0.3 The total delay from pipeline stage to pipeline stage is 0.9 ns. The maximum clock rate for this design is 1.1 GHz. 11/22/2018 Prof. V.G. Oklobdzija, University of California

14 The Pentium 4 Depends on Pipelines
logic register logic register logic register logic register logic register logic register Delay: 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 The total delay from pipeline stage to pipeline stage is 0.6 ns. This design, with twice the stages, only has a maximum clock rate of 1.67 GHz. As the design is broken into more pipeline stages, the logic in each stage has less delay, and the registers between stages consume a higher percentage of the delay, causing diminishing returns. At some point the cost of adding more stages, such as branch prediction, causes a very marginal return. The only way out of this bottleneck is a faster register. This is one reason why the P4 is not significantly faster than a slower-clocked P3 for many applications. 11/22/2018 Prof. V.G. Oklobdzija, University of California

15 Recent Interest in Storage Elements
Difficult to control both edges of the clock Higher impact of clock skew Higher cross-talk and substrate coupling Higher power consumption Limits on performance Clock burns up to 40%, storage elements up to 20% of total power I have even seen 75% recently (ISSCC 2001) 11/22/2018 Prof. V.G. Oklobdzija, University of California

16 Solution: Faster Flip-Flops
We have developed a new fast register which can be fabricated using the standard microprocessor fabrication lines – several times faster than registers currently used. logic logic logic logic logic logic Delay: 0.3 0.04 0.3 0.04 0.3 0.04 0.3 0.04 0.3 0.04 0.3 0.04 The total delay from pipeline stage to pipeline stage is 0.34 ns. Using our design allows a maximum nominal clock rate of 2.9 GHz. Can you achieve this performance gain with architecture ? 11/22/2018 Prof. V.G. Oklobdzija, University of California

17 Clocked Storage Element Requirements
High speed High-frequency applications require low FF timing overhead Sub-nanosecond clock periods  x10ps - x100ps FF delays Low power Dissipation of >100W for recent processors Battery-supplied applications Size High clock imperfections robustness Logic embedding property Two major performance parameters: speed and power. Delay importance grows as the frequency of operation of the system applying state element increases. Nanosecond or sub-nanosecond clock periods can tolerate state element overhead of several of tens to at most few hundreds picoseconds. Low power consumption - another required feature. Recent processors use tens, or even hundreds of watts for operation - operate with as low power consumption as possible. Other parameters of interest comprise circuit’s size, skew and other clock imperfection robustness, logic embedding property etc. 11/22/2018 Prof. V.G. Oklobdzija, University of California

18 Prof. V.G. Oklobdzija, University of California
Clock Signals Clocks are defined as pulsed, synchronizing signals that provide the time reference for the movement of data in the synchronous digital system. The clocking in a digital system can be either single phase, or multi-phase (usually two-phase). Clocking strategy is dependent and largely influenced by the choice of the storage element: latch or flip-flop The dark rectangles in the figure represent the interval during which the bi-stable element samples its data input. Fig. 4.2 shows the possible types of clocking techniques and corresponding general finite-state machine structures: 11/22/2018 Prof. V.G. Oklobdzija, University of California

19 Clock Signal Uncertainty
Effects on cycle- time: – maximum delay restriction – violation of set- up time May cause race – minimum delay restriction – violation of hold time Uncertainty is: Jitter, Skew, and Duty Cycle 11/22/2018 Prof. V.G. Oklobdzija, University of California

20 Prof. V.G. Oklobdzija, University of California
Jitter • Uncertainty in consecutive edges of a periodic signal • Caused by temporal noise events • Quantified as: – cycle-to-cycle or short-term jitter, tJS – long-term jitter, tJL 11/22/2018 Prof. V.G. Oklobdzija, University of California

21 Prof. V.G. Oklobdzija, University of California
Clock Skew Time difference between temporally-equivalent or concurrent edges of two periodic signals Caused by spatial noise events 11/22/2018 Prof. V.G. Oklobdzija, University of California

22 Prof. V.G. Oklobdzija, University of California
Clocking Strategies Single-phase clocking and single latch machine Edge-triggered clocking and Flip-Flop based machine 11/22/2018 Prof. V.G. Oklobdzija, University of California

23 Prof. V.G. Oklobdzija, University of California
Clocking Strategies Two-phase clocking and two-phase latch machine with single latch Two-phase clocking and two-phase latch machine with double latch 11/22/2018 Prof. V.G. Oklobdzija, University of California

24 Prof. V.G. Oklobdzija, University of California
Delay Restrictions Clock defines hard boundaries for edge-triggered design Clock boundaries are soft for level sensitive clocking and they are: Tolerant for clock edge uncertainty Tolerant to uncertainty of data arrival Timing slack can voluntarily be passed forward Time can forcefully be borrowed *Taken from Hamid Partovi’s ISSCC-2000 GHz Processor Design Workshop presentation 11/22/2018 Prof. V.G. Oklobdzija, University of California

25 Single-Phase Clocking, Single Latch: Timing Constraints
11/22/2018 Prof. V.G. Oklobdzija, University of California

26 Two-Phase Clocking with Two-Phase Double Latch
11/22/2018 Prof. V.G. Oklobdzija, University of California

27 Two-Phase Clocking with One-Phase Double Latch
Some people refer to this clocking arrangement as: “negative edge Flip-Flop” – erroneously ! 11/22/2018 Prof. V.G. Oklobdzija, University of California

28 Difference between Latch and Flip-Flop
After the transition of the clock data can not change Latch is “transparent” 11/22/2018 Prof. V.G. Oklobdzija, University of California

29 Flip-Flop and M-S Latch Combination
How can one recognize the difference without knowing what is inside the “black-box” ? 11/22/2018 Prof. V.G. Oklobdzija, University of California

30 F-F and M-S Latch: Difference
Experiment: 11/22/2018 Prof. V.G. Oklobdzija, University of California

31 F-F and M-S Latch: Difference
Structural Difference: No Clock Flip-Flop M-S Latch 11/22/2018 Prof. V.G. Oklobdzija, University of California

32 Prof. V.G. Oklobdzija, University of California
Flip-Flop vs. Latch Edge sensitive Easier to use as frequency increases Robustness on duty cycle Simpler logic timing requirements Fits into CAD tools Level sensitive Consume less power for the operation Better clock skew/jitter characteristics Choice between use of FF or latch is subject to each individual design and its specifications Flip-flops are edge sensitive - simpler timing requirements and lower sensitivity to duty cycle imperfections Latches are level sensitive, simpler - less power consumption and better clock skew/jitter characteristics 11/22/2018 Prof. V.G. Oklobdzija, University of California

33 Prof. V.G. Oklobdzija, University of California
Flip-Flop: Example HLFF (Partovi) 11/22/2018 Prof. V.G. Oklobdzija, University of California

34 Prof. V.G. Oklobdzija, University of California
Flip-Flop: Example HLFF (Partovi) 11/22/2018 Prof. V.G. Oklobdzija, University of California

35 Pulse-Based Flip-Flops*
*Taken from Hamid Partovi’s ISSCC-2000 GHz Processor Design Workshop presentation 11/22/2018 Prof. V.G. Oklobdzija, University of California

36 Prof. V.G. Oklobdzija, University of California
Flip-Flop: Example D=0 pulse D=1 SAFF DEC Alpha 21264 11/22/2018 Prof. V.G. Oklobdzija, University of California

37 Requirements in the Flip-Flop Design
Small Clk-Output delay, Narrow sampling window Low power Small clock load High driving capability (increased levels of parallelism) Typical flip-flop load in a 0.18m CMOS ranges from 50fF to over 200fF, with typical values of fF in critical paths (rule of thumb number for cap Cgate=2fF/um ) Integration of logic into the flop Multiplexed or clock scan Crosstalk insensitivity - dynamic/high impedance nodes are affected 11/22/2018 Prof. V.G. Oklobdzija, University of California

38 State Element Characterization
Timing Propagation time (clock-to-output) Set-up time Hold time Skew amortization Power consumption Internal power Input power Two major parameters that determine state element’s performance are delay and power consumption. There are three defined timing parameters of state element: - Propagation time, or clock-to-output time, defined as a delay between active clock edge and subsequent output transition, - Setup time, defined as latest allowed arrival of the input with respect to clock, in order to properly capture it, - Hold time, similarly defined as earliest allowed arrival of the input with respect to clock in order to properly capture previous value Power consumption can be broken up into - Internal power, which is the portion of power consumed by the circuit itself, - Input power, which is the portion of the power consumption of driving circuits of the state element due to the presence of the state element; this part of power accounts for often overlooked contribution of the state element to the consumption of the clock distribution network 11/22/2018 Prof. V.G. Oklobdzija, University of California

39 Prof. V.G. Oklobdzija, University of California
Flip-Flop Delay Sum of setup time and Clk-output delay is the only true measure of the performance with respect to the system speed T = TClk-Q + TLogic + Tsetup+ Tskew TClk-Q TSetup TLogic 11/22/2018 Prof. V.G. Oklobdzija, University of California

40 Delay vs. Setup/Hold Times
11/22/2018 Prof. V.G. Oklobdzija, University of California

41 Timing Characteristics
Figure presenting typical clock-to-output and data-to-output characteristics is shown.. In stable region, clock-to-output characteristic is constant. As setup requirement of the device starts to be violated, clock-to-output curve rises, ending in failure at some point. Data-to-output characteristic, being simple sum of clock-to-output and data-to-clock time, falls with the slope of 45° in stable region. In metastable region, the slope starts to decrease as a function of increased clock-to-output characteristic. Minimum of data-to-output curve occurs at 45 ° slope of clock-to-output curve. Data-to-clock time that corresponds to this point is termed optimal setup time. 11/22/2018 Prof. V.G. Oklobdzija, University of California

42 Timing parameters, details
The best point to pick on delay curve is minimum D-Q 11/22/2018 Prof. V.G. Oklobdzija, University of California

43 Prof. V.G. Oklobdzija, University of California
Latch and Flip-Flop latencies (tDQ ) vs. Data-to-clock Set-up Time (tDC ) *Taken from Hamid Partovi’s ISSCC-2000 GHz Processor Design Workshop presentation 11/22/2018 Prof. V.G. Oklobdzija, University of California

44 Clock Skew Considerations
Need for characterization of Flip-Flop behavior in presence of skew/jitter Soft Clock Edge property only qualitatively describes skew immunity Still, designers calculate maximum useful time by incorporating all skew into clocking overhead 11/22/2018 Prof. V.G. Oklobdzija, University of California

45 Clock Skew Considerations
Real Skew Overhead Skew Overhead of Ideal Flip-Flop 11/22/2018 Prof. V.G. Oklobdzija, University of California

46 Clock Skew Considerations
11/22/2018 Prof. V.G. Oklobdzija, University of California

47 Clock Skew Considerations
Skew Rejection - ratio of total skew and its impact on FF overhead Shows how circuit reacts to clock edge uncertainty Helps answering the question to what point to optimize clock distribution network 11/22/2018 Prof. V.G. Oklobdzija, University of California

48 Simulation Condition and Testbench
Power Data activity dependence as a FF characteristics Consumption with 50% activity adopted as a figure of merit Dissipation of driving inverters is part of total power consumption In order to perform evaluation and comparison of flip-flops, simulation conditions and testbench for simulations are defined. They are set according to flip-flop characterization presented earlier. Measurement of power consumption is done with several different input activities; power consumption with input activity of 50% is adopted as a figure of merit. Total dissipation includes dissipation of driving inverters 11/22/2018 Prof. V.G. Oklobdzija, University of California

49 Simulation Condition and Testbench
Timing Total FF overhead is setup + clock-to-output time Circuit optimization towards td-q Clock skew robustness obtained from observing DQ curve Power-Delay Product Overall performance parameter at fixed frequency Circuit delay parameter used for evaluation is data-to-output time. Circuits are optimized towards this parameter. Ultimate performance parameter is power-delay product, measured at fixed clock frequency. It is calculated as a product of data-to-output time and total power consumption measured at optimal-setup time 11/22/2018 Prof. V.G. Oklobdzija, University of California

50 Flip-Flop Performance Comparison
Test bench Total power consumed internal power data power clock power Measured for four cases no activity (0000… and 1111…) maximum activity ( ) average activity (random sequence) Delay is (minimum D-Q) Clk-Q + setup time 11/22/2018 Prof. V.G. Oklobdzija, University of California

51 Prof. V.G. Oklobdzija, University of California
OLD TEST BENCH: Total Power = Drivers Power + Test Unit Power PDP- Optimized = Equal Trade-off on Power and Delay Improper Load on Drivers NEW TEST BENCH: Drivers: Fixed Gain and Driving Test Unit Only Data-to-Output Delay PD2P Optimized = Best for Constant-Field Scaling OLD TEST BENCH NEW TEST BENCH 11/22/2018 Prof. V.G. Oklobdzija, University of California

52 The sources of internal power consumption
11/22/2018 Prof. V.G. Oklobdzija, University of California

53 Design & optimization tradeoffs
Opposite Goals Minimal Total power consumption Minimal Delay Power-Delay tradeoff Minimize Power-Delay product (PDPtot) 11/22/2018 Prof. V.G. Oklobdzija, University of California

54 High-performance microprocessor latches and flip-flops
11/22/2018 Prof. V.G. Oklobdzija, University of California

55 Prof. V.G. Oklobdzija, University of California
21264 Flip-Flop Used in Digital's WD21264 high-performance processor Runs at 600MHz 450pS Clk-Q delay, simulated in 0.35u technology Our simulations show Small clock load High internal power consumption S-R latch ruins the speed by 40% Dynamic nodes, potential hazard in low-power applications 11/22/2018 Prof. V.G. Oklobdzija, University of California

56 Prof. V.G. Oklobdzija, University of California
Strong Arm 110 Flip-Flop Used in SA W low-power processor Runs at 200MHz One transistor more than flip-flop 450ps Clk-Q delay, simulated in 0.35u CMOS technology Our simulations show Additional transistor provides fully static operation (robustness to leakage currents) essential for low-power applications, but slightly increased internal power consumption 11/22/2018 Prof. V.G. Oklobdzija, University of California

57 Prof. V.G. Oklobdzija, University of California
Master-Slave Latches Positive setup times Two clock phases: distributed globally generated locally Small penalty in delay for incorporating MUX Some circuit tricks needed to reduce the overall delay 11/22/2018 Prof. V.G. Oklobdzija, University of California

58 T-G Master-Slave Latch
Low power feedback Unbuffered input input capacitance depends on the phase of the clock over-shoot and under-shoot with long routes wirelength must be restricted at the input Clock load is high Low power Small Clk-output delay, but positive setup Easily embedded scan or mux 11/22/2018 Prof. V.G. Oklobdzija, University of California

59 T-G Master-Slave Latch
PowerPC 603 (Gerosa, JSSC 12/94) 11/22/2018 Prof. V.G. Oklobdzija, University of California

60 PowerPC 603 M-S Latch Combination
Used in PowerPC family Low-power High speed Big clock load Easily embedded scan function Our simulations show Small internal power consumption Low-power feedback Double the clock load compared with other latches Locally generated second phase (reduces overall clock load) 11/22/2018 Prof. V.G. Oklobdzija, University of California

61 C2MOS M-S Latches (Suzuki ’73)
Low power feedback Locally generated second phase Poor driving capability Robustness to clock slope Y. Suzuki, “Clocked CMOS Calculator Circuitry”, IEEE J. Solid-State Circuits, Dec. 1973 11/22/2018 Prof. V.G. Oklobdzija, University of California

62 Prof. V.G. Oklobdzija, University of California
mC2MOS M-S Latch Small clock load (local clock buffering) Low-power feedback Big positive setup time robustness to clock slope, unlike classic C2MOS structure Our simulations show 11/22/2018 Prof. V.G. Oklobdzija, University of California

63 Prof. V.G. Oklobdzija, University of California
Advanced Flip-Flops 11/22/2018 Prof. V.G. Oklobdzija, University of California

64 Prof. V.G. Oklobdzija, University of California
Flip-Flops First stage is a pulse generator generates a pulse (glitch) on a rising edge of the clock Second stage is a latch captures the pulse generated in the first stage Pulse generation results in a negative setup time Frequently exhibit a soft edge property Must check for hold time violations Note: power is always consumed in the clocked pulse generator 11/22/2018 Prof. V.G. Oklobdzija, University of California

65 HLFF (Partovi’s) Flip-Flop
AMD K-6, Partovi, ISSCC’96 11/22/2018 Prof. V.G. Oklobdzija, University of California

66 Prof. V.G. Oklobdzija, University of California
HLFF Operation 1-0 and 0-1 transitions at the input with 0ps setup time 11/22/2018 Prof. V.G. Oklobdzija, University of California

67 Hybrid Latch Flip-Flop
Skew absorption Partovi et al, ISSCC’96 11/22/2018 Prof. V.G. Oklobdzija, University of California

68 Prof. V.G. Oklobdzija, University of California
Partovi’s HLFF Hybrid Latch-Flip-Flop combination 280pS Clk-Q delay Negative set-up time of pS Robustness to clock skew and fast clocking Our simulations show Hybrid design Gains speed (negative setup time) robustness to clock skew Drawbacks sensitivity to clock slope relatively high internal power (due to precharge) 11/22/2018 Prof. V.G. Oklobdzija, University of California

69 Prof. V.G. Oklobdzija, University of California
HLFF Flip-Flop Flip-flop features: single phase clock edge triggered, on one clock edge Features: Soft clock edge property brief transparency, equal to 3 inverter delays negative setup time allows slack passing absorbs skew Hold time is comparable to HLFF delay minimum delay between flip-flops must be controlled Pseudo static Possible to incorporate logic 11/22/2018 Prof. V.G. Oklobdzija, University of California

70 Prof. V.G. Oklobdzija, University of California
K-6 Dual-Rail ETL Self-reset property increases dynamic power drives domino logic Precharge increases speed Very fast but burns a lot of power Small clock load 11/22/2018 Prof. V.G. Oklobdzija, University of California

71 Flip- Flop Element of K6 (Partovi*)
*Taken from Hamid Partovi’s ISSCC-2000 GHz Processor Design Workshop presentation 11/22/2018 Prof. V.G. Oklobdzija, University of California

72 Pulsed Flip- Flop of K7 w/ Embedded MUX*
*Taken from Hamid Partovi’s ISSCC-2000 GHz Processor Design Workshop presentation 11/22/2018 Prof. V.G. Oklobdzija, University of California

73 Prof. V.G. Oklobdzija, University of California
K-6 Dual-Rail ETL Self-reset property Hybrid combination 260ps Clk-Q delay simulated in .35u CMOS technology negative setup time: -20ps small clock load Our simulations show Double-ended, precharge structure is the most power hungry (switching on all input combinations) Self-reset property increases power consumption drives succeeding fast domino stages Precharge increases speed 11/22/2018 Prof. V.G. Oklobdzija, University of California

74 Semi-Dynamic Flip-Flop (SDFF)
Sun UltraSparc III, Klass, VLSI Circuits’98 Soft edge conditioned by data since first stage is precharged - cross-coupled latch is added for robustness Small penalty for adding logic Latch has one transistor less in stack - faster than HLFF, but 1-1 glitch exists 11/22/2018 Prof. V.G. Oklobdzija, University of California

75 Semi-Dynamic Flip-Flop
Hybrid combination used in UltraSPARC-III Very fast circuit ( 188ps Clk-Q delay .25u technology, 1.6V, 105oC ) Our simulations show Negative setup time Feature of small penalty for embedded logic Relatively high internal power consumption and clock load 11/22/2018 Prof. V.G. Oklobdzija, University of California

76 SDFF with Improved Switching Delay Characteristic
11/22/2018 Prof. V.G. Oklobdzija, University of California

77 Performance with Scaled Supply Voltage SDFF with Improved Switching
11/22/2018 Prof. V.G. Oklobdzija, University of California

78 Svensson’s Family of Latches (he developed TSPC-FF)
11/22/2018 Prof. V.G. Oklobdzija, University of California

79 Single-Transistor-Clocked MS latches
DSTC SSTC Yuan and Svennson, JSSC Jan. ‘97 Ratioed DCVS and SRPL based designs Relatively small clock load Very sensitive to input glitching Back-gate coupling and charge sharing related speed and power problems 11/22/2018 Prof. V.G. Oklobdzija, University of California

80 Prof. V.G. Oklobdzija, University of California
SSTC latch Fully Static According to Svensson exhibits 360ps worst case Clk-Q delay, laid out in 0.8u single-poly CMOS process Our simulations show No significant clock power savings compared to the rest of the latches Excessively large setup time due to the minimized Master latch Both Master and Slave latch have to be optimized for speed and power 11/22/2018 Prof. V.G. Oklobdzija, University of California

81 Prof. V.G. Oklobdzija, University of California
DSTC latch Double-ended, dynamic latch 350ps Clk-Q delay, according to Svensson Our simulations show Problems with output dynamic node Capacitive-coupling Charge sharing Delay even worse than in SSTC latch due to the capacitive-coupling of dynamic drive node 11/22/2018 Prof. V.G. Oklobdzija, University of California

82 Modified Sense Amplifier-Based Flip-Flop
Nikolic, Oklobdzija, Stojanovic, ISSCC, 1999 Delay of each of the outputs is independent of the load on the other output Delay of Q and Q is symmetrical as opposed to the NAND based design Convenient for dual rail logic and driving strength for standard CMOS is effectively doubled SAFF presents a small clock load, small setup time and all the advantages of original design Possible tradeoff between speed and robustness to cross-talk 11/22/2018 Prof. V.G. Oklobdzija, University of California

83 Sense-amplifier-based flip-flop
Matsui et al DEC Alpha 21264, StrongARM 110 First stage is a sense amplifier On rising clock edge monotonic S_b or R_b trigger the S-R latch Cross-coupled NAND - speed bottleneck Big power savings in reduced swing designs Nice interface to/from domino logic 11/22/2018 Prof. V.G. Oklobdzija, University of California

84 Modified Sense Amplifier-Based Flip-Flop
The first stage is unchanged sense amplifier Second stage is sized to provide maximum switching speed Driver transistors are large Keeper transistors are small and disengaged during transitions Nikolic, Oklobdzija, Stojanovic ISSCC ‘99 11/22/2018 Prof. V.G. Oklobdzija, University of California

85 Prof. V.G. Oklobdzija, University of California
Overall results 11/22/2018 Prof. V.G. Oklobdzija, University of California

86 Comparison in terms of speed and PDPtot
Delay below 200ps SDFF ps HLFF ps K-6 ETL ps ps PowerPC latch ps 21264 Alpha FF ps Strong Arm FF ps mC2MOS latch ps above 500ps SSTC latch ps DSTC latch ps SSTC* latch ps DSTC* latch ps PDPtot below 30fJ PowerPC latch fJ fJ HLFF fJ SDFF fJ mC2MOS latch fJ 21264 Alpha FF fJ Strong Arm FF fJ fJ K-6 ETL fJ above 70fJ SSTC latch fJ DSTC latch fJ 11/22/2018 Prof. V.G. Oklobdzija, University of California

87 Prof. V.G. Oklobdzija, University of California
Delay comparison F-F design brings the fastest structures 11/22/2018 Prof. V.G. Oklobdzija, University of California

88 Prof. V.G. Oklobdzija, University of California
Delay comparison F-F design brings the fastest structures 11/22/2018 Prof. V.G. Oklobdzija, University of California

89 Prof. V.G. Oklobdzija, University of California
Overall ranking EDPtot accepted as the overall cost function Proposed “low-power” latches from Yuan & Svensson are not so, compared with other presented structures, (the optimization was not properly done), optimization is yet to be repeated under different setup 11/22/2018 Prof. V.G. Oklobdzija, University of California

90 Overall ranking, zoomed
Real signals have the activity between 0 and 0.25 () Precharged hybrid structures are the fastest but their power consumption strongly depends on the probability of “ones” More “ones” above the  point 11/22/2018 Prof. V.G. Oklobdzija, University of California

91 PDPtot ranges for Svensson’s family
DSTC and SSTC have problems Weak drive of Minimized master increases short circuit power consumption in Slave * Latches are sized like proposed in original paper 11/22/2018 Prof. V.G. Oklobdzija, University of California

92 Prof. V.G. Oklobdzija, University of California
Overall performance Real signals have the activity between 0 and 0.5 () Precharged hybrid structures are the fastest but their power consumption strongly depends on the probability of “ones” More “ones” above the  point 11/22/2018 Prof. V.G. Oklobdzija, University of California

93 Conventional Clk-Q vs. minimum D-Q
Hidden positive setup time Degradation of Clk-Q 11/22/2018 Prof. V.G. Oklobdzija, University of California

94 Internal Power distribution
Four sequences characterize the boundaries for internal power consumption …010101… maximum random, equal transition probability, average …111111… precharge activity …000000… leakage + internal clock processing 11/22/2018 Prof. V.G. Oklobdzija, University of California

95 Comparison of Clock power consumption
11/22/2018 Prof. V.G. Oklobdzija, University of California

96 Comparison of Clock power consumption
11/22/2018 Prof. V.G. Oklobdzija, University of California

97 Prof. V.G. Oklobdzija, University of California
Using Dual-Edge Flip-Flop (run at ½ of the frequency save on the power consumed in clock distribution tree) 11/22/2018 Prof. V.G. Oklobdzija, University of California

98 Dual-Edge vs. Single-Edge Flip-Flops Comparison
Delay [ps] Total Power [W] Fujitsu 0.18u process; Clock frequency 500MHz (250MHz for Dual Edge FFs) Data activity ratio  = 0.5 VDD = 1.8V Temp = 25º 11/22/2018 Prof. V.G. Oklobdzija, University of California

99 Dual-Edge vs. Single-Edge Flip-Flops Comparison
Internal Power [W] Clock Power [W] Fujitsu 0.18u process; Clock frequency 500MHz (250MHz for Dual Edge FFs) Data activity ratio  = 0.5 VDD = 1.8V Temp = 25º Data Power [W] 11/22/2018 Prof. V.G. Oklobdzija, University of California

100 Silicon on Insulator (SOI) Technology
11/22/2018 Prof. V.G. Oklobdzija, University of California

101 Prof. V.G. Oklobdzija, University of California
SOI Comparison F= 1GHz,  = 0.5, Le = 0.08 m, VDD=1.3V, T = 25C 11/22/2018 Prof. V.G. Oklobdzija, University of California

102 Prof. V.G. Oklobdzija, University of California
Conclusion 11/22/2018 Prof. V.G. Oklobdzija, University of California

103 Prof. V.G. Oklobdzija, University of California
Approaches Apply Small clock load Short direct path Reduced node swing Low-power feedback Short period of transparency (hybrid design) Optimization of both Master and Slave latch Avoid Positive setup time Sensitivity to clock slope and skew Dynamic (floating) nodes Double-ended precharged structures Dynamic Slave latch Future directions Tighten the design rules for low-power, high-performance, deep-submicron structures Develop new latches featuring Small clock load Small PDP 11/22/2018 Prof. V.G. Oklobdzija, University of California

104 Prof. V.G. Oklobdzija, University of California
Design goals Apply Small clock load Short direct path Reduced node swing Low-power feedback Pulsed design Optimization of both Master and Slave latch Avoid Positive setup time Sensitivity to clock slope and skew Dynamic (floating) nodes Dynamic Master latch Conduct Power *Delay optimizations on constant frequency - really optimize Energy*Delay product Take into account all sources of power dissipation ALWAYS use Clk-Q + setup time for max delay 11/22/2018 Prof. V.G. Oklobdzija, University of California


Download ppt "University of California Davis"

Similar presentations


Ads by Google