Download presentation
Presentation is loading. Please wait.
1
University of California Davis
Clocked Storage Elements: Master-Slave Latches and Flip-Flops for High-Performance and Low-Power Systems Vojin G. Oklobdzija University of California Davis Integration Corp. Berkeley, CA 94708
2
Prof. V.G. Oklobdzija, University of California
Outline Recent interest and importance Timing and Power metrics Master-Slave vs. Flip-Flop Design and optimization tradeoffs Representative designs Comparison Some novel designs Conclusion 11/22/2018 Prof. V.G. Oklobdzija, University of California
3
Prof. V.G. Oklobdzija, University of California
11/22/2018 Prof. V.G. Oklobdzija, University of California
4
Recent Interest in Storage Elements
Trends in high-performance systems: Higher clock frequency 11/22/2018 Prof. V.G. Oklobdzija, University of California
5
Prof. V.G. Oklobdzija, University of California
Performance 3X / generation Source: ISSCC, uP Report, Hot-Chips 11/22/2018 Prof. V.G. Oklobdzija, University of California
6
Prof. V.G. Oklobdzija, University of California
Total transistors 3X / generation Logic transistors 2X / generation Source: ISSCC, uP Report, Hot-Chips 11/22/2018 Prof. V.G. Oklobdzija, University of California
7
Processor Design Challenges
Performance is tracking frequency increase Where are the transistors contributing ? 3X per generation growth in transistors seems to be uncompensated as far as performance is concerned 11/22/2018 Prof. V.G. Oklobdzija, University of California
8
Prof. V.G. Oklobdzija, University of California
Power versus Year High-end growing at 25% / year 12% / yr 15% / yr Consumer (low-end) At 13% / year 11/22/2018 Prof. V.G. Oklobdzija, University of California
9
Prof. V.G. Oklobdzija, University of California
Power Trend 100 x4 / 3years 10 Power (W) 1 0.1 0.01 80 85 90 95 Courtesy of Sakurai Sensei 11/22/2018 Prof. V.G. Oklobdzija, University of California
10
Gloom and Doom predictions
Source: Shekhar Borkar, Intel 11/22/2018 Prof. V.G. Oklobdzija, University of California
11
Prof. V.G. Oklobdzija, University of California
Recent Interest in Storage Elements Or Why Do Computer Architect Care ? Trends in high-performance systems Higher clock frequency (1.5GHz Pentium, 4GHz presented) More transistors on chip (214 million, ISSCC 2001) Consequences Increased Flip-Flop overhead relative to cycle time Pipeline depth of 20 or more Cycle time FO4 delays, Flop overhead FO4 11/22/2018 Prof. V.G. Oklobdzija, University of California
12
Processor Frequency Trend
Source: Intel S. Borkar Frequency doubles each generation Number of gates/clock reduce by 25% 11/22/2018 Prof. V.G. Oklobdzija, University of California
13
Traditional Pentium 3 uArchitecture
stage stage stage logic register logic register logic register Delay: 0.6 0.3 0.6 0.3 0.6 0.3 The total delay from pipeline stage to pipeline stage is 0.9 ns. The maximum clock rate for this design is 1.1 GHz. 11/22/2018 Prof. V.G. Oklobdzija, University of California
14
The Pentium 4 Depends on Pipelines
logic register logic register logic register logic register logic register logic register Delay: 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 The total delay from pipeline stage to pipeline stage is 0.6 ns. This design, with twice the stages, only has a maximum clock rate of 1.67 GHz. As the design is broken into more pipeline stages, the logic in each stage has less delay, and the registers between stages consume a higher percentage of the delay, causing diminishing returns. At some point the cost of adding more stages, such as branch prediction, causes a very marginal return. The only way out of this bottleneck is a faster register. This is one reason why the P4 is not significantly faster than a slower-clocked P3 for many applications. 11/22/2018 Prof. V.G. Oklobdzija, University of California
15
Recent Interest in Storage Elements
Difficult to control both edges of the clock Higher impact of clock skew Higher cross-talk and substrate coupling Higher power consumption Limits on performance Clock burns up to 40%, storage elements up to 20% of total power I have even seen 75% recently (ISSCC 2001) 11/22/2018 Prof. V.G. Oklobdzija, University of California
16
Solution: Faster Flip-Flops
We have developed a new fast register which can be fabricated using the standard microprocessor fabrication lines – several times faster than registers currently used. logic logic logic logic logic logic Delay: 0.3 0.04 0.3 0.04 0.3 0.04 0.3 0.04 0.3 0.04 0.3 0.04 The total delay from pipeline stage to pipeline stage is 0.34 ns. Using our design allows a maximum nominal clock rate of 2.9 GHz. Can you achieve this performance gain with architecture ? 11/22/2018 Prof. V.G. Oklobdzija, University of California
17
Clocked Storage Element Requirements
High speed High-frequency applications require low FF timing overhead Sub-nanosecond clock periods x10ps - x100ps FF delays Low power Dissipation of >100W for recent processors Battery-supplied applications Size High clock imperfections robustness Logic embedding property Two major performance parameters: speed and power. Delay importance grows as the frequency of operation of the system applying state element increases. Nanosecond or sub-nanosecond clock periods can tolerate state element overhead of several of tens to at most few hundreds picoseconds. Low power consumption - another required feature. Recent processors use tens, or even hundreds of watts for operation - operate with as low power consumption as possible. Other parameters of interest comprise circuit’s size, skew and other clock imperfection robustness, logic embedding property etc. 11/22/2018 Prof. V.G. Oklobdzija, University of California
18
Prof. V.G. Oklobdzija, University of California
Clock Signals Clocks are defined as pulsed, synchronizing signals that provide the time reference for the movement of data in the synchronous digital system. The clocking in a digital system can be either single phase, or multi-phase (usually two-phase). Clocking strategy is dependent and largely influenced by the choice of the storage element: latch or flip-flop The dark rectangles in the figure represent the interval during which the bi-stable element samples its data input. Fig. 4.2 shows the possible types of clocking techniques and corresponding general finite-state machine structures: 11/22/2018 Prof. V.G. Oklobdzija, University of California
19
Clock Signal Uncertainty
Effects on cycle- time: – maximum delay restriction – violation of set- up time May cause race – minimum delay restriction – violation of hold time Uncertainty is: Jitter, Skew, and Duty Cycle 11/22/2018 Prof. V.G. Oklobdzija, University of California
20
Prof. V.G. Oklobdzija, University of California
Jitter • Uncertainty in consecutive edges of a periodic signal • Caused by temporal noise events • Quantified as: – cycle-to-cycle or short-term jitter, tJS – long-term jitter, tJL 11/22/2018 Prof. V.G. Oklobdzija, University of California
21
Prof. V.G. Oklobdzija, University of California
Clock Skew Time difference between temporally-equivalent or concurrent edges of two periodic signals Caused by spatial noise events 11/22/2018 Prof. V.G. Oklobdzija, University of California
22
Prof. V.G. Oklobdzija, University of California
Clocking Strategies Single-phase clocking and single latch machine Edge-triggered clocking and Flip-Flop based machine 11/22/2018 Prof. V.G. Oklobdzija, University of California
23
Prof. V.G. Oklobdzija, University of California
Clocking Strategies Two-phase clocking and two-phase latch machine with single latch Two-phase clocking and two-phase latch machine with double latch 11/22/2018 Prof. V.G. Oklobdzija, University of California
24
Prof. V.G. Oklobdzija, University of California
Delay Restrictions Clock defines hard boundaries for edge-triggered design Clock boundaries are soft for level sensitive clocking and they are: Tolerant for clock edge uncertainty Tolerant to uncertainty of data arrival Timing slack can voluntarily be passed forward Time can forcefully be borrowed *Taken from Hamid Partovi’s ISSCC-2000 GHz Processor Design Workshop presentation 11/22/2018 Prof. V.G. Oklobdzija, University of California
25
Single-Phase Clocking, Single Latch: Timing Constraints
11/22/2018 Prof. V.G. Oklobdzija, University of California
26
Two-Phase Clocking with Two-Phase Double Latch
11/22/2018 Prof. V.G. Oklobdzija, University of California
27
Two-Phase Clocking with One-Phase Double Latch
Some people refer to this clocking arrangement as: “negative edge Flip-Flop” – erroneously ! 11/22/2018 Prof. V.G. Oklobdzija, University of California
28
Difference between Latch and Flip-Flop
After the transition of the clock data can not change Latch is “transparent” 11/22/2018 Prof. V.G. Oklobdzija, University of California
29
Flip-Flop and M-S Latch Combination
How can one recognize the difference without knowing what is inside the “black-box” ? 11/22/2018 Prof. V.G. Oklobdzija, University of California
30
F-F and M-S Latch: Difference
Experiment: 11/22/2018 Prof. V.G. Oklobdzija, University of California
31
F-F and M-S Latch: Difference
Structural Difference: No Clock Flip-Flop M-S Latch 11/22/2018 Prof. V.G. Oklobdzija, University of California
32
Prof. V.G. Oklobdzija, University of California
Flip-Flop vs. Latch Edge sensitive Easier to use as frequency increases Robustness on duty cycle Simpler logic timing requirements Fits into CAD tools Level sensitive Consume less power for the operation Better clock skew/jitter characteristics Choice between use of FF or latch is subject to each individual design and its specifications Flip-flops are edge sensitive - simpler timing requirements and lower sensitivity to duty cycle imperfections Latches are level sensitive, simpler - less power consumption and better clock skew/jitter characteristics 11/22/2018 Prof. V.G. Oklobdzija, University of California
33
Prof. V.G. Oklobdzija, University of California
Flip-Flop: Example HLFF (Partovi) 11/22/2018 Prof. V.G. Oklobdzija, University of California
34
Prof. V.G. Oklobdzija, University of California
Flip-Flop: Example HLFF (Partovi) 11/22/2018 Prof. V.G. Oklobdzija, University of California
35
Pulse-Based Flip-Flops*
*Taken from Hamid Partovi’s ISSCC-2000 GHz Processor Design Workshop presentation 11/22/2018 Prof. V.G. Oklobdzija, University of California
36
Prof. V.G. Oklobdzija, University of California
Flip-Flop: Example D=0 pulse D=1 SAFF DEC Alpha 21264 11/22/2018 Prof. V.G. Oklobdzija, University of California
37
Requirements in the Flip-Flop Design
Small Clk-Output delay, Narrow sampling window Low power Small clock load High driving capability (increased levels of parallelism) Typical flip-flop load in a 0.18m CMOS ranges from 50fF to over 200fF, with typical values of fF in critical paths (rule of thumb number for cap Cgate=2fF/um ) Integration of logic into the flop Multiplexed or clock scan Crosstalk insensitivity - dynamic/high impedance nodes are affected 11/22/2018 Prof. V.G. Oklobdzija, University of California
38
State Element Characterization
Timing Propagation time (clock-to-output) Set-up time Hold time Skew amortization Power consumption Internal power Input power Two major parameters that determine state element’s performance are delay and power consumption. There are three defined timing parameters of state element: - Propagation time, or clock-to-output time, defined as a delay between active clock edge and subsequent output transition, - Setup time, defined as latest allowed arrival of the input with respect to clock, in order to properly capture it, - Hold time, similarly defined as earliest allowed arrival of the input with respect to clock in order to properly capture previous value Power consumption can be broken up into - Internal power, which is the portion of power consumed by the circuit itself, - Input power, which is the portion of the power consumption of driving circuits of the state element due to the presence of the state element; this part of power accounts for often overlooked contribution of the state element to the consumption of the clock distribution network 11/22/2018 Prof. V.G. Oklobdzija, University of California
39
Prof. V.G. Oklobdzija, University of California
Flip-Flop Delay Sum of setup time and Clk-output delay is the only true measure of the performance with respect to the system speed T = TClk-Q + TLogic + Tsetup+ Tskew TClk-Q TSetup TLogic 11/22/2018 Prof. V.G. Oklobdzija, University of California
40
Delay vs. Setup/Hold Times
11/22/2018 Prof. V.G. Oklobdzija, University of California
41
Timing Characteristics
Figure presenting typical clock-to-output and data-to-output characteristics is shown.. In stable region, clock-to-output characteristic is constant. As setup requirement of the device starts to be violated, clock-to-output curve rises, ending in failure at some point. Data-to-output characteristic, being simple sum of clock-to-output and data-to-clock time, falls with the slope of 45° in stable region. In metastable region, the slope starts to decrease as a function of increased clock-to-output characteristic. Minimum of data-to-output curve occurs at 45 ° slope of clock-to-output curve. Data-to-clock time that corresponds to this point is termed optimal setup time. 11/22/2018 Prof. V.G. Oklobdzija, University of California
42
Timing parameters, details
The best point to pick on delay curve is minimum D-Q 11/22/2018 Prof. V.G. Oklobdzija, University of California
43
Prof. V.G. Oklobdzija, University of California
Latch and Flip-Flop latencies (tDQ ) vs. Data-to-clock Set-up Time (tDC ) *Taken from Hamid Partovi’s ISSCC-2000 GHz Processor Design Workshop presentation 11/22/2018 Prof. V.G. Oklobdzija, University of California
44
Clock Skew Considerations
Need for characterization of Flip-Flop behavior in presence of skew/jitter Soft Clock Edge property only qualitatively describes skew immunity Still, designers calculate maximum useful time by incorporating all skew into clocking overhead 11/22/2018 Prof. V.G. Oklobdzija, University of California
45
Clock Skew Considerations
Real Skew Overhead Skew Overhead of Ideal Flip-Flop 11/22/2018 Prof. V.G. Oklobdzija, University of California
46
Clock Skew Considerations
11/22/2018 Prof. V.G. Oklobdzija, University of California
47
Clock Skew Considerations
Skew Rejection - ratio of total skew and its impact on FF overhead Shows how circuit reacts to clock edge uncertainty Helps answering the question to what point to optimize clock distribution network 11/22/2018 Prof. V.G. Oklobdzija, University of California
48
Simulation Condition and Testbench
Power Data activity dependence as a FF characteristics Consumption with 50% activity adopted as a figure of merit Dissipation of driving inverters is part of total power consumption In order to perform evaluation and comparison of flip-flops, simulation conditions and testbench for simulations are defined. They are set according to flip-flop characterization presented earlier. Measurement of power consumption is done with several different input activities; power consumption with input activity of 50% is adopted as a figure of merit. Total dissipation includes dissipation of driving inverters 11/22/2018 Prof. V.G. Oklobdzija, University of California
49
Simulation Condition and Testbench
Timing Total FF overhead is setup + clock-to-output time Circuit optimization towards td-q Clock skew robustness obtained from observing DQ curve Power-Delay Product Overall performance parameter at fixed frequency Circuit delay parameter used for evaluation is data-to-output time. Circuits are optimized towards this parameter. Ultimate performance parameter is power-delay product, measured at fixed clock frequency. It is calculated as a product of data-to-output time and total power consumption measured at optimal-setup time 11/22/2018 Prof. V.G. Oklobdzija, University of California
50
Flip-Flop Performance Comparison
Test bench Total power consumed internal power data power clock power Measured for four cases no activity (0000… and 1111…) maximum activity ( ) average activity (random sequence) Delay is (minimum D-Q) Clk-Q + setup time 11/22/2018 Prof. V.G. Oklobdzija, University of California
51
Prof. V.G. Oklobdzija, University of California
OLD TEST BENCH: Total Power = Drivers Power + Test Unit Power PDP- Optimized = Equal Trade-off on Power and Delay Improper Load on Drivers NEW TEST BENCH: Drivers: Fixed Gain and Driving Test Unit Only Data-to-Output Delay PD2P Optimized = Best for Constant-Field Scaling OLD TEST BENCH NEW TEST BENCH 11/22/2018 Prof. V.G. Oklobdzija, University of California
52
The sources of internal power consumption
11/22/2018 Prof. V.G. Oklobdzija, University of California
53
Design & optimization tradeoffs
Opposite Goals Minimal Total power consumption Minimal Delay Power-Delay tradeoff Minimize Power-Delay product (PDPtot) 11/22/2018 Prof. V.G. Oklobdzija, University of California
54
High-performance microprocessor latches and flip-flops
11/22/2018 Prof. V.G. Oklobdzija, University of California
55
Prof. V.G. Oklobdzija, University of California
21264 Flip-Flop Used in Digital's WD21264 high-performance processor Runs at 600MHz 450pS Clk-Q delay, simulated in 0.35u technology Our simulations show Small clock load High internal power consumption S-R latch ruins the speed by 40% Dynamic nodes, potential hazard in low-power applications 11/22/2018 Prof. V.G. Oklobdzija, University of California
56
Prof. V.G. Oklobdzija, University of California
Strong Arm 110 Flip-Flop Used in SA W low-power processor Runs at 200MHz One transistor more than flip-flop 450ps Clk-Q delay, simulated in 0.35u CMOS technology Our simulations show Additional transistor provides fully static operation (robustness to leakage currents) essential for low-power applications, but slightly increased internal power consumption 11/22/2018 Prof. V.G. Oklobdzija, University of California
57
Prof. V.G. Oklobdzija, University of California
Master-Slave Latches Positive setup times Two clock phases: distributed globally generated locally Small penalty in delay for incorporating MUX Some circuit tricks needed to reduce the overall delay 11/22/2018 Prof. V.G. Oklobdzija, University of California
58
T-G Master-Slave Latch
Low power feedback Unbuffered input input capacitance depends on the phase of the clock over-shoot and under-shoot with long routes wirelength must be restricted at the input Clock load is high Low power Small Clk-output delay, but positive setup Easily embedded scan or mux 11/22/2018 Prof. V.G. Oklobdzija, University of California
59
T-G Master-Slave Latch
PowerPC 603 (Gerosa, JSSC 12/94) 11/22/2018 Prof. V.G. Oklobdzija, University of California
60
PowerPC 603 M-S Latch Combination
Used in PowerPC family Low-power High speed Big clock load Easily embedded scan function Our simulations show Small internal power consumption Low-power feedback Double the clock load compared with other latches Locally generated second phase (reduces overall clock load) 11/22/2018 Prof. V.G. Oklobdzija, University of California
61
C2MOS M-S Latches (Suzuki ’73)
Low power feedback Locally generated second phase Poor driving capability Robustness to clock slope Y. Suzuki, “Clocked CMOS Calculator Circuitry”, IEEE J. Solid-State Circuits, Dec. 1973 11/22/2018 Prof. V.G. Oklobdzija, University of California
62
Prof. V.G. Oklobdzija, University of California
mC2MOS M-S Latch Small clock load (local clock buffering) Low-power feedback Big positive setup time robustness to clock slope, unlike classic C2MOS structure Our simulations show 11/22/2018 Prof. V.G. Oklobdzija, University of California
63
Prof. V.G. Oklobdzija, University of California
Advanced Flip-Flops 11/22/2018 Prof. V.G. Oklobdzija, University of California
64
Prof. V.G. Oklobdzija, University of California
Flip-Flops First stage is a pulse generator generates a pulse (glitch) on a rising edge of the clock Second stage is a latch captures the pulse generated in the first stage Pulse generation results in a negative setup time Frequently exhibit a soft edge property Must check for hold time violations Note: power is always consumed in the clocked pulse generator 11/22/2018 Prof. V.G. Oklobdzija, University of California
65
HLFF (Partovi’s) Flip-Flop
AMD K-6, Partovi, ISSCC’96 11/22/2018 Prof. V.G. Oklobdzija, University of California
66
Prof. V.G. Oklobdzija, University of California
HLFF Operation 1-0 and 0-1 transitions at the input with 0ps setup time 11/22/2018 Prof. V.G. Oklobdzija, University of California
67
Hybrid Latch Flip-Flop
Skew absorption Partovi et al, ISSCC’96 11/22/2018 Prof. V.G. Oklobdzija, University of California
68
Prof. V.G. Oklobdzija, University of California
Partovi’s HLFF Hybrid Latch-Flip-Flop combination 280pS Clk-Q delay Negative set-up time of pS Robustness to clock skew and fast clocking Our simulations show Hybrid design Gains speed (negative setup time) robustness to clock skew Drawbacks sensitivity to clock slope relatively high internal power (due to precharge) 11/22/2018 Prof. V.G. Oklobdzija, University of California
69
Prof. V.G. Oklobdzija, University of California
HLFF Flip-Flop Flip-flop features: single phase clock edge triggered, on one clock edge Features: Soft clock edge property brief transparency, equal to 3 inverter delays negative setup time allows slack passing absorbs skew Hold time is comparable to HLFF delay minimum delay between flip-flops must be controlled Pseudo static Possible to incorporate logic 11/22/2018 Prof. V.G. Oklobdzija, University of California
70
Prof. V.G. Oklobdzija, University of California
K-6 Dual-Rail ETL Self-reset property increases dynamic power drives domino logic Precharge increases speed Very fast but burns a lot of power Small clock load 11/22/2018 Prof. V.G. Oklobdzija, University of California
71
Flip- Flop Element of K6 (Partovi*)
*Taken from Hamid Partovi’s ISSCC-2000 GHz Processor Design Workshop presentation 11/22/2018 Prof. V.G. Oklobdzija, University of California
72
Pulsed Flip- Flop of K7 w/ Embedded MUX*
*Taken from Hamid Partovi’s ISSCC-2000 GHz Processor Design Workshop presentation 11/22/2018 Prof. V.G. Oklobdzija, University of California
73
Prof. V.G. Oklobdzija, University of California
K-6 Dual-Rail ETL Self-reset property Hybrid combination 260ps Clk-Q delay simulated in .35u CMOS technology negative setup time: -20ps small clock load Our simulations show Double-ended, precharge structure is the most power hungry (switching on all input combinations) Self-reset property increases power consumption drives succeeding fast domino stages Precharge increases speed 11/22/2018 Prof. V.G. Oklobdzija, University of California
74
Semi-Dynamic Flip-Flop (SDFF)
Sun UltraSparc III, Klass, VLSI Circuits’98 Soft edge conditioned by data since first stage is precharged - cross-coupled latch is added for robustness Small penalty for adding logic Latch has one transistor less in stack - faster than HLFF, but 1-1 glitch exists 11/22/2018 Prof. V.G. Oklobdzija, University of California
75
Semi-Dynamic Flip-Flop
Hybrid combination used in UltraSPARC-III Very fast circuit ( 188ps Clk-Q delay .25u technology, 1.6V, 105oC ) Our simulations show Negative setup time Feature of small penalty for embedded logic Relatively high internal power consumption and clock load 11/22/2018 Prof. V.G. Oklobdzija, University of California
76
SDFF with Improved Switching Delay Characteristic
11/22/2018 Prof. V.G. Oklobdzija, University of California
77
Performance with Scaled Supply Voltage SDFF with Improved Switching
11/22/2018 Prof. V.G. Oklobdzija, University of California
78
Svensson’s Family of Latches (he developed TSPC-FF)
11/22/2018 Prof. V.G. Oklobdzija, University of California
79
Single-Transistor-Clocked MS latches
DSTC SSTC Yuan and Svennson, JSSC Jan. ‘97 Ratioed DCVS and SRPL based designs Relatively small clock load Very sensitive to input glitching Back-gate coupling and charge sharing related speed and power problems 11/22/2018 Prof. V.G. Oklobdzija, University of California
80
Prof. V.G. Oklobdzija, University of California
SSTC latch Fully Static According to Svensson exhibits 360ps worst case Clk-Q delay, laid out in 0.8u single-poly CMOS process Our simulations show No significant clock power savings compared to the rest of the latches Excessively large setup time due to the minimized Master latch Both Master and Slave latch have to be optimized for speed and power 11/22/2018 Prof. V.G. Oklobdzija, University of California
81
Prof. V.G. Oklobdzija, University of California
DSTC latch Double-ended, dynamic latch 350ps Clk-Q delay, according to Svensson Our simulations show Problems with output dynamic node Capacitive-coupling Charge sharing Delay even worse than in SSTC latch due to the capacitive-coupling of dynamic drive node 11/22/2018 Prof. V.G. Oklobdzija, University of California
82
Modified Sense Amplifier-Based Flip-Flop
Nikolic, Oklobdzija, Stojanovic, ISSCC, 1999 Delay of each of the outputs is independent of the load on the other output Delay of Q and Q is symmetrical as opposed to the NAND based design Convenient for dual rail logic and driving strength for standard CMOS is effectively doubled SAFF presents a small clock load, small setup time and all the advantages of original design Possible tradeoff between speed and robustness to cross-talk 11/22/2018 Prof. V.G. Oklobdzija, University of California
83
Sense-amplifier-based flip-flop
Matsui et al DEC Alpha 21264, StrongARM 110 First stage is a sense amplifier On rising clock edge monotonic S_b or R_b trigger the S-R latch Cross-coupled NAND - speed bottleneck Big power savings in reduced swing designs Nice interface to/from domino logic 11/22/2018 Prof. V.G. Oklobdzija, University of California
84
Modified Sense Amplifier-Based Flip-Flop
The first stage is unchanged sense amplifier Second stage is sized to provide maximum switching speed Driver transistors are large Keeper transistors are small and disengaged during transitions Nikolic, Oklobdzija, Stojanovic ISSCC ‘99 11/22/2018 Prof. V.G. Oklobdzija, University of California
85
Prof. V.G. Oklobdzija, University of California
Overall results 11/22/2018 Prof. V.G. Oklobdzija, University of California
86
Comparison in terms of speed and PDPtot
Delay below 200ps SDFF ps HLFF ps K-6 ETL ps ps PowerPC latch ps 21264 Alpha FF ps Strong Arm FF ps mC2MOS latch ps above 500ps SSTC latch ps DSTC latch ps SSTC* latch ps DSTC* latch ps PDPtot below 30fJ PowerPC latch fJ fJ HLFF fJ SDFF fJ mC2MOS latch fJ 21264 Alpha FF fJ Strong Arm FF fJ fJ K-6 ETL fJ above 70fJ SSTC latch fJ DSTC latch fJ 11/22/2018 Prof. V.G. Oklobdzija, University of California
87
Prof. V.G. Oklobdzija, University of California
Delay comparison F-F design brings the fastest structures 11/22/2018 Prof. V.G. Oklobdzija, University of California
88
Prof. V.G. Oklobdzija, University of California
Delay comparison F-F design brings the fastest structures 11/22/2018 Prof. V.G. Oklobdzija, University of California
89
Prof. V.G. Oklobdzija, University of California
Overall ranking EDPtot accepted as the overall cost function Proposed “low-power” latches from Yuan & Svensson are not so, compared with other presented structures, (the optimization was not properly done), optimization is yet to be repeated under different setup 11/22/2018 Prof. V.G. Oklobdzija, University of California
90
Overall ranking, zoomed
Real signals have the activity between 0 and 0.25 () Precharged hybrid structures are the fastest but their power consumption strongly depends on the probability of “ones” More “ones” above the point 11/22/2018 Prof. V.G. Oklobdzija, University of California
91
PDPtot ranges for Svensson’s family
DSTC and SSTC have problems Weak drive of Minimized master increases short circuit power consumption in Slave * Latches are sized like proposed in original paper 11/22/2018 Prof. V.G. Oklobdzija, University of California
92
Prof. V.G. Oklobdzija, University of California
Overall performance Real signals have the activity between 0 and 0.5 () Precharged hybrid structures are the fastest but their power consumption strongly depends on the probability of “ones” More “ones” above the point 11/22/2018 Prof. V.G. Oklobdzija, University of California
93
Conventional Clk-Q vs. minimum D-Q
Hidden positive setup time Degradation of Clk-Q 11/22/2018 Prof. V.G. Oklobdzija, University of California
94
Internal Power distribution
Four sequences characterize the boundaries for internal power consumption …010101… maximum random, equal transition probability, average …111111… precharge activity …000000… leakage + internal clock processing 11/22/2018 Prof. V.G. Oklobdzija, University of California
95
Comparison of Clock power consumption
11/22/2018 Prof. V.G. Oklobdzija, University of California
96
Comparison of Clock power consumption
11/22/2018 Prof. V.G. Oklobdzija, University of California
97
Prof. V.G. Oklobdzija, University of California
Using Dual-Edge Flip-Flop (run at ½ of the frequency save on the power consumed in clock distribution tree) 11/22/2018 Prof. V.G. Oklobdzija, University of California
98
Dual-Edge vs. Single-Edge Flip-Flops Comparison
Delay [ps] Total Power [W] Fujitsu 0.18u process; Clock frequency 500MHz (250MHz for Dual Edge FFs) Data activity ratio = 0.5 VDD = 1.8V Temp = 25º 11/22/2018 Prof. V.G. Oklobdzija, University of California
99
Dual-Edge vs. Single-Edge Flip-Flops Comparison
Internal Power [W] Clock Power [W] Fujitsu 0.18u process; Clock frequency 500MHz (250MHz for Dual Edge FFs) Data activity ratio = 0.5 VDD = 1.8V Temp = 25º Data Power [W] 11/22/2018 Prof. V.G. Oklobdzija, University of California
100
Silicon on Insulator (SOI) Technology
11/22/2018 Prof. V.G. Oklobdzija, University of California
101
Prof. V.G. Oklobdzija, University of California
SOI Comparison F= 1GHz, = 0.5, Le = 0.08 m, VDD=1.3V, T = 25C 11/22/2018 Prof. V.G. Oklobdzija, University of California
102
Prof. V.G. Oklobdzija, University of California
Conclusion 11/22/2018 Prof. V.G. Oklobdzija, University of California
103
Prof. V.G. Oklobdzija, University of California
Approaches Apply Small clock load Short direct path Reduced node swing Low-power feedback Short period of transparency (hybrid design) Optimization of both Master and Slave latch Avoid Positive setup time Sensitivity to clock slope and skew Dynamic (floating) nodes Double-ended precharged structures Dynamic Slave latch Future directions Tighten the design rules for low-power, high-performance, deep-submicron structures Develop new latches featuring Small clock load Small PDP 11/22/2018 Prof. V.G. Oklobdzija, University of California
104
Prof. V.G. Oklobdzija, University of California
Design goals Apply Small clock load Short direct path Reduced node swing Low-power feedback Pulsed design Optimization of both Master and Slave latch Avoid Positive setup time Sensitivity to clock slope and skew Dynamic (floating) nodes Dynamic Master latch Conduct Power *Delay optimizations on constant frequency - really optimize Energy*Delay product Take into account all sources of power dissipation ALWAYS use Clk-Q + setup time for max delay 11/22/2018 Prof. V.G. Oklobdzija, University of California
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.