University of California Davis Clocked Storage Elements: Master-Slave Latches and Flip-Flops for High-Performance and Low-Power Systems Vojin G. Oklobdzija University of California Davis http://www.ece.ucdavis.edu/acsel Integration Corp. Berkeley, CA 94708 http://www.integration-corp.com
Prof. V.G. Oklobdzija, University of California Outline Recent interest and importance Timing and Power metrics Master-Slave vs. Flip-Flop Design and optimization tradeoffs Representative designs Comparison Some novel designs Conclusion 11/22/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California 11/22/2018 Prof. V.G. Oklobdzija, University of California
Recent Interest in Storage Elements Trends in high-performance systems: Higher clock frequency 11/22/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Performance 3X / generation Source: ISSCC, uP Report, Hot-Chips 11/22/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Total transistors 3X / generation Logic transistors 2X / generation Source: ISSCC, uP Report, Hot-Chips 11/22/2018 Prof. V.G. Oklobdzija, University of California
Processor Design Challenges Performance is tracking frequency increase Where are the transistors contributing ? 3X per generation growth in transistors seems to be uncompensated as far as performance is concerned 11/22/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Power versus Year High-end growing at 25% / year RISC @ 12% / yr X86 @ 15% / yr Consumer (low-end) At 13% / year 11/22/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Power Trend 100 x4 / 3years 10 Power (W) 1 0.1 0.01 80 85 90 95 Courtesy of Sakurai Sensei 11/22/2018 Prof. V.G. Oklobdzija, University of California
Gloom and Doom predictions Source: Shekhar Borkar, Intel 11/22/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Recent Interest in Storage Elements Or Why Do Computer Architect Care ? Trends in high-performance systems Higher clock frequency (1.5GHz Pentium, 4GHz presented) More transistors on chip (214 million, ISSCC 2001) Consequences Increased Flip-Flop overhead relative to cycle time Pipeline depth of 20 or more Cycle time 10 - 20 FO4 delays, Flop overhead 3 - 4 FO4 11/22/2018 Prof. V.G. Oklobdzija, University of California
Processor Frequency Trend Source: Intel S. Borkar Frequency doubles each generation Number of gates/clock reduce by 25% 11/22/2018 Prof. V.G. Oklobdzija, University of California
Traditional Pentium 3 uArchitecture stage stage stage logic register logic register logic register Delay: 0.6 0.3 0.6 0.3 0.6 0.3 The total delay from pipeline stage to pipeline stage is 0.9 ns. The maximum clock rate for this design is 1.1 GHz. 11/22/2018 Prof. V.G. Oklobdzija, University of California
The Pentium 4 Depends on Pipelines logic register logic register logic register logic register logic register logic register Delay: 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 The total delay from pipeline stage to pipeline stage is 0.6 ns. This design, with twice the stages, only has a maximum clock rate of 1.67 GHz. As the design is broken into more pipeline stages, the logic in each stage has less delay, and the registers between stages consume a higher percentage of the delay, causing diminishing returns. At some point the cost of adding more stages, such as branch prediction, causes a very marginal return. The only way out of this bottleneck is a faster register. This is one reason why the P4 is not significantly faster than a slower-clocked P3 for many applications. 11/22/2018 Prof. V.G. Oklobdzija, University of California
Recent Interest in Storage Elements Difficult to control both edges of the clock Higher impact of clock skew Higher cross-talk and substrate coupling Higher power consumption Limits on performance Clock burns up to 40%, storage elements up to 20% of total power I have even seen 75% recently (ISSCC 2001) 11/22/2018 Prof. V.G. Oklobdzija, University of California
Solution: Faster Flip-Flops We have developed a new fast register which can be fabricated using the standard microprocessor fabrication lines – several times faster than registers currently used. logic logic logic logic logic logic Delay: 0.3 0.04 0.3 0.04 0.3 0.04 0.3 0.04 0.3 0.04 0.3 0.04 The total delay from pipeline stage to pipeline stage is 0.34 ns. Using our design allows a maximum nominal clock rate of 2.9 GHz. Can you achieve this performance gain with architecture ? 11/22/2018 Prof. V.G. Oklobdzija, University of California
Clocked Storage Element Requirements High speed High-frequency applications require low FF timing overhead Sub-nanosecond clock periods x10ps - x100ps FF delays Low power Dissipation of >100W for recent processors Battery-supplied applications Size High clock imperfections robustness Logic embedding property Two major performance parameters: speed and power. Delay importance grows as the frequency of operation of the system applying state element increases. Nanosecond or sub-nanosecond clock periods can tolerate state element overhead of several of tens to at most few hundreds picoseconds. Low power consumption - another required feature. Recent processors use tens, or even hundreds of watts for operation - operate with as low power consumption as possible. Other parameters of interest comprise circuit’s size, skew and other clock imperfection robustness, logic embedding property etc. 11/22/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Clock Signals Clocks are defined as pulsed, synchronizing signals that provide the time reference for the movement of data in the synchronous digital system. The clocking in a digital system can be either single phase, or multi-phase (usually two-phase). Clocking strategy is dependent and largely influenced by the choice of the storage element: latch or flip-flop The dark rectangles in the figure represent the interval during which the bi-stable element samples its data input. Fig. 4.2 shows the possible types of clocking techniques and corresponding general finite-state machine structures: 11/22/2018 Prof. V.G. Oklobdzija, University of California
Clock Signal Uncertainty Effects on cycle- time: – maximum delay restriction – violation of set- up time May cause race – minimum delay restriction – violation of hold time Uncertainty is: Jitter, Skew, and Duty Cycle 11/22/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Jitter • Uncertainty in consecutive edges of a periodic signal • Caused by temporal noise events • Quantified as: – cycle-to-cycle or short-term jitter, tJS – long-term jitter, tJL 11/22/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Clock Skew Time difference between temporally-equivalent or concurrent edges of two periodic signals Caused by spatial noise events 11/22/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Clocking Strategies Single-phase clocking and single latch machine Edge-triggered clocking and Flip-Flop based machine 11/22/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Clocking Strategies Two-phase clocking and two-phase latch machine with single latch Two-phase clocking and two-phase latch machine with double latch 11/22/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Delay Restrictions Clock defines hard boundaries for edge-triggered design Clock boundaries are soft for level sensitive clocking and they are: Tolerant for clock edge uncertainty Tolerant to uncertainty of data arrival Timing slack can voluntarily be passed forward Time can forcefully be borrowed *Taken from Hamid Partovi’s ISSCC-2000 GHz Processor Design Workshop presentation 11/22/2018 Prof. V.G. Oklobdzija, University of California
Single-Phase Clocking, Single Latch: Timing Constraints 11/22/2018 Prof. V.G. Oklobdzija, University of California
Two-Phase Clocking with Two-Phase Double Latch 11/22/2018 Prof. V.G. Oklobdzija, University of California
Two-Phase Clocking with One-Phase Double Latch Some people refer to this clocking arrangement as: “negative edge Flip-Flop” – erroneously ! 11/22/2018 Prof. V.G. Oklobdzija, University of California
Difference between Latch and Flip-Flop After the transition of the clock data can not change Latch is “transparent” 11/22/2018 Prof. V.G. Oklobdzija, University of California
Flip-Flop and M-S Latch Combination How can one recognize the difference without knowing what is inside the “black-box” ? 11/22/2018 Prof. V.G. Oklobdzija, University of California
F-F and M-S Latch: Difference Experiment: 11/22/2018 Prof. V.G. Oklobdzija, University of California
F-F and M-S Latch: Difference Structural Difference: No Clock Flip-Flop M-S Latch 11/22/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Flip-Flop vs. Latch Edge sensitive Easier to use as frequency increases Robustness on duty cycle Simpler logic timing requirements Fits into CAD tools Level sensitive Consume less power for the operation Better clock skew/jitter characteristics Choice between use of FF or latch is subject to each individual design and its specifications Flip-flops are edge sensitive - simpler timing requirements and lower sensitivity to duty cycle imperfections Latches are level sensitive, simpler - less power consumption and better clock skew/jitter characteristics 11/22/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Flip-Flop: Example HLFF (Partovi) 11/22/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Flip-Flop: Example HLFF (Partovi) 11/22/2018 Prof. V.G. Oklobdzija, University of California
Pulse-Based Flip-Flops* *Taken from Hamid Partovi’s ISSCC-2000 GHz Processor Design Workshop presentation 11/22/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Flip-Flop: Example D=0 pulse D=1 SAFF DEC Alpha 21264 11/22/2018 Prof. V.G. Oklobdzija, University of California
Requirements in the Flip-Flop Design Small Clk-Output delay, Narrow sampling window Low power Small clock load High driving capability (increased levels of parallelism) Typical flip-flop load in a 0.18m CMOS ranges from 50fF to over 200fF, with typical values of 100-150fF in critical paths (rule of thumb number for cap Cgate=2fF/um ) Integration of logic into the flop Multiplexed or clock scan Crosstalk insensitivity - dynamic/high impedance nodes are affected 11/22/2018 Prof. V.G. Oklobdzija, University of California
State Element Characterization Timing Propagation time (clock-to-output) Set-up time Hold time Skew amortization Power consumption Internal power Input power Two major parameters that determine state element’s performance are delay and power consumption. There are three defined timing parameters of state element: - Propagation time, or clock-to-output time, defined as a delay between active clock edge and subsequent output transition, - Setup time, defined as latest allowed arrival of the input with respect to clock, in order to properly capture it, - Hold time, similarly defined as earliest allowed arrival of the input with respect to clock in order to properly capture previous value Power consumption can be broken up into - Internal power, which is the portion of power consumed by the circuit itself, - Input power, which is the portion of the power consumption of driving circuits of the state element due to the presence of the state element; this part of power accounts for often overlooked contribution of the state element to the consumption of the clock distribution network 11/22/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Flip-Flop Delay Sum of setup time and Clk-output delay is the only true measure of the performance with respect to the system speed T = TClk-Q + TLogic + Tsetup+ Tskew TClk-Q TSetup TLogic 11/22/2018 Prof. V.G. Oklobdzija, University of California
Delay vs. Setup/Hold Times 11/22/2018 Prof. V.G. Oklobdzija, University of California
Timing Characteristics Figure presenting typical clock-to-output and data-to-output characteristics is shown.. In stable region, clock-to-output characteristic is constant. As setup requirement of the device starts to be violated, clock-to-output curve rises, ending in failure at some point. Data-to-output characteristic, being simple sum of clock-to-output and data-to-clock time, falls with the slope of 45° in stable region. In metastable region, the slope starts to decrease as a function of increased clock-to-output characteristic. Minimum of data-to-output curve occurs at 45 ° slope of clock-to-output curve. Data-to-clock time that corresponds to this point is termed optimal setup time. 11/22/2018 Prof. V.G. Oklobdzija, University of California
Timing parameters, details The best point to pick on delay curve is minimum D-Q 11/22/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Latch and Flip-Flop latencies (tDQ ) vs. Data-to-clock Set-up Time (tDC ) *Taken from Hamid Partovi’s ISSCC-2000 GHz Processor Design Workshop presentation 11/22/2018 Prof. V.G. Oklobdzija, University of California
Clock Skew Considerations Need for characterization of Flip-Flop behavior in presence of skew/jitter Soft Clock Edge property only qualitatively describes skew immunity Still, designers calculate maximum useful time by incorporating all skew into clocking overhead 11/22/2018 Prof. V.G. Oklobdzija, University of California
Clock Skew Considerations Real Skew Overhead Skew Overhead of Ideal Flip-Flop 11/22/2018 Prof. V.G. Oklobdzija, University of California
Clock Skew Considerations 11/22/2018 Prof. V.G. Oklobdzija, University of California
Clock Skew Considerations Skew Rejection - ratio of total skew and its impact on FF overhead Shows how circuit reacts to clock edge uncertainty Helps answering the question to what point to optimize clock distribution network 11/22/2018 Prof. V.G. Oklobdzija, University of California
Simulation Condition and Testbench Power Data activity dependence as a FF characteristics Consumption with 50% activity adopted as a figure of merit Dissipation of driving inverters is part of total power consumption In order to perform evaluation and comparison of flip-flops, simulation conditions and testbench for simulations are defined. They are set according to flip-flop characterization presented earlier. Measurement of power consumption is done with several different input activities; power consumption with input activity of 50% is adopted as a figure of merit. Total dissipation includes dissipation of driving inverters 11/22/2018 Prof. V.G. Oklobdzija, University of California
Simulation Condition and Testbench Timing Total FF overhead is setup + clock-to-output time Circuit optimization towards td-q Clock skew robustness obtained from observing DQ curve Power-Delay Product Overall performance parameter at fixed frequency Circuit delay parameter used for evaluation is data-to-output time. Circuits are optimized towards this parameter. Ultimate performance parameter is power-delay product, measured at fixed clock frequency. It is calculated as a product of data-to-output time and total power consumption measured at optimal-setup time 11/22/2018 Prof. V.G. Oklobdzija, University of California
Flip-Flop Performance Comparison Test bench Total power consumed internal power data power clock power Measured for four cases no activity (0000… and 1111…) maximum activity (0101010..) average activity (random sequence) Delay is (minimum D-Q) Clk-Q + setup time 11/22/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California OLD TEST BENCH: Total Power = Drivers Power + Test Unit Power PDP- Optimized = Equal Trade-off on Power and Delay Improper Load on Drivers NEW TEST BENCH: Drivers: Fixed Gain and Driving Test Unit Only Data-to-Output Delay PD2P Optimized = Best for Constant-Field Scaling OLD TEST BENCH NEW TEST BENCH 11/22/2018 Prof. V.G. Oklobdzija, University of California
The sources of internal power consumption 11/22/2018 Prof. V.G. Oklobdzija, University of California
Design & optimization tradeoffs Opposite Goals Minimal Total power consumption Minimal Delay Power-Delay tradeoff Minimize Power-Delay product (PDPtot) 11/22/2018 Prof. V.G. Oklobdzija, University of California
High-performance microprocessor latches and flip-flops 11/22/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California 21264 Flip-Flop Used in Digital's WD21264 high-performance processor Runs at 600MHz 450pS Clk-Q delay, simulated in 0.35u technology Our simulations show Small clock load High internal power consumption S-R latch ruins the speed by 40% Dynamic nodes, potential hazard in low-power applications 11/22/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Strong Arm 110 Flip-Flop Used in SA110 0.5W low-power processor Runs at 200MHz One transistor more than 21264 flip-flop 450ps Clk-Q delay, simulated in 0.35u CMOS technology Our simulations show Additional transistor provides fully static operation (robustness to leakage currents) essential for low-power applications, but slightly increased internal power consumption 11/22/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Master-Slave Latches Positive setup times Two clock phases: distributed globally generated locally Small penalty in delay for incorporating MUX Some circuit tricks needed to reduce the overall delay 11/22/2018 Prof. V.G. Oklobdzija, University of California
T-G Master-Slave Latch Low power feedback Unbuffered input input capacitance depends on the phase of the clock over-shoot and under-shoot with long routes wirelength must be restricted at the input Clock load is high Low power Small Clk-output delay, but positive setup Easily embedded scan or mux 11/22/2018 Prof. V.G. Oklobdzija, University of California
T-G Master-Slave Latch PowerPC 603 (Gerosa, JSSC 12/94) 11/22/2018 Prof. V.G. Oklobdzija, University of California
PowerPC 603 M-S Latch Combination Used in PowerPC family Low-power High speed Big clock load Easily embedded scan function Our simulations show Small internal power consumption Low-power feedback Double the clock load compared with other latches Locally generated second phase (reduces overall clock load) 11/22/2018 Prof. V.G. Oklobdzija, University of California
C2MOS M-S Latches (Suzuki ’73) Low power feedback Locally generated second phase Poor driving capability Robustness to clock slope Y. Suzuki, “Clocked CMOS Calculator Circuitry”, IEEE J. Solid-State Circuits, Dec. 1973 11/22/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California mC2MOS M-S Latch Small clock load (local clock buffering) Low-power feedback Big positive setup time robustness to clock slope, unlike classic C2MOS structure Our simulations show 11/22/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Advanced Flip-Flops 11/22/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Flip-Flops First stage is a pulse generator generates a pulse (glitch) on a rising edge of the clock Second stage is a latch captures the pulse generated in the first stage Pulse generation results in a negative setup time Frequently exhibit a soft edge property Must check for hold time violations Note: power is always consumed in the clocked pulse generator 11/22/2018 Prof. V.G. Oklobdzija, University of California
HLFF (Partovi’s) Flip-Flop AMD K-6, Partovi, ISSCC’96 11/22/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California HLFF Operation 1-0 and 0-1 transitions at the input with 0ps setup time 11/22/2018 Prof. V.G. Oklobdzija, University of California
Hybrid Latch Flip-Flop Skew absorption Partovi et al, ISSCC’96 11/22/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Partovi’s HLFF Hybrid Latch-Flip-Flop combination 280pS Clk-Q delay Negative set-up time of -100pS Robustness to clock skew and fast clocking Our simulations show Hybrid design Gains speed (negative setup time) robustness to clock skew Drawbacks sensitivity to clock slope relatively high internal power (due to precharge) 11/22/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California HLFF Flip-Flop Flip-flop features: single phase clock edge triggered, on one clock edge Features: Soft clock edge property brief transparency, equal to 3 inverter delays negative setup time allows slack passing absorbs skew Hold time is comparable to HLFF delay minimum delay between flip-flops must be controlled Pseudo static Possible to incorporate logic 11/22/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California K-6 Dual-Rail ETL Self-reset property increases dynamic power drives domino logic Precharge increases speed Very fast but burns a lot of power Small clock load 11/22/2018 Prof. V.G. Oklobdzija, University of California
Flip- Flop Element of K6 (Partovi*) *Taken from Hamid Partovi’s ISSCC-2000 GHz Processor Design Workshop presentation 11/22/2018 Prof. V.G. Oklobdzija, University of California
Pulsed Flip- Flop of K7 w/ Embedded MUX* *Taken from Hamid Partovi’s ISSCC-2000 GHz Processor Design Workshop presentation 11/22/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California K-6 Dual-Rail ETL Self-reset property Hybrid combination 260ps Clk-Q delay simulated in .35u CMOS technology negative setup time: -20ps small clock load Our simulations show Double-ended, precharge structure is the most power hungry (switching on all input combinations) Self-reset property increases power consumption drives succeeding fast domino stages Precharge increases speed 11/22/2018 Prof. V.G. Oklobdzija, University of California
Semi-Dynamic Flip-Flop (SDFF) Sun UltraSparc III, Klass, VLSI Circuits’98 Soft edge conditioned by data since first stage is precharged - cross-coupled latch is added for robustness Small penalty for adding logic Latch has one transistor less in stack - faster than HLFF, but 1-1 glitch exists 11/22/2018 Prof. V.G. Oklobdzija, University of California
Semi-Dynamic Flip-Flop Hybrid combination used in UltraSPARC-III Very fast circuit ( 188ps Clk-Q delay .25u technology, 1.6V, 105oC ) Our simulations show Negative setup time Feature of small penalty for embedded logic Relatively high internal power consumption and clock load 11/22/2018 Prof. V.G. Oklobdzija, University of California
SDFF with Improved Switching Delay Characteristic 11/22/2018 Prof. V.G. Oklobdzija, University of California
Performance with Scaled Supply Voltage SDFF with Improved Switching 11/22/2018 Prof. V.G. Oklobdzija, University of California
Svensson’s Family of Latches (he developed TSPC-FF) 11/22/2018 Prof. V.G. Oklobdzija, University of California
Single-Transistor-Clocked MS latches DSTC SSTC Yuan and Svennson, JSSC Jan. ‘97 Ratioed DCVS and SRPL based designs Relatively small clock load Very sensitive to input glitching Back-gate coupling and charge sharing related speed and power problems 11/22/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California SSTC latch Fully Static According to Svensson exhibits 360ps worst case Clk-Q delay, laid out in 0.8u single-poly CMOS process Our simulations show No significant clock power savings compared to the rest of the latches Excessively large setup time due to the minimized Master latch Both Master and Slave latch have to be optimized for speed and power 11/22/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California DSTC latch Double-ended, dynamic latch 350ps Clk-Q delay, according to Svensson Our simulations show Problems with output dynamic node Capacitive-coupling Charge sharing Delay even worse than in SSTC latch due to the capacitive-coupling of dynamic drive node 11/22/2018 Prof. V.G. Oklobdzija, University of California
Modified Sense Amplifier-Based Flip-Flop Nikolic, Oklobdzija, Stojanovic, ISSCC, 1999 Delay of each of the outputs is independent of the load on the other output Delay of Q and Q is symmetrical as opposed to the NAND based design Convenient for dual rail logic and driving strength for standard CMOS is effectively doubled SAFF presents a small clock load, small setup time and all the advantages of original design Possible tradeoff between speed and robustness to cross-talk 11/22/2018 Prof. V.G. Oklobdzija, University of California
Sense-amplifier-based flip-flop Matsui et al. 1994. DEC Alpha 21264, StrongARM 110 First stage is a sense amplifier On rising clock edge monotonic S_b or R_b trigger the S-R latch Cross-coupled NAND - speed bottleneck Big power savings in reduced swing designs Nice interface to/from domino logic 11/22/2018 Prof. V.G. Oklobdzija, University of California
Modified Sense Amplifier-Based Flip-Flop The first stage is unchanged sense amplifier Second stage is sized to provide maximum switching speed Driver transistors are large Keeper transistors are small and disengaged during transitions Nikolic, Oklobdzija, Stojanovic ISSCC ‘99 11/22/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Overall results 11/22/2018 Prof. V.G. Oklobdzija, University of California
Comparison in terms of speed and PDPtot Delay below 200ps SDFF 187ps HLFF 199ps K-6 ETL 200ps 200-300ps PowerPC latch 266ps 21264 Alpha FF 272ps Strong Arm FF 275ps mC2MOS latch 292ps above 500ps SSTC latch 592ps DSTC latch 629ps SSTC* latch 898ps DSTC* latch 1060ps PDPtot below 30fJ PowerPC latch 28fJ 30 - 50fJ HLFF 29fJ SDFF 39fJ mC2MOS latch 40fJ 21264 Alpha FF 43fJ Strong Arm FF 45fJ 50 - 70fJ K-6 ETL 70fJ above 70fJ SSTC latch 95fJ DSTC latch 125fJ 11/22/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Delay comparison F-F design brings the fastest structures 11/22/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Delay comparison F-F design brings the fastest structures 11/22/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Overall ranking EDPtot accepted as the overall cost function Proposed “low-power” latches from Yuan & Svensson are not so, compared with other presented structures, (the optimization was not properly done), optimization is yet to be repeated under different setup 11/22/2018 Prof. V.G. Oklobdzija, University of California
Overall ranking, zoomed Real signals have the activity between 0 and 0.25 () Precharged hybrid structures are the fastest but their power consumption strongly depends on the probability of “ones” More “ones” above the point 11/22/2018 Prof. V.G. Oklobdzija, University of California
PDPtot ranges for Svensson’s family DSTC and SSTC have problems Weak drive of Minimized master increases short circuit power consumption in Slave * Latches are sized like proposed in original paper 11/22/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Overall performance Real signals have the activity between 0 and 0.5 () Precharged hybrid structures are the fastest but their power consumption strongly depends on the probability of “ones” More “ones” above the point 11/22/2018 Prof. V.G. Oklobdzija, University of California
Conventional Clk-Q vs. minimum D-Q Hidden positive setup time Degradation of Clk-Q 11/22/2018 Prof. V.G. Oklobdzija, University of California
Internal Power distribution Four sequences characterize the boundaries for internal power consumption …010101… maximum random, equal transition probability, average …111111… precharge activity …000000… leakage + internal clock processing 11/22/2018 Prof. V.G. Oklobdzija, University of California
Comparison of Clock power consumption 11/22/2018 Prof. V.G. Oklobdzija, University of California
Comparison of Clock power consumption 11/22/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Using Dual-Edge Flip-Flop (run at ½ of the frequency save on the power consumed in clock distribution tree) 11/22/2018 Prof. V.G. Oklobdzija, University of California
Dual-Edge vs. Single-Edge Flip-Flops Comparison Delay [ps] Total Power [W] Fujitsu 0.18u process; Clock frequency 500MHz (250MHz for Dual Edge FFs) Data activity ratio = 0.5 VDD = 1.8V Temp = 25º 11/22/2018 Prof. V.G. Oklobdzija, University of California
Dual-Edge vs. Single-Edge Flip-Flops Comparison Internal Power [W] Clock Power [W] Fujitsu 0.18u process; Clock frequency 500MHz (250MHz for Dual Edge FFs) Data activity ratio = 0.5 VDD = 1.8V Temp = 25º Data Power [W] 11/22/2018 Prof. V.G. Oklobdzija, University of California
Silicon on Insulator (SOI) Technology 11/22/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California SOI Comparison F= 1GHz, = 0.5, Le = 0.08 m, VDD=1.3V, T = 25C 11/22/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Conclusion 11/22/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Approaches Apply Small clock load Short direct path Reduced node swing Low-power feedback Short period of transparency (hybrid design) Optimization of both Master and Slave latch Avoid Positive setup time Sensitivity to clock slope and skew Dynamic (floating) nodes Double-ended precharged structures Dynamic Slave latch Future directions Tighten the design rules for low-power, high-performance, deep-submicron structures Develop new latches featuring Small clock load Small PDP 11/22/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Design goals Apply Small clock load Short direct path Reduced node swing Low-power feedback Pulsed design Optimization of both Master and Slave latch Avoid Positive setup time Sensitivity to clock slope and skew Dynamic (floating) nodes Dynamic Master latch Conduct Power *Delay optimizations on constant frequency - really optimize Energy*Delay product Take into account all sources of power dissipation ALWAYS use Clk-Q + setup time for max delay 11/22/2018 Prof. V.G. Oklobdzija, University of California