University of California Davis Clocked Storage Elements for High-Performance and Low-Power Systems ICCD 2001 Tutorial Vojin G. Oklobdzija University of California Davis http://www.ece.ucdavis.edu/acsel Integration Corp. Berkeley, CA 94708 http://www.integration-corp.com
Prof. V.G. Oklobdzija, University of California Outline Importance of Clocked Storage Elements (CSE) Basic Definitions Difference between Latch and Flip-Flop Timing and Power metrics Representative designs used in High-Performance Microprocessors Comparison Conclusion, New Directions and Some novel designs 9/16/2018 Prof. V.G. Oklobdzija, University of California
Importance of Clocked Storage Elements (CSE) 9/16/2018 Prof. V.G. Oklobdzija, University of California
Trends in high-performance systems: Higher clock frequency 9/16/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Power vs. Year High-end growing at 25% / year RISC @ 12% / yr X86 @ 15% / yr Consumer (low-end) At 13% / year 9/16/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Predictions Source: Shekhar Borkar, Intel 9/16/2018 Prof. V.G. Oklobdzija, University of California
Recent Interest in Clocked Storage Elements Trends in high-performance systems Higher clock frequency: 1.8GHz Pentium 4 4GHz logic presented) More transistors on chip (214 million, ISSCC 2001) Consequences Increased Flip-Flop overhead relative to cycle time Pipeline depth of 20 or more Cycle time 10 - 20 FO4 delays, F-F overhead 3 - 4 FO4 9/16/2018 Prof. V.G. Oklobdzija, University of California
Courtesy: Doug Carmean, Hot-Chips-13 presentation 9/16/2018 Prof. V.G. Oklobdzija, University of California
Processor Frequency Trend Source: Intel S. Borkar Frequency doubles each generation Number of gates/clock reduce by 25% 9/16/2018 Prof. V.G. Oklobdzija, University of California
Pentium 3 uArchitecture stage stage stage logic register logic register logic register Delay: 0.6 ? 0.3 ? 0.6 ? 0.3 ? 0.6 ? 0.3 ? The total delay from pipeline stage to pipeline stage is 0.9 ns. The maximum clock rate for this design is 1.1 GHz. 9/16/2018 Prof. V.G. Oklobdzija, University of California
The Pentium 4 Depends on Pipelines logic register logic register logic register logic register logic register logic register Delay: 0.4? 0.16? 0.4? 0.16? 0.4? 0.16? 0.4? 0.16? 0.4? 0.16? 0.4? 0.16? The total delay from pipeline stage to pipeline stage is 560 pS. This design, with twice the stages, has a maximum clock rate of 1.8 GHz. As the design is broken into more pipeline stages, the logic in each stage has less delay, and the registers between stages consume a higher percentage of the delay, causing diminishing returns. At some point the cost of adding more stages, such as branch prediction, causes a very marginal return. The only way out of this bottleneck is a faster register. This is one reason why the P4 is not significantly faster than a slower-clocked P3 for many applications. 9/16/2018 Prof. V.G. Oklobdzija, University of California
Courtesy: Doug Carmean, Hot-Chips-13 presentation 9/16/2018 Prof. V.G. Oklobdzija, University of California
Why Interest in Clocked Storage Elements ? Higher impact of storage element delay High-speed requires low CSE pipeline overhead: 3 FO4 or less. Logic embedding property Limits on performance FF delays of 10pS - 100pS Higher impact of clock skew Ability to control both edges of the clock Higher power consumption >100W for recent processors Clock system burns up to 40%, storage elements up to 20% of total power Battery-powered applications 9/16/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Basic Definitions 9/16/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Clock Signals Clocks are defined as pulsed, synchronizing signals that provide the time reference for the movement of data in the synchronous digital system. The clocking in a digital system can be either single phase, or multi-phase (usually two-phase). Clocking strategy is dependent and largely influenced by the choice of the CSE: latch or flip-flop The dark rectangles in the figure represent the interval during which the bi-stable element samples its data input. Fig. 4.2 shows the possible types of clocking techniques and corresponding general finite-state machine structures: 9/16/2018 Prof. V.G. Oklobdzija, University of California
Clock Signal Uncertainty Effects on cycle- time: – maximum delay restriction – violation of set- up time May cause race – minimum delay restriction – violation of hold time Uncertainty is: Jitter, Skew, and Duty Cycle 9/16/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Jitter • Uncertainty in consecutive edges of a periodic signal • Caused by temporal noise events • Quantified as: – cycle-to-cycle or short-term jitter, tJS – long-term jitter, tJL 9/16/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Clock Skew Time difference between temporally-equivalent or concurrent edges of two periodic signals Caused by spatial noise events 9/16/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Clocking Strategies Single-phase clocking and single latch machine Edge-triggered clocking and Flip-Flop based machine 9/16/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Clocking Strategies Two-phase clocking and two-phase latch machine with single latch Two-phase clocking and two-phase latch machine with double latch 9/16/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Delay Restrictions Clock defines hard boundaries for edge-triggered design Clock boundaries are soft for level sensitive clocking and they are: Tolerant for clock edge uncertainty Tolerant to uncertainty of data arrival Timing slack can voluntarily be passed forward Time can forcefully be borrowed *Taken from Hamid Partovi’s ISSCC-2000 GHz Processor Design Workshop presentation 9/16/2018 Prof. V.G. Oklobdzija, University of California
Single-Phase Clocking, Single Latch: Timing Constraints 9/16/2018 Prof. V.G. Oklobdzija, University of California
Two-Phase Clocking with Two-Phase Double Latch 9/16/2018 Prof. V.G. Oklobdzija, University of California
Two-Phase Clocking with One-Phase Double Latch Some people refer to this clocking arrangement as: “negative edge Flip-Flop” – erroneously ! 9/16/2018 Prof. V.G. Oklobdzija, University of California
Difference between Latch and Flip-Flop 9/16/2018 Prof. V.G. Oklobdzija, University of California
Difference between Latch and Flip-Flop After the transition of the clock data can not change Latch is “transparent” 9/16/2018 Prof. V.G. Oklobdzija, University of California
Flip-Flop and M-S Latch Arrangement How can one recognize the difference without knowing what is inside the “black-box” ? 9/16/2018 Prof. V.G. Oklobdzija, University of California
F-F and M-S Latch: Difference Experiment: 9/16/2018 Prof. V.G. Oklobdzija, University of California
F-F and M-S Latch: Difference Structural Difference: No Clock Flip-Flop M-S Latch 9/16/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Flip-Flop vs. Latch Edge sensitive Easier to use as frequency increases Robustness to duty cycle Simpler logic timing requirements Fits into CAD tools Level sensitive May consume less power for the operation Better clock skew/jitter characteristics More difficult clock requirements Choice between use of FF or latch is subject to each individual design and its specifications Flip-flops are edge sensitive - simpler timing requirements and lower sensitivity to duty cycle imperfections Latches are level sensitive, simpler - less power consumption and better clock skew/jitter characteristics 9/16/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Flip-Flop: Example HLFF (Partovi) 9/16/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Flip-Flop: Example HLFF (Partovi) 9/16/2018 Prof. V.G. Oklobdzija, University of California
Pulse-Based Flip-Flops* *Taken from Hamid Partovi’s ISSCC-2000 GHz Processor Design Workshop presentation 9/16/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Flip-Flop: Example D=0 pulse D=1 SAFF DEC Alpha 21264 9/16/2018 Prof. V.G. Oklobdzija, University of California
Requirements in the Flip-Flop Design Small Clk-Output delay, Narrow sampling window Low power Small clock load High driving capability (increased levels of parallelism) Typical load ranges from 3-4 FO4 to 15-25 FO4. High driving should be achieved by inserting inverters and following “logical effort” rules starting with minimal size CSE. Symmetry: balanced D-Q and D-Q/not delay. Integration of logic into the flop Multiplexed or clock scan Cross-talk insensitivity - dynamic/high impedance nodes are affected 9/16/2018 Prof. V.G. Oklobdzija, University of California
Timing and Power metrics 9/16/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Delay Sum of setup time U and Clk-Q delay is the only true measure of the performance with respect to the system speed T = TClk-Q + TLogic + Tsetup+ Tskew TClk-Q TSetup TLogic 9/16/2018 Prof. V.G. Oklobdzija, University of California
Delay vs. Setup/Hold Times 9/16/2018 Prof. V.G. Oklobdzija, University of California
Timing Characteristics Figure presenting typical clock-to-output and data-to-output characteristics is shown.. In stable region, clock-to-output characteristic is constant. As setup requirement of the device starts to be violated, clock-to-output curve rises, ending in failure at some point. Data-to-output characteristic, being simple sum of clock-to-output and data-to-clock time, falls with the slope of 45° in stable region. In metastable region, the slope starts to decrease as a function of increased clock-to-output characteristic. Minimum of data-to-output curve occurs at 45 ° slope of clock-to-output curve. Data-to-clock time that corresponds to this point is termed optimal setup time. 9/16/2018 Prof. V.G. Oklobdzija, University of California
Timing parameters, details The best point to pick on delay curve is minimum D-Q 9/16/2018 Prof. V.G. Oklobdzija, University of California
Simulation Condition and Testbench Power Data activity dependence as a FF characteristics Consumption with 50% (30%)activity adopted as a figure of merit Dissipation of driving inverters is part of total power consumption In order to perform evaluation and comparison of flip-flops, simulation conditions and testbench for simulations are defined. They are set according to flip-flop characterization presented earlier. Measurement of power consumption is done with several different input activities; power consumption with input activity of 50% is adopted as a figure of merit. Total dissipation includes dissipation of driving inverters 9/16/2018 Prof. V.G. Oklobdzija, University of California
Simulation Condition and Testbench Timing Total FF overhead is setup + clock-to-output time Circuit optimization towards td-q Clock skew robustness obtained from observing DQ curve Power-Delay Product Overall performance parameter at fixed frequency Circuit delay parameter used for evaluation is data-to-output time. Circuits are optimized towards this parameter. Ultimate performance parameter is power-delay product, measured at fixed clock frequency. It is calculated as a product of data-to-output time and total power consumption measured at optimal-setup time 9/16/2018 Prof. V.G. Oklobdzija, University of California
Flip-Flop Performance Comparison Test bench Total power consumed internal power data power clock power Measured for four cases no activity (0000… and 1111…) maximum activity (0101010..) average activity (random sequence) Delay is (minimum D-Q) Clk-Q + setup time 9/16/2018 Prof. V.G. Oklobdzija, University of California
The sources of internal power consumption 9/16/2018 Prof. V.G. Oklobdzija, University of California
Design & optimization tradeoffs Opposite Goals Minimal Total power consumption Minimal Delay Power-Delay tradeoff Minimize Power-Delay product (PDPtot) 9/16/2018 Prof. V.G. Oklobdzija, University of California
Clocked Storage Elements in High-Performance Microprocessors 9/16/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Master-Slave Latches Positive setup times Two clock phases: distributed globally generated locally Small penalty in delay for incorporating MUX Some circuit tricks needed to reduce the overall delay 9/16/2018 Prof. V.G. Oklobdzija, University of California
PowerPC 603 M-S Latch Combination Used in PowerPC family Low-power High speed Big clock load Easily embedded scan function Our simulations show PowerPC 603 (Gerosa, JSSC 12/94) Small internal power consumption Low-power feedback Double the clock load compared with other latches Locally generated second phase (reduces overall clock load) 9/16/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California mC2MOS M-S Latch Small clock load (local clock buffering) Low-power feedback Big positive setup time Robustness to clock slope, unlike classic C2MOS structure Our simulations show Y. Suzuki, “Clocked CMOS Calculator Circuitry”, IEEE J. Solid-State Circuits, Dec. 1973 9/16/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Advanced Flip-Flops 9/16/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California 21264 Flip-Flop Used in Digital's WD21264 high-performance processor Runs at 600MHz 450pS Clk-Q delay, simulated in 0.35u technology Our simulations show Small clock load High internal power consumption S-R latch ruins the speed by 40% Dynamic nodes, potential hazard in low-power applications 9/16/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Strong Arm 110 Flip-Flop Used in SA110 0.5W low-power processor Runs at 200MHz One transistor more than 21264 flip-flop 450ps Clk-Q delay, simulated in 0.35u CMOS technology Our simulations show Additional transistor provides fully static operation (robustness to leakage currents) essential for low-power applications, but slightly increased internal power consumption 9/16/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Flip-Flops First stage is a pulse generator generates a pulse (glitch) on a rising edge of the clock Second stage is a latch captures the pulse generated in the first stage Pulse generation results in a negative setup time Frequently exhibit a soft edge property Must check for hold time violations Note: power is always consumed in the clocked pulse generator 9/16/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Partovi’s HLFF Hybrid Latch-Flip-Flop combination 280pS Clk-Q delay Negative set-up time of -100pS Robustness to clock skew and fast clocking Our simulations show AMD K-6, Partovi, ISSCC’96 Hybrid design Gains speed (negative setup time) robustness to clock skew Drawbacks sensitivity to clock slope relatively high internal power (due to precharge) 9/16/2018 Prof. V.G. Oklobdzija, University of California
Hybrid Latch Flip-Flop Skew absorption Partovi et al, ISSCC’96 9/16/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California HLFF Flip-Flop Flip-flop features: single phase clock edge triggered, on one clock edge Features: Soft clock edge property brief transparency, equal to 3 inverter delays negative setup time allows slack passing absorbs skew Hold time is comparable to HLFF delay minimum delay between flip-flops must be controlled Pseudo static Possible to incorporate logic 9/16/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California K-6 Dual-Rail ETL Self-reset property Hybrid combination 260ps Clk-Q delay simulated in .35u CMOS technology negative setup time: -20ps small clock load Our simulations show Double-ended, precharge structure is the most power hungry (switching on all input combinations) Self-reset property increases power consumption drives succeeding fast domino stages Precharge increases speed 9/16/2018 Prof. V.G. Oklobdzija, University of California
Semi-Dynamic Flip-Flop Hybrid combination used in UltraSPARC-III Very fast circuit ( 188ps Clk-Q delay .25u technology, 1.6V, 105oC ) Our simulations show F. Klass, VLSI Circuits’98 Negative setup time Feature of small penalty for embedded logic Relatively high internal power consumption and clock load 9/16/2018 Prof. V.G. Oklobdzija, University of California
Modified Sense Amplifier-Based Flip-Flop Nikolic, Oklobdzija, Stojanovic, ISSCC, 1999 Delay of each of the outputs is independent of the load on the other output Delay of Q and Q is symmetrical as opposed to the NAND based design Convenient for dual rail logic and driving strength for standard CMOS is effectively doubled SAFF presents a small clock load, small setup time and all the advantages of original design Possible tradeoff between speed and robustness to cross-talk 9/16/2018 Prof. V.G. Oklobdzija, University of California
Modified Sense Amplifier-Based Flip-Flop The first stage is unchanged sense amplifier Second stage is sized to provide maximum switching speed Driver transistors are large Keeper transistors are small and disengaged during transitions Nikolic, Oklobdzija, Stojanovic ISSCC ‘99 9/16/2018 Prof. V.G. Oklobdzija, University of California
New Sense Amplifier-Based Flip-Flop New pulse-generating stage Inverters relocated to de-couple gates of MN3, MN4 MN5, MN6 provide leakage current paths Second stage is unchanged Nikolic, Oklobdzija, ESSCIRC’99 9/16/2018 Prof. V.G. Oklobdzija, University of California
New Sense Amplifier-Based Flip-Flop Falling edge flip-flop Output stage has identical topology Nikolic, Oklobdzija, ESSCIRC’99 9/16/2018 Prof. V.G. Oklobdzija, University of California
Comparison with Other Flip-Flops Delay vs. power comparison of different flip-flops Flip-flops are optimized for speed with output transistor sizes limited to 7.5m/4.3 m, driving 200fF Total transistor gate width is indicated Nikolic, Oklobdzija, ESSCIRC’99 70 60 TG M-S 52mm 50 Original SAFF 60mm HLFF 54mm 40 Total power [uW] THIS 30 WORK 69mm C 2 MOS 80mm 20 SDFF 49mm 10 100 150 200 250 300 350 400 450 500 Delay [ps] 9/16/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Overall results 9/16/2018 Prof. V.G. Oklobdzija, University of California
Comparison in terms of speed and PDPtot Delay below 200ps SDFF 187ps HLFF 199ps K-6 ETL 200ps 200-300ps PowerPC latch 266ps 21264 Alpha FF 272ps Strong Arm FF 275ps mC2MOS latch 292ps above 500ps SSTC latch 592ps DSTC latch 629ps SSTC* latch 898ps DSTC* latch 1060ps PDPtot below 30fJ PowerPC latch 28fJ 30 - 50fJ HLFF 29fJ SDFF 39fJ mC2MOS latch 40fJ 21264 Alpha FF 43fJ Strong Arm FF 45fJ 50 - 70fJ K-6 ETL 70fJ above 70fJ SSTC latch 95fJ DSTC latch 125fJ 9/16/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Delay comparison F-F design brings the fastest structures 9/16/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Delay comparison F-F design brings the fastest structures 9/16/2018 Prof. V.G. Oklobdzija, University of California
Overall ranking, zoomed Real signals have the activity between 0 and 0.25 () Precharged hybrid structures are the fastest but their power consumption strongly depends on the probability of “ones” More “ones” above the point 9/16/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Overall performance Real signals have the activity between 0 and 0.5 () Precharged hybrid structures are the fastest but their power consumption strongly depends on the probability of “ones” More “ones” above the point 9/16/2018 Prof. V.G. Oklobdzija, University of California
Conventional Clk-Q vs. minimum D-Q Hidden positive setup time Degradation of Clk-Q 9/16/2018 Prof. V.G. Oklobdzija, University of California
Internal Power distribution Four sequences characterize the boundaries for internal power consumption …010101… maximum random, equal transition probability, average …111111… precharge activity …000000… leakage + internal clock processing 9/16/2018 Prof. V.G. Oklobdzija, University of California
Comparison of Clock power consumption 9/16/2018 Prof. V.G. Oklobdzija, University of California
Conclusion and New Directions 9/16/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California New Directions Reducing CSE power: Using conditional pre-charge techniques Using conditional data capture techniques Reducing clock distribution network power: Capture data on each edge – Double Edge Triggered structure Improving CSE reliability: Fully derived CSE (ESSCIRC’99, ICCD 2000) 9/16/2018 Prof. V.G. Oklobdzija, University of California
Conditional Precharge Flip-Flop Circuit Proposed flip-flop is shown. First stage employs the feedback from the output to disable the precharge and keep the internal node at the low level if Q is high <Mn4, Mp2>. Second stage implement conditional keeping function <Mn8, Mp3, Mp4> Nedovic, Oklobdzija, SBCCI 2000 9/16/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California Conditional Capture Flip-Flop (Im-CCFF: Nedovic, Oklobdzija, ICECS 2001) Use conditional capture idea When Q=1, 1=>0 transition of X is prohibited To equalize 1=>0 and 0=>1 set-up times, the signal from the middle of the stack (Y) controls HL transition on Q Y is output of the first stage of domino-like inverter, obtained almost for free Easy logic embedding First stage has dynamic behavior only in transparency window Improved Conditional Capture Flip-Flop: First stage computes nodes X and Y. If CLK=1, D=1, and CLKbb=Q=0 (I.e. if D=1, Q=0 in transparency window), X evaluates to 0. Lower part of the stack is used for Y: Y=not(D) if clock is at high level (CLK=1). X is ‘conditional-capture signal’ with the activity equal to activity of D. Y has larger activity. Second stage uses both X and Y: If X=0 (i.e. D=1, Q=0 in the transparency window), Q is brought to high level. If Y=1 when CLKbb=1 (i.e. D=0 in transparency window), Q is brought to 0. CLKbb in second stage is used instead of CLK to leave time to Y to evaluate to 0 and remove hazard in second stage 9/16/2018 Prof. V.G. Oklobdzija, University of California
Power Consumption Comparison: Im-CCFF: Nedovic, Oklobdzija, ICECS-2001 SBCCI 2000 NOTE: Conditional flip-flops behave like MS latches with respect to input data activity 9/16/2018 Prof. V.G. Oklobdzija, University of California
Dual-Edge Triggered Flip-Flops Structurally, two different designs are distinguished a) Latch-Mux (LM) b) Pulsed Latch (PL, flip-flop) Classification very similar to single edge triggered SE 9/16/2018 Prof. V.G. Oklobdzija, University of California
Prof. V.G. Oklobdzija, University of California DETSE Overall Results 1 4 3 2 1 4 3 2 9/16/2018 Prof. V.G. Oklobdzija, University of California
Summary: Double-Edge Flip-Flops PDP [fJ] PD2P [10-24 Js] Fujitsu 0.18m, wmin = 0.22m, wmax = 10m, le = 0.18m, fclk=250/500MHz, activity =0.5, VDD = 1.8V, Temp = 25º, load=14 min. inv Even ‘local’ performance of DETFFs (not considering power savings of clock distribution) is comparable to that of SETFFs Analogy between double edge flip-flops behavior and their single-edge counterparts 9/16/2018 Prof. V.G. Oklobdzija, University of California
SDFF improvement: Nedovic, Oklobdzija ICCD 2000 Eliminated glitch Avoided keeper overpowering Faster operation Improved power PDP improvement over SDFF about 27% (first version only 8% improvement Preserved Logic Embedding Property Achieved strong driving capability at the output More robust to scaling down supply voltage 0.25u bulk CMOS, VDD=2.5V, T=27 C, fclk=500MHz, load=14 min. inv’s 9/16/2018 Prof. V.G. Oklobdzija, University of California
New Sense Amplifier-Based Flip-Flop New pulse-generating stage Inverters relocated to de-couple gates of MN3, MN4 MN5, MN6 provide leakage current paths Second stage is unchanged Nikolic, Oklobdzija, ESSCIRC’99 9/16/2018 Prof. V.G. Oklobdzija, University of California
Comparison with Other Flip-Flops Delay vs. power comparison of different flip-flops Flip-flops are optimized for speed with output transistor sizes limited to 7.5m/4.3 m, driving 200fF Total transistor gate width is indicated Nikolic, Oklobdzija, ESSCIRC’99 70 60 TG M-S 52mm 50 Original SAFF 60mm HLFF 54mm 40 Total power [uW] THIS 30 WORK 69mm C 2 MOS 80mm 20 SDFF 49mm 10 100 150 200 250 300 350 400 450 500 Delay [ps] 9/16/2018 Prof. V.G. Oklobdzija, University of California
What to Expect in the Future ? Important: Incorporating logic into the CSE Absorbing clock skew Quiet state (battery powered applications) Pipeline boundaries will start to blur CSE will be mixed with logic Waver pipelining, domino style, signals used to clock Synchronous design only in a limited domain Asynchronous communication between synchronous domains 9/16/2018 Prof. V.G. Oklobdzija, University of California
Modified Test Bench and PD2P Optimization
Prof. V.G. Oklobdzija, University of California PDP, EDP Comparison SDFF is best; PowerPC and SAFF are competitive 9/16/2018 Prof. V.G. Oklobdzija, University of California
50%-Data-Activities -- 1GHz Clock -- PD2P Optimization 1.8VDD, 0.18um CMOS Technology 50%-Data-Activities -- 1GHz Clock -- PD2P Optimization 9/16/2018 Prof. V.G. Oklobdzija, University of California