University of California Davis

Slides:



Advertisements
Similar presentations
Digital System Clocking: High-Performance and Low-Power Aspects Vojin G. Oklobdzija, Vladimir M. Stojanovic, Dejan M. Markovic, Nikola M. Nedovic Wiley-Interscience.
Advertisements

Digital System Clocking:
Topics Electrical properties of static combinational gates:
Transmission Gate Based Circuits
Sequential Circuit Design
Issues in System on the Chip Clocking November 6th, 2003 SoC Design Conference, Seoul, KOREA Vojin G. Oklobdzija Advanced Computer System Engineering Laboratory.
Introduction to CMOS VLSI Design Sequential Circuits.
Introduction to CMOS VLSI Design Sequential Circuits
Lecture 11: Sequential Circuit Design. CMOS VLSI DesignCMOS VLSI Design 4th Ed. 11: Sequential Circuits2 Outline  Sequencing  Sequencing Element Design.
L06 – Clocks Spring /18/05 Clocking.
(Neil west - p: ). Finite-state machine (FSM) which is composed of a set of logic input feeding a block of combinational logic resulting in a set.
Introduction to CMOS VLSI Design Lecture 19: Design for Skew David Harris Harvey Mudd College Spring 2004.
Sequential Circuits IEP on Synthesis of Digital Design Sequential Circuits S. Sundar Kumar Iyer.
Introduction to CMOS VLSI Design Clock Skew-tolerant circuits.
CSE477 L19 Timing Issues; Datapaths.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 19: Timing Issues; Introduction to Datapath.
Clock Design Adopted from David Harris of Harvey Mudd College.
Chapter 11 Timing Issues in Digital Systems Boonchuay Supmonchai Integrated Design Application Research (IDAR) Laboratory August 20, 2004; Revised - July.
Introduction to CMOS VLSI Design Lecture 10: Sequential Circuits Credits: David Harris Harvey Mudd College (Material taken/adapted from Harris’ lecture.
© Digital Integrated Circuits 2nd Sequential Circuits Digital Integrated Circuits A Design Perspective Designing Sequential Logic Circuits Jan M. Rabaey.
Lecture 21, Slide 1EECS40, Fall 2004Prof. White Lecture #21 OUTLINE –Sequential logic circuits –Fan-out –Propagation delay –CMOS power consumption Reading:
Power, Energy and Delay Static CMOS is an attractive design style because of its good noise margins, ideal voltage transfer characteristics, full logic.
Digital System Clocking: High-Performance and Low-Power Aspects Vojin G. Oklobdzija, Vladimir M. Stojanovic, Dejan M. Markovic, Nikola M. Nedovic Wiley-Interscience.
Digital Integrated Circuits for Communication
© Digital Integrated Circuits 2nd Sequential Circuits Digital Integrated Circuits A Design Perspective Designing Sequential Logic Circuits Jan M. Rabaey.
Chapter 07 Electronic Analysis of CMOS Logic Gates
DCSL & LVDCSL: A High Fan-in, High Performance Differential Current Switch Logic Families Dinesh Somasekhaar, Kaushik Roy Presented by Hazem Awad.
Digital System Clocking: High-Performance and Low-Power Aspects Vojin G. Oklobdzija, Vladimir M. Stojanovic, Dejan M. Markovic, Nikola M. Nedovic Wiley-Interscience.
Sp09 CMPEN 411 L18 S.1 CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 16: Static Sequential Circuits [Adapted from Rabaey’s Digital Integrated Circuits,
Digital System Clocking: High-Performance and Low-Power Aspects Vojin G. Oklobdzija, Vladimir M. Stojanovic, Dejan M. Markovic, Nikola M. Nedovic Wiley-Interscience.
Digital System Clocking: High-Performance and Low-Power Aspects Vojin G. Oklobdzija, Vladimir M. Stojanovic, Dejan M. Markovic, Nikola M. Nedovic Wiley-Interscience.
EE141 Timing Issues 1 Chapter 10 Timing Issues Rev /11/2003 Rev /28/2003 Rev /05/2003.
EE141 Timing Issues 1 Chapter 10 Timing Issues Rev /11/2003.
High Speed Properties of Digital Gates, Copyright F. Canavero, R. Fantino Licensed to HDT - High Design Technology
Power-Optimal Pipelining in Deep Submicron Technology
Digital Design - Sequential Logic Design
Digital Integrated Circuits A Design Perspective
Lecture 11: Sequential Circuit Design
Digital Integrated Circuits A Design Perspective
Subject Name: Fundamentals Of CMOS VLSI Subject Code: 10EC56
Chapter 7 Designing Sequential Logic Circuits Rev 1.0: 05/11/03
Low Power Very Fast Dynamic Logic Circuits
Sequential circuit design with metastability
Appendix B The Basics of Logic Design
SEQUENTIAL LOGIC -II.
University of California Davis
Reading: Hambley Ch. 7; Rabaey et al. Sec. 5.2
Introduction to CMOS VLSI Design Lecture 10: Sequential Circuits
University of California Davis
CSE 370 – Winter Sequential Logic-2 - 1
Timing Analysis 11/21/2018.
University of California Davis
Lecture 10: Circuit Families
Subject Name: Fundamentals Of CMOS VLSI Subject Code: 10EC56
Future Directions in Clocking Multi-GHz Systems ISLPED 2002 Tutorial This presentation is available at: under Presentations.
Day 26: November 1, 2013 Synchronous Circuits
Clocking in High-Performance and Low-Power Systems Presentation given at: EPFL Lausanne, Switzerland June 23th, 2003 Vojin G. Oklobdzija Advanced.
Topics Performance analysis..
Dual Mode Logic An approach for high speed and energy efficient design
Chapter 10 Timing Issues Rev /11/2003 Rev /28/2003
Circuit Design Techniques for Low Power DSPs
332:578 Deep Submicron VLSI Design Lecture 14 Design for Clock Skew
Day 21: October 29, 2010 Registers Dynamic Logic
Combinational Circuit Design
Characterization of C2MOS Flip-Flop in Sub-Threshold Region
Lecture 10: Circuit Families
Lecture 19 Logistics Last lecture Today
Reading: Hambley Ch. 7; Rabaey et al. Secs. 5.2, 5.5, 6.2.1
Low Power Digital Design
Lecture 3: Timing & Sequential Circuits
Presentation transcript:

University of California Davis Digital System Clocking: Storage Elements in High-Performance and Low-Power Systems ISSCC 2002 uP Workshop Final version of this presentation is available at: http://www.ece.ucdavis.edu/acsel under Presentations also look for a book under the same title in Summer 2002 Vojin G. Oklobdzija University of California Davis Integration Corp. Berkeley, CA 94708 http://www.integration-corp.com

Prof. V.G. Oklobdzija, University of California Outline Why working on Clocked Storage Elements ? M-S Latch is not a Flip-Flop ! How do we compare them ? What are the relevant parameters ? What is an appropriate setup ? What do we use in high-performance microprocessors ? How do they compare ? What should we do for low-power ? What next ? Ideas, Suggestions, Insights 9/16/2018 Prof. V.G. Oklobdzija, University of California

Prof. V.G. Oklobdzija, University of California Importance 9/16/2018 Prof. V.G. Oklobdzija, University of California

Trends in high-performance systems: Higher clock frequency ISSCC-2002 Trends in high-performance systems: Higher clock frequency 9/16/2018 Prof. V.G. Oklobdzija, University of California

Courtesy: Doug Carmean, Hot-Chips-13 presentation 9/16/2018 Prof. V.G. Oklobdzija, University of California

Processor Frequency Trend Courtesy of: Intel, S. Borkar Frequency doubles each generation Number of gates/clock reduce by 25% 9/16/2018 Prof. V.G. Oklobdzija, University of California

Why working on Clocked Storage Elements ? Example: In a 2.0 GHZ processor T=500pS Typically clocked storage element D-Q delay is in the order of 100-150pS If one can design a faster CSE: e.g. 80-100pS D-Q, this represents 10-15% performance improvement If in addition one can absorb 20pS of clock uncertainties and embedd one level of logic – this can yield up to 20% performance improvement Try to achieve 10-20% performance improvement by introducing new features in the architecture ! This is sufficient to turn an architect into a circuit designer ! 9/16/2018 Prof. V.G. Oklobdzija, University of California

Prof. V.G. Oklobdzija, University of California Basic Definitions 9/16/2018 Prof. V.G. Oklobdzija, University of California

Clock Generation and Distribution Non-idealities Jitter Jitter is a temporal variation of the clock signal manifested as uncertainty of consecutive edges of a periodic clock signal. It is caused by temporal noise events Manifested as: - cycle-to-cycle or short-term jitter, tJS - long-term jitter, tJL Characteristic of clock generation system Skew Is a time difference between temporally-equivalent or concurrent edges of two periodic signals Manifests as SE-to-SE fluctuation of clock arrival at the same time instance Characteristic of clock distribution system Caused by spatial variations in signal propagation 9/16/2018 Prof. V.G. Oklobdzija, University of California

Prof. V.G. Oklobdzija, University of California Clock Skew and Jitter 9/16/2018 Prof. V.G. Oklobdzija, University of California

Difference between Latch and Flip-Flop 9/16/2018 Prof. V.G. Oklobdzija, University of California

Difference between Latch and Flip-Flop After the transition of the clock data can not change Latch is “transparent” 9/16/2018 Prof. V.G. Oklobdzija, University of California

Two-Phase Clocking with Two-Phase Double Latch 9/16/2018 Prof. V.G. Oklobdzija, University of California

Two-Phase Clocking with One-Phase Double Latch Some people refer to this latch arrangement as: “negative edge Flip-Flop” ! 9/16/2018 Prof. V.G. Oklobdzija, University of California

Flip-Flop and M-S Latch Arrangement How can one recognize the difference without knowing what is inside the “black-box” ? 9/16/2018 Prof. V.G. Oklobdzija, University of California

F-F and M-S Latch: Difference Experiment: Failed ! 9/16/2018 Prof. V.G. Oklobdzija, University of California

F-F and M-S Latch: Difference Structural Difference: No Clock Flip-Flop M-S Latch Pulse Capturing Latch S R 9/16/2018 Prof. V.G. Oklobdzija, University of California

PG Theory of Operation: Sn+1 9/16/2018 Prof. V.G. Oklobdzija, University of California

Flip-Flop: Example-2 S R D=0 S R pulse D=1 SAFF DEC Alpha 21264 (Madden & Bowhill, 1990, Matsui 1994) 9/16/2018 Prof. V.G. Oklobdzija, University of California

F-F Derivation using Delayed Clock Equivalent to: 9/16/2018 Prof. V.G. Oklobdzija, University of California

Systematically Derived ET FF N. Nedovic, V. G. Oklobdzija, “Dynamic Flip-Flop with Improved Power”, ICCD 2000, Sept. 2000 9/16/2018 Prof. V.G. Oklobdzija, University of California

Flip-Flop: Example (HLFF, H. Partovi) 9/16/2018 Prof. V.G. Oklobdzija, University of California

Flip-Flop: Example (HLFF, H. Partovi) 9/16/2018 Prof. V.G. Oklobdzija, University of California

Timing and Power metrics 9/16/2018 Prof. V.G. Oklobdzija, University of California

Prof. V.G. Oklobdzija, University of California Delay Sum of setup time U and Clk-Q delay is the only true measure of the performance with respect to the system speed T = TClk-Q + TLogic + Tsetup+ Tskew T TD-Q=TClk-Q + TSetup TClk-Q TLogic TSetup 9/16/2018 Prof. V.G. Oklobdzija, University of California

Delay vs. Setup/Hold Times Sampling Window 9/16/2018 Prof. V.G. Oklobdzija, University of California

Timing Characteristics Figure presenting typical clock-to-output and data-to-output characteristics is shown.. In stable region, clock-to-output characteristic is constant. As setup requirement of the device starts to be violated, clock-to-output curve rises, ending in failure at some point. Data-to-output characteristic, being simple sum of clock-to-output and data-to-clock time, falls with the slope of 45° in stable region. In metastable region, the slope starts to decrease as a function of increased clock-to-output characteristic. Minimum of data-to-output curve occurs at 45 ° slope of clock-to-output curve. Data-to-clock time that corresponds to this point is termed optimal setup time. 9/16/2018 Prof. V.G. Oklobdzija, University of California

Absorbing Clock Uncertainties 9/16/2018 Prof. V.G. Oklobdzija, University of California

Hybrid Latch Flip-Flop Skew absorption Partovi et al, ISSCC’96 9/16/2018 Prof. V.G. Oklobdzija, University of California

Prof. V.G. Oklobdzija, University of California Power Consumption All power related to the SE can be divided into: Input power Data power (PD) Clock power (PCLK) Internal power (PINT) Load power (PLOAD) PLOAD can be merged into PINT Internal power is a function of data activity ratio () – number of captured data transitions with respect to number of clock transitions (max=100%) no activity (0000… and 1111…) maximum activity (0101010..) average activity (random sequence) Glitching activity Delay is (minimum D-Q) Clk-Q + setup time 9/16/2018 Prof. V.G. Oklobdzija, University of California

State Element Performance Metrics It is always possible trade power for speed Common metrics: Power-Delay Product (PDP) Misleading measure Good only if measured at constant frequency = EDP EDP - Energy-Delay Product (EDP) More accurate measure ED2P – Energy-Delay2-Product A new measure, being justified by new results (Hofstee, Nowka, IBM) 9/16/2018 Prof. V.G. Oklobdzija, University of California

Prof. V.G. Oklobdzija, University of California PDP, EDP Comparison High Voltage Low Voltage Slow Corner 9/16/2018 Prof. V.G. Oklobdzija, University of California

Design & optimization tradeoffs Opposite Goals Minimal Total power consumption Minimal Delay Power-Delay tradeoff Minimize Power-Delay product (PDPtot) @ f=const. Opt. Opt. Opt. 9/16/2018 Prof. V.G. Oklobdzija, University of California

Clocked Storage Elements: Examples 9/16/2018 Prof. V.G. Oklobdzija, University of California

Simulation Conditions: Power Supply Voltage: VDD=1.8V nominal Temperature T=27°C nominal Technology: 0.18m Fujitsu Fan-Out of 4 Delay = 75pS Transistor Widths Minimal 0.36m Maximal 10m Load: 14 minimal inverters in the technology used Clock frequency: 500MHz (250MHz for Dual-Egde) Data/Clock slopes of ideal signal 100ps 9/16/2018 Prof. V.G. Oklobdzija, University of California

Transmission Gate MS Latch Two staticized transmission gate transparent latches Direct path D-Q consists of two transmission gates and two regenerative inverters Two-phase clock Advantage: symmetric high-to-low and low-to-high transitions are achievable Disadvantage: large cost associated with two-phase clock distribution PowerPC 603 (Gerosa, JSSC 12/94) tD [ps] {fo4} PI [W] PCLK [W] PD [W] PTOT [W] EDP [fJ/500MHz] 300 {4} 80.0 32.1 11.1 123.2 36.9 Comments: Very low internal power. Large Total Power due to clock and data load 9/16/2018 Prof. V.G. Oklobdzija, University of California

Prof. V.G. Oklobdzija, University of California C2MOS MS Latch Forward path consists of two clocked inverters - parts of C2MOS latches Degradation of speed due to pMOS stacks Degradation in speed due to non-ideal 2-phase clock Large clock power (if not buffered locally) tD [ps] {fo4} PI [W] PCLK [W] PD [W] PTOT [W] EDP [fJ/500MHz] 354 {4.7} 110.8 27.5 2.8 141.1 49.9 Y. Suzuki, “Clocked CMOS Calculator Circuitry”, IEEE J. Solid-State Circuits, Dec. 1973 9/16/2018 Prof. V.G. Oklobdzija, University of California

Prof. V.G. Oklobdzija, University of California SAFF: Strong Arm 110 Staticized Sense Amplifier Flip-Flop Weak nMOS keeps set/reset signals low Second stage – non-clocked SR latch Additional NMOS transistor causes slightly increased power consumption and delay degradation Bad timing characteristics due to the latching stage. Signal propagates through three stages. Unbalanced rising and falling time of the output signals (speed degraded by 40%) tD [ps] {fo4} PI [W] PCLK [W] PD [W] PTOT [W] PDP [fJ] 323 {4.31} 79.7 4.2 0.5 84.8 27.4 9/16/2018 Prof. V.G. Oklobdzija, University of California

Prof. V.G. Oklobdzija, University of California Modified SAFF The first stage is unchanged sense amplifier Second stage is sized to provide maximum switching speed Driver transistors are large Keeper transistors are small and disengaged during transitions Nikolic, Oklobdzija, Stojanovic ISSCC ‘99 V. Stojanovic, US Patent No. 6,232,810 9/16/2018 Prof. V.G. Oklobdzija, University of California

Systematicaly Derived SAFF: Example-2 Nikolic, Oklobdzija, ESSCIRC’99 New pulse-generating stage Inverters decoupling gates from MN3, MN4 MN5, MN6 provide leakage current paths Second stage is unchanged V. Stojanovic, US Patent No. 6,232,810. 9/16/2018 Prof. V.G. Oklobdzija, University of California

Sense Amplifier-based Flip-Flop (SAbFF) Emerged as a workaround for SAFF drawbacks floating nodes (keeping the Sb, Rb nodes low with additional transistors parallel to data-controlled transistors) symmetric second stage (push-pull realization) Internal signals still experience transition on every clock cycle V. Stojanovic, US Patent No. 6,232,810. tD [ps] {fo4} PI [W] PCLK [W] PD [W] PTOT [W] EDP [fJ/500MHz] 169 {2.25} 100.8 5.8 1.3 107.9 18.2 9/16/2018 Prof. V.G. Oklobdzija, University of California

Comparison with other SAFFs Nikolic, Oklobdzija, ESSCIRC’99 800 CMOS, nominal corner, Leff = 0.18m, VDD = 1.8V, T = 25C, load on both outputs 700 Clk-Output Delay [ps] Falling Egde SAFF w/NOR 600 500 Rising Egde 400 SAFF w/NAND 300 Rising Egde 200 SAFF Falling Egde SAFF Rising Egde SAFF this work 100 this work 50 100 150 200 250 Load [fF] 9/16/2018 Prof. V.G. Oklobdzija, University of California

Conditional Capture Flip-Flop (CCFF) 0.18m Fujitsu; f = 500MHz; VDD = 1.8V; Data activity 50% Principle of Operation Suppress any transition in flip-flop if the input to be captured is equal to previous output value Double-ended realization FF functionality achieved by producing clock pulse Static operation by use of keepers Second stage is pass-transistor latch Comments Contention with keepers causes larger first stage Large power consumption despite conditional signaling tD [ps] PI [W] PCLK [W] PD [W] PTOT [W] EDP [fJ/500MHz] 169 112.5 17.0 2.6 132.1 22.3 B. S. Kong, et all, ISSCC 2000 9/16/2018 Prof. V.G. Oklobdzija, University of California

Prof. V.G. Oklobdzija, University of California Partovi’s HLFF Hybrid Latch-Flip-Flop combination Negative set-up time of -80pS Robustness to clock skew and fast clocking Our simulations show tD [ps] fo4 PI [W] PCLK [W] PD [W] PTOT [W] EDP [fJ/500MHz] 188 {2.51} 161.3 18.0 4.4 183.8 34.5 AMD K-6, Partovi, ISSCC’96 Gains speed (negative setup time) robustness to clock skew Drawbacks sensitivity to clock slope relatively high internal power (due to precharge) 9/16/2018 Prof. V.G. Oklobdzija, University of California

Semi-Dynamic Flip-Flop Hybrid combination used in UltraSPARC-III Very fast circuit ( 173pS Clk-Q delay .18u technology, 1.8V, 27oC ) Problem D=Q=1: Our simulations shows F. Klass, VLSI Circuits’98 Negative setup time Feature of small penalty for embedded logic Relatively high internal power consumption and clock load tD [ps] PI [W] PCLK [W] PD [W] PTOT [W] EDP [fJ/500MHz] 169 188.6 34.1 2.7 224.9 38.1 9/16/2018 Prof. V.G. Oklobdzija, University of California

Transmission Gate Flip-Flop (TGFF) Two transmission gates define transparency window Time window with non precharge-evaluate structure Low input activity => low output activity tD [ps] PI [W] PCLK [W] PD [W] PTOT [W] EDP [fJ/500MHz] 292 {3.89} 110.5 8.7 9.3 128.5 37.5 Comments: Two transmission gates increase delay Noticeable data power 9/16/2018 Prof. V.G. Oklobdzija, University of California

Prof. V.G. Oklobdzija, University of California Comparison 9/16/2018 Prof. V.G. Oklobdzija, University of California

Prof. V.G. Oklobdzija, University of California Overall Results 4 fo4 2 fo4 9/16/2018 Prof. V.G. Oklobdzija, University of California

Prof. V.G. Oklobdzija, University of California Overall Results 9/16/2018 Prof. V.G. Oklobdzija, University of California

Prof. V.G. Oklobdzija, University of California Overall Results 9/16/2018 Prof. V.G. Oklobdzija, University of California

Conventional Clk-Q vs. minimum D-Q Hidden positive setup time Degradation of Clk-Q Older 0.22u comparison results 9/16/2018 Prof. V.G. Oklobdzija, University of California

Internal Power distribution Four sequences characterize the boundaries for internal power consumption …010101… maximum random, equal transition probability, average …111111… precharge activity …000000… leakage + internal clock processing Older 0.22u comparison results 9/16/2018 Prof. V.G. Oklobdzija, University of California

Comparison of Clock power consumption Older 0.22u comparison results 9/16/2018 Prof. V.G. Oklobdzija, University of California

Design for Low-Power

Conditional Pre-charge / Capture Techniques

Conditional Capture Flip-Flop Use conditional capture idea When Q=1, 1=>0 transition of X is prohibited To equalize 1=>0 and 0=>1 set-up times, the signal from the middle of the stack (Y) controls HL transition on Q Y is output of the first stage of domino-like inverter, obtained almost for free Easy logic embedding First stage has dynamic behavior only in transparency window Improved Conditional Capture Flip-Flop: First stage computes nodes X and Y. If CLK=1, D=1, and CLKbb=Q=0 (I.e. if D=1, Q=0 in transparency window), X evaluates to 0. Lower part of the stack is used for Y: Y=not(D) if clock is at high level (CLK=1). X is ‘conditional-capture signal’ with the activity equal to activity of D. Y has larger activity. Second stage uses both X and Y: If X=0 (i.e. D=1, Q=0 in the transparency window), Q is brought to high level. If Y=1 when CLKbb=1 (i.e. D=0 in transparency window), Q is brought to 0. CLKbb in second stage is used instead of CLK to leave time to Y to evaluate to 0 and remove hazard in second stage tD [ps] PI [W] PCLK [W] PD [W] PTOT [W] EDP [fJ/500MHz] 257 {3.43} 110.8 10.2 0.7 121.7 31.3 (Im-CCFF: Nedovic, Oklobdzija, ICECS 2001) 9/16/2018 Prof. V.G. Oklobdzija, University of California

Power Consumption Comparison vs. Data Activity Nedovic, Oklobdzija SBCCI 2000 ICECS 2001 NOTE: Conditional flip-flops behave like MS latches with respect to input data activity 9/16/2018 Prof. V.G. Oklobdzija, University of California

Dual-Edge-Triggered Clocked Storage Elements DET-CSE

Dual-Edge Triggered CSE Dual-Edge Triggered Clocked Storage Element (DET-CSE) samples the input data on both edges of the clock Useful for reducing overall power consumption Uses half of the original clock frequency for the same data throughput Roughly half of clock generation/distribution/SE-clock-related power is saved However, an overhead of more complex design may exists 9/16/2018 Prof. V.G. Oklobdzija, University of California

Dual-Edge Triggered Storage Elements Structurally, two different designs are distinguished a) Latch-Mux (LM) b) Flip-Flop (FF) Classification very similar to single edge triggered SE Non-transparency achieved by MUX DET-Latch DET-Flip-Flop 9/16/2018 Prof. V.G. Oklobdzija, University of California

Transmission Gate Latch-MUX (TGLM) Dual-edge counterpart of PowerPC MS latch Mux – pass-transistor manner Smaller delay compared to Single-Edge TGMS Latch Original design has single-phase input clock Second phase is generated locally Better global power savings Degradation in speed tD [ps] PI [W] PCLK [W] PD [W] PTOT [W] EDP [fJ/250MHz] DE 322 83.3 20.5 7.8 111.6 35.9 SE 300 80.0 32.1 11.1 123.2 36.9 0.18m Fujitsu; f = 250MHz (500MHz for Single-Edge); VDD = 1.8V; Data Activity 50% 9/16/2018 Prof. V.G. Oklobdzija, University of California

Prof. V.G. Oklobdzija, University of California C2MOS Latch-MUX Latches – incomplete C2MOS with shared clocked transistors Mux Exactly one path is ‘ON’ at each moment Simple connection of latch outputs (wired-OR mux) simplifies the design and saves performance 0.18m Fujitsu; f= 250MHz (500MHz for Single-Edge) VDD = 1.8V; Data activity 50% tD [ps] PI [W] PCLK [W] PD [W] PTOT [W] EDP [fJ/250MHz] DE 268 122.9 27.3 8.1 158.3 42.5 SE 354 110.8 27.5 2.8 141.1 49.9 9/16/2018 Prof. V.G. Oklobdzija, University of California

Prof. V.G. Oklobdzija, University of California TG Flip-Flop Simple design with 2 pass-transistor TG’s and buffering inverter Original design semi-dynamic (only high level is kept) Only n-MOS pass-gates Degraded performance Two TG’s impair the driving capability 0.18m Fujitsu; f= 250MHz (500MHz for Single-Edge) VDD = 1.8V; Data activity 50% tD [ps] PI [W] PCLK [W] PD [W] PTOT [W] EDP [fJ/250MHz] DE 374 104.7 7.8 13.2 125.7 47.1 SE 292 110.5 8.7 9.3 128.5 37.5 9/16/2018 Prof. V.G. Oklobdzija, University of California

Comparison with Single Edge SEs 9/16/2018 Prof. V.G. Oklobdzija, University of California

Comparison with Single Edge SEs 9/16/2018 Prof. V.G. Oklobdzija, University of California

Single and Double Edge Triggered SE: Power Consumption (a=50%) 9/16/2018 Prof. V.G. Oklobdzija, University of California

DET-CSE: Power vs. Delay Fo4=2.9 9/16/2018 Prof. V.G. Oklobdzija, University of California

Overall Comparisons

Prof. V.G. Oklobdzija, University of California Overall Results - EDP 9/16/2018 Prof. V.G. Oklobdzija, University of California

Overall Results – Delay Fo4=4 9/16/2018 Prof. V.G. Oklobdzija, University of California

New Structures: Clock Power Consumption 9/16/2018 Prof. V.G. Oklobdzija, University of California

Prof. V.G. Oklobdzija, University of California EDP vs. Data Activity 9/16/2018 Prof. V.G. Oklobdzija, University of California

Power vs. Delay (Single-Ended) 9/16/2018 Prof. V.G. Oklobdzija, University of California

Power vs. Delay (Differential) Our Designs Fo4=2.07 9/16/2018 Prof. V.G. Oklobdzija, University of California

Power vs. Delay (Differential) Fo4=2.07 9/16/2018 Prof. V.G. Oklobdzija, University of California

Dual-Edge vs. Conditional vs. Conventional 9/16/2018 Prof. V.G. Oklobdzija, University of California

Dual-Edge vs. Conditional vs. Conventional Fo4=4 Fo4=2 9/16/2018 Prof. V.G. Oklobdzija, University of California

Dual-Edge vs. Conditional vs. Conventional 9/16/2018 Prof. V.G. Oklobdzija, University of California

Dual-Edge vs. Conditional vs. Conventional 9/16/2018 Prof. V.G. Oklobdzija, University of California

What to do and what to expect ? Important: Incorporating logic into the CSE Absorbing clock skew Quiet state (battery powered applications) Pipeline boundaries will start to blur CSE will be mixed with logic Wave pipelining, domino style, signals used to clock Synchronous design only in a limited domain Asynchronous communication between synchronous domains 9/16/2018 Prof. V.G. Oklobdzija, University of California