Clocking in High-Performance and Low-Power Systems Presentation given at: EPFL Lausanne, Switzerland June 23th, 2003 Vojin G. Oklobdzija Advanced.

Slides:



Advertisements
Similar presentations
Digital System Clocking: High-Performance and Low-Power Aspects Vojin G. Oklobdzija, Vladimir M. Stojanovic, Dejan M. Markovic, Nikola M. Nedovic Wiley-Interscience.
Advertisements

1 Lecture 16 Timing  Terminology  Timing issues  Asynchronous inputs.
Modern VLSI Design 4e: Chapter 5 Copyright  2008 Wayne Wolf Topics n Performance analysis of sequential machines.
Issues in System on the Chip Clocking November 6th, 2003 SoC Design Conference, Seoul, KOREA Vojin G. Oklobdzija Advanced Computer System Engineering Laboratory.
Introduction to CMOS VLSI Design Sequential Circuits.
VLSI Design EE 447/547 Sequential circuits 1 EE 447/547 VLSI Design Lecture 9: Sequential Circuits.
Introduction to CMOS VLSI Design Sequential Circuits
MICROELETTRONICA Sequential circuits Lection 7.
ELEC 256 / Saif Zahir UBC / 2000 Timing Methodology Overview Set of rules for interconnecting components and clocks When followed, guarantee proper operation.
Lecture 11: Sequential Circuit Design. CMOS VLSI DesignCMOS VLSI Design 4th Ed. 11: Sequential Circuits2 Outline  Sequencing  Sequencing Element Design.
Digital Logic Chapter 5 Presented by Prof Tim Johnson
Introduction to CMOS VLSI Design Lecture 10: Sequential Circuits David Harris Harvey Mudd College Spring 2004.
Digital Logic Design Lecture # 17 University of Tehran.
Sequential Circuits. Outline  Floorplanning  Sequencing  Sequencing Element Design  Max and Min-Delay  Clock Skew  Time Borrowing  Two-Phase Clocking.
1 Lecture 28 Timing Analysis. 2 Overview °Circuits do not respond instantaneously to input changes °Predictable delay in transferring inputs to outputs.
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis EE4800 CMOS Digital IC Design & Analysis Lecture 11 Sequential Circuit Design Zhuo Feng.
Introduction to CMOS VLSI Design Lecture 19: Design for Skew David Harris Harvey Mudd College Spring 2004.
Introduction to CMOS VLSI Design Clock Skew-tolerant circuits.
Clock Design Adopted from David Harris of Harvey Mudd College.
Assume array size is 256 (mult: 4ns, add: 2ns)
ENGIN112 L20: Sequential Circuits: Flip flops October 20, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 20 Sequential Circuits: Flip.
Chapter #6: Sequential Logic Design 6.2 Timing Methodologies
Introduction to CMOS VLSI Design Lecture 10: Sequential Circuits Credits: David Harris Harvey Mudd College (Material taken/adapted from Harris’ lecture.
Digital System Clocking: High-Performance and Low-Power Aspects Vojin G. Oklobdzija, Vladimir M. Stojanovic, Dejan M. Markovic, Nikola M. Nedovic Wiley-Interscience.
1 CSE370, Lecture 16 Lecture 19 u Logistics n HW5 is due today (full credit today, 20% off Monday 10:29am, Solutions up Monday 10:30am) n HW6 is due Wednesday.
Lecture 5. Sequential Logic 3 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System Education & Research.
DCSL & LVDCSL: A High Fan-in, High Performance Differential Current Switch Logic Families Dinesh Somasekhaar, Kaushik Roy Presented by Hazem Awad.
Digital System Clocking: High-Performance and Low-Power Aspects Vojin G. Oklobdzija, Vladimir M. Stojanovic, Dejan M. Markovic, Nikola M. Nedovic Wiley-Interscience.
1 CSE370, Lecture 17 Lecture 17 u Logistics n Lab 7 this week n HW6 is due Friday n Office Hours íMine: Friday 10:00-11:00 as usual íSara: Thursday 2:30-3:20.
Digital System Clocking: High-Performance and Low-Power Aspects Vojin G. Oklobdzija, Vladimir M. Stojanovic, Dejan M. Markovic, Nikola M. Nedovic Wiley-Interscience.
Clocking System Design
EE141 Timing Issues 1 Chapter 10 Timing Issues Rev /11/2003 Rev /28/2003 Rev /05/2003.
EE141 Timing Issues 1 Chapter 10 Timing Issues Rev /11/2003.
Digital Integrated Circuits A Design Perspective
Lecture 11: Sequential Circuit Design
Digital Integrated Circuits A Design Perspective
Chapter 7 Designing Sequential Logic Circuits Rev 1.0: 05/11/03
Low Power Very Fast Dynamic Logic Circuits
Sequential circuit design with metastability
Timing issues.
Flip Flops.
Appendix B The Basics of Logic Design
Basic Delay in Gates Definitions
Flip-Flop.
SEQUENTIAL LOGIC -II.
University of California Davis
University of California Davis
Latches and Flip-flops
CPE/EE 422/522 Advanced Logic Design L03
Introduction to CMOS VLSI Design Lecture 10: Sequential Circuits
University of California Davis
CSE 370 – Winter Sequential Logic-2 - 1
COMP541 Flip-Flop Timing Montek Singh Feb 23, 2010.
University of California Davis
触发器 Flip-Flops 刘鹏 浙江大学信息与电子工程学院 March 27, 2018
Future Directions in Clocking Multi-GHz Systems ISLPED 2002 Tutorial This presentation is available at: under Presentations.
Topics Performance analysis..
Dual Mode Logic An approach for high speed and energy efficient design
Chapter 10 Timing Issues Rev /11/2003 Rev /28/2003
332:578 Deep Submicron VLSI Design Lecture 14 Design for Clock Skew
Pipeline Principle A non-pipelined system of combination circuits (A, B, C) that computation requires total of 300 picoseconds. Comb. logic.
Clockless Logic: Asynchronous Pipelines
Lecture 14: Performance Optimization
COMP541 Sequential Logic Timing
Lecture 19 Logistics Last lecture Today
Timing Analysis and Optimization of Sequential Circuits
SEQUENTIAL CIRCUITS __________________________________________________
Lecture 3: Timing & Sequential Circuits
Presentation transcript:

Clocking in High-Performance and Low-Power Systems Presentation given at: EPFL Lausanne, Switzerland June 23th, 2003 Vojin G. Oklobdzija Advanced Computer System Engineering Laboratory University of California Davis Presentation available at: http://www.ece.ucdavis.edu/acsel

Prof. V.G. Oklobdzija, University of California Future Directions Synchronous / Asynchronous paradigm Synchronous solutions: Clock uncertainty absorption Time borrowing Skew-Tolerant Domino Clocking with signals Using both edges of the clock Conclusion 11/27/2018 Prof. V.G. Oklobdzija, University of California

Multi-GHz Clocking Problems Fewer logic in-between pipeline stages: Out of 7-10 FO4 allocated delays, FF can take 2-4 FO4 Clock uncertainty can take another FO4 The total could be ½ of the time allowed for computation 11/27/2018 Prof. V.G. Oklobdzija, University of California

Consequences of multi-GHz Clocks Pipeline boundaries start to blur Clocked Storage Elements must include logic Wave pipelining, domino style, signals used to clock ….. Synchronous design only in a limited domain Asynchronous communication between synchronous domains 11/27/2018 Prof. V.G. Oklobdzija, University of California

Synchronous / Asynchronous Design on the Chip 1 Billion transistors on the chip by 2005-6 64-b, 4-way issue logic core requires ~4 Million 11/27/2018 Prof. V.G. Oklobdzija, University of California

Synchronous / Asynchronous Design on the Chip 10 million transistors 1 Billion Transistos Chip 11/27/2018 Prof. V.G. Oklobdzija, University of California

Prof. V.G. Oklobdzija, University of California Two views of the world: - Asynchronous - Synchronous 11/27/2018 Prof. V.G. Oklobdzija, University of California

Prof. V.G. Oklobdzija, University of California 11/27/2018 Prof. V.G. Oklobdzija, University of California

Asynchronous Paradigm Logic Stage can take any time it needs Max. Speed limited by Handshake overhead Increased complexity of logic (de-glitching) 11/27/2018 Prof. V.G. Oklobdzija, University of California

Prof. V.G. Oklobdzija, University of California Synchronous Paradigm Max Speed determined by the slowest logic block Latch / FF timing overhead Fixed clock frequency (set by longest path) 11/27/2018 Prof. V.G. Oklobdzija, University of California

Synchronous Paradigm Clocked Storage Elements: Flip-Flops and Latches should be viewed as synchronization elements, not merely as storage elements ! Their main purpose is to synchronize fast and slow paths: prevent the fast path from corrupting the state Fast path corrupting present state 11/27/2018 Prof. V.G. Oklobdzija, University of California

Synchronous World: Tricks and Solutions Clocked Storage Elements with clock uncertainty absorption features Time Borrowing Incorporation of Synchronization features into the logic Skew Tolerant Domino Next Utilizing both edges of the Clock 11/27/2018 Prof. V.G. Oklobdzija, University of California

Clocked Storage Element Overhead Q Logic D Q N Clk Clk T TClk-Q TLogic U TD-Q=TClk-Q + U Tskew The time taken from the pipeline by the CSE is U and Clk-Q delay. Thus, D-Q delay is relevant, not Clk-Q : T = TClk-Q + TLogic + U+ Tskew 11/27/2018 Prof. V.G. Oklobdzija, University of California

Timing Characteristics Figure presenting typical clock-to-output and data-to-output characteristics is shown.. In stable region, clock-to-output characteristic is constant. As setup requirement of the device starts to be violated, clock-to-output curve rises, ending in failure at some point. Data-to-output characteristic, being simple sum of clock-to-output and data-to-clock time, falls with the slope of 45° in stable region. In metastable region, the slope starts to decrease as a function of increased clock-to-output characteristic. Minimum of data-to-output curve occurs at 45 ° slope of clock-to-output curve. Data-to-clock time that corresponds to this point is termed optimal setup time. 11/27/2018 Prof. V.G. Oklobdzija, University of California

Clock Uncertainty Absorption 11/27/2018 Prof. V.G. Oklobdzija, University of California

Clock Uncertainty Absrobtion Worst-case D DQ Nominal D D-Clk D Clock uncertainty t CU Early D D-Clk Late D D-Clk T =0 Nominal Clk Q D DQm D DQM 11/27/2018 Prof. V.G. Oklobdzija, University of California

Clock Uncertainty Absorption =30ps t =100ps CU CU Clk Clk U =-5ps Opt D D 3ps 44ps U =30ps Q Q Opt D =220ps D =261ps DQM DQM ( a ) t =30ps ( a =90% ) (b) t =100ps ( a =56% ) CU CU CU CU 11/27/2018 Prof. V.G. Oklobdzija, University of California

Synchronous World: Tricks and Solutions Clocked Storage Elements with clock uncertainty absorption features Time Borrowing Incorporation of Synchronization features into the logic Skew Tolerant Domino Next Utilizing both edges of the Clock 11/27/2018 Prof. V.G. Oklobdzija, University of California

Prof. V.G. Oklobdzija, University of California 11/27/2018 Prof. V.G. Oklobdzija, University of California

Critical Path with Time Borrowing 11/27/2018 Prof. V.G. Oklobdzija, University of California

Latches as synchronizers The purpose of CSE it is to synchronize data flow. We need to insert CSE to prevent “fast paths” from reaching the next logic stage too early. If the signal arrives late – it is allowed to borrow time from the next stage However, borrowing can not go for ever ….. 11/27/2018 Prof. V.G. Oklobdzija, University of California

Using Single Pulsed Latch 11/27/2018 Prof. V.G. Oklobdzija, University of California

Prof. V.G. Oklobdzija, University of California Single Pulsed Latch *Courtesy of D. Markovic & Intel MRL 11/27/2018 Prof. V.G. Oklobdzija, University of California

Optimal Single Latch Clocking Single Latch System (Unger & Tan ‘83): Pm=P ≥ DLM+DDQM {miminal clock period} DLm>DLmB≥W+TT+TL+H-DCQm {shortest path} Wopt=TL+TT+U+DCQM-DDQM {minimal clock width} Example: 0.10m Technology FO4=25-40pS, FF=80pS, Tunc=25-35pS, fmax=2.5-4. GHz, T=250-400pS Wopt~2Tunc~50-70pS DLm~4Tunc+H-DCQm~100-140pS {this is close to ½ of a cycle} 11/27/2018 Prof. V.G. Oklobdzija, University of California

Synchronous World: Tricks and Solutions Clocked Storage Elements with clock uncertainty absorption features Time Borrowing Incorporation of Synchronization features into the logic Skew Tolerant Domino Next Utilizing both edges of the Clock 11/27/2018 Prof. V.G. Oklobdzija, University of California

Prof. V.G. Oklobdzija, University of California Skew-Tolerant Domino (a.k.a. Opportunistic Time Borrowing) Intel Patent No.5,517,136 May 14, 1996 11/27/2018 Prof. V.G. Oklobdzija, University of California

CMOS Domino as Memory Element After the input changes – output remembers it Pre-charge destroys the information Proper phasing of the clock can allow passing the information from stage to stage 11/27/2018 Prof. V.G. Oklobdzija, University of California

Prof. V.G. Oklobdzija, University of California Skew-Tolerant Domino 11/27/2018 Prof. V.G. Oklobdzija, University of California

Synchronous World: Tricks and Solutions Clocked Storage Elements with clock uncertainty absorption features Time Borrowing Incorporation of Synchronization features into the logic Skew Tolerant Domino Next Utilizing both edges of the Clock 11/27/2018 Prof. V.G. Oklobdzija, University of California

Data Used to Clock: Non-Clocked Dynamic Logic (NCD) Logic gate precharged by fast input - no clock Practical for AND logic function OR is also possible Courtesy of N. Nedovic 11/27/2018 Prof. V.G. Oklobdzija, University of California

Differential Non-Clocked Dynamic Logic Can implement any function Precharge inputs are chosen from fastest arriving inputs Courtesy of N. Nedovic 11/27/2018 Prof. V.G. Oklobdzija, University of California

Pipeline Flow with NCD Logic - Example Domino gates used as synchronizers Gates are non-clocked => reducing clock load Precharge ripples through logic in path, similar to evaluation Evaluation may exceed pipeline boundary A method needed to prevent fast precharge Courtesy of N. Nedovic 11/27/2018 Prof. V.G. Oklobdzija, University of California

Synchronous World: Tricks and Solutions Clocked Storage Elements with clock uncertainty absorption features Time Borrowing Incorporation of Synchronization features into the logic Skew Tolerant Domino Next Utilizing both edges of the Clock 11/27/2018 Prof. V.G. Oklobdzija, University of California

Dual-Edge Triggered CSE DET-CSE samples the input data on both edges of the clock Reducing power consumption Half of the original clock frequency for the same data throughput Half of clock generation/distribution/SE-clock-related power is saved However, it may introduce an overhead 11/27/2018 Prof. V.G. Oklobdzija, University of California

Dual-Edge Triggered Storage Element Topologies Structurally, there are two different designs Latch-Mux (LM) Flip-Flop (FF) DET-Flip-Flop Non-transparency achieved by MUX DET-Latch 11/27/2018 Prof. V.G. Oklobdzija, University of California

Comparison with Single Edge SEs 11/27/2018 Prof. V.G. Oklobdzija, University of California

Comparison with Single Edge CSEs 11/27/2018 Prof. V.G. Oklobdzija, University of California

Single and Double Edge Triggered SE: Power Consumption (a=50%) 11/27/2018 Prof. V.G. Oklobdzija, University of California

Prof. V.G. Oklobdzija, University of California 11/27/2018 Prof. V.G. Oklobdzija, University of California

Prof. V.G. Oklobdzija, University of California Conclusion Synchronous Design: Has not exhausted all the tricks Asynchronous Design: Has not solved all the problems 11/27/2018 Prof. V.G. Oklobdzija, University of California

Design & optimization tradeoffs Opposite Goals Minimal Total power consumption Minimal Delay Power-Delay tradeoff Minimize Power-Delay product (PDPtot) @ f=const. Opt. Opt. Opt. 11/27/2018 Prof. V.G. Oklobdzija, University of California