Issues in System on the Chip Clocking November 6th, 2003 SoC Design Conference, Seoul, KOREA Vojin G. Oklobdzija Advanced Computer System Engineering Laboratory.

Slides:



Advertisements
Similar presentations
Digital System Clocking: High-Performance and Low-Power Aspects Vojin G. Oklobdzija, Vladimir M. Stojanovic, Dejan M. Markovic, Nikola M. Nedovic Wiley-Interscience.
Advertisements

1 Lecture 16 Timing  Terminology  Timing issues  Asynchronous inputs.
Modern VLSI Design 4e: Chapter 5 Copyright  2008 Wayne Wolf Topics n Performance analysis of sequential machines.
Digital Integrated Circuits A Design Perspective
Introduction to CMOS VLSI Design Sequential Circuits.
VLSI Design EE 447/547 Sequential circuits 1 EE 447/547 VLSI Design Lecture 9: Sequential Circuits.
Introduction to CMOS VLSI Design Sequential Circuits
ECE C03 Lecture 81 Lecture 8 Memory Elements and Clocking Hai Zhou ECE 303 Advanced Digital Design Spring 2002.
MICROELETTRONICA Sequential circuits Lection 7.
ELEC 256 / Saif Zahir UBC / 2000 Timing Methodology Overview Set of rules for interconnecting components and clocks When followed, guarantee proper operation.
Lecture 11: Sequential Circuit Design. CMOS VLSI DesignCMOS VLSI Design 4th Ed. 11: Sequential Circuits2 Outline  Sequencing  Sequencing Element Design.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 24: November 4, 2011 Synchronous Circuits.
Avshalom Elyada, Ran GinosarPipeline Synchronization 1 A Unique and Successfully Implemented Approach to the Synchronization Problem Based on the article.
Introduction to CMOS VLSI Design Lecture 10: Sequential Circuits David Harris Harvey Mudd College Spring 2004.
Sequential Circuits. Outline  Floorplanning  Sequencing  Sequencing Element Design  Max and Min-Delay  Clock Skew  Time Borrowing  Two-Phase Clocking.
Modern VLSI Design 4e: Chapter 8 Copyright  2008 Wayne Wolf Topics High-level synthesis. Architectures for low power. GALS design.
1 Lecture 28 Timing Analysis. 2 Overview °Circuits do not respond instantaneously to input changes °Predictable delay in transferring inputs to outputs.
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis EE4800 CMOS Digital IC Design & Analysis Lecture 11 Sequential Circuit Design Zhuo Feng.
Introduction to CMOS VLSI Design Lecture 19: Design for Skew David Harris Harvey Mudd College Spring 2004.
Sequential Logic 1 clock data in may changestable data out (Q) stable Registers  Sample data using clock  Hold data between clock cycles  Computation.
Introduction to CMOS VLSI Design Clock Skew-tolerant circuits.
Sequential Definitions  Use two level sensitive latches of opposite type to build one master-slave flipflop that changes state on a clock edge (when the.
EE141 © Digital Integrated Circuits 2nd Timing Issues 1 Digital Integrated Circuits A Design Perspective Timing Issues Jan M. Rabaey Anantha Chandrakasan.
CSE477 L19 Timing Issues; Datapaths.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 19: Timing Issues; Introduction to Datapath.
Clock Design Adopted from David Harris of Harvey Mudd College.
Embedding of Asynchronous Wave Pipelines into Synchronous Data Processing Stephan Hermanns, Sorin Alexander Huss University of Technology Darmstadt, Germany.
Assume array size is 256 (mult: 4ns, add: 2ns)
EE141 © Digital Integrated Circuits 2nd Timing Issues 1 Latch-based Design.
Sequential Logic 1  Combinational logic:  Compute a function all at one time  Fast/expensive  e.g. combinational multiplier  Sequential logic:  Compute.
Digital System Clocking:
S. Reda EN160 SP’08 Design and Implementation of VLSI Systems (EN1600) Lecture 22: Sequential Circuit Design (1/2) Prof. Sherief Reda Division of Engineering,
Introduction to CMOS VLSI Design Lecture 10: Sequential Circuits
Chapter #6: Sequential Logic Design 6.2 Timing Methodologies
Introduction to CMOS VLSI Design Lecture 10: Sequential Circuits Credits: David Harris Harvey Mudd College (Material taken/adapted from Harris’ lecture.
Temporizzazioni e sincronismo1 Progettazione di circuiti e sistemi VLSI Anno Accademico Lezione Temporizzazioni e sincronizzazione.
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 23: Sequential Circuit Design (1/3) Prof. Sherief Reda Division of Engineering,
Digital System Clocking: High-Performance and Low-Power Aspects Vojin G. Oklobdzija, Vladimir M. Stojanovic, Dejan M. Markovic, Nikola M. Nedovic Wiley-Interscience.
Digital Integrated Circuits for Communication
1 CSE370, Lecture 16 Lecture 19 u Logistics n HW5 is due today (full credit today, 20% off Monday 10:29am, Solutions up Monday 10:30am) n HW6 is due Wednesday.
MOUSETRAP Ultra-High-Speed Transition-Signaling Asynchronous Pipelines Montek Singh & Steven M. Nowick Department of Computer Science Columbia University,
Digital System Clocking: High-Performance and Low-Power Aspects Vojin G. Oklobdzija, Vladimir M. Stojanovic, Dejan M. Markovic, Nikola M. Nedovic Wiley-Interscience.
Safe Overclocking Safe Overclocking of Tightly Coupled CGRAs and Processor Arrays using Razor © 2012 Guy Lemieux Alex Brant, Ameer Abdelhadi, Douglas Sim,
Digital System Clocking: High-Performance and Low-Power Aspects Vojin G. Oklobdzija, Vladimir M. Stojanovic, Dejan M. Markovic, Nikola M. Nedovic Wiley-Interscience.
1 CSE370, Lecture 17 Lecture 17 u Logistics n Lab 7 this week n HW6 is due Friday n Office Hours íMine: Friday 10:00-11:00 as usual íSara: Thursday 2:30-3:20.
Computer Architecture Lecture 4 Sequential Circuits Ralph Grishman September 2015 NYU.
Skewed Flip-Flop Transformation for Minimizing Leakage in Sequential Circuits Jun Seomun, Jaehyun Kim, Youngsoo Shin Dept. of Electrical Engineering, KAIST,
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 26: October 31, 2014 Synchronous Circuits.
UNIVERSITY OF ROSTOCK Institute of Applied Microelectronics and Computer Science Single-Rail Self-timed Logic Circuits in Synchronous Designs Frank Grassert,
June clock data Q-flop Flop dataQ clock Flip-flop is edge triggered. It transfers input data to Q on clock rising edge. Memory Elements.
Digital System Clocking: High-Performance and Low-Power Aspects Vojin G. Oklobdzija, Vladimir M. Stojanovic, Dejan M. Markovic, Nikola M. Nedovic Wiley-Interscience.
1 COMP541 Sequential Logic Timing Montek Singh Sep 30, 2015.
Digital System Clocking: High-Performance and Low-Power Aspects Vojin G. Oklobdzija, Vladimir M. Stojanovic, Dejan M. Markovic, Nikola M. Nedovic Wiley-Interscience.
Clocking System Design
07/11/2005 Register File Design and Memory Design Presentation E CSE : Introduction to Computer Architecture Slides by Gojko Babić.
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU 99-1 Under-Graduate Project Design of Datapath Controllers Speaker: Shao-Wei Feng Adviser:
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 24: November 5, 2012 Synchronous Circuits.
FAMU-FSU College of Engineering EEL 3705 / 3705L Digital Logic Design Spring 2007 Instructor: Dr. Michael Frank Module #10: Sequential Logic Timing & Pipelining.
EE141 Timing Issues 1 Chapter 10 Timing Issues Rev /11/2003.
Lecture 11: Sequential Circuit Design
University of California Davis
University of California Davis
Introduction to CMOS VLSI Design Lecture 10: Sequential Circuits
The University of British Columbia
University of California Davis
Future Directions in Clocking Multi-GHz Systems ISLPED 2002 Tutorial This presentation is available at: under Presentations.
Clocking in High-Performance and Low-Power Systems Presentation given at: EPFL Lausanne, Switzerland June 23th, 2003 Vojin G. Oklobdzija Advanced.
Topics Performance analysis..
Chapter 10 Timing Issues Rev /11/2003 Rev /28/2003
332:578 Deep Submicron VLSI Design Lecture 14 Design for Clock Skew
Lecture 19 Logistics Last lecture Today
Presentation transcript:

Issues in System on the Chip Clocking November 6th, 2003 SoC Design Conference, Seoul, KOREA Vojin G. Oklobdzija Advanced Computer System Engineering Laboratory University of California Davis Presentation available at:

November 6, 2003Prof. V.G. Oklobdzija, University of California2 Directions in SoC Clocking Synchronous / Asynchronous paradigm Synchronous / Asynchronous paradigm Synchronous solutions: Synchronous solutions: –Clock uncertainty absorption –Time borrowing –Skew-Tolerant Domino –Using both edges of the clock Conclusion Conclusion

November 6, 2003Prof. V.G. Oklobdzija, University of California3 Clock frequency trends ISSCC-2002

November 6, 2003Prof. V.G. Oklobdzija, University of California4 Processor Frequency Trends Ê Frequency doubles each generation Ë Number of gates/clock reduce by 25% Courtesy of: Intel, S. Borkar

November 6, 2003Prof. V.G. Oklobdzija, University of California5 Multi-GHz Clocking Problems Fewer logic in-between pipeline stages: Fewer logic in-between pipeline stages: –Out of 7-10 FO4 allocated delays, FF can take 2-4 FO4 Clock uncertainty can take another FO4 Clock uncertainty can take another FO4 The total could be ½ of the time allowed for computation The total could be ½ of the time allowed for computation

November 6, 2003Prof. V.G. Oklobdzija, University of California6 Clock Uncertainties

November 6, 2003Prof. V.G. Oklobdzija, University of California7 Motivation for Improving on Clocked Storage Elements Example: In a 2.0 GHZ processor T=500pS - Typically clocked storage element D-Q delay is in the order of pS - If one can design a faster CSE: e.g pS D-Q, this represents 10-15% performance improvement - If in addition one can absorb 20pS of clock uncertainties and embedd one level of logic – this can yield up to 20% performance improvement - Try to achieve 10-20% performance improvement by introducing new features in the architecture ! - This is sufficient to turn an architect into a circuit designer !

November 6, 2003Prof. V.G. Oklobdzija, University of California8 Consequences of multi-GHz Clocks Pipeline boundaries start to blur Pipeline boundaries start to blur Clocked Storage Elements must include logic Clocked Storage Elements must include logic Wave pipelining, domino style, signals used to clock ….. Wave pipelining, domino style, signals used to clock ….. Synchronous design only in a limited domain Synchronous design only in a limited domain Asynchronous communication between synchronous domains Asynchronous communication between synchronous domains

November 6, 2003Prof. V.G. Oklobdzija, University of California9 Synchronous / Asynchronous Design on the Chip 1 Billion transistors on the chip by Billion transistors on the chip by b, 4-way issue logic core requires ~2 Million 64-b, 4-way issue logic core requires ~2 Million Table 1: Transistor count in typical RISC processors Feature Digital MIPS Power PC620 HP 8000 Sun US Freq. [MHz] Pipeline Stg Issue Rate44444 Out-of-Ord.6 loads321656none Reg-Ren./flpnone/832/328/856none Total Trans.9.3M5.9M6.9M3.9M3.8M Logic Trans.1.8M2.3M2.2M3.9M2.0M

November 6, 2003Prof. V.G. Oklobdzija, University of California10 Synchronous / Asynchronous Design on the Chip 1 Billion Transistors Chip 10 million transistors

November 6, 2003Prof. V.G. Oklobdzija, University of California11 Two views of the world: - Asynchronous - Synchronous

November 6, 2003Prof. V.G. Oklobdzija, University of California12 Asynchronous Paradigm Logic Stage can take any time it needs Logic Stage can take any time it needs Max. Speed limited by Handshake overhead Max. Speed limited by Handshake overhead Increased complexity of logic (de-glitching) Increased complexity of logic (de-glitching)

November 6, 2003Prof. V.G. Oklobdzija, University of California13 Synchronous Paradigm Max Speed determined by the slowest logic block Max Speed determined by the slowest logic block Latch / FF timing overhead Latch / FF timing overhead Fixed clock frequency (set by longest path) Fixed clock frequency (set by longest path)

November 6, 2003Prof. V.G. Oklobdzija, University of California14 Synchronous Paradigm Clocked Storage Elements: Flip-Flops and Latches should be viewed as synchronization elements, not merely as storage elements ! Clocked Storage Elements: Flip-Flops and Latches should be viewed as synchronization elements, not merely as storage elements ! Their main purpose is to synchronize fast and slow paths: Their main purpose is to synchronize fast and slow paths: –prevent the fast path from corrupting the state

November 6, 2003Prof. V.G. Oklobdzija, University of California15 Synchronous World: Tricks and Solutions Clocked Storage Elements with clock uncertainty absorption features Clocked Storage Elements with clock uncertainty absorption features Time Borrowing Time Borrowing Incorporation of Synchronization features into the logic Incorporation of Synchronization features into the logic –Skew Tolerant Domino Utilizing both edges of the Clock Utilizing both edges of the Clock

November 6, 2003Prof. V.G. Oklobdzija, University of California16 Clocked Storage Element Overhead The time taken from the pipeline by the CSE is U and Clk-Q delay. Thus, D-Q delay is relevant, not Clk-Q : The time taken from the pipeline by the CSE is U and Clk-Q delay. Thus, D-Q delay is relevant, not Clk-Q : T = T Clk-Q + T Logic + U+ T skew N DQ Clk DQ Logic T Logic T Clk-Q U T T D-Q =T Clk-Q + U T skew

November 6, 2003Prof. V.G. Oklobdzija, University of California17 Delay vs. Setup/Hold Times Sampling Window Data-Clk [ps] Clk-Output [ps] SetupHold Minimum Data-Output

November 6, 2003Prof. V.G. Oklobdzija, University of California18

November 6, 2003Prof. V.G. Oklobdzija, University of California19 Clock Uncertainty Absorption

November 6, 2003Prof. V.G. Oklobdzija, University of California20 Single-Ended Skew Tolerant Flip-Flop Nedovic, Oklobdzija, Walker, ISSCC 2003

November 6, 2003Prof. V.G. Oklobdzija, University of California21 Clock Uncertainty Absrobtion Clock uncertainty t CU D Q Clk Worst-case D DQ Nominal D D-Clk D DQm D DQM Early D D-Clk Late D D-Clk T Nominal =0

November 6, 2003Prof. V.G. Oklobdzija, University of California22 Clock Uncertainty Absorption t CU =100ps 44ps U Opt =30ps D DQM =261ps t CU =30ps 3ps U Opt =-5ps D DQM =220ps Clk D Q D Q (b) t CU =100ps(a CU =56%) (a) t CU =30ps(a CU =90%)

November 6, 2003Prof. V.G. Oklobdzija, University of California23 Synchronous World: Tricks and Solutions Clocked Storage Elements with clock uncertainty absorption features Clocked Storage Elements with clock uncertainty absorption features Time Borrowing Time Borrowing Incorporation of Synchronization features into the logic Incorporation of Synchronization features into the logic –Skew Tolerant Domino Utilizing both edges of the Clock Utilizing both edges of the Clock

November 6, 2003Prof. V.G. Oklobdzija, University of California24 Time Borrowing

November 6, 2003Prof. V.G. Oklobdzija, University of California25

November 6, 2003Prof. V.G. Oklobdzija, University of California26 Critical Path with Time Borrowing

November 6, 2003Prof. V.G. Oklobdzija, University of California27 Latches as synchronizers The purpose of CSE it is to synchronize data flow. The purpose of CSE it is to synchronize data flow. We need to insert CSE to prevent “fast paths” from reaching the next logic stage too early. We need to insert CSE to prevent “fast paths” from reaching the next logic stage too early. If the signal arrives late – it is allowed to borrow time from the next stage If the signal arrives late – it is allowed to borrow time from the next stage However, borrowing can not go for ever ….. However, borrowing can not go for ever …..

November 6, 2003Prof. V.G. Oklobdzija, University of California28 Using Single Pulsed Latch

November 6, 2003Prof. V.G. Oklobdzija, University of California29 Single Pulsed Latch *Courtesy of D. Markovic & Intel MRL

November 6, 2003Prof. V.G. Oklobdzija, University of California30 Optimal Single Latch Clocking Single Latch System (Unger & Tan ‘83): P m =P ≥ D LM +D DQM {miminal clock period} D Lm >D LmB ≥W+T T +T L +H-D CQm {shortest path} W opt =T L +T T +U+D CQM -D DQM {minimal clock width} Example: 0.10  Technology FO4=25-40pS, FF=80pS, T unc =25-35pS, f max = GHz, T= pS W opt ~2T unc ~50-70pS D Lm ~4T unc +H-D CQm ~ pS {this is close to ½ of a cycle}

November 6, 2003Prof. V.G. Oklobdzija, University of California31 Synchronous World: Tricks and Solutions Clocked Storage Elements with clock uncertainty absorption features Clocked Storage Elements with clock uncertainty absorption features Time Borrowing Time Borrowing Incorporation of Synchronization features into the logic Incorporation of Synchronization features into the logic –Skew Tolerant Domino Utilizing both edges of the Clock Utilizing both edges of the Clock

November 6, 2003Prof. V.G. Oklobdzija, University of California32 Skew-Tolerant Domino (a.k.a. Opportunistic Time Borrowing) Intel Patent No.5,517,136 May 14, 1996

November 6, 2003Prof. V.G. Oklobdzija, University of California33 CMOS Domino as Memory Element After the input changes – output remembers it After the input changes – output remembers it Pre-charge destroys the information Pre-charge destroys the information Proper phasing of the clock can allow passing the information from stage to stage Proper phasing of the clock can allow passing the information from stage to stage

November 6, 2003Prof. V.G. Oklobdzija, University of California34 Skew-Tolerant Domino

November 6, 2003Prof. V.G. Oklobdzija, University of California35 Synchronous World: Tricks and Solutions Clocked Storage Elements with clock uncertainty absorption features Clocked Storage Elements with clock uncertainty absorption features Time Borrowing Time Borrowing Incorporation of Synchronization features into the logic Incorporation of Synchronization features into the logic –Skew Tolerant Domino Utilizing both edges of the Clock Utilizing both edges of the Clock

November 6, 2003Prof. V.G. Oklobdzija, University of California36 Dual-Edge Triggered CSE DET-CSE samples the input data on both edges of the clock DET-CSE samples the input data on both edges of the clock Reducing power consumption Reducing power consumption –Half of the original clock frequency for the same data throughput –Half of clock generation/distribution/SE- clock-related power is saved However, it may introduce an overhead However, it may introduce an overhead

November 6, 2003Prof. V.G. Oklobdzija, University of California37 Dual-Edge Triggered Storage Element Topologies Structurally, there are two different designs Structurally, there are two different designs –Latch-Mux (LM) –Flip-Flop (FF) DET-Flip-Flop DET-Latch Non-transparency achieved by MUX

November 6, 2003Prof. V.G. Oklobdzija, University of California38 Comparison with Single Edge SEs

November 6, 2003Prof. V.G. Oklobdzija, University of California39 Comparison with Single Edge CSEs

November 6, 2003Prof. V.G. Oklobdzija, University of California40 Single and Double Edge Triggered SE: Power Consumption (a=50%)

November 6, 2003Prof. V.G. Oklobdzija, University of California41 Fo4=2.9

November 6, 2003Prof. V.G. Oklobdzija, University of California42 Symmetric Pulse Generator Flip-Flop (SPG-FF) Nedovic, Oklobdzija, Walker, ESSCIRC 2002

November 6, 2003Prof. V.G. Oklobdzija, University of California43Conclusion Clocking is the next challenge. Current clocking techniques may hold up to 10 GHz. Afterwards the pipeline boundaries start to vanish while more exotic clocking techniques will find their use. Synchronous design will be possible only in limited domains on the chip. A mix of Synchronous and Asynchronous design may emerge even in digital logic. Clocking is the next challenge. Current clocking techniques may hold up to 10 GHz. Afterwards the pipeline boundaries start to vanish while more exotic clocking techniques will find their use. Synchronous design will be possible only in limited domains on the chip. A mix of Synchronous and Asynchronous design may emerge even in digital logic. Synchronous Design: Synchronous Design: –Has not exhausted all the tricks Asynchronous Design: Asynchronous Design: –Has not solved all the problems We need solutions from both for a successful SoC Design We need solutions from both for a successful SoC Design