Demystifying Data-Driven and Pausible Clocking Schemes Robert Mullins Tutorial presented at 18 th UK Asynchronous Forum Newcastle, September 2006.

Slides:



Advertisements
Similar presentations
Registers Computer Organization I 1 September 2009 © McQuain, Feng & Ribbens A clock is a free-running signal with a cycle time. A clock may.
Advertisements

Data Synchronization Issues in GALS SoCs Rostislav (Reuven) Dobkin and Ran Ginosar Technion Christos P. Sotiriou FORTH ICS- FORTH.
Reading1: An Introduction to Asynchronous Circuit Design Al Davis Steve Nowick University of Utah Columbia University.
Ch. 7 Process Synchronization (1/2) I Background F Producer - Consumer process :  Compiler, Assembler, Loader, · · · · · · F Bounded buffer.
MICROELETTRONICA Sequential circuits Lection 7.
ELEC 256 / Saif Zahir UBC / 2000 Timing Methodology Overview Set of rules for interconnecting components and clocks When followed, guarantee proper operation.
Flip-Flops Computer Organization I 1 June 2010 © McQuain, Feng & Ribbens A clock is a free-running signal with a cycle time. A clock may be.
Lecture 11: Sequential Circuit Design. CMOS VLSI DesignCMOS VLSI Design 4th Ed. 11: Sequential Circuits2 Outline  Sequencing  Sequencing Element Design.
Computer Architecture CS 215
CPS3340 COMPUTER ARCHITECTURE Fall Semester, /23/2013 Lecture 7: Computer Clock & Memory Elements Instructor: Ashraf Yaseen DEPARTMENT OF MATH &
Digital Logic Design Lecture # 17 University of Tehran.
Delay/Phase Regeneration Circuits Crescenzo D’Alessandro, Andrey Mokhov, Alex Bystrov, Alex Yakovlev Microelectronics Systems Design Group School of EECE.
1 Lecture 20 Sequential Circuits: Latches. 2 Overview °Circuits require memory to store intermediate data °Sequential circuits use a periodic signal to.
Presenter : Ching-Hua Huang 2012/4/16 A Low-latency GALS Interface Implementation Yuan-Teng Chang; Wei-Che Chen; Hung-Yue Tsai; Wei-Min Cheng; Chang-Jiu.
Modern VLSI Design 4e: Chapter 8 Copyright  2008 Wayne Wolf Topics High-level synthesis. Architectures for low power. GALS design.
Sequential Logic 1 clock data in may changestable data out (Q) stable Registers  Sample data using clock  Hold data between clock cycles  Computation.
Synchronous Digital Design Methodology and Guidelines
Clock Design Adopted from David Harris of Harvey Mudd College.
The Design and Implementation of a Low-Latency On-Chip Network Robert Mullins 11 th Asia and South Pacific Design Automation Conference (ASP-DAC), Jan.
Digital Integrated Circuits© Prentice Hall 1995 Timing ISSUES IN TIMING.
Charles Kime & Thomas Kaminski © 2008 Pearson Education, Inc. (Hyperlinks are active in View Show mode) Chapter 6 –Selected Design Topics Part 3 – Asynchronous.
ELEC 6200, Fall 07, Oct 24 Jiang: Async. Processor 1 Asynchronous Processor Design for ELEC 6200 by Wei Jiang.
Lecture 8: Clock Distribution, PLL & DLL
MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group Computer Laboratory University of Cambridge, UK.
Demystifying Data-Driven and Pausible Clocking Schemes Robert Mullins Computer Architecture Group Computer Laboratory, University of Cambridge ASYNC 2007,
ASYNC 2000 Eilat April Priority Arbiters Alex Bystrov David Kinniment Alex Yakovlev University of Newcastle upon Tyne, UK.
Lab for Reliable Computing Generalized Latency-Insensitive Systems for Single-Clock and Multi-Clock Architectures Singh, M.; Theobald, M.; Design, Automation.
Design and Implementation of a True Random Number Generator Based on Digital Circuit Artifacts Michael Epstein 1, Laszlo Hars 2, Raymond Krasinski 1, Martin.
Communication-Centric Design Robert Mullins Computer Architecture Group Computer Laboratory, University of Cambridge (University of Twente, December 11.
1 Synchronization of complex systems Jordi Cortadella Universitat Politecnica de Catalunya Barcelona, Spain Thanks to A. Chakraborty, T. Chelcea, M. Greenstreet.
Communication-Centric Design Robert Mullins Computer Architecture Group Computer Laboratory, University of Cambridge Workshop on On- and Off-Chip Interconnection.
CHAPTER 9: Input / Output
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Hao Ji.
Contemporary Logic Design Sequential Logic © R.H. Katz Transparency No Chapter #6: Sequential Logic Design Sequential Switching Networks.
Sequential Circuit  It is a type of logic circuit whose output depends not only on the present value of its input signals but on the past history of its.
Electronic Counters.
Digital Integrated Circuits for Communication
Sequential Circuits Chapter 4 S. Dandamudi To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer,  S.
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
Computer Architecture Lecture 08 Fasih ur Rehman.
Digital System Bus A bus in a digital system is a collection of (usually unbroken) signal lines that carry module-to-module communications. The signals.
- 1 - Embedded Systems - SDL Some general properties of languages 1. Synchronous vs. asynchronous languages Description of several processes in many languages.
1 Presented By Şahin DELİPINAR Simon Moore,Peter Robinson,Steve Wilcox Computer Labaratory,University Of Cambridge December 15, 1995 Rotary Pipeline Processors.
Games Development 2 Concurrent Programming CO3301 Week 9.
RTL Hardware Design by P. Chu Chapter Overview on sequential circuits 2. Synchronous circuits 3. Danger of synthesizing asynchronous circuit 4.
Topic: Sequential Circuit Course: Logic Design Slide no. 1 Chapter #6: Sequential Logic Design.
Introduction to State Machine
Reading1: An Introduction to Asynchronous Circuit Design Al Davis Steve Nowick University of Utah Columbia University.
1 COMP541 Sequential Circuits Montek Singh Feb 1, 2012.
Chapter 4 MARIE: An Introduction to a Simple Computer.
12004 MAPLD: 153Brej Early output logic and Anti-Tokens Charlie Brej APT Group Manchester University.
CHAPTER 8 Developing Hard Macros The topics are: Overview Hard macro design issues Hard macro design process Physical design for hard macros Block integration.
Field Programmable Port Extender (FPX) 1 Modular Design Techniques for the FPX.
Reading Assignment: Rabaey: Chapter 9
School of Computer Science, The University of Adelaide© The University of Adelaide, Control Data Flow Graphs An experiment using Design/CPN Sue Tyerman.
Sequential Logic Computer Organization II 1 © McQuain A clock is a free-running signal with a cycle time. A clock may be either high or.
Chapter5: Synchronous Sequential Logic – Part 1
LECTURE 4 Logic Design. LOGIC DESIGN We already know that the language of the machine is binary – that is, sequences of 1’s and 0’s. But why is this?
07/11/2005 Register File Design and Memory Design Presentation E CSE : Introduction to Computer Architecture Slides by Gojko Babić.
CS151 Introduction to Digital Design Chapter 5: Sequential Circuits 5-1 : Sequential Circuit Definition 5-2: Latches 1Created by: Ms.Amany AlSaleh.
Implementing Tile-based Chip Multiprocessors with GALS Clocking Styles Zhiyi Yu, Bevan Baas VLSI Computation Lab, ECE Department University of California,
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 20: October 25, 2010 Pass Transistors.
CS3771 Today: Distributed Coordination  Previous class: Distributed File Systems Issues: Naming Strategies: Absolute Names, Mount Points (logical connection.
Embedded Real-Time Systems Processing interrupts Lecturer Department University.
Chapter #6: Sequential Logic Design
Background on the need for Synchronization
Clocks A clock is a free-running signal with a cycle time.
COMP541 Sequential Circuits
Formal Verification of Partial Good Self-Test Fencing Structures
ECE 352 Digital System Fundamentals
Presentation transcript:

Demystifying Data-Driven and Pausible Clocking Schemes Robert Mullins Tutorial presented at 18 th UK Asynchronous Forum Newcastle, September 2006

Synchronous to Delay-Insensitive Approaches to System Timing Synchronous Delay Insensitive GlobalNone Timing Assumptions Local Relative Wire Delay Less Detection Sub-System Local Isochronic Forks Multiple clocks Data-Driven and Pausible Clocks Bundled Data Quasi-Delay Insensitive Local Clocks/ Interaction with data (becoming aperiodic)

Introduction Clock Stretching Idea History and Uses –Crossing clock domains –Data-driven clocking Building local clock generators –Data-Driven and Pausible Clocks –Clock-Tree Insertion Delays Beyond synchronization –Novel Applications

Clock Stretching Local clock generator is constructed using a tunable delay-line and inverter (ring-oscillator) We are able to interrupt the ring oscillator and delay the generation of next clock edge until some event has completed Length of each clock cycle may now vary

Value-Safe Communication Metastability always an issue when attempting to synchronize an external input to a local clock Typical solution is to allocate a fixed period of time for metastability to resolve –e.g. two-flip flop synchronizer (1 cycle delay) –Accept a small chance of failure Pěchouček (1976) –Observed it was “fundamentally impossible to exclude arbitrarily long responses in any flip-flop” –Described circuits to allow the clock period to be stretched until metastability has been resolved

Data-Driven Clocking Metastable states may be avoided if asynchronous input starts local clock Pěchouček also outlined such a system –No longer a synchronization issue if clock is quiescent when data arrives Seitz (Ch. 7, Mead and Conway book) –Seitz creates a self-timed interface or wrapper around synchronous blocks –Used in proprietary designs since 1968 –“Assures performance and correct operation will not be compromised by wire delay in signals or in clock distribution”

Data-Driven Clocking Example Berkeley MAIA chip (2000) –GALS architecture, data-flow driven processing elements (“satellites”)

Data-Driven Clocking On-Chip Network Routers –Difficult to set global network clock frequency –Let local data streams determine rate of clocking –Need to sample other inputs on receiving an input request – who is to be admitted and who must wait (arbitration) Similar logic to static priority arbiter implementations

Building local clock generators Lots of published schemes, but similarities often hidden –Different circuit styles –Different specification & synthesis techniques Our approach –Introduce two basic types of local clock generator Data-driven and Pausible In reality both are very similar, distinction is aimed at understanding circuits and related work –For each case we will look at a number of different input port behaviours we may need Arbitrated, Sampled and Synchronised –Demonstrate how a complete wrapper may be “assembled”

Data-Driven and Pausible Clocks

Data-Driven Clocks

Basic input port behaviours: – Arbitrated Inputs Can only progress with one input at a time – Synchronised Inputs Need full set of inputs to progress – Sampled Inputs Can progress with variable number of data inputs (e.g. on-chip network router) Arbitrated Inputs Synchronized Inputs

Data-Driven Clock: Sampled Inputs Data-Driven Clock Template Sample inputs when at least one input is ready (and clock is low) Assert Lock Either admitted or locked out

Data-Driven Clocking: Pipelines and Flushing May require multiple clocks to process a single data item, e.g. when logic is pipelined Flushing Alternatives: –Eager Flushing Initialise counter to make N successive requests for a new clock to be generated –Time-Out Flush May wish to allow pipeline to be data-driven, time-out as last resort and flush using counter –Uninterrupted Flush –Pull-Driven Flush (driven by output) Flush logic is easy to add, simply appears as additional input

Clock Stretching Approach Common to see GALS wrappers built around a ‘stretchable’ clock circuit Idea is very similar to a data-driven clock Interface is reduced to a single ‘stretch’ input –Stretch signal is asserted synchronously –Removed asynchronously to permit next rising clock edge

Pausible Clocks

Pausible Clock In contrast to data-driven approach, the clock is normally free-running Mutual-exclusion element permits clock to be interrupted or paused Often used to ensure value-safe communication in a GALS system

Pausible Clock – Input Ports Arbitrated Inputs Sampled Inputs Synchronized Inputs

Consumer Side Interface Example Data is latched in first register while input holds grant from MUTEX MUTEX is released, new clock edge is generated, data is admitted

Arbitrated-call based pausible clock interface Similar consumer side interface using a single input register In this case, if the input is granted it also goes on to request the next clock edge Potential for creating a wider window for accepting data

Asynchronous Synchronisers and Q-elements Amulet3 interrupt synchroniser (1997) Similarities to pausible clock circuit Also interesting to look at Q-Modules again [Rosenberger/Molnar et al, 1988]

Output Ports Output port types: –Scheduled Output operation must be completed on a particular clock cycle –Registered Cannot overwrite output register until is has been read –Polled Port polls output to determine when data may be sent (synchronisation issue which may require clock period to be extended) Implementations are based on input port circuits already discussed

A GALS Wrapper Example Outline specification: – Free running clock – Asynchronous input we know nothing about when data will arrive For simplicity, lets assume we can always accept new data – Registered output feeding asynchronous FIFO

A GALS Wrapper Example: Step 1. Local clock generator with H/S interface

A GALS Wrapper Example: Step 2. Pausible Clock Template

A GALS Wrapper Example: Step 3. Provide registered output port support (stretchable clock template)

A GALS Wrapper Example: Step 4.

Clock Tree Insertion Delays Delay from root to leaf of clock tree can be considerable (certainly non-zero!) If every clock cycle is the same, this clock insertion delay is not normally an issue If we stretch the clock the insertion delay must be considered in our timing analysis (also true for clock gating in synchronous world) Not difficult to handle, but can increase time required to admit new data –Very large synchronous islands make little sense if we pursue a GALS approach anyway

Clock Tree Insertion Delays Must ensure input data is latched before we remove data and request! (a) Need to be careful not to extend clock cycle (b) Illustrates a simple approach if insertion delay is less than one cycle Can place logic here

Clock Tree Insertion Delays How do we handle multi- cycle insertion delays? Need to ensure we admit data on the correct clock cycle Cannot cheat and promote data! (requires arbitration, clock edge already in tree) We simply remember on which clock cycle data has been scheduled to be admitted

Summary Simple view of elastic clocking ideas Practical wrappers can be assembled from a few simple circuit templates Lots I haven’t talked about –approaching specification and synthesis more formally (next talk!) –formal verification I use Veraci (Paul Cunningham’s PhD work) Verification aided by modular view of wrappers –performance and optimisation of interfaces…

Novel Applications: Timing Speculation Circuit-Level Timing Speculation – Razor [ARM/Michigan] Exploit elastic clocking ideas to simplify error recovery mechanism and timing analysis

Novel Applications: Fine-Grain Clock Gating Power gating techniques increasingly important as static power consumption increases Multiple levels of sleep –In some cases will require restoration of state Exploit elastic clocking –Self-timed wake-up process –Simplify timing analysis and implementation –Power gate much smaller blocks –Speculative sleep and wake-up control

Synchronous vs. Asynchronous Control for GALS Synchronous alternatives –2 Flip-flop synchronizers –Clock Gating –Pipeline flushing mechanisms –Timing analysis and implementation can get complex –Not necessarily transparent to IP core architecture Elastic clocking techniques –Fully transparent, robust, simple to compose blocks and techniques –Optimise common-case and make exceptional cases work

Conclusions Scaling trends suggest we will soon see hundreds of networked IP cores on a single chip Timing and integration issues alone suggest a GALS approach based on elastic clocking should play a role –Competition from well established synchronisation techniques The need to optimise each IP block by exploiting timing speculation, clock, signal and power gating, DVFS etc. may see significant additional advantages –Research opportunity