Synchronizers for Low Latency Clock Domain Transfer

Slides:



Advertisements
Similar presentations
1 Lecture 16 Timing  Terminology  Timing issues  Asynchronous inputs.
Advertisements

Data Synchronization Issues in GALS SoCs Rostislav (Reuven) Dobkin and Ran Ginosar Technion Christos P. Sotiriou FORTH ICS- FORTH.
Sensors Interfacing.
ELEC 256 / Saif Zahir UBC / 2000 Timing Methodology Overview Set of rules for interconnecting components and clocks When followed, guarantee proper operation.
Lecture 11: Sequential Circuit Design. CMOS VLSI DesignCMOS VLSI Design 4th Ed. 11: Sequential Circuits2 Outline  Sequencing  Sequencing Element Design.
1 Lecture 20 Sequential Circuits: Latches. 2 Overview °Circuits require memory to store intermediate data °Sequential circuits use a periodic signal to.
Flip-Flops Last time, we saw how latches can be used as memory in a circuit. Latches introduce new problems: We need to know when to enable a latch. We.
Interfacing Analog and Digital Circuits
Introduction to CMOS VLSI Design Lecture 19: Design for Skew David Harris Harvey Mudd College Spring 2004.
Introduction to CMOS VLSI Design Clock Skew-tolerant circuits.
Reconfigurable Computing - Clocks John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on Cockburn Sound, Western Australia.
EE141 © Digital Integrated Circuits 2nd Timing Issues 1 Digital Integrated Circuits A Design Perspective Timing Issues Jan M. Rabaey Anantha Chandrakasan.
Synchronous Digital Design Methodology and Guidelines
1 Digital Design: State Machines Timing Behavior Credits : Slides adapted from: J.F. Wakerly, Digital Design, 4/e, Prentice Hall, 2006 C.H. Roth, Fundamentals.
Clock Design Adopted from David Harris of Harvey Mudd College.
Technical Seminar on Timing Issues in Digital Circuits
Charles Kime & Thomas Kaminski © 2008 Pearson Education, Inc. (Hyperlinks are active in View Show mode) Chapter 6 –Selected Design Topics Part 3 – Asynchronous.
Lecture 8: Clock Distribution, PLL & DLL
Embedded Systems Hardware:
1 Synchronization of complex systems Jordi Cortadella Universitat Politecnica de Catalunya Barcelona, Spain Thanks to A. Chakraborty, T. Chelcea, M. Greenstreet.
CS 151 Digital Systems Design Lecture 20 Sequential Circuits: Flip flops.
11/15/2004EE 42 fall 2004 lecture 321 Lecture #32 Registers, counters etc. Last lecture: –Digital circuits with feedback –Clocks –Flip-Flops This Lecture:
Sequential Circuits. 2 Sequential vs. Combinational Combinational Logic:  Output depends only on current input −TV channel selector (0-9) Sequential.
Embedded Systems Hardware: Storage Elements; Finite State Machines; Sequential Logic.
Timers and Interrupts Shivendu Bhushan Summer Camp ‘13.
Comparators A comparator compares two input words.
Comparators  A comparator compares two input words.  The following slide shows a simple comparator which takes two inputs, A, and B, each of length 4.
Buck Regulator Architectures
Digital System Clocking: High-Performance and Low-Power Aspects Vojin G. Oklobdzija, Vladimir M. Stojanovic, Dejan M. Markovic, Nikola M. Nedovic Wiley-Interscience.
Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport (1978) Presented by: Yoav Kantor.
Lecture 22: PLLs and DLLs. CMOS VLSI DesignCMOS VLSI Design 4th Ed. 22: PLLs and DLLs2 Outline  Clock System Architecture  Phase-Locked Loops  Delay-Locked.
Lecture 1 Signals in the Time and Frequency Domains
Engineering Lecture 3 Digital Electronics by Jaroslaw Karcz.
Elastic Buffer: data transfer in 2 clock domains Albert Chun (M.A.Sc. Candidate) Ottawa-Carleton Institute for Electrical & Computer Engineering (OCIECE)
Low Latency Clock Domain Transfer for Simultaneously Mesochronous, Plesiochronous and Heterochronous Interfaces Wade Williams Philip Madrid, Scott C. Johnson.
A 1V 14b Self-Timed Zero- Crossing-Based Incremental ΔΣ ADC[1] Class Presentation for Custom Implementation of DSP By Parinaz Naseri Spring
Topic: Sequential Circuit Course: Logic Design Slide no. 1 Chapter #6: Sequential Logic Design.
Introduction to State Machine
RTL Hardware Design by P. Chu Chapter Poor design practice and remedy 2. More counters 3. Register as fast temporary storage 4. Pipelined circuit.
Reading Assignment: Rabaey: Chapter 9
1Sequential circuit design Acknowledgement: Most of the following slides are adapted from Prof. Kale's slides at UIUC, USA by Erol Sahin and Ruken Cakici.
Sequential Logic Computer Organization II 1 © McQuain A clock is a free-running signal with a cycle time. A clock may be either high or.
SoC Clock Synchronizers Project Elihai Maicas Harel Mechlovitz Characterization Presentation.
Unit 1 Lecture 4.
Analog/Digital Conversion
Chapter5: Synchronous Sequential Logic – Part 1
Time Management.  Time management is concerned with OS facilities and services which measure real time.  These services include:  Keeping track of.
CS151 Introduction to Digital Design Chapter 5: Sequential Circuits 5-1 : Sequential Circuit Definition 5-2: Latches 1Created by: Ms.Amany AlSaleh.
Data and Computer Communications Eighth & Ninth Edition by William Stallings Chapter 6 – Digital Data Communications Techniques.
1 Recap: Lecture 4 Logic Implementation Styles:  Static CMOS logic  Dynamic logic, or “domino” logic  Transmission gates, or “pass-transistor” logic.
Sequential Circuit Design 05 Acknowledgement: Most of the following slides are adapted from Prof. Kale's slides at UIUC, USA.
Analog-Digital Conversion. Analog outputs from sensors and analog front- ends (analog signal conditioning) have to be converted into digital signals.
Serial Communications
Chapter #6: Sequential Logic Design
Registers and Counters
EI205 Lecture 8 Dianguang Ma Fall 2008.
Clocks A clock is a free-running signal with a cycle time.
DAC3484 Multi-DAC Synchronization
ECE Digital logic Lecture 16: Synchronous Sequential Logic
Clock Domain Crossing Keon Amini.
CMOS VLSI Design Chapter 13 Clocks, DLLs, PLLs
Timing Analysis 11/21/2018.
CMOS VLSI Design Chapter 13 Clocks, DLLs, PLLs
332:578 Deep Submicron VLSI Design Lecture 14 Design for Clock Skew
Clockless Logic: Asynchronous Pipelines
8253 – PROGRAMMABLE INTERVAL TIMER (PIT). What is a Timer? Timer is a specialized type of device that is used to measure timing intervals. Timers can.
Synchronous Digital Design Methodology and Guidelines
Serial Communications
Synchronous Digital Design Methodology and Guidelines
Lecture 22: PLLs and DLLs.
Presentation transcript:

Synchronizers for Low Latency Clock Domain Transfer Presented by Dmitry Verbitsky

Exactly Matched Frequencies All domains operate from the same clock Skews may be arbitrary Skew may vary due to clock jitter, power supply noise, temperature variations, etc.

Rationally Related Frequencies Clocks are derived from a common source Clock frequencies are rational multiples of each other

Closely Matched Frequencies Clocks are derived from independent sources Clock are very closely matched in frequencies

Arbitrary Frequencies Clocks are derived from independent sources Clock can be of any arbitrary frequencies Assume that clock frequencies are relatively stable – satisfied by nearly all synchronous designs

Clock Mismatch Sources Difference in insertion delays between the two independent clock grids Reference clock distribution networks Accumulated phase error between independent PLL sources Primary clock distribution networks

Clock Mismatch Sources(2) PVT variations Variation in parameters of the wires Different sizing of each buffering stage Presence of adjacent wires and the amount of switching activities between them

Interfacing Synchronous and Asynchronous Systems To achieve a sufficiently small probability of synchronization failure of a single asynchronous input, all that is required is to allow a sufficiently long time for the synchronizer to exit the metastable state.

Pipelined Synchronization Instead of transferring W bits every 1/E seconds, one can transfer kW bits every k/E seconds in order to allow k times as much time for synchronization.

STARI (Self Timed At Receiver’s Input) Transmitter and receiver are mesochronous If the FIFO is initialized to be roughly half full, then throughout operation, the capacity of the FIFO remains roughly half full The need to check overflow and underflow is avoided Doesn’t require the absolute synchronization of purely synchronous methods Doesn’t require the explicit flow control mechanism of purely asynchronous methods

MinSTARI FIFO reduces to latch latch-X and a latch controller Irrespective of the phase relations between FT and FR, FX can always be generated in such a way as to reliably transfer data from input to output

Latch Controller State Diagram Initially starts at 0 Goes to state TR only when has seen both FT and FR events 2 possible cycles:

The Latch Controller Circuit

Transmitter Clock Event

Receiver Clock Event

Generate FX

Reset aT and aR

Reset c and FX

Description of the Solution Low latency, high-speed interface through the integration of three major components: Data rate matching FIFO Pointer tracking circuit Digital filter

Data rate synchronization FIFO Implemented as a circular queue of a given depth Read and write pointers are expected to exist on different clock grids FIFO is acting as a buffer between the two domains

Data rate synchronization FIFO(2) For mesochronous systems no need to track if FIFO full or empty No additional logic is required to ensure the FIFO pointers are running at similar frequences since both clocks will be derived from the same reference

Data rate synchronization FIFO(3) For heterochronous systems whose clocks are ratios of one another, a control circuit is required to reduce the frequency of the faster clock and ensure both pointers are running at the same average data rate

Data rate synchronization FIFO(4) For plesiochronous systems the allowable frequency mismatch is limited by the tracking response time of the final design implementation In all clocking topologies, any differences between read and write pointers clock rates must be controlled to ensure they do not exceed tracking bandwidth of the final design

Pointer Tracking Circuit By minimizing the number of unread entries in FIFO the latency is reduced Slow clock drift assumption relaxes the response time requirement and permits to remove the latency of the tracking circuit from the data path

Pointer Tracking Circuit(2) Possible simplifications: No need to evaluate pointer separation on every clock One can choose to evaluate pointers at a convenient time to remove ambiguity as they wrap around the FIFO structure Pointer information, which is delayed while being locally synchronized, can be treated as the current state of the pointer in the other domain

Pointer Tracking Circuit(3) Signal for pointer tracking is the MSB of the pointer Ensures that the signal will be safely captured through a simple synchronizer chain of flops in the other clock domain By detecting the falling edge of the MSB, one has a clear indication of when the pointer has wrapped to entry 0 of the FIFO

Pointer Tracking Circuit(4) Designed to maintain the pointers at a specific, user programmable separation Tracking accuracy is a function of ratios between the clock domains, the digital filter and the pointer sampling rate If the design is failing in a particular configuration, the pointer separation can be increased to achieve functional operation

Pointer tracking circuit(5) Relevant equations: F = Number of FIFO entries S = Desired/Programmed separation E = Expected local pointer location A = Actual local pointer location D = Pointer comparison result If the local domain is read pointer: E = F – S and D = A – E If the local domain is write pointer: E = S and D = E – A

Tracking Logic in RdPtr Domain In this example, E = 6 and, when the Eval signal asserts, A = 5 Thus D = 5-6=-1 and the pointers are detected as being too far apart.

Timing Diagram Detecting Pointers Are Drifting Apart

Pointer Adjustment One clock is nominally faster than another Ptrs too close(D>0) - suppress one Fast Clock Ptrs too far (D<0) - allow one extra Fast Clock Neither clock is nominally faster Ptrs too close(D>0) - suppress one clock on the Write Pointer Ptrs too far (D<0) - suppress one clock on the Read Pointer

Digital Filter Reasons: Example: Tracking logic is susceptible to metastability on the synchronizer chain Data rate matching circuit may produce non-uniform clock patterns Example: Make adjustment only if in m samples, there were n detections of the pointers too far, (or conversely too close) where n is an integer

Sampling Uncertainties By design any missed event is guaranteed to be capture on the very next clock This translates to one FIFO entry of uncertainty The other main contributor to uncertainty is the irregular duty cycle of the throttled clock

Uncertainty Due to Sample Jitter

Tracking Response Time F = Number of FIFO entries S = Number of samples required by the digital filter l = FIFO throttled data rate, typically the clock period of the slow domain f = Maximum clock edge mismatch. The degree of phase mismatch between the throttled clock and the data-rate clock y = Maximum percentage of allowable clock mismatch

Tracking Response Time(2) y=l/((F*S)+(F-1))*l+f)*100 F*S – total sample time (F-1) – worst case latency to the first sample

Tracking Response Time(3) Simplification : f = l y = l/((F*S)+(F-1)+1)*l)*100 = =1/(F*(S+1))*100 By pipelining the throttled clock pattern which controls the faster domains’ pointer, the equation is modified to: y = 1/(F*(S+1)+P)*100

Tracking Response Time(4) Example: 8 Entry FIFO (F) 3 Sample Filter (S) 1 Clock Uncertainty (f) 8 Clock Pipeline (P) y = 2.5% or 25000 PPM

Further Refinements Looking at the pointer separation slightly earlier in time can predict a pointer collision before it actually occurs. For example, invert the clock on the synchronizer chain Optimization of digital filter by more accurate tracking of pointers drift to avoid pointer collision when reducing their separation

Conclusions Design effectively reduces the latency across two clock domains in systems where the clock drift is slow but unbounded in duration The digital nature of design allows the implementation to scale in frequency without the potential risk of self-timed circuits The only true constraint on its use is that the domain clock frequencies must be known prior to activating the FIFO to ensure that pointers are advancing within the bandwidth of the tracking logic

A Predictive Synchronizer for Periodic Clock Domains

Synchronizer Architecture

Synchronizer Overview Receives the two clocks and manages safe data transfer both ways Produces SEND and RECV control outputs to both domains, indicating when it is safe to receive and send new data on both sides, avoiding data misses and duplicates due to mismatched clock frequencies

Clock Conflicts Prediction Can be predicted in advance due to periodic nature of the two clocks Let’s assume we have a conflict at time zero The next conflict occurs when there exist some N and K such that: N*TLOCAL=K*TEXT

Clock Conflicts Prediction(2) Find the smallest D such that: TLOCAL+ D = M* TEXT (N-1)*TLOCAL = K*TEXT – TLOCAL (N-1)*TLOCAL = (K-M) *TEXT + D Conflict prediction is achieved by creating a Predictive Clock which is a version of the external clock delayed by D

Clock Conflicts Prediction(3) Predicted and Local clocks conflict one TLOCAL cycle before the imminent conflict of the External and Local clocks Sampling the input (which is affected by RxCK) is delayed by a keep-out time TKO, where TKO>dZ

Conflict Detector FF1 and FF2 effectively sample Clk2 d time after and d time before the rising edge of Clk1, respectively Either FF may become metastable One half cycle of Clk1 is allotted for metastability resolution If Clk2 has risen during the 2d detection period, the top AND gate is enabled and Conflict output is generated

Computing Clock Cycle Time Circuit starts with minimal delay and increases (or decreases) delay until it is equal to a full cycle The clock divider and flip-flop provide a loop delay (of two local clock cycles) Time resolution of Conflict detector must be larger than adjustment step Once the lower delay line has converted to TLOCAL, its programming code is copied to the upper delay line

Computing Clock Cycle Time(2) The TLOCAL unit safely computes cycle time with precision dL DLL convergence time is:

Clock Predictor “Predicted clock” output provides a copy of external clock, delayed by D, one local cycle time in advance Loop delay must be the maximum of the two clock cycles

Rate Reducer The delay introduced by Rate Reducer between successive adjustments is 4TLOCAL+4TEXT Total tuning time of Programmable Delay 1 is:

Clock Predictor Precision Clock Predictor safely generates a delayed version of the external clock that periodically precedes its original version by TLOCAL with precision

Conflict Prevention Circuit The dC-conflict detector produces the Keep-Out signal upon a dC conflict of the local and predicted clocks The Clock Select circuit produces RxCK depending on Keep-Out RxCK is either the original local clock (when there is no predicted conflict) or the TKO delayed local clock (when a conflict is predicted)

Prediction Timing Diagram

TKO constraint Definition: Theorem: R: The rising edge event of RxCK D: Event R when Keep-Out = 1 Theorem: If L1 and P occur within dC time of each other, then D and E are safely separated by at least dZ of each other

TKO constraint(2) Proof: Need to confirm that: By definition:

Avoiding Misses and Duplicates

Duplicate and Miss Control Circuit

Duplicate and Miss Control Circuit(2)

Conclusions Synchronizer takes advantage of periodic nature of clocks in order to predict potential conflicts in advance, and to conditionally employ an input sampling delay to avoid such conflicts Adjusts automatically to wide range of clock frequencies Avoids sampling duplicate data or missing any input

References Wade L. Williams, Philip E. Madrid, Scott C. Johnson, "Low Latency Clock Domain Transfer for Simultaneously Mesochronous, Plesiochronous and Heterochronous Interfaces," async, pp.196-204, 13th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC'07), 2007 J.N. Seizovic, “Pipeline Synchronization”, Proceedings of the 1st International Symposium on Advanced Research in Asynchronous Circuits and Systems, pp. 87-96, 1994. A. Chakraborty and M.R. Greenstreet, “Efficient Self-Timed Interfaces for Crossing Clock Domains,” Proceedings 9th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC’03), pp. 78-88, 2003. U. Frank, T. Kapschitz and R. Ginosar, “A Predictive Synchronizer for Periodic Clock Domains,” J. Formal Methods in System Design (special issue on Formal Methods for Globally Asynchronous Locally Synchronous Design), 28(2):171-186, 2006