Download presentation
1
Synchronizers for Low Latency Clock Domain Transfer
Presented by Dmitry Verbitsky
2
Exactly Matched Frequencies
All domains operate from the same clock Skews may be arbitrary Skew may vary due to clock jitter, power supply noise, temperature variations, etc.
3
Rationally Related Frequencies
Clocks are derived from a common source Clock frequencies are rational multiples of each other
4
Closely Matched Frequencies
Clocks are derived from independent sources Clock are very closely matched in frequencies
5
Arbitrary Frequencies
Clocks are derived from independent sources Clock can be of any arbitrary frequencies Assume that clock frequencies are relatively stable – satisfied by nearly all synchronous designs
6
Clock Mismatch Sources
Difference in insertion delays between the two independent clock grids Reference clock distribution networks Accumulated phase error between independent PLL sources Primary clock distribution networks
7
Clock Mismatch Sources(2)
PVT variations Variation in parameters of the wires Different sizing of each buffering stage Presence of adjacent wires and the amount of switching activities between them
8
Interfacing Synchronous and Asynchronous Systems
To achieve a sufficiently small probability of synchronization failure of a single asynchronous input, all that is required is to allow a sufficiently long time for the synchronizer to exit the metastable state.
9
Pipelined Synchronization
Instead of transferring W bits every 1/E seconds, one can transfer kW bits every k/E seconds in order to allow k times as much time for synchronization.
10
STARI (Self Timed At Receiver’s Input)
Transmitter and receiver are mesochronous If the FIFO is initialized to be roughly half full, then throughout operation, the capacity of the FIFO remains roughly half full The need to check overflow and underflow is avoided Doesn’t require the absolute synchronization of purely synchronous methods Doesn’t require the explicit flow control mechanism of purely asynchronous methods
11
MinSTARI FIFO reduces to latch latch-X and a latch controller
Irrespective of the phase relations between FT and FR, FX can always be generated in such a way as to reliably transfer data from input to output
12
Latch Controller State Diagram
Initially starts at 0 Goes to state TR only when has seen both FT and FR events 2 possible cycles:
13
The Latch Controller Circuit
14
Transmitter Clock Event
15
Receiver Clock Event
16
Generate FX
17
Reset aT and aR
18
Reset c and FX
19
Description of the Solution
Low latency, high-speed interface through the integration of three major components: Data rate matching FIFO Pointer tracking circuit Digital filter
20
Data rate synchronization FIFO
Implemented as a circular queue of a given depth Read and write pointers are expected to exist on different clock grids FIFO is acting as a buffer between the two domains
21
Data rate synchronization FIFO(2)
For mesochronous systems no need to track if FIFO full or empty No additional logic is required to ensure the FIFO pointers are running at similar frequences since both clocks will be derived from the same reference
22
Data rate synchronization FIFO(3)
For heterochronous systems whose clocks are ratios of one another, a control circuit is required to reduce the frequency of the faster clock and ensure both pointers are running at the same average data rate
23
Data rate synchronization FIFO(4)
For plesiochronous systems the allowable frequency mismatch is limited by the tracking response time of the final design implementation In all clocking topologies, any differences between read and write pointers clock rates must be controlled to ensure they do not exceed tracking bandwidth of the final design
24
Pointer Tracking Circuit
By minimizing the number of unread entries in FIFO the latency is reduced Slow clock drift assumption relaxes the response time requirement and permits to remove the latency of the tracking circuit from the data path
25
Pointer Tracking Circuit(2)
Possible simplifications: No need to evaluate pointer separation on every clock One can choose to evaluate pointers at a convenient time to remove ambiguity as they wrap around the FIFO structure Pointer information, which is delayed while being locally synchronized, can be treated as the current state of the pointer in the other domain
26
Pointer Tracking Circuit(3)
Signal for pointer tracking is the MSB of the pointer Ensures that the signal will be safely captured through a simple synchronizer chain of flops in the other clock domain By detecting the falling edge of the MSB, one has a clear indication of when the pointer has wrapped to entry 0 of the FIFO
27
Pointer Tracking Circuit(4)
Designed to maintain the pointers at a specific, user programmable separation Tracking accuracy is a function of ratios between the clock domains, the digital filter and the pointer sampling rate If the design is failing in a particular configuration, the pointer separation can be increased to achieve functional operation
28
Pointer tracking circuit(5)
Relevant equations: F = Number of FIFO entries S = Desired/Programmed separation E = Expected local pointer location A = Actual local pointer location D = Pointer comparison result If the local domain is read pointer: E = F – S and D = A – E If the local domain is write pointer: E = S and D = E – A
29
Tracking Logic in RdPtr Domain
In this example, E = 6 and, when the Eval signal asserts, A = 5 Thus D = 5-6=-1 and the pointers are detected as being too far apart.
30
Timing Diagram Detecting Pointers Are Drifting Apart
31
Pointer Adjustment One clock is nominally faster than another
Ptrs too close(D>0) - suppress one Fast Clock Ptrs too far (D<0) - allow one extra Fast Clock Neither clock is nominally faster Ptrs too close(D>0) - suppress one clock on the Write Pointer Ptrs too far (D<0) - suppress one clock on the Read Pointer
32
Digital Filter Reasons: Example:
Tracking logic is susceptible to metastability on the synchronizer chain Data rate matching circuit may produce non-uniform clock patterns Example: Make adjustment only if in m samples, there were n detections of the pointers too far, (or conversely too close) where n is an integer
33
Sampling Uncertainties
By design any missed event is guaranteed to be capture on the very next clock This translates to one FIFO entry of uncertainty The other main contributor to uncertainty is the irregular duty cycle of the throttled clock
34
Uncertainty Due to Sample Jitter
35
Tracking Response Time
F = Number of FIFO entries S = Number of samples required by the digital filter l = FIFO throttled data rate, typically the clock period of the slow domain f = Maximum clock edge mismatch. The degree of phase mismatch between the throttled clock and the data-rate clock y = Maximum percentage of allowable clock mismatch
36
Tracking Response Time(2)
y=l/((F*S)+(F-1))*l+f)*100 F*S – total sample time (F-1) – worst case latency to the first sample
37
Tracking Response Time(3)
Simplification : f = l y = l/((F*S)+(F-1)+1)*l)*100 = =1/(F*(S+1))*100 By pipelining the throttled clock pattern which controls the faster domains’ pointer, the equation is modified to: y = 1/(F*(S+1)+P)*100
38
Tracking Response Time(4)
Example: 8 Entry FIFO (F) 3 Sample Filter (S) 1 Clock Uncertainty (f) 8 Clock Pipeline (P) y = 2.5% or PPM
39
Further Refinements Looking at the pointer separation slightly earlier in time can predict a pointer collision before it actually occurs. For example, invert the clock on the synchronizer chain Optimization of digital filter by more accurate tracking of pointers drift to avoid pointer collision when reducing their separation
40
Conclusions Design effectively reduces the latency across two clock domains in systems where the clock drift is slow but unbounded in duration The digital nature of design allows the implementation to scale in frequency without the potential risk of self-timed circuits The only true constraint on its use is that the domain clock frequencies must be known prior to activating the FIFO to ensure that pointers are advancing within the bandwidth of the tracking logic
41
A Predictive Synchronizer for Periodic Clock Domains
42
Synchronizer Architecture
43
Synchronizer Overview
Receives the two clocks and manages safe data transfer both ways Produces SEND and RECV control outputs to both domains, indicating when it is safe to receive and send new data on both sides, avoiding data misses and duplicates due to mismatched clock frequencies
44
Clock Conflicts Prediction
Can be predicted in advance due to periodic nature of the two clocks Let’s assume we have a conflict at time zero The next conflict occurs when there exist some N and K such that: N*TLOCAL=K*TEXT
45
Clock Conflicts Prediction(2)
Find the smallest D such that: TLOCAL+ D = M* TEXT (N-1)*TLOCAL = K*TEXT – TLOCAL (N-1)*TLOCAL = (K-M) *TEXT + D Conflict prediction is achieved by creating a Predictive Clock which is a version of the external clock delayed by D
46
Clock Conflicts Prediction(3)
Predicted and Local clocks conflict one TLOCAL cycle before the imminent conflict of the External and Local clocks Sampling the input (which is affected by RxCK) is delayed by a keep-out time TKO, where TKO>dZ
47
Conflict Detector FF1 and FF2 effectively sample Clk2 d time after and d time before the rising edge of Clk1, respectively Either FF may become metastable One half cycle of Clk1 is allotted for metastability resolution If Clk2 has risen during the 2d detection period, the top AND gate is enabled and Conflict output is generated
48
Computing Clock Cycle Time
Circuit starts with minimal delay and increases (or decreases) delay until it is equal to a full cycle The clock divider and flip-flop provide a loop delay (of two local clock cycles) Time resolution of Conflict detector must be larger than adjustment step Once the lower delay line has converted to TLOCAL, its programming code is copied to the upper delay line
49
Computing Clock Cycle Time(2)
The TLOCAL unit safely computes cycle time with precision dL DLL convergence time is:
50
Clock Predictor “Predicted clock” output provides a copy of external clock, delayed by D, one local cycle time in advance Loop delay must be the maximum of the two clock cycles
51
Rate Reducer The delay introduced by Rate Reducer between successive adjustments is 4TLOCAL+4TEXT Total tuning time of Programmable Delay 1 is:
52
Clock Predictor Precision
Clock Predictor safely generates a delayed version of the external clock that periodically precedes its original version by TLOCAL with precision
53
Conflict Prevention Circuit
The dC-conflict detector produces the Keep-Out signal upon a dC conflict of the local and predicted clocks The Clock Select circuit produces RxCK depending on Keep-Out RxCK is either the original local clock (when there is no predicted conflict) or the TKO delayed local clock (when a conflict is predicted)
54
Prediction Timing Diagram
55
TKO constraint Definition: Theorem: R: The rising edge event of RxCK
D: Event R when Keep-Out = 1 Theorem: If L1 and P occur within dC time of each other, then D and E are safely separated by at least dZ of each other
56
TKO constraint(2) Proof: Need to confirm that: By definition:
57
Avoiding Misses and Duplicates
58
Duplicate and Miss Control Circuit
59
Duplicate and Miss Control Circuit(2)
60
Conclusions Synchronizer takes advantage of periodic nature of clocks in order to predict potential conflicts in advance, and to conditionally employ an input sampling delay to avoid such conflicts Adjusts automatically to wide range of clock frequencies Avoids sampling duplicate data or missing any input
61
References Wade L. Williams, Philip E. Madrid, Scott C. Johnson, "Low Latency Clock Domain Transfer for Simultaneously Mesochronous, Plesiochronous and Heterochronous Interfaces," async, pp , 13th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC'07), 2007 J.N. Seizovic, “Pipeline Synchronization”, Proceedings of the 1st International Symposium on Advanced Research in Asynchronous Circuits and Systems, pp , 1994. A. Chakraborty and M.R. Greenstreet, “Efficient Self-Timed Interfaces for Crossing Clock Domains,” Proceedings 9th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC’03), pp , 2003. U. Frank, T. Kapschitz and R. Ginosar, “A Predictive Synchronizer for Periodic Clock Domains,” J. Formal Methods in System Design (special issue on Formal Methods for Globally Asynchronous Locally Synchronous Design), 28(2): , 2006
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.