Synchronizers for Low Latency Clock Domain Transfer Presented by Dmitry Verbitsky
Exactly Matched Frequencies All domains operate from the same clock Skews may be arbitrary Skew may vary due to clock jitter, power supply noise, temperature variations, etc.
Rationally Related Frequencies Clocks are derived from a common source Clock frequencies are rational multiples of each other
Closely Matched Frequencies Clocks are derived from independent sources Clock are very closely matched in frequencies
Arbitrary Frequencies Clocks are derived from independent sources Clock can be of any arbitrary frequencies Assume that clock frequencies are relatively stable – satisfied by nearly all synchronous designs
Clock Mismatch Sources Difference in insertion delays between the two independent clock grids Reference clock distribution networks Accumulated phase error between independent PLL sources Primary clock distribution networks
Clock Mismatch Sources(2) PVT variations Variation in parameters of the wires Different sizing of each buffering stage Presence of adjacent wires and the amount of switching activities between them
Interfacing Synchronous and Asynchronous Systems To achieve a sufficiently small probability of synchronization failure of a single asynchronous input, all that is required is to allow a sufficiently long time for the synchronizer to exit the metastable state.
Pipelined Synchronization Instead of transferring W bits every 1/E seconds, one can transfer kW bits every k/E seconds in order to allow k times as much time for synchronization.
STARI (Self Timed At Receiver’s Input) Transmitter and receiver are mesochronous If the FIFO is initialized to be roughly half full, then throughout operation, the capacity of the FIFO remains roughly half full The need to check overflow and underflow is avoided Doesn’t require the absolute synchronization of purely synchronous methods Doesn’t require the explicit flow control mechanism of purely asynchronous methods
MinSTARI FIFO reduces to latch latch-X and a latch controller Irrespective of the phase relations between FT and FR, FX can always be generated in such a way as to reliably transfer data from input to output
Latch Controller State Diagram Initially starts at 0 Goes to state TR only when has seen both FT and FR events 2 possible cycles:
The Latch Controller Circuit
Transmitter Clock Event
Receiver Clock Event
Generate FX
Reset aT and aR
Reset c and FX
Description of the Solution Low latency, high-speed interface through the integration of three major components: Data rate matching FIFO Pointer tracking circuit Digital filter
Data rate synchronization FIFO Implemented as a circular queue of a given depth Read and write pointers are expected to exist on different clock grids FIFO is acting as a buffer between the two domains
Data rate synchronization FIFO(2) For mesochronous systems no need to track if FIFO full or empty No additional logic is required to ensure the FIFO pointers are running at similar frequences since both clocks will be derived from the same reference
Data rate synchronization FIFO(3) For heterochronous systems whose clocks are ratios of one another, a control circuit is required to reduce the frequency of the faster clock and ensure both pointers are running at the same average data rate
Data rate synchronization FIFO(4) For plesiochronous systems the allowable frequency mismatch is limited by the tracking response time of the final design implementation In all clocking topologies, any differences between read and write pointers clock rates must be controlled to ensure they do not exceed tracking bandwidth of the final design
Pointer Tracking Circuit By minimizing the number of unread entries in FIFO the latency is reduced Slow clock drift assumption relaxes the response time requirement and permits to remove the latency of the tracking circuit from the data path
Pointer Tracking Circuit(2) Possible simplifications: No need to evaluate pointer separation on every clock One can choose to evaluate pointers at a convenient time to remove ambiguity as they wrap around the FIFO structure Pointer information, which is delayed while being locally synchronized, can be treated as the current state of the pointer in the other domain
Pointer Tracking Circuit(3) Signal for pointer tracking is the MSB of the pointer Ensures that the signal will be safely captured through a simple synchronizer chain of flops in the other clock domain By detecting the falling edge of the MSB, one has a clear indication of when the pointer has wrapped to entry 0 of the FIFO
Pointer Tracking Circuit(4) Designed to maintain the pointers at a specific, user programmable separation Tracking accuracy is a function of ratios between the clock domains, the digital filter and the pointer sampling rate If the design is failing in a particular configuration, the pointer separation can be increased to achieve functional operation
Pointer tracking circuit(5) Relevant equations: F = Number of FIFO entries S = Desired/Programmed separation E = Expected local pointer location A = Actual local pointer location D = Pointer comparison result If the local domain is read pointer: E = F – S and D = A – E If the local domain is write pointer: E = S and D = E – A
Tracking Logic in RdPtr Domain In this example, E = 6 and, when the Eval signal asserts, A = 5 Thus D = 5-6=-1 and the pointers are detected as being too far apart.
Timing Diagram Detecting Pointers Are Drifting Apart
Pointer Adjustment One clock is nominally faster than another Ptrs too close(D>0) - suppress one Fast Clock Ptrs too far (D<0) - allow one extra Fast Clock Neither clock is nominally faster Ptrs too close(D>0) - suppress one clock on the Write Pointer Ptrs too far (D<0) - suppress one clock on the Read Pointer
Digital Filter Reasons: Example: Tracking logic is susceptible to metastability on the synchronizer chain Data rate matching circuit may produce non-uniform clock patterns Example: Make adjustment only if in m samples, there were n detections of the pointers too far, (or conversely too close) where n is an integer
Sampling Uncertainties By design any missed event is guaranteed to be capture on the very next clock This translates to one FIFO entry of uncertainty The other main contributor to uncertainty is the irregular duty cycle of the throttled clock
Uncertainty Due to Sample Jitter
Tracking Response Time F = Number of FIFO entries S = Number of samples required by the digital filter l = FIFO throttled data rate, typically the clock period of the slow domain f = Maximum clock edge mismatch. The degree of phase mismatch between the throttled clock and the data-rate clock y = Maximum percentage of allowable clock mismatch
Tracking Response Time(2) y=l/((F*S)+(F-1))*l+f)*100 F*S – total sample time (F-1) – worst case latency to the first sample
Tracking Response Time(3) Simplification : f = l y = l/((F*S)+(F-1)+1)*l)*100 = =1/(F*(S+1))*100 By pipelining the throttled clock pattern which controls the faster domains’ pointer, the equation is modified to: y = 1/(F*(S+1)+P)*100
Tracking Response Time(4) Example: 8 Entry FIFO (F) 3 Sample Filter (S) 1 Clock Uncertainty (f) 8 Clock Pipeline (P) y = 2.5% or 25000 PPM
Further Refinements Looking at the pointer separation slightly earlier in time can predict a pointer collision before it actually occurs. For example, invert the clock on the synchronizer chain Optimization of digital filter by more accurate tracking of pointers drift to avoid pointer collision when reducing their separation
Conclusions Design effectively reduces the latency across two clock domains in systems where the clock drift is slow but unbounded in duration The digital nature of design allows the implementation to scale in frequency without the potential risk of self-timed circuits The only true constraint on its use is that the domain clock frequencies must be known prior to activating the FIFO to ensure that pointers are advancing within the bandwidth of the tracking logic
A Predictive Synchronizer for Periodic Clock Domains
Synchronizer Architecture
Synchronizer Overview Receives the two clocks and manages safe data transfer both ways Produces SEND and RECV control outputs to both domains, indicating when it is safe to receive and send new data on both sides, avoiding data misses and duplicates due to mismatched clock frequencies
Clock Conflicts Prediction Can be predicted in advance due to periodic nature of the two clocks Let’s assume we have a conflict at time zero The next conflict occurs when there exist some N and K such that: N*TLOCAL=K*TEXT
Clock Conflicts Prediction(2) Find the smallest D such that: TLOCAL+ D = M* TEXT (N-1)*TLOCAL = K*TEXT – TLOCAL (N-1)*TLOCAL = (K-M) *TEXT + D Conflict prediction is achieved by creating a Predictive Clock which is a version of the external clock delayed by D
Clock Conflicts Prediction(3) Predicted and Local clocks conflict one TLOCAL cycle before the imminent conflict of the External and Local clocks Sampling the input (which is affected by RxCK) is delayed by a keep-out time TKO, where TKO>dZ
Conflict Detector FF1 and FF2 effectively sample Clk2 d time after and d time before the rising edge of Clk1, respectively Either FF may become metastable One half cycle of Clk1 is allotted for metastability resolution If Clk2 has risen during the 2d detection period, the top AND gate is enabled and Conflict output is generated
Computing Clock Cycle Time Circuit starts with minimal delay and increases (or decreases) delay until it is equal to a full cycle The clock divider and flip-flop provide a loop delay (of two local clock cycles) Time resolution of Conflict detector must be larger than adjustment step Once the lower delay line has converted to TLOCAL, its programming code is copied to the upper delay line
Computing Clock Cycle Time(2) The TLOCAL unit safely computes cycle time with precision dL DLL convergence time is:
Clock Predictor “Predicted clock” output provides a copy of external clock, delayed by D, one local cycle time in advance Loop delay must be the maximum of the two clock cycles
Rate Reducer The delay introduced by Rate Reducer between successive adjustments is 4TLOCAL+4TEXT Total tuning time of Programmable Delay 1 is:
Clock Predictor Precision Clock Predictor safely generates a delayed version of the external clock that periodically precedes its original version by TLOCAL with precision
Conflict Prevention Circuit The dC-conflict detector produces the Keep-Out signal upon a dC conflict of the local and predicted clocks The Clock Select circuit produces RxCK depending on Keep-Out RxCK is either the original local clock (when there is no predicted conflict) or the TKO delayed local clock (when a conflict is predicted)
Prediction Timing Diagram
TKO constraint Definition: Theorem: R: The rising edge event of RxCK D: Event R when Keep-Out = 1 Theorem: If L1 and P occur within dC time of each other, then D and E are safely separated by at least dZ of each other
TKO constraint(2) Proof: Need to confirm that: By definition:
Avoiding Misses and Duplicates
Duplicate and Miss Control Circuit
Duplicate and Miss Control Circuit(2)
Conclusions Synchronizer takes advantage of periodic nature of clocks in order to predict potential conflicts in advance, and to conditionally employ an input sampling delay to avoid such conflicts Adjusts automatically to wide range of clock frequencies Avoids sampling duplicate data or missing any input
References Wade L. Williams, Philip E. Madrid, Scott C. Johnson, "Low Latency Clock Domain Transfer for Simultaneously Mesochronous, Plesiochronous and Heterochronous Interfaces," async, pp.196-204, 13th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC'07), 2007 J.N. Seizovic, “Pipeline Synchronization”, Proceedings of the 1st International Symposium on Advanced Research in Asynchronous Circuits and Systems, pp. 87-96, 1994. A. Chakraborty and M.R. Greenstreet, “Efficient Self-Timed Interfaces for Crossing Clock Domains,” Proceedings 9th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC’03), pp. 78-88, 2003. U. Frank, T. Kapschitz and R. Ginosar, “A Predictive Synchronizer for Periodic Clock Domains,” J. Formal Methods in System Design (special issue on Formal Methods for Globally Asynchronous Locally Synchronous Design), 28(2):171-186, 2006