Download presentation
Presentation is loading. Please wait.
1
Low-Latency Interfaces for Mixed-Timing Domains [in DAC-01] Tiberiu ChelceaSteven M. Nowick Department of Computer Science Columbia University {tibi,nowick}@cs.columbia.edu
2
Introduction Key Trend in VLSI systems: systems-on-a-chip (SoC) Two fundamental challenges: mixed-timing domains long interconnect delays Our Goal: design of efficient interface circuits Desirable Features: arbitrarily robust low-latency, high-throughput modularity, scalability Few satisfactory solutions to date….
3
Timing Issues in SoC Design (a) single-clock long inter- connect Domain #1 sync or async (b) mixed-timing domains Domain #2 sync or async Domain #1 Domain #2 long inter- connect
4
Timing Issues in SoC Design (cont.) Solution: provide interface circuits (a) single-clock long inter- connect Carloni et al., “relay stations” Domain #1 sync or async (b) mixed-timing domains Domain #2 sync or async Domain #1 Domain #2 long inter- connect NEW: “mixed-timing FIFO’s” NEW: “mixed-timing “relay stations”
5
Contributions Complete set of mixed-timing interface circuits: sync-sync, async-sync, sync-async, async-async Features: Arbitrary Robustness: wrt synchronization failures High-Throughput: in steady-state operation: no synchronization overhead Low-Latency: “fast restart” in empty FIFO: only synchronization overhead Reusability: each interface partitioned into reusable sub-components Two Contributions: Mixed-Timing FIFO’s Mixed-Timing Relay Stations
6
Contribution #1: Mixed-Timing FIFO’s Addresses issue of interfacing mixed-timing domains Features: token ring architecture circular array of identical cells shared buses: data + control data: “immobile” once enqueued distributed control: allows concurrent put/get operations 2 circulating tokens: define tail & head of queue Potential benefits: low latency low power scalability
7
Contribution #2: Mixed-Timing Relay Stations Addresses issue of long interconnect delays “Latency-Insensitive Protocols”: safely tolerate long interconnect delays between systems Prior Contribution: introduce “relay stations” single-clock domains (Carloni et al., ICCAD-99) Our Contribution: introduce “mixed-timing relay stations” mixed-clock (sync-sync) async-sync First proposed solutions to date….
8
Related Work Single-Clock Domains: handling clock discrepancies clock skew and jitter (Kol98, Greenstreet95) long interconnect delays (Carloni99) Mixed-Timing Domains: 3 common approaches Use “Wrapper Logic”: add logic layer to synchronize data/control (Seitz80, Seizovic94) drawback: long latencies in communication Modify Receiver’s Clock: stretchable and pausible clocks (Chapiro84, Yun96, Bormann97, Sjogren/Myers97) drawback: penalties in restarting clock
9
Related Work: Closer Approaches Mixed-Timing Domains (cont.): Interface Circuits: Mixed-Clock FIFO’s (Intel, Jex et al. 1997): drawback: significant area overhead = synchronizer for each cell Our approach: mixed-clock FIFO’s … only 2 synchronizers for entire FIFO
10
Outline Mixed-Clock Interfaces FIFO Relay Station Async-Sync Interfaces Async-Sync Interfaces FIFO Relay Station Results Results Conclusions Conclusions
11
Mixed-Clock FIFO: Block Level full req_put data_put CLK_put req_get valid_get empty data_get CLK_get Mixed-Clock FIFO Bus for data itemsIndicates when FIFO full Indicates when FIFO empty Controls get operations Initiates get operations Bus for data items Indicates data items validity (always 1 in this design)synchronous put inteface synchronous get interface Initiates put operations Controls put operations
13
Mixed-Clock FIFO: Steady-State Simulation Get Controller Empty Detector Full Detector Put Controller full req_put data_put CLK_put CLK_get data_get req_get valid_get empty FIFO not full Put Controller enables a put operation TAIL At the end of clock cycle Cell enqueues data HEAD Sender starts a put operation Steady state: FIFO neither full, nor empty
14
Mixed-Clock FIFO: Steady-State Simulation Get Controller Empty Detector Full Detector Put Controller full req_put data_put CLK_put CLK_get data_get req_get valid_get empty TAIL Passes the put token HEAD
15
Mixed-Clock FIFO: Steady-State Simulation Get Controller Empty Detector Full Detector Put Controller full req_put data_put CLK_put CLK_get data_get req_get valid_get empty TAIL HEAD Get Operation
16
Mixed-Clock FIFO: Steady-State Simulation Get Controller Empty Detector Full Detector Put Controller full req_put data_put CLK_put CLK_get data_get req_get valid_get empty TAIL HEAD Steady state operation: Puts and Gets “reasonably spaced” Zero probability of synchronization failure Steady state operation: Zero synchronization overhead
17
Mixed-Clock FIFO: Steady-State Simulation Get Controller Empty Detector Full Detector Put Controller full req_put data_put CLK_put CLK_get data_get req_get valid_get empty TAIL HEAD TAIL
18
Mixed-Clock FIFO: Full Scenario Get Controller Empty Detector Full Detector Put Controller full req_put data_put CLK_put CLK_get data_get req_get valid_get empty Put interface stalled FIFO FULL HEAD TAIL
19
Mixed-Clock FIFO: Full Scenario Get Controller Empty Detector Full Detector full req_put data_put CLK_put CLK_get data_get req_get valid_get empty HEAD Put Controller TAIL
20
Mixed-Clock FIFO: Full Scenario Get Controller Empty Detector Full Detector full req_put data_put CLK_put CLK_get data_get req_get valid_get empty Put Controller TAIL FIFO NOT FULL HEAD
21
Mixed-Clock FIFO: Full Scenario Get Controller Empty Detector Full Detector full req_put data_put CLK_put CLK_get data_get req_get valid_get empty Put Controller TAIL HEAD
22
REG Mixed-Clock FIFO: Cell Implementation En f_i e_i ptok_outptok_in gtok_ingtok_out CLK_geten_getvaliddata_get CLK_puten_putreq_putdata_put SR en_put en_get Enables a get operation Enables a put operation Synchronous Put Part Synchronous Get Part Data Validity Controller reusable f_i e_i Cell FULL Cell EMPTY Status Bits: ptok_outptok_ingtok_outgtok_in En validdata_get Data item out Validity bit out req_putdata_put Data item in Validity bit in
23
Mixed-Clock FIFO: Architecture Get Controller Empty Detector Full Detector Put Controller full req_put data_put CLK_put CLK_get data_get req_get valid_get empty FIFO not full
24
Synchronization Issues Challenge: interfaces are highly-concurrent Global “FIFO state”: controlled by 2 different clocks Problem #1: Metastability Each FIFO interface needs clean state signals Solution: Synchronize “full” & “empty” signals “full” with CLK_put “empty” with CLK_get Add 2 (or more) synchronizing latches to each signal Observable “full”/“empty” safely approximate true FIFO state
25
Synchronization Issues (cont.) Problem #2: FIFO now may underflow/overflow! synchronizing latches add extra latency Solution: Modify definitions of “full” and “empty” New FULL: 0 or 1 empty cells left New EMPTY: 0 or 1 full cells left e_0 e_1 e_2 e_3 e_2 e_1 e_0 CLK_put full Two consecutive empty cells FIFO not full = NO two consecutive empty cells Synchronizing Latches New Full Detector
26
Synchronization Issues (cont.) Problem #3: Potential for deadlock Scenario: suppose only 1 data item in quiescent FIFO FIFO still considered “empty” (new definition) Get interface: cannot dequeue data item! Solution: bi-modal “empty detector”, combines: “New empty” detector (0 or 1 data items) “True empty” detector (0 data items) Two results folded into single global “empty” signal
27
Synchronization Issues: Avoiding Deadlock f_0 f_1 f_2 f_3 f_2 f_1 f_0 CLK_get ne f_1f_3f_2f_0 CLK_get oe req_get en_get empty Detects “new empty” (0 or 1 empty cells) Detects “true empty” (0 empty cells) Combine into global “empty” Bi-modal empty detection: select either ne or oe Reconfigure whenever active get interface When reconfigured use “ne”: FIFO active avoids underflow When NOT reconfigured, use “oe”: FIFO quiescent avoids deadlock
28
Mixed-Clock FIFO: Architecture Get Controller Empty Detector Full Detector Put Controller full req_put data_put CLK_put CLK_get data_get req_get valid_get empty FIFO not full
29
Put/Get Controllers Put Controller: enables put operation disabled when FIFO full Get Controller: enables get operation indicates when data valid disabled when FIFO empty en_put full req_put en_get empty valid req_get valid_get
30
Outline Mixed-Clock Interfaces FIFO Relay Station Async-Sync Interfaces Async-Sync Interfaces FIFO Relay Station Results Results Conclusions Conclusions
31
Relay Stations: Overview system 1 now sends “data packets” to system 2 RS System 1System 2 Data Packet = data item + validity bit “stop” control = stopIn + stopOut - apply counter-pressure - result: stall communication Proposed by Carloni et al. (ICCAD’99) Steady State: pass data on every cycle (either valid or invalid) Problem: Works only for single-clock systems! CLK system 1 sends “data items” to system 2 Delay = > 1 cycle Delay = 1 cycle
32
Relay Stations: Implementation In normal operation: In normal operation: packetIn copied to MR and forwarded on packetOut When stopped ( stopIn =1): When stopped ( stopIn =1): stopOut raised on the next clock edge extra packet copied to AR switch mux MRAR Control packetOutpacketIn stopIn stopOut
33
Relay Station vs. Mixed-Clock FIFO Steady state: always pass data Data items: both valid & invalid Stopping mechanism: stopIn & stopOut Steady state: only pass data when requested Data items: only valid data Stopping mechanism: none (only full/empty) validOut dataOut stopIn validIn dataIn stopOut emptyfull req_getreq_put dataOut dataIn Relay Station Mixed- Clock FIFO
34
full req_put data_put CLK_put empty req_get valid_get data_get CLK_get Mixed-Clock FIFO CLK Mixed-Clock Relay Stations (MCRS) Mixed-Clock Relay Stations (MCRS) RS System 1System 2 Mixed-Clock Relay Station derived from the Mixed-Clock FIFOvalid_putdata_put stopOutstopIn valid_getdata_get Mixed-Clock Relay Station CLK1CLK2 MCRS CLK1 CLK2 Change ONLY Put and Get Controllers NEW packetIn packetOut
35
Mixed-Clock Relay Station: Implementation Identical: - FIFO cells - Full/Empty detectors (...or can simplify) Only modify: Put & Get Controllers validIn full en_put stopIn empty valid en_get validOut to cells Put ControllerGet Controller Mixed-Clock Relay Station vs. Mixed-Clock FIFO Always enqueue data (unless full)
36
Outline Mixed-Clock Interfaces Mixed-Clock Interfaces FIFO Relay Station Async-Sync Interfaces Async-Sync Interfaces FIFO Relay Station Results Results Conclusions Conclusions
37
Async-Sync FIFO: Block Level Asynchronous put interface: uses handshaking communication put_req: request operation put_ack: acknowledge completion no “full” signal Synchronous get interface: no change full req_put data_put CLK_put req_get valid_get empty data_get CLK_get Mixed-Clock FIFO put_data req_get valid_get empty data_get CLK_get put_req put_ack Async-Sync FIFO Async DomainSync Domain
38
Async-Sync FIFO: Architecture cell Get Controller Empty Detector put_ack put_req put_data CLK_get data_get req_get valid_get empty Get interface: exactly as in Mixed-Clock FIFO Asynchronous put interface No Full Detector or Put Controller When FIFO full, acknowledgement withheld until safe to perform the put operation
39
REG Async-Sync FIFO: Cell Implementation C + OPTDV En put_reqput_data put_ack we f_i gtok_out we1 gtok_in CLK_geten_getget_data e_i Data Validity Controller new Synchronous Get Part reusable (from mixed-clock FIFO) Asynchronous Put Part reusable from async FIFO (Async00)
40
Async-Sync Relay Stations (ASRS) ARS RS System 1 (async) System 2 (sync) ASRS CLK2 Micropipeline optional
41
Outline Mixed-Clock Interfaces Mixed-Clock Interfaces FIFO Relay Station Async-Sync Interfaces Async-Sync Interfaces FIFO Relay Station Results Results Conclusions Conclusions
42
Results Each circuit implemented: using both academic and industry tools MINIMALIST: Burst-Mode controllers [Nowick et al. ‘99] PETRIFY: Petri-Net controllers [Cortadella et al. ‘97] Pre-layout simulations: 0.6 m HP CMOS technology Experiments: various FIFO capacities (4/8/16 cells) various data widths (8/16 bits)
43
Results: Latency Design 4-place8-place16-place MinMaxMinMaxMinMax Mixed-Clock5.436.345.796.646.147.17 Async-Sync5.536.456.137.176.477.51 Mixed-Clock RS 5.486.416.057.026.237.28 Async-Sync RS 5.616.356.187.136.577.62 Experimental Setup: - 8-bit data items - various FIFO capacities (4, 8, 16) For each design, latency not uniquely defined: Min/Max Latency = time from enqueuing to dequeueing data into an empty FIFO
44
Results: Maximum Operating Rate Design 4-place8-place16-place PutGetPutGetPutGet Mixed-Clock565549544523505484 Async-Sync421549379523357484 Mixed-Clock RS 580539550517509475 Async-Sync RS 421539379517357475 Synchronous interfaces: MegaHertz Asynchronous interfaces: MegaOps/sec Put vs. Get rates: - sync put faster than sync get - async put slower than sync get
45
Conclusions Introduced several new low-latency interface circuits Address 2 major issues in SoC design: Mixed-timing domains mixed-clock FIFO async-sync FIFO Long interconnect delays mixed-clock relay station async-sync relay station Other designs implemented and simulated: Sync-Async FIFO + Relay Station Async-Async FIFO + Relay Station Reusable components: mix & match to build circuits Provide useful set of interface circuits for SoC design
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.