Download presentation
Presentation is loading. Please wait.
1
1 A Modular Synchronizing FIFO for NoCs Vainbaum Yuri
2
2 A Modular Synchronizing FIFO for NoCs Paper presented in NOC-2009 Authors : Tarik Ono -Sun Microsystems Mark Greenstreet - University of British Columbia
3
3 Motivation & Purpose of Synchronizing FIFO Timing Domain 1Timing Domain 2Timing Domain 3 Synchronizing FIFO Synchronizing FIFO Synchronizing FIFO Network-on-Chip Multiple clock domains in NoC require many FIFOs
4
4 Synchronizing FIFO Targets Design Targets for FIFO: FIFO can be built using standard cells Easy integration into CAD flow Modular FIFO design with choice of clockless or clocked interfaces Modular, simple architecture reduces NoC design time
5
5 Talk Outline FIFO Overview FIFO Blocks Clockless Put and Get Interface Clocked Put and Get Interface Full-Empty Control and Data Store FIFO Latency and Throughput Implementation Results
6
6 FIFO Overview: Operation stage 1stage 2stage 3 Put Interface Get Interface Sender Receiver Timing Domain A FIFO consists of number of stages Sender communicates with Put Interface, Receiver with Get Interface Tokens determine FIFO stage for next put and get operation Timing Domain B
7
7 FIFO Overview: Structure stage 1stage 2stage 3 Put Interface Cell Sender Receiver Timing Domain A Each FIFO stage has a Put Interface Cell Get Interface Cell Full-Empty Control Data Store Timing Domain B Put Interface Cell Put Interface Cell Get Interface Cell Get Interface Cell Get Interface Cell Full-Empty Control Full-Empty Control Data Store Full-Empty Control Data Store Data Store
8
8 FIFO Overview: Modular Design stage 2stage 3 Put Interface Cell Sender Receiver Clocked Domain A Clockless Noc Put Interface Cell Get Interface Cell Get Interface Cell Full-Empty Control Data Store Full-Empty Control Data Store Data Store Get Interface Cell Put Interface Cell Full-Empty Control stage 1 CLOCKED PUT INTERFACE CLOCKLESS GET INTERFACE Mix-and-Match Interfaces
9
9 FIFO Overview: Modular Design stage 2stage 3 Sender Receiver Fast Clocked Domain A Slow Clocked Domain B Full-Empty Control Data Store Full-Empty Control Data Store Data Store Full-Empty Control stage 1 CLOCKED PUT INTERFACE CLOCKED GET INTERFACE Mix-and-Match Interfaces Can use different synchronization time lengths, depending on clock frequency Changing FIFO size doesn't affect individual FIFO stage 1 flop synchronizer 3 flop synchronizer
10
10 Full Empty Control and Data store Data Store consists of latches enabled when write is high Same blocks for clocked or clockless interfaces Full-Empty Control consists of a SR-latch on write, set output (full signal) high on read, set output low
11
11 asP* FIFO asP*- Asynchronous Symmetric Persistent Pulse Protocol Standard cells Good performance Doesn’t require C-elements asP* handshaking protocol is chosen as baseline for FIFO design
12
12 asP* FIFO -simulation 0 X 111 0 000 Initial state SR latches keeps track of empty/full status AND gates coordinate data transfer between stages
13
13 asP* FIFO -simulation 1 D 111 Data arrives, req rises SR latch EFi is set to indicate Li latch holds valid data 0000
14
14 asP* FIFO -simulation 1 D 1 1 11 Data arrives, req rises SR latch EFi is set to indicate Li latch holds valid data 00 0 0
15
15 asP* FIFO -simulation 1 D 1 1 D 11 Data propagates through L1 0 0 00
16
16 asP* FIFO -simulation 1 D 0 1 D 111 SR latch EF1 is set 000
17
17 asP* FIFO -simulation 0 X 0 0 D 111 1 Enabling L2 latch When stage i-1 is full and i is empty AND gate goes high loading data to Li 0 00
18
18 asP* FIFO -simulation 0 X 1 0 D 001 1 D 1 Clearing EF1 latch When stage i-1 is full and i is empty AND gate goes high loading data to Li Clearing SR EFi-1 latch to indicate that latch Li is now empty 00
19
19 asP* FIFO -simulation 0 X 1 0 D 001 0 D 1 1 00
20
20 asP* FIFO -simulation 0 X 1 0 D 010 0 D 0 1 D 1 0 Data available at output data_R Req_R goes high as data arrives to last stage
21
21 asP* FIFO -simulation 0 X 1 0 D 010 0 D 0 0 D 1 0
22
22 asP* FIFO -simulation 1 D1 1 1 D 010 0 D 0 0 D 1 Next data enters FIFO Actually it can enter just after ack_L falls indicating first data is written 0
23
23 asP* FIFO -simulation 0 D1 0 1 110 0 D 0 0 D 1 0
24
24 asP* FIFO -simulation 0 X 0 0 D1 110 1 D 0 0 D 1 0
25
25 asP* FIFO -simulation 0 X 1 0 D1 000 1 1 0 D 1 0
26
26 asP* FIFO -simulation 1 D2 1 0 D1 000 0 1 0 D 1 Next data enters FIFO 0
27
27 asP* FIFO -simulation 1 D2 1 1 000 0 D1 1 0 D 10
28
28 asP* FIFO -simulation 1 D2 0 1 100 0 D1 1 0 D 1 0
29
29 asP* FIFO -simulation 0 X 0 0 D2 100 0 D1 1 0 D 10
30
30 asP* FIFO -simulation 1 D3 0 0 D2 100 0 D1 1 0 D 1 FIFO FULL! No Acknowledge until next read out 0
31
31 asP* FIFO -simulation 1 D3 0 0 D2 100 0 D1 1 0 D 11 1 D Ack_R rises, data read out
32
32 asP* FIFO -simulation 1 D3 0 0 D2 101 0 D1 1 1 D 01 1
33
33 asP* FIFO -simulation 1 D3 0 0 D2 101 0 D1 1 1 00 0 Data propagates to empty space
34
34 asP* FIFO -simulation 1 D3 0 0 D2 110 0 D1 0 1 10 0
35
35 asP* Put Interface Cell 1 D3 0 0 D2 110 1 0 0 D1 10 0 Data propagates to empty space
36
36 asP* FIFO -simulation 1 D3 1 0 D2 000 1 1 0 D1 10 0
37
37 asP* FIFO -simulation 1 D3 1 0 D2 000 0 1 0 D1 10 0
38
38 asP* FIFO -simulation 1 D3 1 1 000 0 D2 1 0 D1 10 0 Now D3 can enter FIFO
39
39 asP* FIFO -simulation 1 D3 0 0 100 0 D2 1 0 D1 10 0
40
40 asP* FIFO -simulation 0 X 0 0 D3 100 0 D2 1 0 D1 10 0 Sender lowers Req_L
41
41 asP* FIFO - Timing Issue 1 1 1 0 D 1 0 T [En->Q ] Q ]+T AND
42
42 asP* FIFO - Timing Issue 1 11 0 0 0 MinResetPulseWidth[ R->Q ] Q ]+T AND
43
43 3-stage clockless FIFO Write Port Read Port Write requestRises if write succeeded Rises if data available at output Receiver acknowledge receipt of data
44
44 Stage of clockless FIFO Latches to load data Written when cell is empty Tri-state buffer Transfers tokens
45
45 asP* Put Interface Cell Signal from Sender (fanout to all stages)
46
46 asP* Put Interface Cell Signal to Sender (fanin from all stages)
47
47 asP* Put Interface Cell Signal to Data Store and Full-Empty Control
48
48 asP* Put Interface Cell Signal from Full-Empty Control
49
49 asP* Put Interface Cell Signal from previous stage Signal to next stage
50
50 asP* Put Interface Cell Sets in all but one cell to low
51
51 asP* Put Interface Cell
52
52 asP* Put Interface Cell
53
53 asP* Put Interface Cell
54
54 asP* Put Interface Cell
55
55 asP* Put Interface Cell
56
56 asP* Put Interface Cell
57
57 asP* Put Interface Cell
58
58 asP* Get Interface Cell Signal from Receiver
59
59 asP* Get Interface Cell Signal to Receiver
60
60 asP* Get Interface Cell Signal to Data Store and Full-Empty Control
61
61 asP* Get Interface Cell Signal from Full-Empty Control
62
62 asP* Get Interface Cell Signal to all stages
63
63 asP* Get Interface Cell -simulation 1 0 00 1 1 01 1 0 0 0
64
64 Full –empty cell Keeps track of whether cell is empty or full Set by write operation from put interface Reset by read operation from get interface AND gate ensures MUTEX on Set and Reset Avoid races Simplifies timing
65
65 Timing requirements for FIFO The minimum low time for req_put must be at least as large as the minimum clock pulse width for the FFs in the put interfaces. The minimum high time for req_put must be at least as large as the minimum pulse width for the set signal of the SR latch in the empty/full controller. The minimum high time for got_data must be at least as large as the minimum pulse width for the set signal of the SR latch.
66
66 Protocol converters asP* simple and efficient But: timing constraints make it unsuitable for long interconnect LEDR is delay insensitive and better suited for long interconnect Other converters possible
67
67 LEDR protocol –brief overview Dual-rail encoding: two wires per bit – delay-insensitive “Level-encoding”: Data rail: holds actual data value Parity rail: holds parity value Alternating-phase protocol: Encoding parity alternates between odd and even 0 1 Even 0 0 1 1 Odd 0 1 1 0 data rail parity rail parity rail Bit value LEDR Encoding Phase
68
68 LEDR signaling data parity evenoddevenevenoddevenodd Data rail: carries bit value in both phases Parity rail: phase alternates with each data item 0100111 Exactly one wire transition for each new data item
69
69 LEDR - completion detector 1-bit LEDR completion detector N-bit LEDR completion detector
70
70 LEDR-to-asP* converter Completion detector per bit Even parity detector Odd parity detector Store data when all data [1:n] bits have changed LEDR to asP* converter
71
71 LEDR-to-asP* converter In this Example : Assume Even parity phase 1 X P 1 0 0 D 1 0 0 D 1 0 1 D
72
72 LEDR-to-asP* converter 1 D P 1 0 0 X 0 0 1 D 1 0 1 1 0 0 1 1 0 1 1 0 X X
73
73 asP*-to-LEDR converter 0 0 0 0 0 0 0
74
74 asP*-to-LEDR converter 0 0 0 0 0 0 0 1 1 D D 1 1 DP 1 0 1 0 0 1 1
75
75 Clocked FIFOs Design goal is to provide all flavors of synchronization converters Synchronous-to-Asynchronous Asynchronous-to-Synchronous Synchronous-to-Synchronous Asyn-to-Sync and Sycn-to-Async is obtained by combining async put interface with sync get interace and vice versa Synchronous-to-Synchronous will be detailed in next slides
76
76 3-Stage Clocked FIFO Indicates that Data can be put into FIFO Ensures fully sync behavior
77
77 FIFO stage with clocked RX and TX
78
78 Clocked Put Interface Cell Signal to sender Signals from sender Synchronizer ●State (full or empty) of FIFO stage is synchronized ●One 1-bit synchronizer per FIFO stage interface ●Asymmetric delay
79
79 Clocked Put Interface Cell !
80
80 Clocked Put Interface Cell !
81
81 Clocked Put Interface Cell !
82
82 Clocked Put Interface Cell !
83
83 Clocked Put Interface Cell !
84
84 Clocked Put Interface Cell !
85
85 Clocked Put Interface Cell !
86
86 Clocked Put Interface Cell !
87
87 Clocked Put Interface Cell !
88
88 Clocked Put Interface Cell Clocked get interface cell is analogous
89
89 Example of 1.5 cycle Synchronizer IN OUT Async_ OUT
90
90 Synchronizer MTBF for different synchronizers and clock speeds 90nm technology τ- metastability resolving constant
91
91 FIFO latency and throughput Latency minimum time data spends in FIFO independent of FIFO length Throughput maximum number of data transfers per time depends on FIFO length
92
92 FIFO throughput Throughput is limited by slower of put and get interfaces Put interface delay: minimum time between two successive FIFO writes Get interface delay: minimum time between two successive FIFO reads
93
93 Clocked FIFO throughput simulation Simulation scenario 2-cycle synchronizer Same put and get frequency with zero phase shift Throughput results Doesn’t allow to write every clock cycle Need to increase FIFO to 6 stages FIFO with equal put and get frequencies and n-cycle synchronizer needs 2*(n+1) stages to support max throughput
94
94 asP* FIFO latency Write latency Read latency Receiver latency Full-Empty Control
95
95 asP* FIFO latency –clockless Latency measured from rising req_put to data_valid rising (220ps) + got_data rising to empty cell status (140ps) equals at total to 360ps Throughput limited by slower get and put interface, evaluated max 1.95Ghz Power 5.27mW at 1.95Ghz 5.27mW
96
96 asP* FIFO latency –clocked Latency measured from rising clk_put to rising clk_get with valid data (doesn’t depends on FIFO length) + tsync(173ps) Throughput gain when using 6 stage FIFO is 2 times 6 stages FIFO running at 1.28Ghz consumes 4.91mW 5.27mW
97
97 Clocked FIFO latency Measured from clk_put edge that latches data in FIFO until clk_get edge that notifies receiver of available data
98
98 Clocked FIFO throughput Throughput determined by slower of put and get interfaces There is a minimum required FIFO length to support maximum throughput Minimum FIFO length depends on synchronization latencies ratio of put and get clock speeds phase relationship of put and get clock
99
99 Conclusions Presented a synchronizing FIFO that can be built using standard cells has modular design following properties can be chosen independently: type of put and get interface synchronization time length FIFO size has simple interfaces
100
100 References T.Ono, M.Greenstreet. A modular synchronizing FIFO for NoCs Proceedings of the 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip M. E. Dean, T. E. Williams, and D. L. Dill. Efficient selftiming with level-encoded 2- phase dual-rail (LEDR). 1991. MIT Press. C. E. Molnar, I. W. Jones, W. S. Coates, and J. K. Lexau. A FIFO ring performance experiment. In Advanced Research in Asynchronous Circuits and Systems, 1997. Proceedings of the Third International Symposium on, pages 279–289, Eindhoven, Apr. 1997. I. E. Sutherland. Micropipelines. Commun. ACM,32(6):720–738, June 1989. Turing Award lecture. Mark Dean, Ted Williams and David Dill, “Efficient Self-Timing with Level-Encoded 2- Phase Dual Rail(LEDR)”, ARVLSI, 1991, pp. 55-70.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.