Download presentation
Presentation is loading. Please wait.
Published byAmie Kennedy Modified over 8 years ago
1
Network-on-Chip (2/2) Ben Abdallah, Abderazek The University of Aizu E-mail: benab@u-aizu.ac.jp 1 KUST University, March 2011
2
Part 3 Routing Routing Algorithms Deterministic Routing Oblivious Routing Adaptive Routing 2
3
Routing Basics Once topology is fixed Routing algorithm determines path(s) from source to destination They must prevent deadlock, livelock, and starvation 3
4
Routing Deadlock 4 Without routing restrictions, a resource cycle can occur Leads to deadlock
5
Deadlock Definition Deadlock: A packet does not reach its destination, because it is blocked at some intermediate resource Livelock: A packet does not reach its destination, because it enters a cyclic path Starvation: A packet does not reach its destination, because some resource does not grant access (wile it grants access to other packets) 5
6
Routing Algorithm Attributes Number of destinations Unicast, Multicast, Broadcast? Adaptivity Deterministic, Oblivious or Adaptive? Implementation (Mechanisms) Source or node routing? Table or circuit? 6
7
Deterministic Routing 7
8
Always choose the same path between two nodes Easy to implement and to make deadlock free Do not use path diversity and thus bad on load balancing Packets arrive in order 8
9
Deterministic Routing - Example: Destination-Tag Routing in Butterfly Networks 9 The destination address in binary is 5 = 101 = down, up, down, selects the route. The destination address interpreted as quaternary digits. 11=1011(2) = 23(4), selects the route 1 0 1 = 101 Note: Starting from any source and using the same pattern always routes to destination. Depends on the destination address only (not on source) 2 3
10
Deterministic Routing- Dimension-Order Routing For n-dimensional hypercubes and meshes, dimension-order routing produces deadlock-free routing algorithms. It is called XY routing in 2-D mesh and e-cube routing in hypercubes 10
11
Dimension-Order Routing - XY Routing Algorithm S D D 11
12
Dimension-Order Routing - XY Routing Algorithm 12 XY routing algorithm for 2 D Mesh
13
Deterministic Routing - E -cube Routing Algorithm 13 Dimension order routing algorithm for Hypercubes
14
Oblivious Routing 14
15
Oblivious (unconscious) Routing Always choose a route without knowing about the state of the network Random algorithms that do not consider the network state, are oblivious algorithms Include deterministic routing algorithms as a subset 15
16
Minimal Oblivious Routing Minimal oblivious routing attempts to achieve the load balance of randomized routing without giving up the locality This is done by restricting routes to minimal paths Again routing is done in two steps 1.Route to random node 2.Route to destination 16
17
17 Minimal Oblivious Routing - (Torus) Idea: For each packet randomly determine a node x inside the minimal quadrant, such that the packet is routed from source node s to x and then to destination node d Assumption: At each node routing in x or y direction is allowed. 0313233302122232 0111213100102030
18
18 For each node in quadrant (00, 10, 20, 01, 11, 21) ◦ Determine a minimal route via x Start with x = 00 ◦ Three possible routes: (00, 01, 11, 21) (p=0.33) (00, 10, 20, 21) (p=0.33) (00,10,11,21) (p=0.33) 03132333 02122232 01112131 00102030 Minimal Oblivious Routing - (Torus)
19
19 x = 01 ◦ One possible route: (00, 01, 11, 21) (p=1) 03132333 02122232 01112131 00102030 Minimal Oblivious Routing - (Torus)
20
20 x = 10 ◦ Two possible routes: (00, 10, 20, 21) (p=0.5) (00, 10, 11, 21) (p=0.5) 03132333 02122232 01112131 00102030 Minimal Oblivious Routing - (Torus)
21
21 x = 11 ◦ Two possible routes: (00, 10, 11, 21) (p=0.5) (00, 01, 11, 21) (p=0.5) 03132333 02122232 01112131 00102030 Minimal Oblivious Routing - (Torus)
22
22 x = 20 ◦ One possible route: (00, 10, 20, 21) (p=1) 03132333 02122232 01112131 00102030 Minimal Oblivious Routing - (Torus)
23
23 x = 21 ◦ Three possible routes: (00, 01, 11, 21) (p=0.33) (00, 10, 20, 21) (p=0.33) (00, 10, 11, 21) (p=0.33) 03132333 02122232 01112131 00102030 Minimal Oblivious Routing - (Torus)
24
24 Adding the probabilities on each channel Example, link (00,01) ◦ P=1/3, x = 00 ◦ P=1, x = 01 ◦ P=0, x = 10 ◦ P=1/2, x = 11 ◦ P=0, x = 20 ◦ P=1/3, x = 21 ◦ P(00,01)=(2*1/3+1/2+1)/6 =2.17/6 03132333 02122232 01112131 00102030 Minimal Oblivious Routing - (Torus)
25
25 Results: ◦ Load is not very balanced ◦ Path between node 10 and 11 is very seldomly used Good locality performance is achieved at expense of worst-case performance 03132333 02122232 01112131 00102030 p = 2.17/6 p = 3.83/6 p = 2.17/6 p = 1.67/6p = 2.17/6 p = 3.83/6 Minimal Oblivious Routing - (Torus)
26
Adaptive Routing (route influenced by traffic along the way) 26
27
Adaptive Routing Uses network state to make routing decisions Buffer occupancies often used Couple with flow control mechanism Local information readily available Global information more costly to obtain Network state can change rapidly Use of local information can lead to non ‐ optimal choices Can be minimal or non ‐ minimal 27
28
Adaptive Routing Local Information not enough In each cycle: Node 5 sends packet to node 6 Node 3 sends packet to node 7 0123456701234567 28
29
Adaptive Routing Local Information not enough Node 3 does not know about the traffic between 5 and 6 before the input buffers between node 3 and 5 are completely filled with packets! 0123456701234567 29
30
Adaptive Routing Local Information is not enough Adaptive flow works better with smaller buffers, since small buffers fill faster and thus congestion is propagated earlier to the sensing node (stiff backpressure) 0123456701234567 30
31
Adaptive Routing How does the adaptive routing algorithm sense the state of the network? It can only sense current local information Global information is based on historic local information Changes in the traffic flow in the network are observed much later 31
32
Minimal Adaptive Routing Minimal adaptive routing chooses among the minimal routes from source s to destination d 32 03132333 02122232 01112131 00102030
33
Minimal Adaptive Routing At each hop a routing function generates a productive output vector that identifies which output channels of the current node will move the packet closer to its destination Network state is then used to select one of these channels for the next hop 33 03132333 02122232 01112131 00102030
34
Minimal Adaptive Routing Good at locally balancing load Poor at globally balancing load Minimal adaptive routing algorithms are unable to avoid congestion of source- destination pairs with no minimal path diversity. 34 03132333 02122232 01112131 00102030 Local congestion can be avoided Local congestion cannot be avoided
35
35 Fully Adaptive Routing Fully-Adaptive Routing does not restrict packets to take the shortest path Misrouting is allowed This can help to avoid congested areas and improves load balance 03132333 02122232 01112131 00102030
36
36 Fully Adaptive Routing Live-Lock Fully-Adaptive Routing may result in live-lock! Mechanisms must be added to prevent livelock ◦ Misrouting may only be allowed a fixed number of times 03132333 02122232 01112131 00102030
37
Summary of Routing Algorithms Deterministic routing is a simple and inexpensive routing algorithm, but does not utilize path diversity and thus is weak on load balancing Oblivious algorithms give often good results since they allow load balancing and their effects are easy to analyse Adaptive algorithms, though in theory superior, suffer from that global information is not available at a local node 37
38
Summary of Routing Algorithms Latency paramount concern Minimal routing most common for NoC Non ‐ minimal can avoid congestion and deliver low latency To date: NoC research favors DOR for simplicity and deadlock freedom Only covered unicast routing Recent work on extending on ‐ chip routing to support multicast 38
39
Part 4 NoC Routing Mechanisms 39
40
Routing Two approaches: Fixed routing tables at the source or at each hop Algorithmic routing uses specialized hardware to compute the route or next hop at run-time The term routing mechanics refers to the mechanism that is used to implement any routing algorithm 40
41
Table-based Routing Two approaches: Source-table routing implements all-at- once routing by looking up the entire route at the source Node-table routing performs incremental routing by looking up the hop-by-hop routing relation at each node along the route Major advantage: A routing table can support any routing relation on any topology 41
42
Table-based Routing 42 Example routing mechanism for deterministic source routing NoCs. The NI uses a LUT to store the route map.
43
Source Routing All routing decisions are made at the source terminal To route a packet 1)the table is indexed using the packet destination 2)a route or a set of routes are returned 3)one route is selected 4)the route is prepended to the packet Because of its speed, simplicity and scalability source routing is very often used for deterministic and oblivious routing 43
44
44 Source Routing - Example The example shows a routing table for a 4x2 torus network In this example there are two alternative routes for each destination Each node has its own routing table 00102030 3121 1101 DestinationRoute 0Route 1 00XX 10EXWWWX 20EEXWWX 30WXEEEX 01NXSX 11NEXENX 21NEEXWWNX 31NWXWNX Source routing table for node 00 of 4x2 torus network In this example the order of XY should be the opposite, i.e. 21->12 4x2 torus network Example: -Routing from 00 to 21 -Table is indexed with 21 -Two routes: NEEX and WWNX -The source arbitrarily selects NEEX index select
45
Arbitrary Length Encoding of Source Routes Advantage: It can be used for arbitrary-sized networks The complexity of routing is moved from the network nodes to the terminal nodes But routers must be able to handle arbitrary length routes 45
46
Arbitrary Length-Encoding Router has 16-bit phits 32-bit flits Route has 13 hops: NENNWNNENNWNN Extra symbols: P: Phit continuation selector F: Flit continuation Phit The tables entries in the terminals must be of arbitrary length 46
47
Node-Table Routing Table-based routing can also be performed by placing the routing table in the routing nodes rather than in the terminals Node-table routing is appropriate for adaptive routing algorithms, since it can use state information at each node 47
48
Node-Table Routing A table lookup is required, when a packet arrives at a router, which takes additional time compared to source routing Scalability is sacrificed, since different nodes need tables of varying size Difficult to give two packets arriving from a different node a different way through the network without expanding the tables 48
49
49 Example Table shows a set of routing tables There are two choices from a source to a destination Routing Table for Node 00 00102030 3121 1101 N E Note: Bold font ports are misroutes
50
50 Example 00102030 3121 1101 A packet passing through node 00 destined for node 11. If the entry for (00->11) is N, go to 10 and (10-> 11) is S => 00 10 (livelock) Livelock can occur
51
Algorithmic Routing Instead of using a table, algorithms can be used to compute the next route In order to be fast, algorithms are usually not very complicated and implemented in hardware 51
52
52 Algorithmic Routing - Example Dimension-Order Routing ◦ sx and sy indicated the preferred directions sx=0, +x; sx=1, -x sy=0, +y; sy=1, -y ◦ x and y represent the number of hops in x and y direction ◦ The PDV is used as an input for selection of a route Determines the type of the routing Indicates which channels advance the packet
53
A minimal oblivious router - Implemented by randomly selecting one of the active bits of the PDV as the selected direction Minimal adaptive router - Achieved by making selection based on the length of the respective output Qs. Fully adaptive router – Implemented by picking up unproductive direction if Qs > threshold results 53 Algorithmic Routing - Example
54
Summary Routing Mechanics Table based routing Source routing Node-table routing Algorithmic routing 54
55
Exercise Compression of source routes. In the source routes, each port selector symbol [N,S,W,E, and X] was encoded with three bits. Suggest an alternative encoding to reduce the average length (in bits) required to represent a source route. Justify your encoding in terms of typical routes that might occur on a torus. Also compare the original three bits per symbol with your encoding on the following routes: (a) NNNNNEEX (b) WNEENWWWWWNX 55
56
Part 5 NoC Flow Control Resources in a Network Node Bufferless Flow Control Buffered Flow control 56
57
Flow Control (FC) Goal is to use resources as efficient as possible to allow a high throughput An efficient FC is a prerequisite to achieve a good network performance 57 FC determines how the resources of a network, such as channel bandwidth and buffer capacity are allocated to packets traversing a network.
58
Flow Control FC can be viewed as a problem of Resource allocation Contention resolution Resources in form of channels, buffers and state must be allocated to each packet If two packets compete for the same channel flow control can only assign the channel to one packet, but must also deal with the other packet 58
59
Flow Control Flow Control can be divided into: 1. Bufferless flow control Packets are either dropped or misrouted 2. Buffered flow control Packets that cannot be routed via the desired channel are stored in buffers 59
60
Resources in a Network Node Control State Tracks the resources allocated to the packet in the node and the state of the packet Buffer Packet is stored in a buffer before it is send to next node Bandwidth To travel to the next node bandwidth has to be allocated for the packet 60
61
Units of Resource Allocation - Packet or Flits? Contradictory requirements on packets Packets should be very large in order to reduce overhead of routing and sequencing Packets should be very small to allow efficient and fine-grained resource allocation and minimize blocking latency Flits try to eliminate this conflict Packets can be large (low overhead) Flits can be small (efficient resource allocation) 61
62
Units of Resource Allocation - Size: Phit, Flit, Packet There are no fixed rules for the size of phits, flits and packets Typical values Phits: 1 bit to 64 bits Flits: 16 bits to 512 bits Packets: 128 bits to 1024 bits 62
63
Bufferless Flow Control No buffers less implementation cost If more than 1 packet shall be routed to the same output, 1 has to be Misrouted or Dropped Example: two packets A, and B (consisting of several flits) arrive at a network node. 63
64
Bufferless Flow Control Packet B is dropped and must be resended There must be a protocol that informs the sending node that the packet has been dropped Example: Resend after no acknowledge has been received within a given time 64
65
Bufferless Flow Control Packet B is misrouted No further action is required here, but at the receiving node packets have to be sorted into original order 65
66
Circuit Switching Circuit-Switching is a bufferless flow control, where several channels are reserved to form a circuit A request (R) propagates from source to destination, which is answered by an acknowledgement (A) Then data is sent (here two five flit packets (D)) and a tail flit (T) is sent to deallocate the channels 66
67
Circuit Switching 67 Circuit-switching does not suffer from dropping or misrouting packets However there are two weaknesses: High latency: T = 3 H tr + L/b Low throughput, since channel is used to a large fraction of time for signaling and not for delivery of the payload
68
Circuit Switching Latency 68 T = 3 H tr + L/b Where: H : time required to set up the channel and delivers the head flit tr: serialization latency L: time of flight b: contention time Note: 3 x header latency because the path from source to destination must be traversed 3 times to deliver the packet: Once in each direction to set up the circuit and then again to deliver the first flit
69
Buffered Flow Control More efficient flow control can be achieved by adding buffers With sufficient buffers packets do not need to be misrouted or dropped, since packets can wait for the outgoing channel to be ready 69
70
Buffered Flow Control Two main approaches: 1. Packet-Buffer Flow Control Store-And-Forward Cut-Through 2. Flit-Buffer Flow Control Wormhole Flow Control Virtual Channel Flow Control 70
71
Store & Forward Flow Control Each node along a route waits until a packet is completely received (stored) and then the packet is forwarded to the next node Two resources are needed Packet-sized buffer in the switch Exclusive use of the outgoing channel 71
72
Advantage: While waiting to acquire resources, no channels are being held idle and only a single packet buffer on the current node is occupied Disadvantage: Very high latency T = H (tr + L/b) 72 Store & Forward Flow Control
73
Cut-Through Flow Control Advantages Cut-through reduces the latency T = H tr + L/b Disadvantages No good utilization of buffers, since they are allocated in units of packets Contention latency is increased, since packets must wait until a whole packet leaves the occupied channel 73
74
Wormhole Flow Control Wormhole FC operates like cut-through, but with channel and buffers allocated to flits rather than packets When the head flit arrives at a node, it must acquire resources (VC, B,) before it can be forwarded to the next node Tail flits behave like body flits, but release also the channel 74
75
Wormhole (WH) Flow Control Virtual channels hold the state needed to coordinate the handling of flits of a packet over a channel Comparison to cut-through wormhole flow control makes far more efficient use of buffer space Throughput maybe less, since wormhole flow control may block a channels mid- packets 75
76
Example for WH Flow Control Input virtual channel is in idle state (I) Upper output channel is occupied, allocated to lower channel (L) 76
77
Example for WH Flow Control Input channel enters waiting state (W) Head flit is buffered 77
78
78 Body flit is also buffered No more flits can be buffered, thus congestion arises if more flits want to enter the switch Example for WH Flow Control
79
Virtual channel enters active state (A) Head flit is output on upper channel Second body flit is accepted 79 Example for WH Flow Control
80
First body flit is output Tail flit is accepted 80 Example for WH Flow Control
81
Second body flit is output 81 Example for WH Flow Control
82
Tail flit is output Virtual channels is deallocated and returns to idle state 82 Example for WH Flow Control
83
Wormhole Flow Control The main advantage of wormhole to cut- through is that buffers in the routers do not need to be able to hold full packets, but only need to store a number of flits This allows to use smaller and faster routers 83
84
Part 6 NoC Flow Control (continued) Blocking Virtual Channel-Flow Control Virtual Channel Router Credit-Based Flow Control On/Off Flow Control Flow Control Summary 84
85
Blocking - Cut-Through and Wormhole If a packet is blocked, the flits of the wormhole packet are stored in different routers 85 Wormhole (Buffer-Size 2 Flits) Cut-Through (Buffer-Size 1 Packet) Blocked
86
Wormhole Flow Control There is only one virtual channel for each physical channel Packet A is blocked and cannot acquire channel p Though channels p and q are idle packet A cannot use these channels since B owns channel p 86
87
Virtual Channel-Flow Control In virtual channel flow-control several channels are associated with a single physical channel This allows to use the bandwidth that otherwise is left idle when a packet blocks the channel Unlike wormhole flow control subsequent flits are not guaranteed bandwidth, since they have to compete for bandwidth with other flits 87
88
Virtual Channel Flow Control There are several virtual channels for each physical channel Packet A can use a second virtual channel and thus proceed over channel p and q 88
89
Virtual Channel Allocation Flits must be delivered in order, H, B, …B, T. Only the head flit carries routing information Allocate VC at the packet level, i.e., packet- by-packet The head flit responsible for allocating VCs along the route. Body and tail flits must follow the VC path, and the tail flit releases the VCs. The flits of a packet cannot interleave with those of any other packet 89
90
Virtual Channel Flow Control - Fair Bandwidth Arbitration VCs interleave their flits Results in a high average latency 90
91
Virtual Channel Flow Control - Winner-Take-All Arbitration A winner-take all arbitration reduces the average latency with no throughput penalty 91
92
Virtual Channel Flow Control - Buffer Storage Buffer storage is organized in two dimensions Number of virtual channels Number of flits that can be buffered per channel 92
93
Virtual Channel Flow Control - Buffer Storage Virtual channel buffer shall at least be as deep as needed to cover round- trip credit latency In general it is usually better to add more virtual channels than to increase the buffer size 93
94
Virtual Channel 94 A: active W: waiting I: idle
95
Virtual Channel Router 95
96
Buffer Organization 96 Single buffer per input Multiple fixed length queues per physical channel
97
Buffer Management In buffered CF nodes there is a need for communication between nodes in order to inform about the availability of buffers Backpressure informs upstream nodes that they must stop sending to a downstream node when the buffers of that downstream node are full 97 upstream node downstream node Traffic Flow
98
Credit-Based Flow Control The upstream router keeps a count of the number of free flit buffers in each virtual channel downstream Each time the upstream router forwards a flit, it decrements the counter If a counter reaches zero, the downstream buffer is full and the upstream node cannot send a new flit If the downstream node forwards a flit, it frees the associated buffer and sends a credit to the upstream buffer, which increments its counter 98
99
Credit-Based Flow Control 99
100
Credit-Based Flow Control The minimum time between the credit being sent at time t1 and a credit send for the same buffer at time t5 is the credit round- trip delay tcrt 100 All buffers on the downstream are full
101
Credit-Based Flow Control If there is only a single flit buffer, a flit waits for a new credit and the maximum throughput is limited to one flit for each t crt The bit rate would be then L f / t crt where L f is the length of a flit in bits 101
102
Credit-Based Flow Control If there are F flit buffers on the virtual channel, F flits could be sent before waiting for the credit, which gives a throughput of F flits for each t crt and a bit rate of F Lf / t crt 102
103
Credit-Based Flow Control In order not to limit the throughput by low level flow control the flit buffer should be at least where b is the bandwidth of a channel 103
104
Credit-Based Flow Control For each flit sent downstream a corresponding credit is set upstream Thus there is a large amount of upstream signaling, which especially for small flits can represent a large overhead! 104
105
On/Off Flow Control On/off Flow control tries to reduce the amount of upstream signaling An off signal is sent to the upstream node, if the number of free buffers falls below the threshold F off An on signal is sent to the upstream node, if the number of free buffers rises above the threshold F on With carefully dimensioned buffers on/off flow control can achieve a very low overhead in form of upstream signaling 105
106
Ack/Nack Flow Control In ack/nack flow control the upstream node sends packets without knowing, if there are free buffers in the downstream node 106
107
Ack/Nack Flow Control If there is no buffer available the downstream node sends nack and drops the flit the flit must be resent flits must be reordered at the downstream node If there is a buffer available The downstream node sends ack and stores the flit in a buffer 107
108
Buffer Management Because of its buffer and bandwidth inefficiency ack/nack is rarely used Credit-based flow control is used in systems with small numbers of buffers On/off flow control is used in systems that have large numbers of flit buffers 108
109
Flow Control Summary Bufferless flow control Dropping, misroute packets Circuit switching Buffered flow control Packet-Buffer Flow Control: SAF vs. Cut Through Flit-Buffer Flow Control: Wormhole and Virtual Channel Switch-to-switch (link level) flow control Credit-based, On/Off, Ack/Nack 109
110
Part 7 Router Architecture Virtual-channel Router Virtual channel state fields The Router Pipeline Pipeline Stalls 110
111
Router Microarchitecture - Virtual-channel Router Modern routers are pipelined and work at the flit level Head flits proceed through buffer stages that perform routing and virtual channel allocation All flits pass through switch allocation and switch traversal stages Most routers use credits to allocate buffer space 111
112
Typical Virtual Channel Router A routers functional blocks can be divided into Datapath: handles storage and movement of a packets payload Input buffers Switch Output buffers Control: coordinating the movements of the packets through the resources of the datapath Route Computation VC Allocator Switch Allocator 112
113
Typical Virtual Channel Router The input unit contains a set of flit buffers Maintains the state for each virtual channel G = Global State R = Route O = Output VC P = Pointers C = Credits 113
114
Virtual Channel State Fields (Input) 114
115
Typical Virtual Channel Router During route computation the output port for the packet is determined Then the packet requests an output virtual channel from the virtual-channel allocator 115
116
Typical Virtual Channel Router Flits are forwarded via the virtual channel by allocating a time slot on the switch and output channel using the switch allocator Flits are forwarded to the appropriate output during this time slot The output unit forwards the flits to the next router in the packet’s path 116
117
Virtual Channel State Fields (Output) 117
118
Packet Rate and Flit Rate The control of the router operates at two distinct frequencies Packet Rate (performed once per packet) Route computation Virtual-channel allocation Flit Rate (performed once per flit) Switch allocation Pointer and credit count update 118
119
The Router Pipeline A typical router pipeline includes the following stages RC (Routing Computation) VC (Virtual Channel Allocation) SA (Switch Allocation) ST (Switch Traversal 119 no pipeline stalls
120
The Router Pipeline Cycle 0 Head flit arrives and the packet is directed to an virtual channel of the input port (G = I) 120 no pipeline stalls
121
The Router Pipeline Cycle 1 Routing computation Virtual channel state changes to routing (G = R) Head flit enters RC-stage First body flit arrives at router 121 no pipeline stalls
122
The Router Pipeline Cycle 2: Virtual Channel Allocation Route field (R) of virtual channel is updated Virtual channel state is set to “waiting for output virtual channel” (G = V) Head flit enters VA state First body flit enters RC stage Second body flit arrives at router 122 no pipeline stalls
123
The Router Pipeline Cycle 2: Virtual Channel Allocation The result of the routing computation is input to the virtual channel allocator If successful, the allocator assigns a single output virtual channel The state of the virtual channel is set to active (G = A 123 no pipeline stalls
124
The Router Pipeline Cycle 3: Switch Allocation All further processing is done on a flit base Head flit enters SA stage Any active VA (G = A) that contains buffered flits (indicated by P) and has downstream buffers available (C > 0) bids for a single-flit time slot through the switch from its input VC to the output VC 124 no pipeline stalls
125
The Router Pipeline Cycle 3: Switch Allocation If successful, pointer field is updated Credit field is decremented 125 no pipeline stalls
126
The Router Pipeline Cycle 4: Switch Traversal Head flit traverses the switch Cycle 5: Head flit starts traversing the channel to the next router 126 no pipeline stalls
127
The Router Pipeline Cycle 7: Tail traverses the switch Output VC set to idle Input VC set to idle (G = I), if buffer is empty Input VC set to routing (G = R), if another head flit is in the buffer 127 no pipeline stalls
128
The Router Pipeline Only the head flits enter the RC and VC stages The body and tail flits are stored in the flit buffers until they can enter the SA stage 128 no pipeline stalls
129
Pipeline Stalls Pipeline stalls can be divided into Packet stalls can occur if the virtual channel cannot advance to its R, V, or A state Flit stalls If a virtual channel is in active state and the flit cannot successfully complete switch allocation due to Lack of flit Lack of credit Losing arbitration for the switch time slot 129
130
Example for Packet Stall Virtual-channel allocation stall Head flit of A can first enter the VA stage when the tail flit of packet B completes switch allocation and releases the virtual channel 130
131
Example for Packet Stall Virtual-channel allocation stall 131 Head flit of A can first enter the VA stage when the tail flit of packet B completes switch allocation and releases the virtual channel
132
Example for Flit Stalls 132 Second body flit fails to allocate the requested connection in cycle 5 Switch allocation stall
133
Example for Flit Stalls 133 Buffer empty stall Body flit 2 is delayed three cycles. However, since it does not have to enter the RC and VA stage the output is only delayed one cycle!
134
Credits A buffer is allocated in the SA stage on the upstream (transmitting) node To reuse the buffer, a credit is returned over a reverse channel after the same flit departs the SA stage of the downstream (receiving) node When the credit reaches the input unit of the upstream node the buffer is available can be reused 134
135
Credits The credit loop can be viewed by means of a token that Starting at the SA stage of the upstream node Traveling downwards with the flit Reaching the SA stage at the downstream node Returning upstream as a credit 135
136
Credit Loop Latency The credit loop latency tcrt, expressed in flit times, gives a lower bound on the number of flit buffers needed on the upstream size for the channel to operate with full bandwidth t crt in flit times is given by: 136
137
Credit Loop Latency If the number of buffers available per virtual channel is F, the duty factor of the channel will be d = min (1, F / t crt ) The duty factor will be 100% as long as there are sufficient flit buffers to cover the round trip latency 137
138
Credit Stall 138 Virtual Channel Router with 4 flit buffers
139
Flit and Credit Encoding A. Flits and credits are send over separated lines with separate width B. Flits and credits are transported via the same line. This can be done by Including credits into flits Multiplexing flits and credits at phit level Option (A) is considered more efficient. For a more detailed discussion check Section 16.6 in the Dally-book 139
140
Summary NoC is a scalable platform for billion-transistor chips Several driving forces behind it Many open research questions May change the way we structure and model VLSI systems 140 Hong Kong University of Science and Technology, March 2010
141
References OASIS NoC Architecture Design in Verilog HDL, Technical Report,TR- 062010-OASIS, Adaptive Systems Laboratory, the University of Aizu, June 2010. OASIS NoC Architecture Design in Verilog HDL OASIS NoC Project: http://web-ext.u-aizu.ac.jp/~benab/research/projects/oasis/ 141
142
Network-on-Chip Ben Abdallah, Abderazek The University of Aizu E-mail: benab@u-aizu.ac.jp 142 KUST University, March 2011
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.