Network-on-Chip (2/2) Ben Abdallah, Abderazek The University of Aizu 1 KUST University, March 2011.

Network-on-Chip (2/2) Ben Abdallah, Abderazek The University of Aizu E-mail: benab@u-aizu.ac.jp 1 KUST University, March 2011

Part 3 Routing Routing Algorithms Deterministic Routing Oblivious Routing Adaptive Routing 2

Routing Basics  Once topology is fixed  Routing algorithm determines path(s) from source to destination  They must prevent deadlock, livelock, and starvation 3

Routing Deadlock 4  Without routing restrictions, a resource cycle can occur  Leads to deadlock

Deadlock Definition Deadlock: A packet does not reach its destination, because it is blocked at some intermediate resource Livelock: A packet does not reach its destination, because it enters a cyclic path Starvation: A packet does not reach its destination, because some resource does not grant access (wile it grants access to other packets) 5

Routing Algorithm Attributes  Number of destinations  Unicast, Multicast, Broadcast?  Adaptivity  Deterministic, Oblivious or Adaptive?  Implementation (Mechanisms)  Source or node routing?  Table or circuit? 6

Deterministic Routing 7

 Always choose the same path between two nodes  Easy to implement and to make deadlock free  Do not use path diversity and thus bad on load balancing  Packets arrive in order 8

Deterministic Routing - Example: Destination-Tag Routing in Butterfly Networks 9 The destination address in binary is 5 = 101 = down, up, down, selects the route. The destination address interpreted as quaternary digits. 11=1011(2) = 23(4), selects the route 1 0 1 = 101 Note: Starting from any source and using the same pattern always routes to destination.  Depends on the destination address only (not on source) 2 3

Deterministic Routing- Dimension-Order Routing  For n-dimensional hypercubes and meshes, dimension-order routing produces deadlock-free routing algorithms.  It is called XY routing in 2-D mesh and e-cube routing in hypercubes 10

Dimension-Order Routing - XY Routing Algorithm S D D 11

Dimension-Order Routing - XY Routing Algorithm 12 XY routing algorithm for 2 D Mesh

Deterministic Routing - E -cube Routing Algorithm 13 Dimension order routing algorithm for Hypercubes

Oblivious Routing 14

Oblivious (unconscious) Routing  Always choose a route without knowing about the state of the network  Random algorithms that do not consider the network state, are oblivious algorithms  Include deterministic routing algorithms as a subset 15

Minimal Oblivious Routing  Minimal oblivious routing attempts to achieve the load balance of randomized routing without giving up the locality  This is done by restricting routes to minimal paths  Again routing is done in two steps 1.Route to random node 2.Route to destination 16

17 Minimal Oblivious Routing - (Torus) Idea: For each packet randomly determine a node x inside the minimal quadrant, such that the packet is routed from source node s to x and then to destination node d Assumption: At each node routing in x or y direction is allowed. 0313233302122232 0111213100102030

18 For each node in quadrant (00, 10, 20, 01, 11, 21) ◦ Determine a minimal route via x Start with x = 00 ◦ Three possible routes:  (00, 01, 11, 21) (p=0.33)  (00, 10, 20, 21) (p=0.33)  (00,10,11,21) (p=0.33) 03132333 02122232 01112131 00102030 Minimal Oblivious Routing - (Torus)

19 x = 01 ◦ One possible route:  (00, 01, 11, 21) (p=1) 03132333 02122232 01112131 00102030 Minimal Oblivious Routing - (Torus)

20 x = 10 ◦ Two possible routes:  (00, 10, 20, 21) (p=0.5)  (00, 10, 11, 21) (p=0.5) 03132333 02122232 01112131 00102030 Minimal Oblivious Routing - (Torus)

21 x = 11 ◦ Two possible routes:  (00, 10, 11, 21) (p=0.5)  (00, 01, 11, 21) (p=0.5) 03132333 02122232 01112131 00102030 Minimal Oblivious Routing - (Torus)

22 x = 20 ◦ One possible route:  (00, 10, 20, 21) (p=1) 03132333 02122232 01112131 00102030 Minimal Oblivious Routing - (Torus)

23 x = 21 ◦ Three possible routes:  (00, 01, 11, 21) (p=0.33)  (00, 10, 20, 21) (p=0.33)  (00, 10, 11, 21) (p=0.33) 03132333 02122232 01112131 00102030 Minimal Oblivious Routing - (Torus)

24 Adding the probabilities on each channel Example, link (00,01) ◦ P=1/3, x = 00 ◦ P=1, x = 01 ◦ P=0, x = 10 ◦ P=1/2, x = 11 ◦ P=0, x = 20 ◦ P=1/3, x = 21 ◦ P(00,01)=(2*1/3+1/2+1)/6 =2.17/6 03132333 02122232 01112131 00102030 Minimal Oblivious Routing - (Torus)

25 Results: ◦ Load is not very balanced ◦ Path between node 10 and 11 is very seldomly used Good locality performance is achieved at expense of worst-case performance 03132333 02122232 01112131 00102030 p = 2.17/6 p = 3.83/6 p = 2.17/6 p = 1.67/6p = 2.17/6 p = 3.83/6 Minimal Oblivious Routing - (Torus)

Adaptive Routing (route influenced by traffic along the way) 26

Adaptive Routing  Uses network state to make routing decisions  Buffer occupancies often used  Couple with flow control mechanism  Local information readily available  Global information more costly to obtain  Network state can change rapidly  Use of local information can lead to non ‐ optimal choices  Can be minimal or non ‐ minimal 27

Adaptive Routing Local Information not enough  In each cycle:  Node 5 sends packet to node 6  Node 3 sends packet to node 7 0123456701234567 28

Adaptive Routing Local Information not enough  Node 3 does not know about the traffic between 5 and 6 before the input buffers between node 3 and 5 are completely filled with packets! 0123456701234567 29

Adaptive Routing Local Information is not enough  Adaptive flow works better with smaller buffers, since small buffers fill faster and thus congestion is propagated earlier to the sensing node (stiff backpressure) 0123456701234567 30

Adaptive Routing  How does the adaptive routing algorithm sense the state of the network?  It can only sense current local information  Global information is based on historic local information  Changes in the traffic flow in the network are observed much later 31

Minimal Adaptive Routing  Minimal adaptive routing chooses among the minimal routes from source s to destination d 32 03132333 02122232 01112131 00102030

Minimal Adaptive Routing  At each hop a routing function generates a productive output vector that identifies which output channels of the current node will move the packet closer to its destination  Network state is then used to select one of these channels for the next hop 33 03132333 02122232 01112131 00102030

Minimal Adaptive Routing Good at locally balancing load Poor at globally balancing load Minimal adaptive routing algorithms are unable to avoid congestion of source- destination pairs with no minimal path diversity. 34 03132333 02122232 01112131 00102030 Local congestion can be avoided Local congestion cannot be avoided

35 Fully Adaptive Routing  Fully-Adaptive Routing does not restrict packets to take the shortest path  Misrouting is allowed  This can help to avoid congested areas and improves load balance 03132333 02122232 01112131 00102030

36 Fully Adaptive Routing Live-Lock  Fully-Adaptive Routing may result in live-lock!  Mechanisms must be added to prevent livelock ◦ Misrouting may only be allowed a fixed number of times 03132333 02122232 01112131 00102030

Summary of Routing Algorithms  Deterministic routing is a simple and inexpensive routing algorithm, but does not utilize path diversity and thus is weak on load balancing  Oblivious algorithms give often good results since they allow load balancing and their effects are easy to analyse  Adaptive algorithms, though in theory superior, suffer from that global information is not available at a local node 37

Summary of Routing Algorithms  Latency paramount concern  Minimal routing most common for NoC  Non ‐ minimal can avoid congestion and deliver low latency  To date: NoC research favors DOR for simplicity and deadlock freedom  Only covered unicast routing  Recent work on extending on ‐ chip routing to support multicast 38

Part 4 NoC Routing Mechanisms 39

Routing  Two approaches:  Fixed routing tables at the source or at each hop  Algorithmic routing uses specialized hardware to compute the route or next hop at run-time The term routing mechanics refers to the mechanism that is used to implement any routing algorithm 40

Table-based Routing  Two approaches:  Source-table routing implements all-at- once routing by looking up the entire route at the source  Node-table routing performs incremental routing by looking up the hop-by-hop routing relation at each node along the route  Major advantage:  A routing table can support any routing relation on any topology 41

Table-based Routing 42 Example routing mechanism for deterministic source routing NoCs. The NI uses a LUT to store the route map.

Source Routing  All routing decisions are made at the source terminal  To route a packet 1)the table is indexed using the packet destination 2)a route or a set of routes are returned 3)one route is selected 4)the route is prepended to the packet  Because of its speed, simplicity and scalability source routing is very often used for deterministic and oblivious routing 43

44 Source Routing - Example The example shows a routing table for a 4x2 torus network In this example there are two alternative routes for each destination Each node has its own routing table 00102030 3121 1101 DestinationRoute 0Route 1 00XX 10EXWWWX 20EEXWWX 30WXEEEX 01NXSX 11NEXENX 21NEEXWWNX 31NWXWNX Source routing table for node 00 of 4x2 torus network In this example the order of XY should be the opposite, i.e. 21->12 4x2 torus network Example: -Routing from 00 to 21 -Table is indexed with 21 -Two routes: NEEX and WWNX -The source arbitrarily selects NEEX index select

Arbitrary Length Encoding of Source Routes  Advantage:  It can be used for arbitrary-sized networks  The complexity of routing is moved from the network nodes to the terminal nodes  But routers must be able to handle arbitrary length routes 45

Arbitrary Length-Encoding  Router has  16-bit phits  32-bit flits  Route has 13 hops: NENNWNNENNWNN  Extra symbols:  P: Phit continuation selector  F: Flit continuation Phit  The tables entries in the terminals must be of arbitrary length 46

Node-Table Routing  Table-based routing can also be performed by placing the routing table in the routing nodes rather than in the terminals  Node-table routing is appropriate for adaptive routing algorithms, since it can use state information at each node 47

Node-Table Routing  A table lookup is required, when a packet arrives at a router, which takes additional time compared to source routing  Scalability is sacrificed, since different nodes need tables of varying size  Difficult to give two packets arriving from a different node a different way through the network without expanding the tables 48

49 Example Table shows a set of routing tables There are two choices from a source to a destination Routing Table for Node 00 00102030 3121 1101 N E Note: Bold font ports are misroutes

50 Example 00102030 3121 1101 A packet passing through node 00 destined for node 11. If the entry for (00->11) is N, go to 10 and (10-> 11) is S => 00 10 (livelock) Livelock can occur

Algorithmic Routing  Instead of using a table, algorithms can be used to compute the next route  In order to be fast, algorithms are usually not very complicated and implemented in hardware 51

52 Algorithmic Routing - Example Dimension-Order Routing ◦ sx and sy indicated the preferred directions  sx=0, +x; sx=1, -x  sy=0, +y; sy=1, -y ◦ x and y represent the number of hops in x and y direction ◦ The PDV is used as an input for selection of a route Determines the type of the routing Indicates which channels advance the packet

 A minimal oblivious router - Implemented by randomly selecting one of the active bits of the PDV as the selected direction  Minimal adaptive router - Achieved by making selection based on the length of the respective output Qs.  Fully adaptive router – Implemented by picking up unproductive direction if Qs > threshold results 53 Algorithmic Routing - Example

Summary  Routing Mechanics  Table based routing  Source routing  Node-table routing  Algorithmic routing 54

Exercise Compression of source routes. In the source routes, each port selector symbol [N,S,W,E, and X] was encoded with three bits. Suggest an alternative encoding to reduce the average length (in bits) required to represent a source route. Justify your encoding in terms of typical routes that might occur on a torus. Also compare the original three bits per symbol with your encoding on the following routes: (a) NNNNNEEX (b) WNEENWWWWWNX 55

Part 5 NoC Flow Control Resources in a Network Node Bufferless Flow Control Buffered Flow control 56

Flow Control (FC)  Goal is to use resources as efficient as possible to allow a high throughput  An efficient FC is a prerequisite to achieve a good network performance 57 FC determines how the resources of a network, such as channel bandwidth and buffer capacity are allocated to packets traversing a network.

Flow Control  FC can be viewed as a problem of  Resource allocation  Contention resolution  Resources in form of channels, buffers and state must be allocated to each packet  If two packets compete for the same channel flow control can only assign the channel to one packet, but must also deal with the other packet 58

Flow Control Flow Control can be divided into: 1. Bufferless flow control  Packets are either dropped or misrouted 2. Buffered flow control  Packets that cannot be routed via the desired channel are stored in buffers 59

Resources in a Network Node  Control State  Tracks the resources allocated to the packet in the node and the state of the packet  Buffer  Packet is stored in a buffer before it is send to next node  Bandwidth  To travel to the next node bandwidth has to be allocated for the packet 60

Units of Resource Allocation - Packet or Flits?  Contradictory requirements on packets  Packets should be very large in order to reduce overhead of routing and sequencing  Packets should be very small to allow efficient and fine-grained resource allocation and minimize blocking latency  Flits try to eliminate this conflict  Packets can be large (low overhead)  Flits can be small (efficient resource allocation) 61

Units of Resource Allocation - Size: Phit, Flit, Packet  There are no fixed rules for the size of phits, flits and packets  Typical values  Phits: 1 bit to 64 bits  Flits: 16 bits to 512 bits  Packets: 128 bits to 1024 bits 62

Bufferless Flow Control  No buffers  less implementation cost  If more than 1 packet shall be routed to the same output, 1 has to be  Misrouted or  Dropped Example: two packets A, and B (consisting of several flits) arrive at a network node. 63

Bufferless Flow Control  Packet B is dropped and must be resended  There must be a protocol that informs the sending node that the packet has been dropped Example: Resend after no acknowledge has been received within a given time 64

Bufferless Flow Control  Packet B is misrouted  No further action is required here, but at the receiving node packets have to be sorted into original order 65

Circuit Switching  Circuit-Switching is a bufferless flow control, where several channels are reserved to form a circuit  A request (R) propagates from source to destination, which is answered by an acknowledgement (A)  Then data is sent (here two five flit packets (D)) and a tail flit (T) is sent to deallocate the channels 66

Circuit Switching 67  Circuit-switching does not suffer from dropping or misrouting packets  However there are two weaknesses:  High latency: T = 3 H tr + L/b  Low throughput, since channel is used to a large fraction of time for signaling and not for delivery of the payload

Circuit Switching Latency 68 T = 3 H tr + L/b Where: H : time required to set up the channel and delivers the head flit tr: serialization latency L: time of flight b: contention time Note: 3 x header latency because the path from source to destination must be traversed 3 times to deliver the packet: Once in each direction to set up the circuit and then again to deliver the first flit

Buffered Flow Control  More efficient flow control can be achieved by adding buffers  With sufficient buffers packets do not need to be misrouted or dropped, since packets can wait for the outgoing channel to be ready 69

Buffered Flow Control Two main approaches: 1. Packet-Buffer Flow Control  Store-And-Forward  Cut-Through 2. Flit-Buffer Flow Control  Wormhole Flow Control  Virtual Channel Flow Control 70

Store & Forward Flow Control  Each node along a route waits until a packet is completely received (stored) and then the packet is forwarded to the next node  Two resources are needed  Packet-sized buffer in the switch  Exclusive use of the outgoing channel 71

 Advantage: While waiting to acquire resources, no channels are being held idle and only a single packet buffer on the current node is occupied  Disadvantage: Very high latency  T = H (tr + L/b) 72 Store & Forward Flow Control

Cut-Through Flow Control  Advantages  Cut-through reduces the latency  T = H tr + L/b  Disadvantages  No good utilization of buffers, since they are allocated in units of packets  Contention latency is increased, since packets must wait until a whole packet leaves the occupied channel 73

Wormhole Flow Control  Wormhole FC operates like cut-through, but with channel and buffers allocated to flits rather than packets  When the head flit arrives at a node, it must acquire resources (VC, B,) before it can be forwarded to the next node  Tail flits behave like body flits, but release also the channel 74

Wormhole (WH) Flow Control  Virtual channels hold the state needed to coordinate the handling of flits of a packet over a channel  Comparison to cut-through  wormhole flow control makes far more efficient use of buffer space  Throughput maybe less, since wormhole flow control may block a channels mid- packets 75

Example for WH Flow Control  Input virtual channel is in idle state (I)  Upper output channel is occupied, allocated to lower channel (L) 76

Example for WH Flow Control  Input channel enters waiting state (W)  Head flit is buffered 77

78  Body flit is also buffered  No more flits can be buffered, thus congestion arises if more flits want to enter the switch Example for WH Flow Control

 Virtual channel enters active state (A)  Head flit is output on upper channel  Second body flit is accepted 79 Example for WH Flow Control

 First body flit is output  Tail flit is accepted 80 Example for WH Flow Control

 Second body flit is output 81 Example for WH Flow Control

 Tail flit is output  Virtual channels is deallocated and returns to idle state 82 Example for WH Flow Control

Wormhole Flow Control  The main advantage of wormhole to cut- through is that buffers in the routers do not need to be able to hold full packets, but only need to store a number of flits  This allows to use smaller and faster routers 83

Part 6 NoC Flow Control (continued) Blocking Virtual Channel-Flow Control Virtual Channel Router Credit-Based Flow Control On/Off Flow Control Flow Control Summary 84

Blocking - Cut-Through and Wormhole  If a packet is blocked, the flits of the wormhole packet are stored in different routers 85 Wormhole (Buffer-Size 2 Flits) Cut-Through (Buffer-Size 1 Packet) Blocked

Wormhole Flow Control  There is only one virtual channel for each physical channel  Packet A is blocked and cannot acquire channel p  Though channels p and q are idle packet A cannot use these channels since B owns channel p 86

Virtual Channel-Flow Control  In virtual channel flow-control several channels are associated with a single physical channel  This allows to use the bandwidth that otherwise is left idle when a packet blocks the channel  Unlike wormhole flow control subsequent flits are not guaranteed bandwidth, since they have to compete for bandwidth with other flits 87

Virtual Channel Flow Control  There are several virtual channels for each physical channel  Packet A can use a second virtual channel and thus proceed over channel p and q 88

Virtual Channel Allocation  Flits must be delivered in order, H, B, …B, T.  Only the head flit carries routing information  Allocate VC at the packet level, i.e., packet- by-packet  The head flit responsible for allocating VCs along the route.  Body and tail flits must follow the VC path, and the tail flit releases the VCs.  The flits of a packet cannot interleave with those of any other packet 89

Virtual Channel Flow Control - Fair Bandwidth Arbitration  VCs interleave their flits  Results in a high average latency 90

Virtual Channel Flow Control - Winner-Take-All Arbitration  A winner-take all arbitration reduces the average latency with no throughput penalty 91

Virtual Channel Flow Control - Buffer Storage  Buffer storage is organized in two dimensions  Number of virtual channels  Number of flits that can be buffered per channel 92

Virtual Channel Flow Control - Buffer Storage  Virtual channel buffer shall at least be as deep as needed to cover round- trip credit latency  In general it is usually better to add more virtual channels than to increase the buffer size 93

Virtual Channel 94 A: active W: waiting I: idle

Virtual Channel Router 95

Buffer Organization 96 Single buffer per input Multiple fixed length queues per physical channel

Buffer Management  In buffered CF nodes there is a need for communication between nodes in order to inform about the availability of buffers  Backpressure informs upstream nodes that they must stop sending to a downstream node when the buffers of that downstream node are full 97 upstream node downstream node Traffic Flow

Credit-Based Flow Control  The upstream router keeps a count of the number of free flit buffers in each virtual channel downstream  Each time the upstream router forwards a flit, it decrements the counter  If a counter reaches zero, the downstream buffer is full and the upstream node cannot send a new flit  If the downstream node forwards a flit, it frees the associated buffer and sends a credit to the upstream buffer, which increments its counter 98

Credit-Based Flow Control 99

Credit-Based Flow Control  The minimum time between the credit being sent at time t1 and a credit send for the same buffer at time t5 is the credit round- trip delay tcrt 100 All buffers on the downstream are full

Credit-Based Flow Control  If there is only a single flit buffer, a flit waits for a new credit and the maximum throughput is limited to one flit for each t crt  The bit rate would be then L f / t crt where L f is the length of a flit in bits 101

Credit-Based Flow Control  If there are F flit buffers on the virtual channel, F flits could be sent before waiting for the credit, which gives a throughput of F flits for each t crt and a bit rate of F Lf / t crt 102

Credit-Based Flow Control  In order not to limit the throughput by low level flow control the flit buffer should be at least where b is the bandwidth of a channel 103

Credit-Based Flow Control  For each flit sent downstream a corresponding credit is set upstream  Thus there is a large amount of upstream signaling, which especially for small flits can represent a large overhead! 104

On/Off Flow Control  On/off Flow control tries to reduce the amount of upstream signaling  An off signal is sent to the upstream node, if the number of free buffers falls below the threshold F off  An on signal is sent to the upstream node, if the number of free buffers rises above the threshold F on  With carefully dimensioned buffers on/off flow control can achieve a very low overhead in form of upstream signaling 105

Ack/Nack Flow Control  In ack/nack flow control the upstream node sends packets without knowing, if there are free buffers in the downstream node 106

Ack/Nack Flow Control  If there is no buffer available  the downstream node sends nack and drops the flit  the flit must be resent  flits must be reordered at the downstream node  If there is a buffer available  The downstream node sends ack and stores the flit in a buffer 107

Buffer Management  Because of its buffer and bandwidth inefficiency ack/nack is rarely used  Credit-based flow control is used in systems with small numbers of buffers  On/off flow control is used in systems that have large numbers of flit buffers 108

Flow Control Summary  Bufferless flow control  Dropping, misroute packets  Circuit switching  Buffered flow control  Packet-Buffer Flow Control: SAF vs. Cut Through  Flit-Buffer Flow Control: Wormhole and Virtual Channel  Switch-to-switch (link level) flow control  Credit-based, On/Off, Ack/Nack 109

Part 7 Router Architecture Virtual-channel Router Virtual channel state fields The Router Pipeline Pipeline Stalls 110

Router Microarchitecture - Virtual-channel Router  Modern routers are pipelined and work at the flit level  Head flits proceed through buffer stages that perform routing and virtual channel allocation  All flits pass through switch allocation and switch traversal stages  Most routers use credits to allocate buffer space 111

Typical Virtual Channel Router  A routers functional blocks can be divided into  Datapath: handles storage and movement of a packets payload  Input buffers  Switch  Output buffers  Control: coordinating the movements of the packets through the resources of the datapath  Route Computation  VC Allocator  Switch Allocator 112

Typical Virtual Channel Router  The input unit contains a set of flit buffers  Maintains the state for each virtual channel  G = Global State  R = Route  O = Output VC  P = Pointers  C = Credits 113

Virtual Channel State Fields (Input) 114

Typical Virtual Channel Router  During route computation the output port for the packet is determined  Then the packet requests an output virtual channel from the virtual-channel allocator 115

Typical Virtual Channel Router  Flits are forwarded via the virtual channel by allocating a time slot on the switch and output channel using the switch allocator  Flits are forwarded to the appropriate output during this time slot  The output unit forwards the flits to the next router in the packet’s path 116

Virtual Channel State Fields (Output) 117

Packet Rate and Flit Rate  The control of the router operates at two distinct frequencies  Packet Rate (performed once per packet)  Route computation  Virtual-channel allocation  Flit Rate (performed once per flit)  Switch allocation  Pointer and credit count update 118

The Router Pipeline  A typical router pipeline includes the following stages  RC (Routing Computation)  VC (Virtual Channel Allocation)  SA (Switch Allocation)  ST (Switch Traversal 119 no pipeline stalls

The Router Pipeline  Cycle 0  Head flit arrives and the packet is directed to an virtual channel of the input port (G = I) 120 no pipeline stalls

The Router Pipeline  Cycle 1  Routing computation  Virtual channel state changes to routing (G = R)  Head flit enters RC-stage  First body flit arrives at router 121 no pipeline stalls

The Router Pipeline  Cycle 2: Virtual Channel Allocation  Route field (R) of virtual channel is updated  Virtual channel state is set to “waiting for output virtual channel” (G = V)  Head flit enters VA state  First body flit enters RC stage  Second body flit arrives at router 122 no pipeline stalls

The Router Pipeline  Cycle 2: Virtual Channel Allocation  The result of the routing computation is input to the virtual channel allocator  If successful, the allocator assigns a single output virtual channel  The state of the virtual channel is set to active (G = A 123 no pipeline stalls

The Router Pipeline  Cycle 3: Switch Allocation  All further processing is done on a flit base  Head flit enters SA stage  Any active VA (G = A) that contains buffered flits (indicated by P) and has downstream buffers available (C > 0) bids for a single-flit time slot through the switch from its input VC to the output VC 124 no pipeline stalls

The Router Pipeline  Cycle 3: Switch Allocation  If successful, pointer field is updated  Credit field is decremented 125 no pipeline stalls

The Router Pipeline  Cycle 4: Switch Traversal  Head flit traverses the switch  Cycle 5:  Head flit starts traversing the channel to the next router 126 no pipeline stalls

The Router Pipeline  Cycle 7:  Tail traverses the switch  Output VC set to idle  Input VC set to idle (G = I), if buffer is empty  Input VC set to routing (G = R), if another head flit is in the buffer 127 no pipeline stalls

The Router Pipeline  Only the head flits enter the RC and VC stages  The body and tail flits are stored in the flit buffers until they can enter the SA stage 128 no pipeline stalls

Pipeline Stalls  Pipeline stalls can be divided into  Packet stalls  can occur if the virtual channel cannot advance to its R, V, or A state  Flit stalls  If a virtual channel is in active state and the flit cannot successfully complete switch allocation due to  Lack of flit  Lack of credit  Losing arbitration for the switch time slot 129

Example for Packet Stall  Virtual-channel allocation stall  Head flit of A can first enter the VA stage when the tail flit of packet B completes switch allocation and releases the virtual channel 130

Example for Packet Stall Virtual-channel allocation stall 131 Head flit of A can first enter the VA stage when the tail flit of packet B completes switch allocation and releases the virtual channel

Example for Flit Stalls 132 Second body flit fails to allocate the requested connection in cycle 5 Switch allocation stall

Example for Flit Stalls 133 Buffer empty stall Body flit 2 is delayed three cycles. However, since it does not have to enter the RC and VA stage the output is only delayed one cycle!

Credits  A buffer is allocated in the SA stage on the upstream (transmitting) node  To reuse the buffer, a credit is returned over a reverse channel after the same flit departs the SA stage of the downstream (receiving) node  When the credit reaches the input unit of the upstream node the buffer is available can be reused 134

Credits  The credit loop can be viewed by means of a token that  Starting at the SA stage of the upstream node  Traveling downwards with the flit  Reaching the SA stage at the downstream node  Returning upstream as a credit 135

Credit Loop Latency  The credit loop latency tcrt, expressed in flit times, gives a lower bound on the number of flit buffers needed on the upstream size for the channel to operate with full bandwidth  t crt in flit times is given by: 136

Credit Loop Latency  If the number of buffers available per virtual channel is F, the duty factor of the channel will be d = min (1, F / t crt )  The duty factor will be 100% as long as there are sufficient flit buffers to cover the round trip latency 137

Credit Stall 138 Virtual Channel Router with 4 flit buffers

Flit and Credit Encoding A. Flits and credits are send over separated lines with separate width B. Flits and credits are transported via the same line. This can be done by  Including credits into flits  Multiplexing flits and credits at phit level  Option (A) is considered more efficient. For a more detailed discussion check Section 16.6 in the Dally-book 139

Summary  NoC is a scalable platform for billion-transistor chips  Several driving forces behind it  Many open research questions  May change the way we structure and model VLSI systems 140 Hong Kong University of Science and Technology, March 2010

References  OASIS NoC Architecture Design in Verilog HDL, Technical Report,TR- 062010-OASIS, Adaptive Systems Laboratory, the University of Aizu, June 2010. OASIS NoC Architecture Design in Verilog HDL  OASIS NoC Project: http://web-ext.u-aizu.ac.jp/~benab/research/projects/oasis/ 141

Network-on-Chip Ben Abdallah, Abderazek The University of Aizu E-mail: benab@u-aizu.ac.jp 142 KUST University, March 2011

Network-on-Chip (2/2) Ben Abdallah, Abderazek The University of Aizu 1 KUST University, March 2011.

Similar presentations

Presentation on theme: "Network-on-Chip (2/2) Ben Abdallah, Abderazek The University of Aizu 1 KUST University, March 2011."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Network-on-Chip (2/2) Ben Abdallah, Abderazek The University of Aizu 1 KUST University, March 2011.

Similar presentations

Presentation on theme: "Network-on-Chip (2/2) Ben Abdallah, Abderazek The University of Aizu 1 KUST University, March 2011."— Presentation transcript:

Similar presentations

About project

Feedback