Destination-Based Adaptive Routing for 2D Mesh Networks ANCS 2010 Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering University of California,

Destination-Based Adaptive Routing for 2D Mesh Networks ANCS 2010 Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering University of California, San Diego

Networks-on-Chip Chip-multiprocessors (CMPs) increasingly popular 2D-mesh networks often used as on-chip fabric Routing algorithm central in determining performance Tilera Tile64Intel 48-core data center on die (ISSCC 2010)

Classes of Routing Algorithms Oblivious routing +Simple and fast router designs – Poor load balancing under bursty traffic Adaptive routing +Better performance (throughput, latency) +Better fault tolerance -Higher router complexity

Related Work Oblivious Routing [Valiant, ROMM, O1TURN, Optimal oblivious routing] – Optimize for worst and average-case performance Adaptive routing commercially used in multiprocessors from IBM, Cray, Compaq On-chip routing very different from off-chip: – Lower power – Lower area – Lower router complexity

Outline Introduction Motivation Destination-Based Adaptive Routing (DAR) Evaluation

Minimal Adaptive Routing Model – Adaptive routing along minimal directions D S

Coarse Fine Granularity of Congestion Estimation Local congestion

Local Congestion Local adaptive – Measure local congestion metric (free VC, free buffers) S Low congestion Moderate congestion D High congestion Optimal Local adaptive

Coarse Fine Granularity of Congestion Estimation Local congestion Dimension-based congestion

Dimension-based Congestion RCA-1D (Gratz et al. HPCA’ 08) – Exponential moving average of congestion to all nodes along a dimension S Low congestion Moderate congestion D High congestion Optimal RCA-1D

Coarse Fine Granularity of Congestion Estimation Local congestion Dimension-based congestion Quadrant-based congestion

Quadrant-based Congestion RCA-Quadrant (Gratz et al. HPCA’ 08) – Exponential moving average of congestion to all nodes in the destination quadrant S Low congestion Moderate congestion D High congestion Optimal

Quadrant-based Congestion RCA-Quadrant (Gratz et al. HPCA’ 08) – Exponential moving average of congestion to all nodes in the destination quadrant S Low congestion Moderate congestion D High congestion Optimal RCA-quad

Coarse Fine Granularity of Congestion Estimation Local congestion Dimension-based congestion Quadrant-based congestion Destination-based congestion

Ideally … On a per-destination basis: – Estimate end-to-end delay along all minimal paths to destination – Choose path with least delay S Low congestion Moderate congestion D High congestion Optimal

Challenges Limited bandwidth for congestion updates – Congestion notification not instantaneous Limited storage in on-chip routers – Exponential number of paths to each destination Limited hardware resources for computations How can we practically emulate ideal adaptive routing?

Destination-based adaptive routing (DAR) A node estimates delay to all other nodes through candidate outputs every T cycles S D L[N][D] = 20 L[E][D] = 30

DAR-High Level Traffic distribution to output ports controlled using per-destination split ratios W W[N][D]= 0.6 W[E][D]= 0.4 S D Estimate delay to destination through candidate outputs Shift traffic from more congested port to less congested port Start with initial set of split ratios L[N][D] = 20 L[E][D] = 30

DAR-High Level Traffic distribution to output ports controlled using per-destination split ratios W Estimate delay to destination through candidate outputs S D Shift traffic from more congested port to less congested port Start with initial set of split ratios W[N][D]= 0.8 W[E][D]= 0.2 L[N][D] = 20 L[E][D] = 30

Outline Introduction Motivation Destination-Based Adaptive Routing (DAR) – Distributed delay measurement – Split ratio adaptation – Scaling Evaluation

Distributed Delay Measurement A node maintains: – Per-destination traffic split ratio through candidate output ports: W[p][j] – Delay to next-hop router/ejection interface through each output port (N, S, E, W, Ej): l[p]

Distributed Delay Measurement Every node estimates average delay to all other nodes in the network 12131415 8 4 0 9 5 11 67 123 10 Avg 10 [10] 1.Delay from 10 to itself, Avg 10 [10] = l 10 [Ej] 2.Avg 10 [10] propagated to neighbors 3.Nodes 6, 9, 14, 11 add local delay to Avg 10 [10] to compute delay to node 10 4.For example, at node 9, L[E][10] = l[E] + Avg 10 [10] Avg 9 [10] = L[E][10]

Distributed Delay Measurement Every node estimates delay to all other nodes in the network 1213 14 15 8 4 0 9 9 5 11 6 6 7 123 10 Avg 14 [10] Avg 11 [10] Avg 9 [10] 1.Nodes 6, 9, 14, 11 propagate estimated delay to node 10 to upstream neighbors 2.For example, node 5 receives two delay updates, from nodes 9 and 6 A[E][10] = Avg 6 [10] A[N][10] = Avg 9 [10] 3.Node 5 adds local link delay to received delay update: L[E][10] = A[E][10] + l[E] L[N][10] = A[N][10] + l[N] 4.Finally, average delay from node 5 to node 10 is computed as: Avg 5 [10] = W[E][10]L[E][10] + W[N][10]L[N][10] Avg 14 [10] Avg 9 [10] Avg 6 [10] Avg 11 [10]

Distributed Delay Measurement Every node estimates delay to all other nodes in the network 12 13 14 15 8 8 4 0 9 9 5 5 11 6 6 7 7 1 2 2 3 10 1.Nodes 6, 9, 14, 11 propagate estimated delay to node 10 to upstream neighbors 2.For example, node 5 receives two delay updates, from nodes 9 and 6 A[E][10] = Avg 6 [10] A[N][10] = Avg 9 [10] 3.Node 5 adds local link delay to received delay update: L[E][10] = A[E][10] + l[E] L[N][10] = A[N][10] + l[N] 4.Finally, average delay from node 5 to node 10 is computed as: Avg 5 [10] = W[E][10]L[E][10] + W[N][10]L[N][10]

Outline Introduction Motivation Destination-Based Adaptive Routing (DAR) Distributed delay measurement – Split ratio adaptation – Scaling Evaluation

Adaptation of Split ratio Objective: Equalize delay on candidate output ports If only one candidate output, split ratio is 1 If two candidate outputs, – Let p h be the port with higher delay to destination j – Let p l be the port with lower delay to destination j – W[p h ][j] + W[p l ][j] = 1 – Δ traffic shifted from p h to p l every T cycles – Δ proportional to (L[p h ][j]-L[p l ][j])/L[p h ][j]

Coarse Fine Granularity of Congestion Estimation Local congestion Dimension-based congestion Quadrant-based congestion Destination-based congestion Does not scale !!

Coarse Fine Granularity of Congestion Estimation Local congestion Dimension-based congestion Quadrant-based congestion Destination-based congestion Scalable Destination- based congestion

Outline Introduction Motivation Destination-Based Adaptive Routing (DAR) Distributed delay measurement Split ratio adaptation – Scaling Evaluation

Look-ahead Window Node S maintains delay estimate for MxM window centered at S. Any node outside window mapped to closest node within window A packet’s look-ahead window shifts as it is routed from source to destination

Window Size Destination D guaranteed to be within window when packet is (M-1)/2 hops away from D Intuition: Packet has (M-1)/2 hops to route around congestion hot spots 7x7 look-ahead window in 16x16 mesh has comparable performance to DAR (equivalent to 31x31 look-ahead window)

Outline Introduction Related work Destination-Based Adaptive Routing (DAR) Evaluation

Experimental setup Compare DAR with RCA-1D, RCA-quadrant, Local adaptive SPLASH-2 benchmarks + synthetic traffic patterns (uniform, transpose, shuffle) Cycle-accurate NoC simulator models 3-stage router pipeline 8 VC, 5 flit deep 1 VC used as escape VC for deadlock prevention

Splash results – 7x7 mesh 41%

Splash results – 7x7 mesh 65%

Uniform traffic – 8x8 mesh

Transpose traffic – 8x8 mesh

Shuffle traffic – 8x8 mesh

SDAR - 16x16 mesh, 7x7 window Average latency over 100 permutation traffic patterns at 18% injection load Network saturation statistics at 18% injection load

Summary Destination-based Adaptive Routing (DAR) for 2D mesh networks Scalable DAR (SDAR) uses look-ahead window and easily scales to large networks DAR outperforms existing adaptive and oblivious routing SDAR achieves comparable performance with significantly less overheads

Thank you!!

Key implementation details Simple router implementation: low storage, low bandwidth Synchronize delay updates to reuse delay computation and weight adaptation hardware Approximate computations to simplify implementation

Router architecture – Kim et al DAC ‘05 Quadrant Port Pre-select VC-1 VC Allocator XB Allocator...... N VC-v...... S E W VC-1...... VC-v Preferred Output Registers In N S E W Ej Congestion Value Registers Credits Routing Unit Override Credits

DAR Router

Distributed delay measurement A node maintains: – Per-destination traffic split ratio through candidate output ports: W[p][j] – Delay to next-hop router/ejection interface through each output port (N, S, E, W, Ej): l[p] Using updates received from downstream nodes, a node computes: – L[p][j]: Average delay from current node to node j through output port p – Avg[j]: Average delay from current node to node j

Destination-based Adaptive Routing (DAR) Every router maintains per-destination split ratios which control traffic distribution to output ports Split ratios adjusted every T cycles based on measured delay to D through the two ports S Low congestion Moderate congestion D High congestion 0.8 0.2 0.7 0.3 1 1

Destination-Based Adaptive Routing for 2D Mesh Networks ANCS 2010 Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering University of California,

Similar presentations

Presentation on theme: "Destination-Based Adaptive Routing for 2D Mesh Networks ANCS 2010 Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering University of California,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Destination-Based Adaptive Routing for 2D Mesh Networks ANCS 2010 Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering University of California,

Similar presentations

Presentation on theme: "Destination-Based Adaptive Routing for 2D Mesh Networks ANCS 2010 Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering University of California,"— Presentation transcript:

Similar presentations

About project

Feedback