Dynamic Traffic Distribution among Hierarchy Levels in Hierarchical Networks-on-Chip Ran Manevich, Israel Cidon, and Avinoam Kolodny Group Research QNoC Electrical Engineering Department Technion – Israel Institute of Technology Haifa, Israel NOCS 2013
Hierarchical un-clustered NoCs Hierarchical Rings S. Bourduas and, Z. Zilic, “Latency reduction of global traffic in wormhole-routed meshes using hierarchical rings for global routing.” ASAP PyraMesh R. Manevich, I Cidon and, A. Kolodny. “Handling global traffic in future CMP NoCs” SLIP
Phase 1 L MAX Ascent to the highest level (L MAX ). Routing in hierarchical NoCs Phase 2 L MAX Travel on L MAX towards the destination. Phase 3 L MAX Descent from L MAX and reach the destination.
L MAX Highest level L MAX defines distribution of traffic among hierarchy levels. Traffic distribution among hierarchy levels
L MAX D Highest Level L MAX defined by the hop distance (D) a packet would travel at the bottom level. DTh i DTh i – Distance Threshold of level i. DDTh i If D > DTh i, the packet is directed to level i+1. DTh i Example: DTh i = 6, 12, 20 Packets distribution policy L MAX Bottom Mesh Travel Distance (D) 4D>20 312<D≤20 26<D≤12 1D≤6
How to distribute traffic among hierarchy levels? SHORTESTPATH?
Shortest path – light load 8x8 PyraMesh, 3D illustration Average latency Hierarchical < Average latency Flat
Shortest path – heavy load 8x8 PyraMesh, 3D illustration Congestion!!! Average latency Hierarchical >> Average latency Flat Shortest path, but not for all? The upper levels are sparse!
Shortest path only for distant packets – heavy load Average latency Hierarchical < Average latency Flat
Shortest path only for distant packets – light load
Traffic distribution – static vs. dynamic Traffic distribution remains constant Traffic Distribution is adapted to the traffic conditions
Dynamic traffic distribution – Two modes At light traffic loads: Under heavy loads:
Example - 16x16 and 32x32 NoCs Topology 16x16 [5,8][11,19] 32x32 [4,10,50][23,42,61]
Traffic Locality Model - Bandwidth Version of Rent’s Rule B – Cluster external bandwidth. k – Average bandwidth per module. G – Number of modules in a cluster. R – Rent’s exponent, 0<R<1. G = 16 B = ∑ Greenfield et al., “Implications of Rent’s Rule for NoC Design and Its Fault-Tolerance”, NOCS 2007
Feedback upper Average buffers occupancy at the bottleneck level among the upper levels:
Feedback vs. injection rate 32x32, 4 Levels PyraMesh; Rentian traffic with R = 0.8
DTrD control scheme Switch between distribution modes using 2 feedback thresholds:
System architecture and implementation costs Logic: Feedback logic : <10K NAND gates. Control logic : <1K gates. Routing logic: comparable to previous schemes. Wires: Feedback links of 4 wires to <10% of the routers. 1 broadcast control bit to all bottom mesh routers. Communication: 1 mode bit in head flits.
Simulation set-up HNOCS HNOCS – NoC simulation framework for OMNET++ Yaniv Ben-Itzhak et. al., NOCS 2011
Average latency vs. injection Rent’s exp
Dynamic Simulation – 32x32 NoC
Conclusions not both Static traffic distribution (STrD) in hierarchical NoCs can optimize performance under either light or heavy traffic loads, but not both at the same time. both Dynamic traffic distribution (DTrD) provides optimal performance under both light and heavy loads. DTrD is lightweight, effective and feasible in future systems with many thousands of modules. DTrD is useful and desirable in any un-clustered hierarchical NoC.
Thank You!