Download presentation
Published byEric Edwards Modified over 9 years ago
1
Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan
Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan
2
Wide Range of Applications
everyday applications physical simulation scientific applications computational chemistry computational biology semiconductor simulation cloud computing varying computation characteristic, user requirement, etc. (picture sources) 1. N-body simulation: semiconductor: 3. computational biology: molecular structure:
3
Application Running on Network-on-Chip
application example: video encoder mapping chip multiprocessor with network-on-chip (NoC) communication frequency destination source 64-thread simulation of SPLASH-2 (ocean) (number of flits) some pairs communicate more frequently A B analysis (picture sources) 1. Video encoder: Gary Sullivan et al., Standardized Extensions of High Efficiency Video Coding (HEVC) 2. Tilera TILE-Gx8072:
4
Fragile Networks-on-Chip
22 nm (Intel) 14 nm 7 nm (IBM) tail of transistor scaling increasing transistor density transistor reliability↓ network-on-chip… possible single point of failure permanent faults solution: network-on-chip routing reconfiguration
5
How to reduce NoC degradation from faults?
Network-on-chip reconfiguration entails performance degradation motivating experiment: fault vs. performance degradation minimum throughput requirement state-of-the-art routing reconfiguration [Aisopos 11] our goal KEY IDEA: application-aware routing optimized to application’s communication patterns
6
Application-Aware Routing (1/2)
How do we find adaptive routing optimized to communication patterns? various route options (no restriction) problem solution 1 deadlock-free 1 1 1 1 S D S D 1 1 1 1 1 1 avoid deadlock by restricting turns 1 2 1 2 3 1 1 1 1 3 1 2 path diversity = 6 deadlock possible path diversity = 3
7
Application-Aware Routing (2/2)
How do we find adaptive routing optimized to communication patterns? various route options (no restriction) problem solution 2 1 1 1 S D 2 3 S D 1 1 1 avoid deadlock by restricting turns 1 2 1 2 3 1 3 path diversity = 6 deadlock possible path diversity = 6 Where to best place turn restrictions? NP-complete problem OUR CONTRIBUTION: turn-restriction placement heuristic
8
Presentation Outline FATE (Fault- and Application-aware Turn-model Extension) (1) Turn-enabling rules (2) Load estimation “How to reduce search?” “Which is the most valuable turn?” (3) Overall routing computation algorithm Experimental evaluation Conclusions 1 3 4 2 5 6 7 8
9
How to reduce turn-restriction search?
To avoid unfruitful turn-restriction patterns… pattern 1. network disconnection pattern 2. non-minimal restriction 1 2 3 1 2 3 pattern 3. possible deadlock 1 3 4 2 5
10
Turn-Enabling Rules To avoid unfruitful turn-restriction patterns…
… each time a turn is disabled, several others should be enabled basic rules advanced rules 1 3 4 2 5 6 7 8 1 4 5 2 6 8 9 3 7 10 11 15 14 13 12 enable adjacent turns (cycle, node, link) enable remote turns (horizontal, vertical, diagonal)
11
Traffic-Load Estimation
Which is the most valuable turn? use traffic-load estimation to decide specific goals (1) balancing link utilization (2) prioritizing turns that are critical load calculation steps path diversity link load turn load cycle load weight scaling take into account hop-by-hop route-decisions
12
Traffic-Load Estimation Step by Step
path diversity 1 4 5 8 9 3 7 11 15 13 12 2 6 10 14 source destination 3 1 2 3 6 1 2 link load turn load 1 high traffic cycle load 1 weight scaling medium traffic low traffic multiply by communication frequency
13
Example: Link, Turn, Cycle Load (1/2)
traffic-load estimation 5 steps: diversity link turn cycle scale link load (from path diversity) 1 4 5 8 9 3 7 11 15 13 12 2 6 10 14 source destination 9 0.25 turn load 0.25 0.125 0.17 0.33 1 2 0.125 0.25 1 9 10 14 13 cycle load 1/2 = 0.5 0.5 link load 1 path diversity sum: 2 4 6
14
Example: Link, Turn, Cycle Load (2/2)
traffic-load estimation 5 steps: diversity link turn cycle scale link load (from path diversity) 1 4 5 8 9 3 7 11 15 13 12 2 6 10 14 source destination 0.25 turn load 0.25 (no path) 0.25 0.17 0.33 14 0.25 9 10 14 13 0.125 cycle load 1/2 = 0.5 0.5 link load sum: 0.375
15
Example: Weight Scaling
sourcedestination S1D1 S2D2 communication frequency 20 8 scaling S2 1 2 D1 1 4 5 8 9 7 11 D2 13 S1 D1 S2 2 6 10 14 2.5 3 9.8 8 13.2 12.5 9.2 9 4 5 6 7 8 9 10 11 0.125 0.38 13.5 0.38 0.25 S1 13 14 D2 most congested cycle
16
Putting it all together
1) evaluate turns, one at a time (choose the one leading to least congestion) 2) apply turn-enabling rules S2 1 2 D1 S2 1 2 D1 4 5 6 7 4 5 6 7 8 10 11 8 9 10 11 9 1 2 3 4 S1 13 14 D2 S1 13 14 D2 iterate this process until no undecided turn is left
17
Backtracking deadlock possible due to greedy turn-restriction selections turn-enabling rules do not resolve all deadlock-causing patterns backtrack to the last decision example placement decision tree 1 4 5 2 6 8 9 3 7 10 11 node 5 turn NW backtrack node 6 turn NE node 3 turn SW deadlock detected …
18
FATE Route-Computation Procedure
procedure flowchart trigger: (1) new application launch (2) fault occurrence start (trigger) network example loop estimate traffic load choose turn to be disabled apply turn-enabling rules backtrack deadlock? disconnect? no undecided turn? : disabled turn : high traffic : enabled turn : medium traffic end : undecided turn : low traffic
19
Presentation Outline FATE routing Experimental evaluation Conclusions
Experimental setup Evaluation on faulty topologies Evaluation on fault-free topologies Overheads Conclusions
20
Experimental Setup BookSim simulation with 8 X 8 mesh networks
3-stage router pipeline, 2 VCs/protocol class, 5 flits/VC Fault injection faults in bidirectional links 5 fault rates: 1 faulty link, 3%, 5%, 10%, and 15% faulty links 10 random fault patterns for each fault rate Traffic benchmarks 5 synthetic patterns: bit complement, bit reversal, shuffle, transpose, uniform random 11 traces from SPLASH-2 multi-threaded workloads generated from gem5 simulation with MESI cache coherence 4 memory controllers at mesh corners
21
Prior Routing Solutions
Fault-tolerant routing Breadth-First Search (BFS) [Schroeder 91, Aisopos 11] Depth-First Search (DFS) [Sancho 04] Application-aware routing Bandwidth-Sensitive Oblivious Routing (BSOR) [Kinsy 09, Kinsy 13] Application-Specific Routing Algorithms (APSRA) [Palesi 08] Fully-adaptive routing on 2D mesh (congestion management) Dynamic XY (DyXY) [Li 06] Neighbor on Path (NoP) [Ascia 08] Regional Congestion Awareness (RCA) [Gratz 08]
22
Saturation Throughput for Synthetic Patterns
fault-tolerant application-aware our solution less performance degradation as faults increase 9.5% 5.5% 10.6% -0.5% 17.7% 0.1% saturation throughput (packet/cycle/router) 23.3% 2.9% 33.3% 9.3% 33.3% ↑ over fault- tolerant routing 9.3% ↑ over app.- aware routing number of faulty links traffic pattern saturation throughput (packet/cycle/router) gains maximized with unbalanced load still provide gain with uniform load (15% fault rate)
23
Packet Latency for SPLASH-2 Traces
13% 228 cycles 59% average packet latency (cycles) minimal increase until 5% faults up to 59% (13%) latency reduction over BFS (APSRA) number of faulty links benchmark program average packet latency (cycles) significantly lower latency in 5 programs (15% fault rate)
24
Performance on Fault-Free Meshes
deterministic fully-adaptive saturation throughput (packet/cycle/router) fault-tolerant application-aware our solution number of VCs Compared to DOR, fault-tolerant and application-aware routing, FATE always provides higher saturation throughput ( better traffic-load estimation) Compared to fully-adaptive, FATE outperforms at small number of VCs ( more VCs for normal transfer)
25
Overheads Software computation Hardware overheads
2-4 sec for 8X8 meshes on Intel Xeon® processor (two orders of magnitude faster than APSRA) ~110 turn-placement attempts (little dependence on fault rate) Hardware overheads Area: 6% increase (routing table, route-computation logic) Power consumption not measured Better power-efficiency than APSRA Can be more power-efficient than application-agnostic solutions when reusing same routing multiple times
26
Conclusions FATE provides highly fault-tolerant routing with graceful performance degradation by leveraging application traffic patterns Performance improvement over existing fault-tolerant routing 33% improvement in saturation throughput (synthetic traffic patterns) 59% improvement in packet latency (SPLASH-2 traces) Two orders of magnitude faster route-computation
27
Thank you! Question?
28
Backup Slides
29
Various Turn-Restriction Choices
exponential increase of turn-restriction choices as network size increases example 2: 6 nodes example 1: 4 nodes 4 possibilities 2-D mesh with M nodes contains possibilities 𝟒 ( 𝑴 −𝟏)×( 𝑴 −𝟏) 16 possibilities (not shown other 8 cases)
30
Basic Turn-Enabling Rules (Cycle, Node, Link)
Which turns should be enabled upon a turn-restriction decision? to minimize the number of restrictions to guarantee deadlock-freedom What happens if we break the rules? 1 2 3 rule 1 (cycle) : undecided : enabled : disabled turn types 1 3 4 2 5 1 3 4 2 5 6 7 8 rule 2 (node) 2 5 6 7 8 1 3 4 rule 3 (link) violated turn deadlock happens
31
Advanced Turn-Enabling Rules (Common Link, Opposite-corner Turn)
1 4 5 2 6 8 9 3 7 10 11 15 14 13 12 rule 4: common link 1 4 5 2 6 3 7 Why rule 4? Let’s applying basic rules… should be enabled for both candidates horizontal enabling vertical enabling rule 5: opposite-corner turn 1 4 5 2 6 8 9 3 7 10 11 15 14 13 12 diagonal enabling : undecided : enabled (basic) : disabled turn types : enabled (advanced) : candidate see paper for details
32
Applying Basic Turn-Enabling Rules to Faulty Topologies
rule 1: cycle rules 2 & 3: node & link special case – no doublecount: counted only for one cycle no special change 1 3 4 2 5 6 7 8 1 3 4 2 5 6 7 8 mutual turn deadlock when disabling only mutual turn
33
Applying Advanced Turn-Enabling Rules to Faulty Topologies
rule 4: common link rule 5: opposite-corner turn apply only towards fault-free directions apply as if fault-free 1 4 5 2 6 8 9 3 7 10 11 15 14 13 12 1 4 5 2 6 8 9 3 7 10 11 15 14 13 12
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.