Download presentation
Presentation is loading. Please wait.
Published byRoderick Floyd Modified over 9 years ago
1
José Vicente Escamilla José Flich Pedro Javier García 1
2
Introduction / Motivation ICARO overview ICARO description ◦ Detection ◦ Notification ◦ Isolation Results Conclusions Questions 2
3
CMP MPSoC CMP and MPSoCs use a network to interconnect nodes Network performance degradation due to: Power saving mechanisms (DVFS) Bursty traffic patterns Heterogeneous systems designs Performance degradation may lead to congestion Tile-Gx (72 cores) 3
4
ICARO does not remove congestion. ICARO isolates it. Two types of traffic Congested Non-congested Goal: To isolate congested traffic from non- congested one in order to avoid HoL-Blocking. 4
5
5 RCA, P. Gratz et al. ◦ Redirects traffic at each router based on congestion metrics. ◦ Metrics are piggybacked. Vicious cycles may be created. “Prediction-based Flow Control for Network-on-Chip Traffic”, U. Ogras et al. ◦ Injection control based on prediction-models. ◦ Prediction-model uses links status sent through a dedicated network. Injection throttling may produce performance oscillations. AVADA/FVADA, Yi Xu et al. ◦ Map different flows to different queues based on the output port requested in the next router (lookahead routing). Require lookahead routing and credit-based flow control. Congested flows and non-congested ones may share queues, generating HoL-blocking in some degree since the mapping policy only consider one hop of the message path.
6
Credits=2 Credits=0 6
7
ICARO uses two types of Virtual Networks (VNs) ◦ Regular VN: Non-congested traffic ◦ Extra VN: Congested traffic Three stages: ◦ Detection Congestion is detected at routers. ◦ Notification Routers notify to all Networks Interfaces (NIs). ◦ Isolation NIs isolate congested traffic from not-congested one. 7
8
NI 0 SW0SW1SW2 SW3 SW4SW5SW6 SW7 SW8 SW9 SW10SW11 SW12SW13 SW14SW15 NI 1NI 2NI 3 NI 4 NI 5NI 6NI 7 NI 8 NI 9NI 10NI 11 NI 12 NI 13NI 14NI 15 Regular VN queue Extra VN queue 8
9
It is performed at routers Detects congestion points ({router, port} pairs) When a message arrives/leaves ◦ Buffer saturation checking If buffer.level > HIGH_THR such buffer is marked as saturated. If buffer.level < LOW_THR such buffer is marked as NOT- saturated (hysteresis). If any of the buffers of an input port is marked as saturated the whole input port is marked as well. ◦ Congestion checking Requests from saturated input ports against each output port are computed Each output port requested by more than 1 saturated input port is marked as congested 9
10
Segmented ring connecting routers and NIs Network width (wires) Process: ◦ Notifications are injected to the register (when it is free). ◦ Notifications are delivered from a register to the next one at each cycle. ◦ Notifications are discarded when reach their origin register. N=Number of nodes p=Router radix 1 p (N)log 2 10
11
SW0SW1SW2 SW3 SW4SW5SW6 SW7 SW8 SW9SW10SW11 SW12SW13 SW14SW15 Register Notification 11 NI 7 CNN out CNN in Notification Injection Notification Reception in2 out in1 RegReg SW 7
12
12
13
Notifications are stored in a cache memory. Useless notifications are discarded ◦ Unreachable CPs ◦ Redundant notifications (merge) SWPort 5E 10S 13
14
SW0SW1SW2 SW3 SW4SW5SW6 SW7 SW8 SW9SW10SW11 SW12SW13 SW14SW15 NI 0 SWPort 10S -- NI 4 SWPort 5E 10S XY routing 14
15
SW0SW1SW2 SW3 SW4SW5SW6 SW7 SW8 SW9SW10SW11 SW12SW13 SW14SW15 XY routing NI 4 SWPort 5E 10S {SW10, Port S} notification is IGNORED {SW5, Port E} and {SW10, Port S} notifications are MERGED 15
16
It is performed at NIs Process: ◦ Initially all traffic is allocated into regular-VNs. ◦ At each cycle the post-processor module checks messages at the header of all regular-VNs in parallel. ◦ If the route crosses any of the CPs stored in the CPs cache memory the message is reallocated into extra-VNs. 16
17
Arbiter Post-processor CPs Cache SW Port 5E Regular-VN Extra-VN Network Interface 4 17 Regular-VN Router 4 Extra-VN in out2 out1 dst:12dst:15dst:6
18
18 Simulation: ◦ NoC simulator developed in our research group. Compared against FVADA/AVADA with different number of virtual queues ◦ FVADA: Restricted to 4 VCs ◦ ICARO: Uses VNs instead of VCs Overheads analysis: ◦ Tools used: Synthesis: Design vision (Synopsys) Place & Route: Encounter (Cadence) Library: 45nm Nangate Open Cell (typical conditional) ParameterValue Topology8x8 2D mesh RoutingXY SwitchingWormhole (flit-level switching) Flow controlCredits Flit size128 bits Message size5 flits Traffic0.3 f/c (background) + 1 f/c (hotspot 4-to-1, from cycle 10k to 20k)
19
4VC/VN 2VC/VN 8VC/VN 19
20
20 Area overhead: ~6%. Power overhead: varies from 6% to 10%.
21
21 Area overhead: varies from 3,8% to 6% Power overhead: varies from 4,5% to 5,4%.
22
Conclusions: ◦ A mechanism to avoid HoL-Blocking on networks- on-chip has been presented. ◦ ICARO manages to isolate harmful traffic from non- harmful one by using VNs achieving an overall latency improvement of up to 82%. Future work: ◦ To analyze hierarchical CNN to improve scalability. ◦ To implement in-order delivery support 22
23
Questions? 23
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.