Download presentation
Presentation is loading. Please wait.
Published byBarry Watkins Modified over 5 years ago
1
A Case for Bufferless Routing in On-Chip Networks
Onur Mutlu Carnegie Mellon University Thomas Moscibroda Microsoft Research 36th International Symposium on Computer Architecture ISCA 2009 Austin, Texas, USA Presented by Jonas Bokstaller 14 November 2018
2
Executive Summary Problem: The on chip networks in system on chips use the most energy/physical area for packet buffers which are used for routing the packets from different components on the chip. Proposal: We use three completely new routing algorithms “FLIT-Level-Routing”, “Bless Wormhole Routing” and “Bless with Buffers” which aims to eliminate/reduce the need for buffers by deflecting packet inside the network. Results: Most of the time buffers are not needed on NoC Average performance decrease by only 0.5% Worst-case performance decrease by 3.2% Average network energy consumption decrease by 39.4% Area-savings of 60%
3
Outline Background, Problem & Goal Key Approach and Ideas
Mechanisms (in some detail) Benefits and Limitations Key Results: Methodology and Evaluation Summary Strengths Weaknesses Takeaways Thoughts, Ideas and Discussion starters
4
System on Chip System on a Chip (=SoC)
Every component on the same chip Small footprint Low power consumption Commonly used in Smartphones, Internet of Things, etc…
5
Network on Chip Network on Chip (=NoC) Connect components on SoC
Cores, caches, etc… Like in typical Computer Network
6
Network on Chip Network on Chip (=NoC) Connect components on SoC
Cores, caches, etc… Like in typical Computer Network Physical link
7
Network on Chip Network on Chip (=NoC) Connect components on SoC
Cores, caches, etc… Like in typical Computer Network Physical link Components
8
Network on Chip Network on Chip (=NoC) Connect components on SoC
Cores, caches, etc… Like in typical Computer Network Physical link Components Built in router E.g. CPU core Core Cache Router
9
Problem with Buffers Energy consumption is too high
Occupy chip area (75% of NoC) Increase design complexity Current approaches assume every router needs a buffer
10
Problem with Buffers Energy consumption is too high
Occupy chip area (75% of NoC) Increase design complexity Current approaches assume every router needs a buffer How can we save this energy?
11
Can we get rid of buffers?!
Problem with Buffers Energy consumption is too high Occupy chip area (75% of NoC) Increase design complexity Existing work assumes every router needs a buffer Can we get rid of buffers?! Is that really necessary?
12
Outline Background, Problem & Goal Key Approach and Ideas
Mechanisms (in some detail) Benefits and Limitations Key Results: Methodology and Evaluation Summary Strengths Weaknesses Takeaways Thoughts, Ideas and Discussion starters
13
Bufferless Routing “Hot potato”-routing Always route a packet
Too hot to keep (buffer) Always route a packet Links act as buffers Don’t care about the lowest distance keep packet moving Misroute if right output-port isn’t available (=deflection)
14
Main Advantages/Disadvantages
Small traffic volumes Number of collisions is low Rerouting is low Increase of bandwidth Decrease of latency Decrease of buffer-energy
15
Main Advantages/Disadvantages
Small traffic volumes Number of collisions is low Rerouting is low Increase of bandwidth Decrease of latency Decrease of buffer-energy
16
Main Advantages/Disadvantages
Large traffic volumes Collisions occur Packets get rerouted Reduction of bandwidth Increase of latency Increase of link/router-energy
17
Main Advantages/Disadvantages
Large traffic volumes Collisions occur Packets get rerouted Reduction of bandwidth Increase of latency Increase of link/router-energy T S
18
Main Advantages/Disadvantages
Large traffic volumes Collisions occur Packets get rerouted Reduction of bandwidth Increase of latency Increase of link/router-energy T S
19
Main Advantages/Disadvantages
Large traffic volumes Collisions occur Packets get rerouted Reduction of bandwidth Increase of latency Increase of link/router-energy T S
20
Main Advantages/Disadvantages
Large traffic volumes Collisions occur Packets get rerouted Reduction of bandwidth Increase of latency Increase of link/router-energy T S
21
Main Advantages/Disadvantages
Large traffic volumes Collisions occur Packets get rerouted Reduction of bandwidth Increase of latency Increase of link/router-energy T S
22
Main Advantages/Disadvantages
Large traffic volumes Collisions occur Packets get rerouted Reduction of bandwidth Increase of latency Increase of link/router-energy T S
23
Main Advantages/Disadvantages
Large traffic volumes Collisions occur Packets get rerouted Reduction of bandwidth Increase of latency Increase of link/router-energy T S
24
Main Advantages/Disadvantages
Large traffic volumes Collisions occur Packets get rerouted Reduction of bandwidth Increase of latency Increase of link/router-energy T S
25
Outline Background, Problem & Goal Key Approach and Ideas
Mechanisms (in some detail) Benefits and Limitations Key Results: Methodology and Evaluation Summary Strengths Weaknesses Takeaways Thoughts, Ideas and Discussion starters
26
Basic Algorithm: Flit-Level Routing (I)
Flow control units Large network packets broken into smaller pieces Each Flit can take a different path but is always forwarded Flit 1 Flit 2 Different paths are taken because not always the same output-ports are always free Flit 3
27
Basic Algorithm: Flit-Level Routing (II)
If no productive output-port is available, send/deflect flit to a non-productive output-port Input ports output ports Routers form a connected graph FLIT A Age 10 Dest. 1 FLIT B Age 5 Dest. 1 Input Port 1 Input Port 2 Bufferless Router Output Port 1 Output Port 2
28
Basic Algorithm: Flit-Level Routing (II)
If no productive output-port is available, send/deflect flit to a non-productive output-port Input ports output ports Routers form a connected graph FLIT A Age 10 Dest. 1 FLIT B Age 5 Dest. 1 Input Port 1 Input Port 2 Bufferless Router Output Port 1 Output Port 2
29
Basic Algorithm: Flit-Level Routing (II)
If no productive output-port is available, send/deflect flit to a non-productive output-port Input ports output ports Routers form a connected graph Bufferless Router FLIT A Age 10 Dest. 1 FLIT B Age 5 Dest. 1 Output Port 1 Output Port 2
30
Basic Algorithm: Flit-Level Routing (II)
If no productive output-port is available, send/deflect flit to a non-productive output-port Input ports output ports Routers form a connected graph Flit-Ranking Oldest first Avoids Livelocks Bufferless Router FLIT A Age 10 Dest. 1 FLIT B Age 5 Dest. 1 Output Port 1 Output Port 2
31
Basic Algorithm: Flit-Level Routing (II)
If no productive output-port is available, send/deflect flit to a non-productive output-port Input ports output ports Routers form a connected graph Flit-Ranking Oldest first Avoids Livelocks Bufferless Router FLIT A Priority 1 Dest. 1 FLIT B Priority 2 Dest. 1 Output Port 1 Output Port 2
32
Basic Algorithm: Flit-Level Routing (II)
If no productive output-port is available, send/deflect flit to a non-productive output-port Input ports output ports Routers form a connected graph Flit-Ranking Oldest first Avoids Livelocks Port-Prioritization Different for every flit Find the best output-ports Bufferless Router FLIT A Priority 1 Dest. 1 FLIT B Priority 2 Dest. 1 Output Port 1 Output Port 2
33
Basic Algorithm: Flit-Level Routing (II)
If no productive output-port is available, send/deflect flit to a non-productive output-port Input ports output ports Routers form a connected graph Flit-Ranking Oldest first Avoids Livelocks Port-Prioritization Different for every flit Find the best output-ports Bufferless Router FLIT A Priority 1 Port (1, 2) FLIT B Priority 2 Port (1, 2) Output Port 1 Output Port 2
34
Basic Algorithm: Flit-Level Routing (II)
If no productive output-port is available, send/deflect flit to a non-productive output-port Input ports output ports Routers form a connected graph Flit-Ranking Oldest first Avoids Livelocks Port-Prioritization Different for every flit Find the best output-ports Bufferless Router FLIT A Priority 1 Port (1, 2) FLIT B Priority 2 Port (1, 2) Output Port 1 Output Port 2
35
Basic Algorithm: Flit-Level Routing (II)
If no productive output-port is available, send/deflect flit to a non-productive output-port Input ports output ports Routers form a connected graph Flit-Ranking Oldest first Avoids Livelocks Port-Prioritization Different for every flit Find the best output-ports Bufferless Router FLIT A Priority 1 Port (1, 2) FLIT B Priority 2 Port (1, 2) Output Port 1 Output Port 2 Deflection
36
Basic Algorithm: Flit-Level Routing (II)
If no productive output-port is available, send/deflect flit to a non-productive output-port Input ports output ports Routers form a connected graph Flit-Ranking Oldest first Avoids Livelocks Port-Prioritization Different for every flit Find the best output-ports Limitations Each flit needs larger header Increase in receiver buffer size due to different paths Extra logic for reassembly at destination Bufferless Router FLIT A Priority 1 Port (1, 2) FLIT B Priority 2 Port (1, 2) Output Port 1 Output Port 2 Deflection
37
Optimized Version: BLESS Wormhole Routing
Only the first of each packet/worm contains the header-info All other flits of the packet follow the leading-flit Deflection Decides where to go Follow the “Head of the Worm”
38
Wormhole Routing: Injection Problem
Injection Problem (when is it safe to inject a new worm) Whenever not all input-ports are busy While inserting all input-ports become busy truncate worm Worm A Bufferless Router Input Port 2 Input Port 1 Output Port 2 Output Port 1
39
Wormhole Routing: Injection Problem
Injection Problem (when is it safe to inject a new worm) Whenever not all input-ports are busy While inserting all input-ports become busy truncate worm Worm A Bufferless Router Input Port 2 Input Port 1 Output Port 2 Output Port 1
40
Wormhole Routing: Injection Problem
Injection Problem (when is it safe to inject a new worm) Whenever not all input-ports are busy While inserting all input-ports become busy truncate worm Worm A Bufferless Router Input Port 2 Input Port 1 Output Port 2 Output Port 1 Worm B
41
Wormhole Routing: Injection Problem
Injection Problem (when is it safe to inject a new worm) Whenever not all input-ports are busy While inserting all input-ports become busy truncate worm Worm A Bufferless Router Input Port 2 Input Port 1 Output Port 2 Output Port 1 Worm B’ Worm C Has to wait until input-port gets available
42
Wormhole Routing: Livelock Problem
Livelock Problem (packets can be deflected forever) Head-Flit New output port must be allocated Unallocated, productive port worm makes progress Allocated, productive port other worm gets truncated Unallocated, non-productive port worm is deflected Allocated, non-productive port other worm gets truncated Non-head-Flit Flit is routed to same output-port as head-flit Other worms only get truncated if the current worm has a higher priority
43
Combined Version: BLESS with Buffers
If good performance at high bandwidth rates is desired Implement Buffers into FLIT-BLESS or WORM-BLESS Buffers reduce probability of misrouting If productive port isn’t available Buffer it Whenever an input-buffer is full, the oldest flit in the buffer becomes “must-schedule-flit” Must-schedule-flit must be send out in the next cycle Mechanism to avoid buffer-overflow
44
Outline Background, Problem & Goal Key Approach and Ideas
Mechanisms (in some detail) Benefits and Limitations Key Results: Methodology and Evaluation Summary Strengths Weaknesses Takeaways Thoughts, Ideas and Discussion starters
45
Benefits No buffers Simpler/cheaper chip design Area savings
Absence of Deadlocks # Input ports # Output ports packet will leave router Absence of Livelocks Oldest-first flit-ranking and port prioritization Router latency reduction
46
Benefits No buffers Simpler/cheaper chip design Area savings
Absence of Deadlocks # Input ports # Output ports packet will leave router Absence of Livelocks Oldest-first flit-ranking and port prioritization Router latency reduction -Buffer Write -Route Computation Channel/Switch Allocation Switch/Link Traversal - Route Computation - Switch/Link Traversal
47
Benefits No buffers Simpler/cheaper chip design Area savings
Absence of Deadlocks # Input ports # Output ports packet will leave router Absence of Livelocks Oldest-first flit-ranking and port prioritization Router latency reduction -Buffer Write -Route Computation Channel/Switch Allocation Switch/Link Traversal - Route Computation - Switch/Link Traversal Router with input-buffers Latency = 3
48
Benefits No buffers Simpler/cheaper chip design Area savings
Absence of Deadlocks # Input ports # Output ports packet will leave router Absence of Livelocks Oldest-first flit-ranking and port prioritization Router latency reduction -Buffer Write -Route Computation Channel/Switch Allocation Switch/Link Traversal - Route Computation - Switch/Link Traversal Router with input-buffers Latency = 3 BLESS bufferless Router Latency = 2
49
Limitations At high network utilization, deflections happen more often which causes unnecessary link/router traversals Reduces network throughput Increases latency Increases link/routing energy consumption
50
Outline Background, Problem & Goal Key Approach and Ideas
Mechanisms (in some detail) Benefits and Limitations Key Results: Methodology and Evaluation Summary Strengths Weaknesses Takeaways Thoughts, Ideas and Discussion starters
51
Evaluation Methodology
Cycle-accurate interconnection network simulator 5 input/output ports 1 Packet = 4 Flits Request generation: real world application Matlab (most network intense) Milc (=physical benchmark) H264ref (=video encoder benchmark)
52
Evaluation Methodology
Cycle-accurate interconnection network simulator 5 input/output ports 1 Packet = 4 Flits Request generation: real world application (e.g. Matlab) BLESS - Flit Level Routing - Wormhole Routing Baseline Routing - 3 different Algorithms Criteria Average packet delivery Maximum packet delivery Throughput Buffering requirements at the receiver Energy consumption
53
Results for homogenous Case: Matlab (I)
Performance decrease without buffers relatively small Injection rates of real applications relatively low Not many L1 misses
54
Results for homogenous Case: Matlab (I)
Performance decrease without buffers relatively small Injection rates of real applications relatively low Not many L1 misses
55
Results for homogenous Case: Matlab (I)
Performance decrease without buffers relatively small Injection rates of real applications relatively low Not many L1 misses Router algorithms with buffers
56
Results for homogenous Case: Matlab (I)
Performance decrease without buffers relatively small Injection rates of real applications relatively low Not many L1 misses FLIT Bufferless routing 2 cycle latency
57
Results for homogenous Case: Matlab (I)
Performance decrease without buffers relatively small Injection rates of real applications relatively low Not many L1 misses FLIT-WORM Bufferless routing 2 cycle latency
58
Results for homogenous Case: Matlab (I)
Performance decrease without buffers relatively small Injection rates of real applications relatively low Not many L1 misses FLIT Bufferless routing 1 cycle latency
59
Results for homogenous Case: Matlab (I)
Performance decrease without buffers relatively small Injection rates of real applications relatively low Not many L1 misses FLIT-WORM Bufferless routing 1 cycle latency
60
Results for homogenous Case: Matlab (I)
Performance decrease without buffers relatively small Injection rates of real applications relatively low Not many L1 misses 4x4 mesh-network 8 processors/instances
61
Results for homogenous Case: Matlab (II)
BLESS significantly reduces energy consumption
62
Results for homogenous Case: Matlab (II)
BLESS significantly reduces energy consumption
63
Results for homogenous Case: Matlab (II)
BLESS significantly reduces energy consumption Link/Router energy slightly higher due to deflections
64
Outline Background, Problem & Goal Key Approach and Ideas
Mechanisms (in some detail) Benefits and Limitations Key Results: Methodology and Evaluation Summary Strengths Weaknesses Takeaways Thoughts, Ideas and Discussion starters
65
Summary Problem: The on chip networks in system on chips use the most energy/physical area for packet buffers which are used for routing the packets from different components on the chip. Proposal: We use three completely new routing algorithms “FLIT-Level-Routing”, “Bless Wormhole Routing” and “Bless with Buffers” which aims to eliminate/reduce the need for buffers by deflecting packet inside the network. Results: Most of the time buffers are not needed on NoC Average performance decrease by only 0.5% Worst-case performance decrease by 3.2% Average network energy consumption decrease by 39.4% Area-savings of 60% BLESS achieves significant energy savings at low performance loss
66
Outline Background, Problem & Goal Key Approach and Ideas
Mechanisms (in some detail) Benefits and Limitations Key Results: Methodology and Evaluation Summary Strengths Weaknesses Takeaways Thoughts, Ideas and Discussion starters
67
Strengths Does not only use computer generated workload for evaluation
Video benchmark encoder 3D fluid benchmark Had an impact on current bufferless research Cited 377 times in other papers (last citation 29. October 2018) First paper which proposes variety of bufferless algorithms Buffers are everywhere: idea can be transferred to other areas Early evaluation of a problem that is more important than ever Smartphones & Internet of things Good foundation for further research
68
Outline Background, Problem & Goal Key Approach and Ideas
Mechanisms (in some detail) Benefits and Limitations Key Results: Methodology and Evaluation Summary Strengths Weaknesses Takeaways Thoughts, Ideas and Discussion starters
69
Weaknesses No explanation why certain programs for evaluations were chosen Matlab on SoC not typical What does Matlab compute? Always speaks of bufferless routing but there need to be more buffers at the receiver side How to reassembly packet with receiver buffer not covered Some critical features are not implemented Manual priorities for different packets Congestion control “Next generation on-chip networks: what kind of congestion control do we need?” by Onur Mutlu in 2010 Assumes no faulty routers/links
70
Outline Background, Problem & Goal Key Approach and Ideas
Mechanisms (in some detail) Benefits and Limitations Key Results: Methodology and Evaluation Summary Strengths Weaknesses Takeaways Thoughts, Ideas and Discussion starters
71
Takeaways Very important topic, especially today
Research about bufferless routing is still in progress Latest research paper published on 11th of June 2018 “High-performance 3D NoC bufferless router with approximate priority comparison” by Konstantinos Tatas BLESS is going into the right direction but it lacks some needed functions Built foundation for further research
72
Outline Background, Problem & Goal Key Approach and Ideas
Mechanisms (in some detail) Benefits and Limitations Key Results: Methodology and Evaluation Summary Strengths Weaknesses Takeaways Thoughts, Ideas and Discussion starters
73
Thoughts, Ideas and Discussion starters
Are there any questions?
74
Thoughts, Ideas and Discussion starters
In what other areas could bufferless routing be used? “Deflection routing in IP optical networks”, Guido Maier 2011 Optical data transfer is much faster than buffers Deflection routing as an alternative in an optical network without using buffers Today, optical networks use only a small fraction of the large capacity since switching, processing and storage technologies aren’t that fast
75
Thoughts, Ideas and Discussion starters
Other ideas to eliminate buffers without deflections? “Scarab: A single cycle adaptive routing an bufferless network”, M. Hayenga, Micro-42, 2009 Drop based bufferless routing Just drop packages when the router is congested Establish circuit-switched backend for requesting retransmits Requires extra links for the retransmit-requests
76
Thoughts, Ideas and Discussion starters
Other ideas to eliminate buffers without deflections? Ring based interconnect No routing is needed at all, just forward the packet inside the ring until it reaches the desired node Not suitable for large networks
77
Thoughts, Ideas and Discussion starters
Is switching between bufferless routing and routing with buffers a good idea (=Hybrid Routing)? “Adaptive flow control for robust performance and energy”, Jafri et al, Micro-43, 2010 Energy savings but no area savings Switch between bufferless deflection routing and buffered operation depending on the needed bandwidth
78
A Case for Bufferless Routing in On-Chip Networks
Onur Mutlu Carnegie Mellon University Thomas Moscibroda Microsoft Research 36th International Symposium on Computer Architecture June 22, 2009 Austin, Texas, USA Presented by Jonas Bokstaller 14 November 2018
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.