A Case for Bufferless Routing in On-Chip Networks

Slides:

Advertisements

Similar presentations

George Nychis✝, Chris Fallin✝, Thomas Moscibroda★, Onur Mutlu✝

Advertisements

Prof. Natalie Enright Jerger

1 On-Chip Networks from a Networking Perspective: Congestion and Scalability in Many-Core Interconnects George Nychis ✝, Chris Fallin ✝, Thomas Moscibroda.

A Novel 3D Layer-Multiplexed On-Chip Network

Application-Aware Memory Channel Partitioning † Sai Prashanth Muralidhara § Lavanya Subramanian † † Onur Mutlu † Mahmut Kandemir § ‡ Thomas Moscibroda.

4/17/20151 Improving Memory Bank-Level Parallelism in the Presence of Prefetching Chang Joo Lee Veynu Narasiman Onur Mutlu* Yale N. Patt Electrical and.

Evaluating Bufferless Flow Control for On-Chip Networks George Michelogiannakis, Daniel Sanchez, William J. Dally, Christos Kozyrakis Stanford University.

$ A Case for Bufferless Routing in On-Chip Networks A Case for Bufferless Routing in On-Chip Networks Onur Mutlu CMU TexPoint fonts used in EMF. Read the.

CS 408 Computer Networks Congestion Control (from Chapter 05)

CHIPPER: A Low-complexity Bufferless Deflection Router Chris Fallin Chris Craik Onur Mutlu.

1 Lecture 12: Interconnection Networks Topics: dimension/arity, routing, deadlock, flow control.

Network based System on Chip Part A Performed by: Medvedev Alexey Supervisor: Walter Isaschar (Zigmond) Winter-Spring 2006.

Lei Wang, Yuho Jin, Hyungjun Kim and Eun Jung Kim

1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control Final exam reminders:  Plan well – attempt every question.

Rotary Router : An Efficient Architecture for CMP Interconnection Networks Pablo Abad, Valentín Puente, Pablo Prieto, and Jose Angel Gregorio University.

1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control.

1 Indirect Adaptive Routing on Large Scale Interconnection Networks Nan Jiang, William J. Dally Computer System Laboratory Stanford University John Kim.

1 On-Chip Networks from a Networking Perspective: Congestion and Scalability in Many-Core Interconnects George Nychis ✝, Chris Fallin ✝, Thomas Moscibroda.

Dragonfly Topology and Routing

Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.

Switching, routing, and flow control in interconnection networks.

High Performance Embedded Computing © 2007 Elsevier Lecture 16: Interconnection Networks Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.

1 The Turn Model for Adaptive Routing. 2 Summary Introduction to Direct Networks. Deadlocks in Wormhole Routing. System Model. Partially Adaptive Routing.

Déjà Vu Switching for Multiplane NoCs NOCS’12 University of Pittsburgh Ahmed Abousamra Rami MelhemAlex Jones.

High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.

Author : Jing Lin, Xiaola Lin, Liang Tang Publish Journal of parallel and Distributed Computing MAKING-A-STOP: A NEW BUFFERLESS ROUTING ALGORITHM FOR ON-CHIP.

High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.

1 Message passing architectures and routing CEG 4131 Computer Architecture III Miodrag Bolic Material for these slides is taken from the book: W. Dally,

George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic- Buffer On-Chip Networks.

Design and Evaluation of Hierarchical Rings with Deflection Routing Rachata Ausavarungnirun, Chris Fallin, Xiangyao Yu, Kevin Chang, Greg Nazario, Reetuparna.

50 th Annual Allerton Conference, 2012 On the Capacity of Bufferless Networks-on-Chip Alex Shpiner, Erez Kantor, Pu Li, Israel Cidon and Isaac Keslassy.

1 Lecture 15: Interconnection Routing Topics: deadlock, flow control.

Yu Cai Ken Mai Onur Mutlu

Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.

Intel Slide 1 A Comparative Study of Arbitration Algorithms for the Alpha Pipelined Router Shubu Mukherjee*, Federico Silla !, Peter Bannon $, Joel.

1 Lecture 22: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.

Virtual-Channel Flow Control William J. Dally

1 Switching and Forwarding Sections Connecting More Than Two Hosts Multi-access link: Ethernet, wireless –Single physical link, shared by multiple.

HAT: Heterogeneous Adaptive Throttling for On-Chip Networks Kevin Kai-Wei Chang Rachata Ausavarungnirun Chris Fallin Onur Mutlu.

Flow Control Ben Abdallah Abderazek The University of Aizu

1 Lecture 22: Interconnection Networks Topics: Routing, deadlock, flow control, virtual channels.

Network layer (addendum) Slides adapted from material by Nick McKeown and Kevin Lai.

Chris Fallin Carnegie Mellon University

Lecture 23: Interconnection Networks

ESE532: System-on-a-Chip Architecture

Rachata Ausavarungnirun, Kevin Chang

Exploring Concentration and Channel Slicing in On-chip Network Router

Azeddien M. Sllame, Amani Hasan Abdelkader

Lecture 23: Router Design

Deadlock Free Hardware Router with Dynamic Arbiter

Mechanics of Flow Control

Switching, routing, and flow control in interconnection networks

Congestion Control (from Chapter 05)

On-time Network On-chip

Natalie Enright Jerger, Li Shiuan Peh, and Mikko Lipasti

Congestion Control (from Chapter 05)

CEG 4131 Computer Architecture III Miodrag Bolic

Low-Latency Virtual-Channel Routers for On-Chip Networks Robert Mullins, Andrew West, Simon Moore Presented by Sailesh Kumar.

EE 122: Lecture 7 Ion Stoica September 18, 2001.

Lecture: Interconnection Networks

Congestion Control (from Chapter 05)

CS 6290 Many-core & Interconnect

Congestion Control (from Chapter 05)

Congestion Control (from Chapter 05)

Lecture 25: Interconnection Networks

Congestion Control (from Chapter 05)

Congestion Control (from Chapter 05)

Presented by Florian Ettinger

Multiprocessors and Multi-computers

Presentation transcript:

A Case for Bufferless Routing in On-Chip Networks Onur Mutlu Carnegie Mellon University Thomas Moscibroda Microsoft Research 36th International Symposium on Computer Architecture ISCA 2009 Austin, Texas, USA Presented by Jonas Bokstaller 14 November 2018

Executive Summary Problem: The on chip networks in system on chips use the most energy/physical area for packet buffers which are used for routing the packets from different components on the chip. Proposal: We use three completely new routing algorithms “FLIT-Level-Routing”, “Bless Wormhole Routing” and “Bless with Buffers” which aims to eliminate/reduce the need for buffers by deflecting packet inside the network. Results: Most of the time buffers are not needed on NoC Average performance decrease by only 0.5% Worst-case performance decrease by 3.2% Average network energy consumption decrease by 39.4% Area-savings of 60%

Outline Background, Problem & Goal Key Approach and Ideas Mechanisms (in some detail) Benefits and Limitations Key Results: Methodology and Evaluation Summary Strengths Weaknesses Takeaways Thoughts, Ideas and Discussion starters

System on Chip System on a Chip (=SoC) Every component on the same chip Small footprint Low power consumption Commonly used in Smartphones, Internet of Things, etc…

Network on Chip Network on Chip (=NoC) Connect components on SoC Cores, caches, etc… Like in typical Computer Network

Network on Chip Network on Chip (=NoC) Connect components on SoC Cores, caches, etc… Like in typical Computer Network Physical link

Network on Chip Network on Chip (=NoC) Connect components on SoC Cores, caches, etc… Like in typical Computer Network Physical link Components

Network on Chip Network on Chip (=NoC) Connect components on SoC Cores, caches, etc… Like in typical Computer Network Physical link Components Built in router E.g. CPU core Core Cache Router

Problem with Buffers Energy consumption is too high Occupy chip area (75% of NoC) Increase design complexity Current approaches assume every router needs a buffer

Problem with Buffers Energy consumption is too high Occupy chip area (75% of NoC) Increase design complexity Current approaches assume every router needs a buffer How can we save this energy?

Can we get rid of buffers?! Problem with Buffers Energy consumption is too high Occupy chip area (75% of NoC) Increase design complexity Existing work assumes every router needs a buffer Can we get rid of buffers?! Is that really necessary?

Outline Background, Problem & Goal Key Approach and Ideas Mechanisms (in some detail) Benefits and Limitations Key Results: Methodology and Evaluation Summary Strengths Weaknesses Takeaways Thoughts, Ideas and Discussion starters

Bufferless Routing “Hot potato”-routing Always route a packet Too hot to keep (buffer) Always route a packet Links act as buffers Don’t care about the lowest distance  keep packet moving Misroute if right output-port isn’t available (=deflection)

Main Advantages/Disadvantages Small traffic volumes Number of collisions is low Rerouting is low Increase of bandwidth Decrease of latency Decrease of buffer-energy

Main Advantages/Disadvantages Small traffic volumes Number of collisions is low Rerouting is low Increase of bandwidth Decrease of latency Decrease of buffer-energy

Main Advantages/Disadvantages Large traffic volumes Collisions occur Packets get rerouted Reduction of bandwidth Increase of latency Increase of link/router-energy

Main Advantages/Disadvantages Large traffic volumes Collisions occur Packets get rerouted Reduction of bandwidth Increase of latency Increase of link/router-energy T S

Main Advantages/Disadvantages Large traffic volumes Collisions occur Packets get rerouted Reduction of bandwidth Increase of latency Increase of link/router-energy T S

Main Advantages/Disadvantages Large traffic volumes Collisions occur Packets get rerouted Reduction of bandwidth Increase of latency Increase of link/router-energy T S

Main Advantages/Disadvantages Large traffic volumes Collisions occur Packets get rerouted Reduction of bandwidth Increase of latency Increase of link/router-energy T S

Main Advantages/Disadvantages Large traffic volumes Collisions occur Packets get rerouted Reduction of bandwidth Increase of latency Increase of link/router-energy T S

Main Advantages/Disadvantages Large traffic volumes Collisions occur Packets get rerouted Reduction of bandwidth Increase of latency Increase of link/router-energy T S

Main Advantages/Disadvantages Large traffic volumes Collisions occur Packets get rerouted Reduction of bandwidth Increase of latency Increase of link/router-energy T S

Main Advantages/Disadvantages Large traffic volumes Collisions occur Packets get rerouted Reduction of bandwidth Increase of latency Increase of link/router-energy T S

Outline Background, Problem & Goal Key Approach and Ideas Mechanisms (in some detail) Benefits and Limitations Key Results: Methodology and Evaluation Summary Strengths Weaknesses Takeaways Thoughts, Ideas and Discussion starters

Basic Algorithm: Flit-Level Routing (I) Flow control units Large network packets broken into smaller pieces Each Flit can take a different path but is always forwarded Flit 1 Flit 2 Different paths are taken because not always the same output-ports are always free Flit 3

Basic Algorithm: Flit-Level Routing (II) If no productive output-port is available, send/deflect flit to a non-productive output-port Input ports  output ports Routers form a connected graph FLIT A Age 10 Dest. 1 FLIT B Age 5 Dest. 1 Input Port 1 Input Port 2 Bufferless Router Output Port 1 Output Port 2

Basic Algorithm: Flit-Level Routing (II) If no productive output-port is available, send/deflect flit to a non-productive output-port Input ports  output ports Routers form a connected graph FLIT A Age 10 Dest. 1 FLIT B Age 5 Dest. 1 Input Port 1 Input Port 2 Bufferless Router Output Port 1 Output Port 2

Basic Algorithm: Flit-Level Routing (II) If no productive output-port is available, send/deflect flit to a non-productive output-port Input ports  output ports Routers form a connected graph Bufferless Router FLIT A Age 10 Dest. 1 FLIT B Age 5 Dest. 1 Output Port 1 Output Port 2

Basic Algorithm: Flit-Level Routing (II) If no productive output-port is available, send/deflect flit to a non-productive output-port Input ports  output ports Routers form a connected graph Flit-Ranking Oldest first Avoids Livelocks Bufferless Router FLIT A Age 10 Dest. 1 FLIT B Age 5 Dest. 1 Output Port 1 Output Port 2

Basic Algorithm: Flit-Level Routing (II) If no productive output-port is available, send/deflect flit to a non-productive output-port Input ports  output ports Routers form a connected graph Flit-Ranking Oldest first Avoids Livelocks Bufferless Router FLIT A Priority 1 Dest. 1 FLIT B Priority 2 Dest. 1 Output Port 1 Output Port 2

Basic Algorithm: Flit-Level Routing (II) If no productive output-port is available, send/deflect flit to a non-productive output-port Input ports  output ports Routers form a connected graph Flit-Ranking Oldest first Avoids Livelocks Port-Prioritization Different for every flit Find the best output-ports Bufferless Router FLIT A Priority 1 Dest. 1 FLIT B Priority 2 Dest. 1 Output Port 1 Output Port 2

Basic Algorithm: Flit-Level Routing (II) If no productive output-port is available, send/deflect flit to a non-productive output-port Input ports  output ports Routers form a connected graph Flit-Ranking Oldest first Avoids Livelocks Port-Prioritization Different for every flit Find the best output-ports Bufferless Router FLIT A Priority 1 Port (1, 2) FLIT B Priority 2 Port (1, 2) Output Port 1 Output Port 2

Basic Algorithm: Flit-Level Routing (II) If no productive output-port is available, send/deflect flit to a non-productive output-port Input ports  output ports Routers form a connected graph Flit-Ranking Oldest first Avoids Livelocks Port-Prioritization Different for every flit Find the best output-ports Bufferless Router FLIT A Priority 1 Port (1, 2) FLIT B Priority 2 Port (1, 2) Output Port 1 Output Port 2

Basic Algorithm: Flit-Level Routing (II) If no productive output-port is available, send/deflect flit to a non-productive output-port Input ports  output ports Routers form a connected graph Flit-Ranking Oldest first Avoids Livelocks Port-Prioritization Different for every flit Find the best output-ports Bufferless Router FLIT A Priority 1 Port (1, 2) FLIT B Priority 2 Port (1, 2) Output Port 1 Output Port 2 Deflection

Basic Algorithm: Flit-Level Routing (II) If no productive output-port is available, send/deflect flit to a non-productive output-port Input ports  output ports Routers form a connected graph Flit-Ranking Oldest first Avoids Livelocks Port-Prioritization Different for every flit Find the best output-ports Limitations Each flit needs larger header Increase in receiver buffer size due to different paths Extra logic for reassembly at destination Bufferless Router FLIT A Priority 1 Port (1, 2) FLIT B Priority 2 Port (1, 2) Output Port 1 Output Port 2 Deflection

Optimized Version: BLESS Wormhole Routing Only the first of each packet/worm contains the header-info All other flits of the packet  follow the leading-flit Deflection Decides where to go Follow the “Head of the Worm”

Wormhole Routing: Injection Problem Injection Problem (when is it safe to inject a new worm) Whenever not all input-ports are busy While inserting all input-ports become busy  truncate worm Worm A Bufferless Router Input Port 2 Input Port 1 Output Port 2 Output Port 1

Wormhole Routing: Injection Problem Injection Problem (when is it safe to inject a new worm) Whenever not all input-ports are busy While inserting all input-ports become busy  truncate worm Worm A Bufferless Router Input Port 2 Input Port 1 Output Port 2 Output Port 1

Wormhole Routing: Injection Problem Injection Problem (when is it safe to inject a new worm) Whenever not all input-ports are busy While inserting all input-ports become busy  truncate worm Worm A Bufferless Router Input Port 2 Input Port 1 Output Port 2 Output Port 1 Worm B

Wormhole Routing: Injection Problem Injection Problem (when is it safe to inject a new worm) Whenever not all input-ports are busy While inserting all input-ports become busy  truncate worm Worm A Bufferless Router Input Port 2 Input Port 1 Output Port 2 Output Port 1 Worm B’ Worm C Has to wait until input-port gets available

Wormhole Routing: Livelock Problem Livelock Problem (packets can be deflected forever) Head-Flit New output port must be allocated Unallocated, productive port  worm makes progress Allocated, productive port  other worm gets truncated Unallocated, non-productive port  worm is deflected Allocated, non-productive port  other worm gets truncated Non-head-Flit Flit is routed to same output-port as head-flit Other worms only get truncated if the current worm has a higher priority

Combined Version: BLESS with Buffers If good performance at high bandwidth rates is desired Implement Buffers into FLIT-BLESS or WORM-BLESS Buffers reduce probability of misrouting If productive port isn’t available  Buffer it Whenever an input-buffer is full, the oldest flit in the buffer becomes “must-schedule-flit” Must-schedule-flit must be send out in the next cycle Mechanism to avoid buffer-overflow

Outline Background, Problem & Goal Key Approach and Ideas Mechanisms (in some detail) Benefits and Limitations Key Results: Methodology and Evaluation Summary Strengths Weaknesses Takeaways Thoughts, Ideas and Discussion starters

Benefits No buffers Simpler/cheaper chip design Area savings Absence of Deadlocks # Input ports  # Output ports  packet will leave router Absence of Livelocks Oldest-first flit-ranking and port prioritization Router latency reduction

Benefits No buffers Simpler/cheaper chip design Area savings Absence of Deadlocks # Input ports  # Output ports  packet will leave router Absence of Livelocks Oldest-first flit-ranking and port prioritization Router latency reduction -Buffer Write -Route Computation Channel/Switch Allocation Switch/Link Traversal - Route Computation - Switch/Link Traversal

Benefits No buffers Simpler/cheaper chip design Area savings Absence of Deadlocks # Input ports  # Output ports  packet will leave router Absence of Livelocks Oldest-first flit-ranking and port prioritization Router latency reduction -Buffer Write -Route Computation Channel/Switch Allocation Switch/Link Traversal - Route Computation - Switch/Link Traversal Router with input-buffers Latency = 3

Benefits No buffers Simpler/cheaper chip design Area savings Absence of Deadlocks # Input ports  # Output ports  packet will leave router Absence of Livelocks Oldest-first flit-ranking and port prioritization Router latency reduction -Buffer Write -Route Computation Channel/Switch Allocation Switch/Link Traversal - Route Computation - Switch/Link Traversal Router with input-buffers Latency = 3 BLESS bufferless Router Latency = 2

Limitations At high network utilization, deflections happen more often which causes unnecessary link/router traversals Reduces network throughput Increases latency Increases link/routing energy consumption

Outline Background, Problem & Goal Key Approach and Ideas Mechanisms (in some detail) Benefits and Limitations Key Results: Methodology and Evaluation Summary Strengths Weaknesses Takeaways Thoughts, Ideas and Discussion starters

Evaluation Methodology Cycle-accurate interconnection network simulator 5 input/output ports 1 Packet = 4 Flits Request generation: real world application Matlab (most network intense) Milc (=physical benchmark) H264ref (=video encoder benchmark)

Evaluation Methodology Cycle-accurate interconnection network simulator 5 input/output ports 1 Packet = 4 Flits Request generation: real world application (e.g. Matlab) BLESS - Flit Level Routing - Wormhole Routing Baseline Routing - 3 different Algorithms Criteria Average packet delivery Maximum packet delivery Throughput Buffering requirements at the receiver Energy consumption

Results for homogenous Case: Matlab (I) Performance decrease without buffers relatively small Injection rates of real applications relatively low  Not many L1 misses

Results for homogenous Case: Matlab (I) Performance decrease without buffers relatively small Injection rates of real applications relatively low  Not many L1 misses

Results for homogenous Case: Matlab (I) Performance decrease without buffers relatively small Injection rates of real applications relatively low  Not many L1 misses Router algorithms with buffers

Results for homogenous Case: Matlab (I) Performance decrease without buffers relatively small Injection rates of real applications relatively low  Not many L1 misses FLIT Bufferless routing 2 cycle latency

Results for homogenous Case: Matlab (I) Performance decrease without buffers relatively small Injection rates of real applications relatively low  Not many L1 misses FLIT-WORM Bufferless routing 2 cycle latency

Results for homogenous Case: Matlab (I) Performance decrease without buffers relatively small Injection rates of real applications relatively low  Not many L1 misses FLIT Bufferless routing 1 cycle latency

Results for homogenous Case: Matlab (I) Performance decrease without buffers relatively small Injection rates of real applications relatively low  Not many L1 misses FLIT-WORM Bufferless routing 1 cycle latency

Results for homogenous Case: Matlab (I) Performance decrease without buffers relatively small Injection rates of real applications relatively low  Not many L1 misses 4x4 mesh-network 8 processors/instances

Results for homogenous Case: Matlab (II) BLESS significantly reduces energy consumption

Results for homogenous Case: Matlab (II) BLESS significantly reduces energy consumption

Results for homogenous Case: Matlab (II) BLESS significantly reduces energy consumption Link/Router energy slightly higher due to deflections

Outline Background, Problem & Goal Key Approach and Ideas Mechanisms (in some detail) Benefits and Limitations Key Results: Methodology and Evaluation Summary Strengths Weaknesses Takeaways Thoughts, Ideas and Discussion starters

Summary Problem: The on chip networks in system on chips use the most energy/physical area for packet buffers which are used for routing the packets from different components on the chip. Proposal: We use three completely new routing algorithms “FLIT-Level-Routing”, “Bless Wormhole Routing” and “Bless with Buffers” which aims to eliminate/reduce the need for buffers by deflecting packet inside the network. Results: Most of the time buffers are not needed on NoC Average performance decrease by only 0.5% Worst-case performance decrease by 3.2% Average network energy consumption decrease by 39.4% Area-savings of 60%  BLESS achieves significant energy savings at low performance loss

Outline Background, Problem & Goal Key Approach and Ideas Mechanisms (in some detail) Benefits and Limitations Key Results: Methodology and Evaluation Summary Strengths Weaknesses Takeaways Thoughts, Ideas and Discussion starters

Strengths Does not only use computer generated workload for evaluation Video benchmark encoder 3D fluid benchmark Had an impact on current bufferless research Cited 377 times in other papers (last citation 29. October 2018) First paper which proposes variety of bufferless algorithms Buffers are everywhere: idea can be transferred to other areas Early evaluation of a problem that is more important than ever  Smartphones & Internet of things Good foundation for further research

Outline Background, Problem & Goal Key Approach and Ideas Mechanisms (in some detail) Benefits and Limitations Key Results: Methodology and Evaluation Summary Strengths Weaknesses Takeaways Thoughts, Ideas and Discussion starters

Weaknesses No explanation why certain programs for evaluations were chosen Matlab on SoC not typical What does Matlab compute? Always speaks of bufferless routing but there need to be more buffers at the receiver side  How to reassembly packet with receiver buffer not covered Some critical features are not implemented Manual priorities for different packets Congestion control “Next generation on-chip networks: what kind of congestion control do we need?” by Onur Mutlu in 2010 Assumes no faulty routers/links

Outline Background, Problem & Goal Key Approach and Ideas Mechanisms (in some detail) Benefits and Limitations Key Results: Methodology and Evaluation Summary Strengths Weaknesses Takeaways Thoughts, Ideas and Discussion starters

Takeaways Very important topic, especially today Research about bufferless routing is still in progress Latest research paper published on 11th of June 2018 “High-performance 3D NoC bufferless router with approximate priority comparison” by Konstantinos Tatas BLESS is going into the right direction but it lacks some needed functions Built foundation for further research

Outline Background, Problem & Goal Key Approach and Ideas Mechanisms (in some detail) Benefits and Limitations Key Results: Methodology and Evaluation Summary Strengths Weaknesses Takeaways Thoughts, Ideas and Discussion starters

Thoughts, Ideas and Discussion starters Are there any questions?

Thoughts, Ideas and Discussion starters In what other areas could bufferless routing be used? “Deflection routing in IP optical networks”, Guido Maier 2011 Optical data transfer is much faster than buffers Deflection routing as an alternative in an optical network without using buffers Today, optical networks use only a small fraction of the large capacity since switching, processing and storage technologies aren’t that fast

Thoughts, Ideas and Discussion starters Other ideas to eliminate buffers without deflections? “Scarab: A single cycle adaptive routing an bufferless network”, M. Hayenga, Micro-42, 2009 Drop based bufferless routing Just drop packages when the router is congested Establish circuit-switched backend for requesting retransmits Requires extra links for the retransmit-requests

Thoughts, Ideas and Discussion starters Other ideas to eliminate buffers without deflections? Ring based interconnect No routing is needed at all, just forward the packet inside the ring until it reaches the desired node Not suitable for large networks

Thoughts, Ideas and Discussion starters Is switching between bufferless routing and routing with buffers a good idea (=Hybrid Routing)? “Adaptive flow control for robust performance and energy”, Jafri et al, Micro-43, 2010 Energy savings but no area savings Switch between bufferless deflection routing and buffered operation depending on the needed bandwidth

A Case for Bufferless Routing in On-Chip Networks Onur Mutlu Carnegie Mellon University Thomas Moscibroda Microsoft Research 36th International Symposium on Computer Architecture June 22, 2009 Austin, Texas, USA Presented by Jonas Bokstaller 14 November 2018