Lecture 23: Router Design

Slides:



Advertisements
Similar presentations
Prof. Natalie Enright Jerger
Advertisements

A Novel 3D Layer-Multiplexed On-Chip Network
Evaluating Bufferless Flow Control for On-Chip Networks George Michelogiannakis, Daniel Sanchez, William J. Dally, Christos Kozyrakis Stanford University.
What is Flow Control ? Flow Control determines how a network resources, such as channel bandwidth, buffer capacity and control state are allocated to packet.
ECE 1749H: Interconnection Networks for Parallel Computer Architectures: Flow Control Prof. Natalie Enright Jerger.
Allocator Implementations for Network-on-Chip Routers Daniel U. Becker and William J. Dally Concurrent VLSI Architecture Group Stanford University.
Miguel Gorgues, Dong Xiang, Jose Flich, Zhigang Yu and Jose Duato Uni. Politecnica de Valencia, Spain School of Software, Tsinghua University, China, Achieving.
1 Lecture 17: On-Chip Networks Today: background wrap-up and innovations.
1 Lecture 12: Interconnection Networks Topics: dimension/arity, routing, deadlock, flow control.
1 Lecture 15: PCM, Networks Today: PCM wrap-up, projects discussion, on-chip networks background.
1 Lecture 23: Interconnection Networks Paper: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton.
CSE 291-a Interconnection Networks Lecture 12: Deadlock Avoidance (Cont’d) Router February 28, 2007 Prof. Chung-Kuan Cheng CSE Dept, UC San Diego Winter.
1 Lecture 16: On-Chip Networks Today: on-chip networks background.
Lei Wang, Yuho Jin, Hyungjun Kim and Eun Jung Kim
1 Lecture 21: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
1 Lecture 13: Interconnection Networks Topics: flow control, router pipelines, case studies.
1 Lecture 25: Interconnection Networks Topics: flow control, router microarchitecture Final exam:  Dec 4 th 9am – 10:40am  ~15-20% on pre-midterm  post-midterm:
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control Final exam reminders:  Plan well – attempt every question.
CSE 291-a Interconnection Networks Lecture 15: Router (cont’d) March 5, 2007 Prof. Chung-Kuan Cheng CSE Dept, UC San Diego Winter 2007 Transcribed by Ling.
1 Lecture 25: Interconnection Networks, Disks Topics: flow control, router microarchitecture, RAID.
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control.
1 Lecture 26: Interconnection Networks Topics: flow control, router microarchitecture.
1 Lecture 13: Interconnection Networks Topics: lots of background, recent innovations for power and performance.
1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,
Low-Latency Virtual-Channel Routers for On-Chip Networks Robert Mullins, Andrew West, Simon Moore Presented by Sailesh Kumar.
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
High Performance Embedded Computing © 2007 Elsevier Lecture 16: Interconnection Networks Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.
1 Lecture 23: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm Next semester:
On-Chip Networks and Testing
Elastic-Buffer Flow-Control for On-Chip Networks
Networks-on-Chips (NoCs) Basics
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
Author : Jing Lin, Xiaola Lin, Liang Tang Publish Journal of parallel and Distributed Computing MAKING-A-STOP: A NEW BUFFERLESS ROUTING ALGORITHM FOR ON-CHIP.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
Deadlock CEG 4131 Computer Architecture III Miodrag Bolic.
George Michelogiannakis, Prof. William J. Dally Concurrent architecture & VLSI group Stanford University Elastic Buffer Flow Control for On-chip Networks.
1 Lecture 26: Networks, Storage Topics: router microarchitecture, disks, RAID (Appendix D) Final exam: Monday 30 th Apr 10:30-12:30 Same rules as the midterm.
© Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Switch Microarchitecture Basics.
NC2 (No.4) 1 Undeliverable packets & solutions Deadlock: packets are unable to progress –Prevention, avoidance, recovery Livelock: packets cannot reach.
1 Lecture 15: Interconnection Routing Topics: deadlock, flow control.
Run-time Adaptive on-chip Communication Scheme 林孟諭 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan, R.O.C.
Yu Cai Ken Mai Onur Mutlu
Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one.
Lecture 16: Router Design
1 Lecture 15: NoC Innovations Today: power and performance innovations for NoCs.
1 Lecture 22: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
Virtual-Channel Flow Control William J. Dally
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix F)
Predictive High-Performance Architecture Research Mavens (PHARM), Department of ECE The NoX Router Mitchell Hayenga Mikko Lipasti.
Flow Control Ben Abdallah Abderazek The University of Aizu
1 Lecture 29: Interconnection Networks Papers: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton Interconnect Design.
1 Lecture 22: Interconnection Networks Topics: Routing, deadlock, flow control, virtual channels.
The network-on-chip protocol
Lecture 23: Interconnection Networks
Physical constraints (1/2)
Addressing: Router Design
Interconnection Networks: Flow Control
Lecture 16: On-Chip Networks
NoC Switch: Basic Design Principles &
Chapter 3 Part 3 Switching and Bridging
Lecture 17: NoC Innovations
Mechanics of Flow Control
Lecture: Interconnection Networks
Natalie Enright Jerger, Li Shiuan Peh, and Mikko Lipasti
CEG 4131 Computer Architecture III Miodrag Bolic
Low-Latency Virtual-Channel Routers for On-Chip Networks Robert Mullins, Andrew West, Simon Moore Presented by Sailesh Kumar.
Lecture: Networks Topics: TM wrap-up, networks.
Lecture: Interconnection Networks
Chapter 3 Part 3 Switching and Bridging
Lecture 25: Interconnection Networks
Presentation transcript:

Lecture 23: Router Design Papers: A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks, ISCA’06, Penn-State ViChaR: A Dynamic Virtual Channel Regulator for Network-on-Chip Routers, MICRO’06, Penn-State

Router Pipeline Four typical stages: RC routing computation: compute the output channel VA virtual-channel allocation: allocate VC for the head flit SA switch allocation: compete for output physical channel ST switch traversal: transfer data on output physical channel STALL Cycle 1 2 3 4 5 6 7 Head flit Body flit 1 Body flit 2 Tail flit RC VA SA ST RC VA SA SA ST -- -- SA ST -- -- -- SA ST -- -- SA ST -- -- -- SA ST -- -- SA ST -- -- -- SA ST

Flow Control VC allocation: when the tail flit is sent, the router knows that the downstream VC is free (or will soon be); the VC is therefore assigned to the next packet and those flits carry the VCid with them; the two routers need not exchange signals to agree on the VCid Head-of-Line (HoL) blocking: a flit at the head of the queue blocks flits (belonging to a different packet) behind it that could have progressed… example: if a VC holds multiple packets because the upstream node assumed the previous packet was handled (as above) Flow control mechanisms: Store-and-Forward: buffers/channels allocated per packet Cut-through: buffers/channels allocated per packet Wormhole: buffers allocated per flit; channels per packet Virtual channel: buffers/channels allocated per flit

Conventional Router Slide taken from presentation at OCIN’06

The RoCo Router

VC Allocation XY routing is deadlock-free; need a minimum of 8 VCs to allow every possible flit traversal: 2 dx, 2 dy, 2 txy, 1 Injxy, 1 Injyx XY-YX routing needs 2 more VCs to enable deadlock freedom Adaptive routing needs 12 VCs Additional constraints on VCs may lower performance

RoCo Router Key features: Early ejection mechanism for flits destined for the PE (saves 2 cycles since they don’t have to go through SA and xbar stages) Flits are steered to the appropriate crossbar thanks to routing info computed in previous stage – enables use of 2 2x2 crossbars instead of 1 5x5 crossbar Results show much lower contention probability for RoCo (?!) Need fewer and smaller arbiters: 2x2 xbar arbiter algorithm: (mirroring) Generic case: for each input port, one arbiter selects the winner for each output port, an arbiter selects the winner RoCo: for each input port, two arbiters select two winners for each 2x2 xbar, one arbiter selects the winner for one port and the outcome is mirrored on the 2nd port

Results

ViChaR Router buffers are a bottleneck: consume 64% of router leakage power consume up to 46% (54%) of total network power (area) high buffer depth (buffers per VC) prevents a packet from holding resources at multiple routers large number of VCs helps reduce contention under high load Primary contribution: instead of maintaining k buffers for each of the v virtual channels, maintain a unified storage of vk buffers and allow the number of VCs to dynamically vary between v and vk (buffer depth of k to 1)

Examples of Buffer Inefficiencies

Proposed Architecture

Unified Buffer Design What does it take to design such a router?

Unified Buffer Design A table to maintain the buffer entries for each VC Pointers to the head and tail of each VC A list of free buffer entries; a list of free VCs (some VCs are used as escape routes to avoid deadlock) The VCs are allocated in the upstream router – hence, when a VC is freed at a router, the upstream router is informed (this is not done in a conventional router) (process similar to credit flow to estimate buffer occupancy) Arbitration mechanism so packets can compete for the next channel

Arbitration

Area/Power

Results Salient results: With 16 buffers per input port, ViChaR out-performs the generic router by ~25%, with a 2% power increase With 8 buffers, ViChaR matches the performance of a 16-buffer generic router, yielding area/power savings of 30%/34%

Title Bullet