CSE 291-a Interconnection Networks Lecture 15: Router (cont’d) March 5, 2007 Prof. Chung-Kuan Cheng CSE Dept, UC San Diego Winter 2007 Transcribed by Ling.

Slides:



Advertisements
Similar presentations
Interconnection Networks: Flow Control and Microarchitecture.
Advertisements

Prof. Natalie Enright Jerger
CS 140 Lecture 16 System Designs Professor CK Cheng CSE Dept. UC San Diego 1.
A Novel 3D Layer-Multiplexed On-Chip Network
PRESENTED BY: PRIYANK GUPTA 04/02/2012 Generic Low Latency NoC Router Architecture for FPGA Computing Systems & A Complete Network on Chip Emulation Framework.
Network-on-Chip (2/2) Ben Abdallah Abderazek The University of Aizu
Flattened Butterfly: A Cost-Efficient Topology for High-Radix Networks ______________________________ John Kim, William J. Dally &Dennis Abts Presented.
What is Flow Control ? Flow Control determines how a network resources, such as channel bandwidth, buffer capacity and control state are allocated to packet.
ECE 1749H: Interconnection Networks for Parallel Computer Architectures: Flow Control Prof. Natalie Enright Jerger.
1 Lecture 17: On-Chip Networks Today: background wrap-up and innovations.
NETWORK ON CHIP ROUTER Students : Itzik Ben - shushan Jonathan Silber Instructor : Isaschar Walter Final presentation part A Winter 2006.
1 Lecture 12: Interconnection Networks Topics: dimension/arity, routing, deadlock, flow control.
1 Lecture 15: PCM, Networks Today: PCM wrap-up, projects discussion, on-chip networks background.
1 Lecture 23: Interconnection Networks Paper: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton.
Network based System on Chip Final Presentation Part B Performed by: Medvedev Alexey Supervisor: Walter Isaschar (Zigmond) Winter-Spring 2006.
CSE 291-a Interconnection Networks Lecture 12: Deadlock Avoidance (Cont’d) Router February 28, 2007 Prof. Chung-Kuan Cheng CSE Dept, UC San Diego Winter.
1 Lecture 16: On-Chip Networks Today: on-chip networks background.
Interconnection Networks Lecture 8: February 12, 2007 Prof. Chung-Kuan Cheng CSE Dept, UC San Diego Winter 2007 Transcribed by Wanping Zhang.
Network based System on Chip Part A Performed by: Medvedev Alexey Supervisor: Walter Isaschar (Zigmond) Winter-Spring 2006.
Lei Wang, Yuho Jin, Hyungjun Kim and Eun Jung Kim
Design of a High-Throughput Distributed Shared-Buffer NoC Router
1 Lecture 21: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
1 Lecture 13: Interconnection Networks Topics: flow control, router pipelines, case studies.
1 Lecture 25: Interconnection Networks Topics: flow control, router microarchitecture Final exam:  Dec 4 th 9am – 10:40am  ~15-20% on pre-midterm  post-midterm:
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control Final exam reminders:  Plan well – attempt every question.
CSE 291-a Interconnection Networks Lecture 10: Flow Control February 21, 2007 Prof. Chung-Kuan Cheng CSE Dept, UC San Diego Winter 2007 Transcribed by.
1 Lecture 25: Interconnection Networks, Disks Topics: flow control, router microarchitecture, RAID.
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control.
1 Lecture 26: Interconnection Networks Topics: flow control, router microarchitecture.
1 Lecture 13: Interconnection Networks Topics: lots of background, recent innovations for power and performance.
1 Indirect Adaptive Routing on Large Scale Interconnection Networks Nan Jiang, William J. Dally Computer System Laboratory Stanford University John Kim.
Dragonfly Topology and Routing
1 Lecture 23: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm Next semester:
Elastic-Buffer Flow-Control for On-Chip Networks
Networks-on-Chips (NoCs) Basics
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic- Buffer On-Chip Networks.
1 Lecture 26: Networks, Storage Topics: router microarchitecture, disks, RAID (Appendix D) Final exam: Monday 30 th Apr 10:30-12:30 Same rules as the midterm.
© Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Switch Microarchitecture Basics.
1 Lecture 15: Interconnection Routing Topics: deadlock, flow control.
Anshul Kumar, CSE IITD ECE729 : Advanced Computer Architecture Lecture 27, 28: Interconnection Mechanisms In Multiprocessors 29 th, 31 st March, 2010.
STORE AND FORWARD & CUT THROUGH FORWARD Switches can use different forwarding techniques— two of these are store-and-forward switching and cut-through.
Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one.
Lecture 16: Router Design
1 Lecture 22: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
Virtual-Channel Flow Control William J. Dally
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix F)
Predictive High-Performance Architecture Research Mavens (PHARM), Department of ECE The NoX Router Mitchell Hayenga Mikko Lipasti.
Network On Chip Cache Coherency Final presentation – Part A Students: Zemer Tzach Kalifon Ethan Kalifon Ethan Instructor: Walter Isaschar Instructor: Walter.
Network-on-Chip (2/2) Ben Abdallah, Abderazek The University of Aizu 1 KUST University, March 2011.
Flow Control Ben Abdallah Abderazek The University of Aizu
1 Lecture 29: Interconnection Networks Papers: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton Interconnect Design.
1 Lecture 22: Interconnection Networks Topics: Routing, deadlock, flow control, virtual channels.
The network-on-chip protocol
Lecture 23: Interconnection Networks
Physical constraints (1/2)
Interconnection Networks: Flow Control
Lecture 23: Router Design
Lecture 16: On-Chip Networks
NoC Switch: Basic Design Principles &
Network-on-Chip & NoCSim
Mechanics of Flow Control
Lecture: Interconnection Networks
EE 122: Lecture 7 Ion Stoica September 18, 2001.
Lecture: Networks Topics: TM wrap-up, networks.
Lecture: Interconnection Networks
Lecture 25: Interconnection Networks
Multiprocessors and Multi-computers
Hardware Microarchitecure Lecture-1 <Ch. 16,17,18,19>
Presentation transcript:

CSE 291-a Interconnection Networks Lecture 15: Router (cont’d) March 5, 2007 Prof. Chung-Kuan Cheng CSE Dept, UC San Diego Winter 2007 Transcribed by Ling Zhang

Topics  Router (cont’d) Output States  Router Pipelines and Stalls  Router Datapath Components Input Buffer Switches Output Buffer

Router  Output virtual channel state fields: G: Global state  I: idle  A: active  C: Waiting for credit I: Input VC  Input port and virtual channel that are forwarding flits to this output virtual channel. C: Credit Count  Number of free buffers available to hold flits from this virtual channel at the downstream node.

Router Pipeline  Each head flit must proceed through: RC: routing computation VA: virtual channel allocation SA: switch allocation ST: switch traversal  For body flit and tail flit: Only SA and ST are needed.

An example of router pipeline cycles HFRCVASAST B1SAST B2SAST TFSAST

Possible stalls in router pipeline  Packet stalls VC Busy: The head flit for one packet arrives before the tail flit of the previous packet has completed switch allocation. Route: Routing not completed. VA: VA not successful.  Flit stalls Switch busy: Switch allocation attempted but unsuccessful. Buffer empty: No flit available. Input buffer is empty. Credit: No credit available.

VA busy stall example cycle s HF(A)RCxxVASAST TF(B)SAST B1(A)SAST Virtual channel 0 is busy. Packet B holds it.

Switch busy stall example cycles HFRCVASAST B1SAST B2xSAST B3xSAST B2 fails to allocation switch in cycle 5.

Buffer empty stall example cycles HFRCVASAST B1SAST B2xSAST B3SAST B2 comes in late, and introduce 1 cycle stall.

Credit stall example credit HFSASTW1W2RCVASAST B1SASTW1W2SAST B2SASTW1W2SAST B3SASTW1W2SAST C of HFCTW1W2CU B4xxxxxxXSASTW1W2SAST C of B1CTW1W2CU B5xxxxxxXSASTW1W2SAST

Credit stall example  W1,W2 is the 2 cycles of time of flights between two routers.  A buffer is allocated to the headflit when it is in the upstream SA stage in cycle 1. This buffer cannot be reassigned to another flit until after the head flit leaves the downstream SA stage, freeing the buffer, and a credit reflecting the free buffer propagates back to the update stage in cycle 11. Body flit 4 uses this credit to enter the SA stage in cycle 12.  t crt = t f +t c +2T w +1 tf: flit pipeline delay, which is 4 cycles. tc: credit pipeline delay, which is 2 cycles. Tw: one way wire delay, which is 2 cycles. The total delay is 11cycles.

Usage of output virtual channel HF(A)RCVASASTW1W2RCVASAST TF(A)SASTW1W2SAST C of TFCTW1W2CU HF(B)RCxxxxxxxxxxxxVASAST C of HFCTW1W2CU HF(B)RCVASASTW1W2xRCVASAST Conservative approach Approach For few stalls

Usage of output virtual channel  The conservative approach is to wait until the downstream flit buffer for the virtual channel is completely empty, as indicated by the arrival of the credit from the tail flit. This avoids creating a dependency between the current packet and a packet occupying the downstream buffer.  If dependency is affordable, the virtual channel can be reallocated as soon as the tail flit of the previous packet completes the SA stage.

Router Datapath Components  Input buffer Central memory  Good usage, but long latency and small bandwidth Separated buffer  Inefficient usage, but good latency and bandwidth Separated buffer for each channel Multi-port memory

Router Datapath Components  Switch Input speedup by splitting the input

Switch  If k inputs are splitted into sk inputs, the throughput of the switch is:  If a switch has both input and output speedup, the throughput can be larger than one:

Router Datapath Components  Output buffers FIFO buffer with length of 2-4 flits is often sufficient to match the speed between the switch and the channel.