Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one.

Slides:



Advertisements
Similar presentations
System Integration and Performance
Advertisements

TRIPS Primary Memory System Simha Sethumadhavan 1.
Switching Techniques In large networks there might be multiple paths linking sender and receiver. Information may be switched as it travels through various.
Prof. Natalie Enright Jerger
1 IK1500 Communication Systems IK1330 Lecture 3: Networking Anders Västberg
What is Flow Control ? Flow Control determines how a network resources, such as channel bandwidth, buffer capacity and control state are allocated to packet.
1 Lecture 17: On-Chip Networks Today: background wrap-up and innovations.
What's inside a router? We have yet to consider the switching function of a router - the actual transfer of datagrams from a router's incoming links to.
1 Lecture 23: Interconnection Networks Paper: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton.
4-1 Network layer r transport segment from sending to receiving host r on sending side encapsulates segments into datagrams r on rcving side, delivers.
Network based System on Chip Final Presentation Part B Performed by: Medvedev Alexey Supervisor: Walter Isaschar (Zigmond) Winter-Spring 2006.
Chapter 4 Network Layer slides are modified from J. Kurose & K. Ross CPE 400 / 600 Computer Communication Networks Lecture 14.
1 Lecture 16: On-Chip Networks Today: on-chip networks background.
10 - Network Layer. Network layer r transport segment from sending to receiving host r on sending side encapsulates segments into datagrams r on rcving.
Network based System on Chip Part A Performed by: Medvedev Alexey Supervisor: Walter Isaschar (Zigmond) Winter-Spring 2006.
Inter Process Communication:  It is an essential aspect of process management. By allowing processes to communicate with each other: 1.We can synchronize.
1 Lecture 21: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
1 Lecture 13: Interconnection Networks Topics: flow control, router pipelines, case studies.
1 Lecture 25: Interconnection Networks Topics: flow control, router microarchitecture Final exam:  Dec 4 th 9am – 10:40am  ~15-20% on pre-midterm  post-midterm:
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control Final exam reminders:  Plan well – attempt every question.
CSE 291-a Interconnection Networks Lecture 15: Router (cont’d) March 5, 2007 Prof. Chung-Kuan Cheng CSE Dept, UC San Diego Winter 2007 Transcribed by Ling.
1 Lecture 25: Interconnection Networks, Disks Topics: flow control, router microarchitecture, RAID.
1 Lecture 26: Interconnection Networks Topics: flow control, router microarchitecture.
A Scalable, Cache-Based Queue Management Subsystem for Network Processors Sailesh Kumar, Patrick Crowley Dept. of Computer Science and Engineering.
Switching Techniques Student: Blidaru Catalina Elena.
Data Communications and Networking
29-Aug-154/598N: Computer Networks Switching and Forwarding Outline –Store-and-Forward Switches.
Networks-on-Chips (NoCs) Basics
QoS Support in High-Speed, Wormhole Routing Networks Mario Gerla, B. Kannan, Bruce Kwan, Prasasth Palanti,Simon Walton.
Sami Al-wakeel 1 Data Transmission and Computer Networks The Switching Networks.
 Circuit Switching  Packet Switching  Message Switching WCB/McGraw-Hill  The McGraw-Hill Companies, Inc., 1998.
1 Message passing architectures and routing CEG 4131 Computer Architecture III Miodrag Bolic Material for these slides is taken from the book: W. Dally,
Survey of Performance Analysis on Banyan Networks Written By Nathan D. Truhan Kent State University.
Network-on-Chip Introduction Axel Jantsch / Ingo Sander
Computer Networks with Internet Technology William Stallings
© Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Switch Microarchitecture Basics.
Packet switching network Data is divided into packets. Transfer of information as payload in data packets Packets undergo random delays & possible loss.
BZUPAGES.COM Presentation On SWITCHING TECHNIQUE Presented To; Sir Taimoor Presented By; Beenish Jahangir 07_04 Uzma Noreen 07_08 Tayyaba Jahangir 07_33.
Final Chapter Packet-Switching and Circuit Switching 7.3. Statistical Multiplexing and Packet Switching: Datagrams and Virtual Circuits 4. 4 Time Division.
Forwarding.
Unit III Bandwidth Utilization: Multiplexing and Spectrum Spreading In practical life the bandwidth available of links is limited. The proper utilization.
Lecture 16: Router Design
1 Lecture 22: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
Virtual-Channel Flow Control William J. Dally
1 Switching and Forwarding Sections Connecting More Than Two Hosts Multi-access link: Ethernet, wireless –Single physical link, shared by multiple.
Network Layer4-1 Chapter 4 Network Layer All material copyright J.F Kurose and K.W. Ross, All Rights Reserved Computer Networking: A Top Down.
Data Communication Networks Lec 13 and 14. Network Core- Packet Switching.
TCP/IP1 Address Resolution Protocol Internet uses IP address to recognize a computer. But IP address needs to be translated to physical address (NIC).
Network-on-Chip (2/2) Ben Abdallah, Abderazek The University of Aizu 1 KUST University, March 2011.
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
Flow Control Ben Abdallah Abderazek The University of Aizu
1 Lecture 29: Interconnection Networks Papers: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton Interconnect Design.
1 Lecture 22: Interconnection Networks Topics: Routing, deadlock, flow control, virtual channels.
Chapter 3 Part 3 Switching and Bridging
Topics discussed in this section:
The network-on-chip protocol
Lecture 23: Interconnection Networks
Physical constraints (1/2)
Interconnection Networks: Flow Control
Lecture 23: Router Design
Switching Techniques In large networks there might be multiple paths linking sender and receiver. Information may be switched as it travels through various.
Chapter 3 Part 3 Switching and Bridging
Mechanics of Flow Control
CMSC 611: Advanced Computer Architecture
Switching Techniques.
EE 122: Lecture 7 Ion Stoica September 18, 2001.
Lecture: Interconnection Networks
Chapter 3 Part 3 Switching and Bridging
Lecture 25: Interconnection Networks
Multiprocessors and Multi-computers
Presentation transcript:

Router Architecture

December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one terminal node to another The interface between the interconnection network and the terminals (client) is called network interface S T S T S T S T S T S T S T S T S T S T S T S T S T S T S T S T

December 21, 2015SoC Architecture3 Router Architecture The discussion concentrates on a typical virtual- channel router Modern routers are pipelined and work at the flit level Head flits proceed through buffer stages that perform routing and virtual channel allocation All flits pass through switch allocation and switch traversal stages Most routers use credits to allocate buffer space

December 21, 2015SoC Architecture4 A typical virtual channel router A routers functional blocks can be divided into Datapath: handles storage and movement of a packets payload Input buffers Switch Output buffers Control Plane: coordinating the movements of the packets through the resources of the datapath Route Computation VC Allocator Switch Allocator

December 21, 2015SoC Architecture5 A typical virtual channel router Routing VC Allocation Output Port Allocation Switch Allocation VC Deallocation Switching

December 21, 2015SoC Architecture6 A typical virtual channel router The input unit contains a set of flit buffers Maintains the state for each virtual channel G = Global State R = Route O = Output VC P = Pointers C = Credits Routin g VC Allocation Output Port Allocation Switch Allocation VC Deallocation Switchin g

December 21, 2015SoC Architecture7 Virtual channel state fields (Input) Routin g VC Allocation Output Port Allocation Switch Allocation VC Deallocation Switchin g

December 21, 2015SoC Architecture8 A typical virtual channel router During route computation the output port for the packet is determined Then the packet requests an output virtual channel from the virtual-channel allocator Routin g VC Allocation Output Port Allocation Switch Allocation VC Deallocation Switchin g

December 21, 2015SoC Architecture9 A typical virtual channel router Flits are forwarded via the virtual channel by allocating a time slot on the switch and output channel using the switch allocator Flits are forwarded to the appropriate output during this time slot The output unit forwards the flits to the next router in the packet’s path Routin g VC Allocation Output Port Allocation Switch Allocation VC Deallocation Switchin g

December 21, 2015SoC Architecture10 Virtual channel state fields (Output)

December 21, 2015SoC Architecture11 Packet Rate and Flit Rate The control of the router operates at two distinct frequencies Packet Rate (performed once per packet) Route computation Virtual-channel allocation Flit Rate (performed once per flit) Switch allocation Pointer and credit count update

December 21, 2015SoC Architecture12 The Router Pipeline A typical router pipeline includes the following stages RC (Routing Computation) VC (Virtual Channel Allocation) SA (Switch Allocation) ST (Switch Traversal) no pipeline stalls Routin g VC Allocation Output Port Allocation Switch Allocation VC Deallocation Switchin g

December 21, 2015SoC Architecture13 The Router Pipeline Cycle 0 Head flit arrives and the packet is directed to an virtual channel of the input port (G = I) no pipeline stalls

December 21, 2015SoC Architecture14 The Router Pipeline Cycle 1 Routing computation Virtual channel state changes to routing (G = R) Head flit enters RC-stage First body flit arrives at router no pipeline stalls

December 21, 2015SoC Architecture15 The Router Pipeline Cycle 2: Virtual Channel Allocation Route field (R) of virtual channel is updated Virtual channel state is set to “waiting for output virtual channel” (G = V) Head flit enters VA state First body flit enters RC stage Second body flit arrives at router no pipeline stalls

December 21, 2015SoC Architecture16 The Router Pipeline Cycle 2: Virtual Channel Allocation The result of the routing computation is input to the virtual channel allocator If successful, the allocator assigns a single output virtual channel The state of the virtual channel is set to active (G = A) no pipeline stalls

December 21, 2015SoC Architecture17 The Router Pipeline Cycle 3: Switch Allocation All further processing is done on a flit base Head flit enters SA stage Any active VA (G = A) that contains buffered flits (indicated by P) and has downstream buffers available (C > 0) bids for a single-flit time slot through the switch from its input VC to the output VC no pipeline stalls

December 21, 2015SoC Architecture18 The Router Pipeline Cycle 3: Switch Allocation If successful, pointer field is updated Credit field is decremented no pipeline stalls

December 21, 2015SoC Architecture19 The Router Pipeline Cycle 4: Switch Traversal Head flit traverses the switch Cycle 5: Head flit starts traversing the channel to the next router no pipeline stalls

December 21, 2015SoC Architecture20 The Router Pipeline Cycle 7: Tail traverses the switch Output VC set to idle Input VC set to idle (G = I), if buffer is empty Input VC set to routing (G = R), if another head flit is in the buffer no pipeline stalls

December 21, 2015SoC Architecture21 The Router Pipeline Only the head flits enter the RC and VC stages The body and tail flits are stored in the flit buffers until they can enter the SA stage no pipeline stalls

December 21, 2015SoC Architecture22 Pipeline Stalls Pipeline stalls can be divided into Packet stalls can occur if the virtual channel cannot advance to its R, V, or A state Flit stalls If a virtual channel is in active state and the flit cannot successfully complete switch allocation due to  Lack of flit  Lack of credit  Losing arbitration for the switch time slot

December 21, 2015SoC Architecture23 Example for Packet Stall Virtual-channel allocation stall Head flit of A can first enter the VA stage when the tail flit of packet B completes switch allocation and releases the virtual channel

December 21, 2015SoC Architecture24 Example for Flit Stalls Switch allocation stall Second body flit fails to allocate the requested connection in cycle 5

December 21, 2015SoC Architecture25 Example for Flit Stalls Buffer empty stall Body flit 2 is delayed three cycles. However, since it does not have to enter the RC and VA stage the output is only delayed one cycle!

December 21, 2015SoC Architecture26 Credits A buffer is allocated in the SA stage on the upstream (transmitting) node To reuse the buffer, a credit is returned over a reverse channel after the same flit departs the SA stage of the downstream (receiving) node When the credit reaches the input unit of the upstream node the buffer is available can be reused

December 21, 2015SoC Architecture27 Credits The credit loop can be viewed by means of a token that Starting at the SA stage of the upstream node Traveling downwards with the flit Reaching the SA stage at the downstream node Returning upstream as a credit

December 21, 2015SoC Architecture28 Credit Loop Latency The credit loop latency t crt, expressed in flit times, gives a lower bound on the number of flit buffers needed on the upstream size for the channel to operate with full bandwidth t crt in flit times is given by t crt = t f + t c + 2 T w + 1 Flit pipeline delay Credit pipeline delay One-way wire delay

December 21, 2015SoC Architecture29 Credit Loop Latency If the number of buffers available per virtual channel is F, the duty factor of the channel will be d = min (1, F/ t crt ) The duty factor will be 100% as long as there are sufficient flit buffers to cover the round trip latency

December 21, 2015SoC Architecture30 Credit Stall White: upstream pipeline stagesGrey: downstream pipeline stages t f = 4 t c = 2 T w = 2 => t crt = 11 Virtual Channel Router with 4 flit buffers tftf tftf tftf tftf tctc tctc TWTW TWTW TWTW TWTW Credit Transmit Credit Update t crt

December 21, 2015SoC Architecture31 Flit and Credit Encoding (a) Flits and credits are sent over separated lines with separate width (b) Flits and credits are transported via the same line. This can be done by Including credits into flits Multiplexing flits and credits at phit level

Network Interface Slides are adapted from previous slides by Ingo Sander and Axel Jantsch.

December 21, 2015SoC Architecture33 Network-on-Chip Information in the form of packets is routed via channels and switches from one terminal node to another The interface between the interconnection network and the terminals (client) is called network interface S T S T S T S T S T S T S T S T S T S T S T S T S T S T S T S T

December 21, 2015SoC Architecture34 Network-on-Chip Information in the form of packets is routed via channels and switches from one terminal node to another The interface between the interconnection network and the terminals (client) is called network interface S T S T S T S T S T S T S T S T S T S T S T S T S T S T S T S T

December 21, 2015SoC Architecture35 Network Interface Different terminals with different interfaces shall be connected to the network The network uses a specific protocol and all traffic on the network has to comply to the format of this protocol Switch Network Interface Terminal Node (Resource)

December 21, 2015SoC Architecture36 Network Interface The network interface plays an important role in a network-on-chip it shall translate between the terminal protocol and the protocol of the network it shall enable the client to communicate at the speed of the network it shall not further reduce the available bandwidth of the network it shall not increase the latency imposed by the network A poorly designed network interface is a bottleneck and can increase the latency considerably

December 21, 2015SoC Architecture37 Network Interfaces For message passging: symmetric Processor-Network Interface, For shared memory: un-symmetric, load & store Processor-Network Interface Memory-Network Interface Packet admission/ejection (line-fabric) Interface May reside in a switch or router Input queuing and output queuing

December 21, 2015SoC Architecture38 Basci Functionality of Network Interfaces Packetization/depacketization Network deliver packets. It does not know messages and transactions. Sender side: packetization (messages to packets); Receiver side: de-packetization (packets to messages) Multiplexing/demultiplexing Scheduling packets to be sent and receive Multiple threads running Sender: multiplexing; Receiver: de-multiplexing Re-ordering A network servcie may not guarantee order End-to-end flow control

December 21, 2015SoC Architecture39 Network Interfaces for message passing Two-register interface Register-mapped interface Descriptor-based interface Message reception

December 21, 2015SoC Architecture40 Two-Register Interface For sending, the processor write to a specific Net-out register For receiving, the processor reads a specific Net-in register Pro: Efficient for short messages Cons: Inefficient for long messages Processor acts as DMA controller Not safe, because, for longer messages, the processor may block network resources forever Network Net out Net in R0 R1 : R31 :

December 21, 2015SoC Architecture41 Descriptor Based Interface The processor composes a message in a set of dedicated message descriptor registers Each descriptor contains An immediate value, or A reference to a processor register, or A reference to a block of memory A co-processor steps through the descriptors and composes the messages Safe because the network is protected from the processor’s SW Immediate END RN Addr Length R0 R1 : : RN : R31 : : : : Memory + Send Start

December 21, 2015SoC Architecture42 Receiving Messages A co-processor or a dedicated thread is triggered upon reception of an incoming message It unpacks the message and stores it in local memory It informs the receiving task via an interrupt or a status register update

December 21, 2015SoC Architecture43 Shared Memory Interfaces The interconnection network is used to transmit memory read/write transactions between processors and memories We will further discuss Processor-Network Interface Memory-Network Interface

December 21, 2015SoC Architecture44 Processor-Network Interface Request are stored in request register Requests are tagged so that answer can be associated to request In case of a cache miss requests are stored in MSHR (miss status holding register)

December 21, 2015SoC Architecture45 Processor-Network Interface Uncacheable read request would result in a pending read After forming and transmitting the message status changes to read requested When the network returns the message status changes to read complete Completed MSHRs are forwarded to reply register, status changes to idle

December 21, 2015SoC Architecture46 Processor-Network Interface Cache coherence protocols change the operation of the processor-network interface 1. Complete cache lines are loaded into the cache 2. Protocol requires larger vocabulary Exclusive read request Invalidation and updating of cache lines 3. Cache coherence protocol requires interface to send messages and update state in response to received messages

December 21, 2015SoC Architecture47 Memory-Network Interface Interfaces receives memory request messages and sends replies Messages received from the network are stored in the TSHR (transaction status holding register)

December 21, 2015SoC Architecture48 Memory-Network Interface Request queue is used to hold request messages, when all THSRs are busy THSR tracks messages in same way as MHSR Bank Control and Message Transmit Unit monitors changes in THSR

December 21, 2015SoC Architecture49 Memory-Network Interface A read request initializes a TSHR with status read pending Subsequent memory access changes status to bank activated Two cycles before first word is returned from memory bank, status is changed to read complete Message transmit unit formats message and injects it into the network and the TSHR entry is marked idle Requests can be handled out of order

December 21, 2015SoC Architecture50 Memory-Network Interface Cache coherence protocols can be implemented with this structure, however TSHR must be extended

December 21, 2015SoC Architecture51 Packet Admission/Ejection (Line- Fabric) Interface Network has higher bandwidth than the input and output lines, but links may be blocked due to congestion. Packets aiming for different destinations come from the same input port. Queues are needed to store packets that cannot enter the network because of congestion in the network cannot enter the terminal

December 21, 2015SoC Architecture52 Packet Admission/Ejection Interface Why parallel queues rather than a single FIFO? If there are traffic classes with different priorities, there should be a queue for every traffic class high-priority traffic is not blocked by low-priority traffic Alleviate head-of-line blocking Implement an admission/ejection control policy based on priority, rate etc.

December 21, 2015SoC Architecture53 Summary Network interfaces bridge processor and processor, processor and memory Messaing passing interfces Shared memory interfaces, complicated by cache coherency. Packet admission and ejection interfaces at the network boundary are also important to use the network better (higher throughput, lower latency).