Download presentation
Presentation is loading. Please wait.
Published byΚλείτος Ιωάννου Modified over 6 years ago
1
Approaching Ideal NoC Latency with Pre-Configured Routes
George Michelogiannakis, Dionisios Pnevmatikatos and Manolis Katevenis Institute of Computer Science (ICS) Foundation for Research & Technology - Hellas (FORTH) P.O.Box 1385, Heraklion, Crete, GR GREECE
2
Introduction Problem: Latency NoCs impose.
Motivation: Latency introduced to every communication pair. Past work: Achieves 1 cycle/hop at 500 MHz. We extend speculation to routing decisions. Goal: Approach buffered wire latency. Fraction of cycle/hop. NoCs also help communication needs from any source to any destination. NoCs significantly affect area, latency, power, fault-tolerance. May 2007 ICS-FORTH, Greece
3
Our Approach 400 ps good scenario; 1 cycle otherwise. 130 nm library
Inverters approximately every 2 mm. 130 nm implementation library. May 2007 ICS-FORTH, Greece
4
Preferred Paths Each output has one preferred input.
This pref. I/O pair is connected by a single pre-enabled tri-state driver. Pre-enabling is crucial: 200 ps pre-enabled mux; 500 ps otherwise. Later check if flits correctly forwarded. Thus, preferred paths are formed. Reconfigurable at run-time. Custom routes (shapes) allowed. Numbers with 1 mm wire ahead. Our switche’s tristate drivers are slower because: 1). They are tri-states. 2). They fanout to more bits at next hop. One input may be preferred for many outputs. Therefore paths may split. They may not converge. Safe to prevent out-of-order delivery: 1). Old preferred path idle for 1 clock cycle. 2). New preferred path’s FIFO is empty. 3). No more flits in a packet are expected. May 2007 ICS-FORTH, Greece
5
Switch Architecture - Output
Config. & arbitration logic. Stores pref. path config. & arbitrates. 400 ps 1 cycle Pref. path pre-enabled tri-states. 5 inputs – don’t care why for now. FIFOs are served if preferred path has been idle for 1 clock cycle. 1 clock cycle per hop in non-preferred path. Routing combinational logic is placed at each input: We will see next slide. Determines if a flit needs to be enqueued. (dead? Incorrectly forwarded?) Reasons for architecture: 1). Mad postman operation. Dead flits not in FIFOs. 2). We don’t have VCs, therefore crosspoint FIFOs not head of line blocking. 3). One arbitration logic per output makes for simpler logic and shorter critical paths. Routing logic tri-state. Input FIFOs. Selectable when non-empty, or flit to be enqueued. May 2007 ICS-FORTH, Greece
6
Switch Architecture - Input
Dead flits: Incorrectly eagerly forwarded. Terminated at end of preferred path. Switch resembles a buffered crossbar. Preferred paths correspond to buffered crossbar’s cut-through paths. Input is not connected to the same port’s output. Preferred path bus restrictive in configuration and simultaneous forwards. Decides if flit needs to be enqueued. May 2007 ICS-FORTH, Greece
7
Routing Algorithm Deterministic routing employed.
Non-preferred paths follow XY routing. We slightly modify XY routing to handle preferred paths: Flit correctly eagerly forwarded if it approaches the destination in any axis. Flit considered dead otherwise. Deterministic routing needed: We need to know if the previous hop regarded this switch as the best next hop at the time, or simply eagerly forwarded wrongly. Global configuration knowledge too expensive. Deadlock-free: Preferred paths the only ones not to follow XY. They do not content for resources, except when congestion. In that case, serve them first. Flits may reach destination in preferred paths not in XY routing fashion. May 2007 ICS-FORTH, Greece
8
Routing Characteristics
Flits in preferred paths may not follow XY routing. Duplicate copies of a flit may be delivered. XY routing. Pref. paths. D Out of order if the flits’ path in a packet changes. We disallow it by applying new configuration safely. Safe to prevent out-of-order delivery: 1). Old preferred path idle for 1 clock cycle. 2). New preferred path’s FIFO is empty. 3). No more flits in a packet are expected. S May 2007 ICS-FORTH, Greece
9
NoC Topology – Bar Floorplan
Application: Tiled CPU and RAM blocks. Each switch is 6x6 and serves 4 PEs. Cannot receive at lowest latency from two adjacent RAMs at the same time. Vertical outputs are slightly more congested (address ins and Y traversal). Data to a PE are sent to both. Data from a PE go through this mini 2x1 switch. May 2007 ICS-FORTH, Greece
10
Bar Floorplan Would be 8x12: Vertical links drive address inputs.
2 PE data ports served by 1 switch port. Switch distance in Y: Approx 1.1mm. Minimum A: 170μm (13% overhead) RAM blocks 4096x32 bits, column mux 16. x μm2 May 2007 ICS-FORTH, Greece
11
Cross Floorplan A: 130μm B: 140μm (18% overhead). May 2007
Advantage: Switch distance in Y minimum. Disadvantage: Area efficiency drops in this case. Gain in X axis small. RAM blocks 4096x32 bits, column mux 16. x μm2 May 2007 ICS-FORTH, Greece
12
Layout Results 130 nm implementation library. Typical case.
Pref. path latency: ps. ps (incl. 1mm). 1 cycle/node otherwise. Past work: 1 cycle/node at 500 MHz. Clock frequency 667 MHz Flit width 39 FIFO lines 2 Number of FIFOs 30 Bar area overhead 13% Cross area overhead 18% Number of cells 15 K Number of gates 45 K Total dynamic power 80 mW 400 MHz worst case Unbuffered dummy 1mm wire: 135ps. Pre-configured mux: 200ps. Cross: A: 130μm B: 140μm (18% overhead). Bar: Minimum A: 170μm (13% overhead) Power: Full traffic scenario. Typical case voltage: 1.2V. Total FIFOs: 30 32 payload bits. It’s hard to work fast without sweating. May 2007 ICS-FORTH, Greece
13
Advanced Issues Deadlock & livelock freedom.
Constraints to prevent circle. Keep NoC functional in any case. Out-of-order delivery of flits in the same packet. Apply reconfiguration at a “safe” time. Adaptive routing. May 2007 ICS-FORTH, Greece
14
Future Work Synchronization issues – A flit may arrive at any time.
Impose preferred path constraints. Implement switch asynchronously. Evaluation in complete system. Implement fault-tolerance. Setting that works: Perhaps maximum number of preferred path hops. Asynchronous implement: Easy, but handshaking will delay us. Is handshaking necessary? Seems to be. Perhaps some kind of encoding. Evaluation complete system: Study exactly the benefits. Study a method for choosing preferred paths. Adaptive routing routes around problems (congestion etc). Is beneficial. Synchronization - steps to solve: 1). Find a setting that works – define maximum number of hops. 2). Synchronizers at destinations and switch combinational logic. 3). Use asynchronous FIFOs and make all logic asynchronous. Not hard. May 2007 ICS-FORTH, Greece
15
Conclusion We approach ideal latency.
By pre-enabled tri-state paths. Our NoC is a generalized “mad-postman” [C. R. Jesshope et al, 1989]. Our NoC is easily generalized – topology may need to be changed. Past NoC research can be applied for further optimizations. Latency independent of the clock cycle. May 2007 ICS-FORTH, Greece
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.