Presentation is loading. Please wait.

Presentation is loading. Please wait.

SMART: A Single- Cycle Reconfigurable NoC for SoC Applications -Jyoti Wadhwani Chia-Hsin Owen Chen, Sunghyun Park, Tushar Krishna, Suvinay Subramaniam,

Similar presentations


Presentation on theme: "SMART: A Single- Cycle Reconfigurable NoC for SoC Applications -Jyoti Wadhwani Chia-Hsin Owen Chen, Sunghyun Park, Tushar Krishna, Suvinay Subramaniam,"— Presentation transcript:

1 SMART: A Single- Cycle Reconfigurable NoC for SoC Applications -Jyoti Wadhwani Chia-Hsin Owen Chen, Sunghyun Park, Tushar Krishna, Suvinay Subramaniam, Anantha P. Chandrakasan, Li-Shiuan Peh Department of Electrical and Computer Science, MIT, Cambridge

2 ECE 284 Spring 20132 Evolution of on-chip systems

3 ECE 284 Spring 20133 Challenges with this evolution Scaling “compute” possible: Moore’s Law What about communication network?

4 ECE 284 Spring 20134 More “hops” are bad At each hop: router Latency Power At system level delayed responses  delayed injection of fresh requests  overall shutdown  increased power budget

5 ECE 284 Spring 20135 Motivation 1mm Wires can be driven to multiple mm within a cycle using repeaters Number of hops in a cycle depends on the repeater circuit and wire parasitics NoCs should deliver Low latency High bandwidth Signaling at low-voltage swing can lower energy consumption and propagation delay Wire delay is much shorter than a typical router cycle time Can traverse multiple hops in a single cycle by bypassing buffering & arbitration at the routers 1mm with low power and area overhead Router cycle time = 500ps for a 2GHz clock Full-swing repeated wire delay ~ 100ps/mm  by bypassing the buffers, we can traverse 5mm in 1 clock cycle!

6 ECE 284 Spring 20136 Approaches to reduce on-chip latency Application-specific topology reconfiguration needed To bypass the buffering and arbitration at routers Topology can be reconfigured to match application-specific communication patterns at Design time Requires knowledge of all applications and their communication graphs at design time Overhead: wiring density to support dedicated links Runtime Computation of contention free routes allowing flits to bypass the queues This paper performs online reconfiguration of network routers at runtime, to enable different applications to run on tailored topologies

7 ECE 284 Spring 20137 SMART LINK Node X voltage locked to swing near the threshold voltage of INV1x without decrease in drive current Low-swing voltage level is determined by transistor sizes and link wire impedance  simulations performed across process corners

8 ECE 284 Spring 20138 SMART Router Microarchitecture SMART Crossbar If the MUX is preset to connect the incoming link to the crossbar, bypass path is enabled bypass path If the MUX is set to connect the input port buffer to the crossbar, bypass path is disabled Bypass path is disabled when the same output port is shared by multiple input ports

9 ECE 284 Spring 20139 SMART Flow The green and purple flows do not overlap with each other  traverse from the source to destination router in a single clock cycle The red and blue flows overlap  need to be stopped at the routers 9 and 10 to arbitrate for the shared crossbar ports Reverse credit mesh network: to keep track of the free VCs at the endpoint of an arbitrary SMART route For the blue flow, 3, 7 and 11 forward credits from NIC3 to the router 10’s East output port The VC queue of a router keeps track of the VCs at the input port of a router multiple hops away, and not just the neighbor

10 ECE 284 Spring 201310 Results SMART is compared against two baselines: Mesh: No reconfiguration Each hop takes 3 cycles in the router and 1 cycle in the link Dedicated: 1-cycle dedicated links tailored to each application At 2GHz, SMART NoC can traverse 8mm within a single clock cycle, i.e. 8 hops with 1mm cores SMART is 1.5 cycles off in performance from the Dedicated baseline. when one core acts as a source and another acts as a sink for most of the flows.

11 ECE 284 Spring 201311 Results Benefits of SMART are seen more when certain tasks are tied to specific cores, resulting in longer paths SMART NoC gives 60% latency savings and 2.2X power savings compared to the Mesh. Power savings are due to bypassing of buffers, low voltage signaling and clock gating at the routers

12 ECE 284 Spring 201312 Conclusion The paper proposes an NoC architecture that reconfigures and tailors a generic mesh topology for SoC applications at runtime a low-swing clockless repeated link circuit embedded within router crossbars that allows packets to bypass all the way from source to destination core within a single clock cycle

13 ECE 284 Spring 201313 Critiques/Comments Wire delay does not scale with the shrinking of transistors unlike gate delay. In multi-mode design (operating at different voltage levels) and wire resistance increasing with rise in temperature, careful transistor sizing in the repeater circuit is required by simulating across all PVT corners (not just process corners).

14 ECE 284 Spring 201314 THANK YOU


Download ppt "SMART: A Single- Cycle Reconfigurable NoC for SoC Applications -Jyoti Wadhwani Chia-Hsin Owen Chen, Sunghyun Park, Tushar Krishna, Suvinay Subramaniam,"

Similar presentations


Ads by Google