SMART: A Single- Cycle Reconfigurable NoC for SoC Applications -Jyoti Wadhwani Chia-Hsin Owen Chen, Sunghyun Park, Tushar Krishna, Suvinay Subramaniam,

Slides:



Advertisements
Similar presentations
Interconnection Networks: Flow Control and Microarchitecture.
Advertisements

Prof. Natalie Enright Jerger
QuT: A Low-Power Optical Network-on-chip
A Novel 3D Layer-Multiplexed On-Chip Network
Flattened Butterfly Topology for On-Chip Networks John Kim, James Balfour, and William J. Dally Presented by Jun Pang.
On-Chip Interconnects Alexander Grubb Jennifer Tam Jiri Simsa Harsha Simhadri Martha Mercaldi Kim, John D. Davis, Mark Oskin, and Todd Austin. “Polymorphic.
Reporter: Bo-Yi Shiu Date: 2011/05/27 Virtual Point-to-Point Connections for NoCs Mehdi Modarressi, Arash Tavakkol, and Hamid Sarbazi- Azad IEEE TRANSACTIONS.
Allocator Implementations for Network-on-Chip Routers Daniel U. Becker and William J. Dally Concurrent VLSI Architecture Group Stanford University.
Reconfigurable Network Topologies at Rack Scale
1 Lecture 17: On-Chip Networks Today: background wrap-up and innovations.
1 Lecture 23: Interconnection Networks Paper: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton.
L2 to Off-Chip Memory Interconnects for CMPs Presented by Allen Lee CS258 Spring 2008 May 14, 2008.
Firefly: Illuminating Future Network-on-Chip with Nanophotonics Yan Pan, Prabhat Kumar, John Kim †, Gokhan Memik, Yu Zhang, Alok Choudhary EECS Department.
MICRO-MODEM RELIABILITY SOLUTION FOR NOC COMMUNICATIONS Arkadiy Morgenshtein, Evgeny Bolotin, Israel Cidon, Avinoam Kolodny, Ran Ginosar Technion – Israel.
IP I/O Memory Hard Disk Single Core IP I/O Memory Hard Disk IP Bus Multi-Core IP R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R Networks.
MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group Computer Laboratory University of Cambridge, UK.
Lei Wang, Yuho Jin, Hyungjun Kim and Eun Jung Kim
Design of a High-Throughput Distributed Shared-Buffer NoC Router
1 Lecture 21: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
1 Lecture 13: Interconnection Networks Topics: flow control, router pipelines, case studies.
1 Lecture 25: Interconnection Networks Topics: flow control, router microarchitecture Final exam:  Dec 4 th 9am – 10:40am  ~15-20% on pre-midterm  post-midterm:
Predictive Load Balancing Reconfigurable Computing Group.
Issues in System-Level Direct Networks Jason D. Bakos.
Trace-Driven Optimization of Networks-on-Chip Configurations Andrew B. Kahng †‡ Bill Lin ‡ Kambiz Samadi ‡ Rohit Sunkam Ramanujam ‡ University of California,
1 Lecture 26: Interconnection Networks Topics: flow control, router microarchitecture.
Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University.
Low-Latency Virtual-Channel Routers for On-Chip Networks Robert Mullins, Andrew West, Simon Moore Presented by Sailesh Kumar.
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
Approaching Ideal NoC Latency with Pre-Configured Routes George Michelogiannakis, Dionisios Pnevmatikatos and Manolis Katevenis Institute of Computer Science.
29-Aug-154/598N: Computer Networks Switching and Forwarding Outline –Store-and-Forward Switches.
Communication issues for NOC By Farhadur Arifin. Objective: Future system of NOC will have strong requirment on reusability and communication performance.
José Vicente Escamilla José Flich Pedro Javier García 1.
Power Issues in On-chip Interconnection Networks Mojtaba Amiri Nov. 5, 2009.
Elastic-Buffer Flow-Control for On-Chip Networks
Networks-on-Chips (NoCs) Basics
International Symposium on Low Power Electronics and Design NoC Frequency Scaling with Flexible- Pipeline Routers Pingqiang Zhou, Jieming Yin, Antonia.
Déjà Vu Switching for Multiplane NoCs NOCS’12 University of Pittsburgh Ahmed Abousamra Rami MelhemAlex Jones.
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
LIBRA: Multi-mode On-Chip Network Arbitration for Locality-Oblivious Task Placement Gwangsun Kim Computer Science Department Korea Advanced Institute of.
Author : Jing Lin, Xiaola Lin, Liang Tang Publish Journal of parallel and Distributed Computing MAKING-A-STOP: A NEW BUFFERLESS ROUTING ALGORITHM FOR ON-CHIP.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic- Buffer On-Chip Networks.
A Lightweight Fault-Tolerant Mechanism for Network-on-Chip
CS 8501 Networks-on-Chip (NoCs) Lukasz Szafaryn 15 FEB 10.
© Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Switch Microarchitecture Basics.
Runtime Power Gating of On-Chip Routers Using Look-Ahead Routing
Enabling System-Level Modeling of Variation-Induced Faults in Networks-on-Chips Konstantinos Aisopos (Princeton, MIT) Chia-Hsin Owen Chen (MIT) Li-Shiuan.
Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.
Yu Cai Ken Mai Onur Mutlu
Lecture 16: Router Design
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
SCORES: A Scalable and Parametric Streams-Based Communication Architecture for Modular Reconfigurable Systems Abelardo Jara-Berrocal, Ann Gordon-Ross NSF.
Topology-aware QOS Support in Highly Integrated CMPs Boris Grot (UT-Austin) Stephen W. Keckler (NVIDIA/UT-Austin) Onur Mutlu (CMU) WIOSCA '10.
1 Lecture 15: NoC Innovations Today: power and performance innovations for NoCs.
1 Lecture 22: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
Predictive High-Performance Architecture Research Mavens (PHARM), Department of ECE The NoX Router Mitchell Hayenga Mikko Lipasti.
Flow Control Ben Abdallah Abderazek The University of Aizu
1 Lecture 29: Interconnection Networks Papers: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton Interconnect Design.
FlexiBuffer: Reducing Leakage Power in On-Chip Network Routers
Lecture 23: Interconnection Networks
Pablo Abad, Pablo Prieto, Valentin Puente, Jose-Angel Gregorio
Exploring Concentration and Channel Slicing in On-chip Network Router
Lecture 23: Router Design
OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel
Lecture 17: NoC Innovations
Network-on-Chip & NoCSim
Rahul Boyapati. , Jiayi Huang
Natalie Enright Jerger, Li Shiuan Peh, and Mikko Lipasti
Low-Latency Virtual-Channel Routers for On-Chip Networks Robert Mullins, Andrew West, Simon Moore Presented by Sailesh Kumar.
Presentation transcript:

SMART: A Single- Cycle Reconfigurable NoC for SoC Applications -Jyoti Wadhwani Chia-Hsin Owen Chen, Sunghyun Park, Tushar Krishna, Suvinay Subramaniam, Anantha P. Chandrakasan, Li-Shiuan Peh Department of Electrical and Computer Science, MIT, Cambridge

ECE 284 Spring Evolution of on-chip systems

ECE 284 Spring Challenges with this evolution Scaling “compute” possible: Moore’s Law What about communication network?

ECE 284 Spring More “hops” are bad At each hop: router Latency Power At system level delayed responses  delayed injection of fresh requests  overall shutdown  increased power budget

ECE 284 Spring Motivation 1mm Wires can be driven to multiple mm within a cycle using repeaters Number of hops in a cycle depends on the repeater circuit and wire parasitics NoCs should deliver Low latency High bandwidth Signaling at low-voltage swing can lower energy consumption and propagation delay Wire delay is much shorter than a typical router cycle time Can traverse multiple hops in a single cycle by bypassing buffering & arbitration at the routers 1mm with low power and area overhead Router cycle time = 500ps for a 2GHz clock Full-swing repeated wire delay ~ 100ps/mm  by bypassing the buffers, we can traverse 5mm in 1 clock cycle!

ECE 284 Spring Approaches to reduce on-chip latency Application-specific topology reconfiguration needed To bypass the buffering and arbitration at routers Topology can be reconfigured to match application-specific communication patterns at Design time Requires knowledge of all applications and their communication graphs at design time Overhead: wiring density to support dedicated links Runtime Computation of contention free routes allowing flits to bypass the queues This paper performs online reconfiguration of network routers at runtime, to enable different applications to run on tailored topologies

ECE 284 Spring SMART LINK Node X voltage locked to swing near the threshold voltage of INV1x without decrease in drive current Low-swing voltage level is determined by transistor sizes and link wire impedance  simulations performed across process corners

ECE 284 Spring SMART Router Microarchitecture SMART Crossbar If the MUX is preset to connect the incoming link to the crossbar, bypass path is enabled bypass path If the MUX is set to connect the input port buffer to the crossbar, bypass path is disabled Bypass path is disabled when the same output port is shared by multiple input ports

ECE 284 Spring SMART Flow The green and purple flows do not overlap with each other  traverse from the source to destination router in a single clock cycle The red and blue flows overlap  need to be stopped at the routers 9 and 10 to arbitrate for the shared crossbar ports Reverse credit mesh network: to keep track of the free VCs at the endpoint of an arbitrary SMART route For the blue flow, 3, 7 and 11 forward credits from NIC3 to the router 10’s East output port The VC queue of a router keeps track of the VCs at the input port of a router multiple hops away, and not just the neighbor

ECE 284 Spring Results SMART is compared against two baselines: Mesh: No reconfiguration Each hop takes 3 cycles in the router and 1 cycle in the link Dedicated: 1-cycle dedicated links tailored to each application At 2GHz, SMART NoC can traverse 8mm within a single clock cycle, i.e. 8 hops with 1mm cores SMART is 1.5 cycles off in performance from the Dedicated baseline. when one core acts as a source and another acts as a sink for most of the flows.

ECE 284 Spring Results Benefits of SMART are seen more when certain tasks are tied to specific cores, resulting in longer paths SMART NoC gives 60% latency savings and 2.2X power savings compared to the Mesh. Power savings are due to bypassing of buffers, low voltage signaling and clock gating at the routers

ECE 284 Spring Conclusion The paper proposes an NoC architecture that reconfigures and tailors a generic mesh topology for SoC applications at runtime a low-swing clockless repeated link circuit embedded within router crossbars that allows packets to bypass all the way from source to destination core within a single clock cycle

ECE 284 Spring Critiques/Comments Wire delay does not scale with the shrinking of transistors unlike gate delay. In multi-mode design (operating at different voltage levels) and wire resistance increasing with rise in temperature, careful transistor sizing in the repeater circuit is required by simulating across all PVT corners (not just process corners).

ECE 284 Spring THANK YOU