OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel

Slides:



Advertisements
Similar presentations
Interconnection Networks: Flow Control and Microarchitecture.
Advertisements

QuT: A Low-Power Optical Network-on-chip
A Novel 3D Layer-Multiplexed On-Chip Network
Presentation of Designing Efficient Irregular Networks for Heterogeneous Systems-on-Chip by Christian Neeb and Norbert Wehn and Workload Driven Synthesis.
Flattened Butterfly Topology for On-Chip Networks John Kim, James Balfour, and William J. Dally Presented by Jun Pang.
Allocator Implementations for Network-on-Chip Routers Daniel U. Becker and William J. Dally Concurrent VLSI Architecture Group Stanford University.
1 Lecture 17: On-Chip Networks Today: background wrap-up and innovations.
NETWORK ON CHIP ROUTER Students : Itzik Ben - shushan Jonathan Silber Instructor : Isaschar Walter Final presentation part A Winter 2006.
1 Lecture 23: Interconnection Networks Paper: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton.
Lei Wang, Yuho Jin, Hyungjun Kim and Eun Jung Kim
1 Link Division Multiplexing (LDM) for NoC Links IEEE 2006 LDM Link Division Multiplexing Arkadiy Morgenshtein, Avinoam Kolodny, Ran Ginosar Technion –
Network-on-Chip Examples System-on-Chip Group, CSE-IMM, DTU.
Orion: A Power-Performance Simulator for Interconnection Networks Presented by: Ilya Tabakh RC Reading Group4/19/2006.
1 Indirect Adaptive Routing on Large Scale Interconnection Networks Nan Jiang, William J. Dally Computer System Laboratory Stanford University John Kim.
Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University.
Low-Latency Virtual-Channel Routers for On-Chip Networks Robert Mullins, Andrew West, Simon Moore Presented by Sailesh Kumar.
Dragonfly Topology and Routing
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
Localized Asynchronous Packet Scheduling for Buffered Crossbar Switches Deng Pan and Yuanyuan Yang State University of New York Stony Brook.
Approaching Ideal NoC Latency with Pre-Configured Routes George Michelogiannakis, Dionisios Pnevmatikatos and Manolis Katevenis Institute of Computer Science.
High Performance Embedded Computing © 2007 Elsevier Lecture 16: Interconnection Networks Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.
José Vicente Escamilla José Flich Pedro Javier García 1.
Elastic-Buffer Flow-Control for On-Chip Networks
International Symposium on Low Power Electronics and Design NoC Frequency Scaling with Flexible- Pipeline Routers Pingqiang Zhou, Jieming Yin, Antonia.
Déjà Vu Switching for Multiplane NoCs NOCS’12 University of Pittsburgh Ahmed Abousamra Rami MelhemAlex Jones.
SMART: A Single- Cycle Reconfigurable NoC for SoC Applications -Jyoti Wadhwani Chia-Hsin Owen Chen, Sunghyun Park, Tushar Krishna, Suvinay Subramaniam,
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
Author : Jing Lin, Xiaola Lin, Liang Tang Publish Journal of parallel and Distributed Computing MAKING-A-STOP: A NEW BUFFERLESS ROUTING ALGORITHM FOR ON-CHIP.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic- Buffer On-Chip Networks.
A Lightweight Fault-Tolerant Mechanism for Network-on-Chip
George Michelogiannakis, Prof. William J. Dally Concurrent architecture & VLSI group Stanford University Elastic Buffer Flow Control for On-chip Networks.
Design and Evaluation of Hierarchical Rings with Deflection Routing Rachata Ausavarungnirun, Chris Fallin, Xiangyao Yu, ​ Kevin Chang, Greg Nazario, Reetuparna.
O1TURN : Near-Optimal Worst-Case Throughput Routing for 2D-Mesh Networks DaeHo Seo, Akif Ali, WonTaek Lim Nauman Rafique, Mithuna Thottethodi School of.
CS 8501 Networks-on-Chip (NoCs) Lukasz Szafaryn 15 FEB 10.
Axel Jantsch 1 Networks on Chip Axel Jantsch 1 Shashi Kumar 1, Juha-Pekka Soininen 2, Martti Forsell 2, Mikael Millberg 1, Johnny Öberg 1, Kari Tiensurjä.
University of Michigan, Ann Arbor
Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.
Yu Cai Ken Mai Onur Mutlu
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
Team LDPC, SoC Lab. Graduate Institute of CSIE, NTU Implementing LDPC Decoding on Network-On-Chip T. Theocharides, G. Link, N. Vijaykrishnan, M. J. Irwin.
Intel Slide 1 A Comparative Study of Arbitration Algorithms for the Alpha Pipelined Router Shubu Mukherjee*, Federico Silla !, Peter Bannon $, Joel.
1 Lecture 15: NoC Innovations Today: power and performance innovations for NoCs.
Virtual-Channel Flow Control William J. Dally
Predictive High-Performance Architecture Research Mavens (PHARM), Department of ECE The NoX Router Mitchell Hayenga Mikko Lipasti.
1 Power-Aware System on a Chip A. Laffely, J. Liang, R. Tessier, C. A. Moritz, W. Burleson University of Massachusetts Amherst Boston Area Architecture.
Network On Chip Cache Coherency Final presentation – Part A Students: Zemer Tzach Kalifon Ethan Kalifon Ethan Instructor: Walter Isaschar Instructor: Walter.
A Low-Area Interconnect Architecture for Chip Multiprocessors Zhiyi Yu and Bevan Baas VLSI Computation Lab ECE Department, UC Davis.
1 Lecture 29: Interconnection Networks Papers: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton Interconnect Design.
Mohamed Abdelfattah Vaughn Betz
How to Train your Dragonfly
FlexiBuffer: Reducing Leakage Power in On-Chip Network Routers
Lecture 23: Interconnection Networks
ESE532: System-on-a-Chip Architecture
Pablo Abad, Pablo Prieto, Valentin Puente, Jose-Angel Gregorio
Exploring Concentration and Channel Slicing in On-chip Network Router
Azeddien M. Sllame, Amani Hasan Abdelkader
Rethinking NoCs for Spatial Neural Network Accelerators
OpenSMART: An Opensource Single-cycle Multi-hop NoC Generator
Lecture 23: Router Design
Lecture 17: NoC Innovations
Network-on-Chip & NoCSim
Approaching Ideal NoC Latency with Pre-Configured Routes
Rahul Boyapati. , Jiayi Huang
Deadlock Free Hardware Router with Dynamic Arbiter
Israel Cidon, Ran Ginosar and Avinoam Kolodny
On-time Network On-chip
Natalie Enright Jerger, Li Shiuan Peh, and Mikko Lipasti
Low-Latency Virtual-Channel Routers for On-Chip Networks Robert Mullins, Andrew West, Simon Moore Presented by Sailesh Kumar.
Hyoukjun Kwon*, Michael Pellauer**, and Tushar Krishna*
Presentation transcript:

OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel Hyoukjun Kwon and Tushar Krishna Georgia Institute of Technology Synergy Lab (http://synergy.ece.gatech.edu) hyoukjun@gatech.edu April 25, 2017

Hardware Development Cost There has been no culture of IP reuse because companies try to squeeze as much performance as possible by reimplmeneting each IP for specific systems source: Todd Austin, Micro-49 keynote Low cost challenge

Many-IP Heterogeneous System Network-on-Chip (NoC) Scalability challenge Flexibility challenge

Diverse System Requirements Throughput Critical Latency Critical source: MNIST, Engadget, TheStack

Challenges for NoCs Low-cost Scalability Flexibility Low design/verification costs of custom/generic NoCs Design automation of high-performance, low-energy NoCs Scalability Many-IP heterogeneous system support Low latency Low energy Low area Flexibility Diverse connectivity Diverse latency/throughput requirements

OpenSMART OpenSMART

Outline Motivation: Scalable, Flexible, and Low-cost NoCs Background: SMART NoCs OpenSMART Design Flow Building Blocks Walk-through Examples Case Studies Mesh vs. SMART High-radix vs. Low-radix Conclusions

Outline Motivation: Scalable, Flexible, and Low-cost NoCs Background: SMART NoCs OpenSMART Design Flow Building Blocks Walk-through Examples Case Studies Mesh vs. SMART High-radix vs. Low-radix Conclusions

SMART NoC Single-cycle Multi-hop Asynchronous Repeated Traversal SSR (SMART Setup Request) SSR (SMART Setup Request) SSR (SMART Setup Request) S D SMART: achieves the performance of dedicated connections over a network of shared links HPCmax Krishna et al, HPCA 2013 Chen et al, DATE 2013 Krishna et al, IEEE Micro Top Picks 2014, 1-cycle (no other traffic)

Is 1-cycle Network Possible? Yes Is wire fast enough to support 1-cycle network? Wire traversal length within 1ns (1Ghz): 10-16mm Wire delay over technology: constant Chip dimension: remain similar (~20mm) On-chip wires are fast enough to transmit across the chip within 1-2 cycles at 1GHz even if the technology scales Clock frequency: remain similar (1~3GHz) Tile dimension: decrease over technology Repeaters are just Inverters and buffers. The knwon fact is that 70~80 ps is reuiqred to traverse 1mm with global repeated wires ~20mm ~20mm ~20mm

Features of SMART Low latency network Separate control path Dynamic bypass of intermediate routers between any two routers Limit: HPCmax (hops per cycle max), maximum number of “hops” that the underlying wire allows the flit to traverse within a clock cycle Separate control path HPCmax bits from every router along each direction Arbitration of multiple bypass requests on the same link No ACK required

Outline Motivation: Scalable, Flexible, and Low-cost NoCs Background: SMART NoCs OpenSMART Design Flow Building Blocks Walk-through Examples Case Studies Mesh vs. SMART High-radix vs. Low-radix Conclusions

OpenSMART Design Flow

OpenSMART NoC

Outline Motivation: Scalable, Flexible, and Low-cost NoCs Background: SMART NoCs OpenSMART Design Flow Building Blocks Walk-through Examples Case Studies Mesh vs. SMART High-radix vs. Low-radix Conclusions

OpenSMART Building Blocks input buffer + input VC arbitration output VC selection + output port arbitration + credit management switching (via crossbar) + routing calculation SSR communication & arbitration + bypass flag

OpenSMART Router Arbiter Number of VCs/VC Depth Flit Size

OpenSMART Router Arbiter

OpenSMART Router Routing Algorithm

OpenSMART Router (SMART) HPCmax SSR Prioritization - Slide 25: Here 2 points should come out when you speak. One: the SSR traversal - put a red circle there and talk about SSRs going up to HPCmax as you've done now. Two: the SSR priority. So show a red circle over that box and say that SSRs are prioritized by distance, with local getting highest priority. And say you will show this with example later. Prioritization by distance -> SSR from a nearer router gets the higher priority (Local (distance = 0) has the highest prirority)

OpenSMART Router (1cycle)

OpenSMART Router (2cycle/SMART)

Outline Motivation: Scalable, Flexible, and Low-cost NoCs Background: SMART NoCs OpenSMART Design Flow Building Blocks Walk-through Examples Case Studies Mesh vs. SMART High-radix vs. Low-radix Conclusions

Cycle 1: Multi-hop Bypass Walk-through Example 1 Router r4 sends a flit to router r7 HPCmax = 3 bypass, bypass, stop Cycle 1: Multi-hop Bypass Cycle 0: SSR Send 110 110 110 SSR (SMART Setup Request)

Cycle 1: Multi-hop Bypass Walk-through Example 2 Router r4 sends a flit to router r7 Router r5 sends a flit to router r7 HPCmax = 3 Cycle 1: Multi-hop Bypass Cycle 0: SSR Send SSR (SMART Setup Request) 110 110 110 100 100 - Slide 30 is the only one that shows a contention scenario so needs to be explained well. This is a crucial slide in my opinion. In the animation, once the SSR is sent, show a pop out from r5 showing the hasSSR and has Local Flit blocks and the priority arbiter setting the flag to Red. Emphasize that the arbiter is a simple arbiter that prioritizes SSRs based on distance. Winner SMART Unit in r5

OpenSMART: Features Language Flow control Buffer management BSV and Chisel Flow control VC and SMART Buffer management Credit-based buffer management Router microarchitecture 1- and 2-cycle state-of-the-art packet switching router SMART router We provide scalable NoC generator that supports irregular topologies

OpenSMART: Features Routing calculation VC selection XY, YX, and source-routing One-hot encoding hop count + shift-based routing calculation For SMART, routing calculation is done during bypasses VC selection FIFO-based dynamic VC selection Next VC is stored in a separate register For SMART, VC selection is done during bypasses We provide scalable NoC generator that supports irregular topologies

Outline Motivation: Scalable, Flexible, and Low-cost NoCs Background: SMART NoCs OpenSMART Design Flow Building Blocks Walk-through Examples Case Studies Mesh vs. SMART High-radix vs. Low-radix Conclusions

Case Study Configuration Network Topology: 8x8 mesh Number of VCs: 4 Traffic: Uniform-random and bit-complement HPCmax : 7 Flit Size : 52 bit (data: 32 bit) Synthesis Environment : Synopsys Design Compiler with NanGate 15nm PDK standard cell library // Xilinx Vivado (target: VC709 board) Simulation Method: Cycle-accurate RTL simulation using BSV testbench and Garnet simulation We provide scalable NoC generator that supports irregular topologies

Latency 5X 4X (a) Uniform Random (b) Bit-complement

Repeaters require less energy than clocked latches Energy Consumption Anticipated Question: How did you estimate the energy? Repeaters require less energy than clocked latches

HPCmax (a) HPCmax on ASIC (b) HPCmax on FPGA Depends on operating clock frequency and tecnhologies (a) HPCmax on ASIC (b) HPCmax on FPGA

Outline Motivation: Scalable, Flexible, and Low-cost NoCs Background: SMART NoCs OpenSMART Design Flow Building Blocks Walk-through Examples Case Studies Mesh vs. SMART High-radix vs. Low-radix Conclusions

Router Area

Router Power Number of Ports (a) ASIC (b) FPGA

Maximum Clock Frequency

Outline Motivation: Scalable, Flexible, and Low-cost NoCs Background: SMART NoCs OpenSMART Design Flow Building Blocks Walk-through Examples Case Studies Mesh vs. SMART High-radix vs. Low-radix Conclusions

Conclusion NoCs are crucial components to support many-IP heterogeneous systems OpenSMART provides automatic generation of NoCs RTL for many-IP heterogeneous systems OpenSMART generates not only state-of-the-art packet switching network but also low latency network, SMART. Thank you!

Paper and Source code Paper is available this link : http://synergy.ece.gatech.edu/wp-content/uploads/sites/332/2017/03/OpenSMART_ISPASS17.pdf Source code is available via this link: https://hyoukjun.github.io/OpenSMART/