High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.

Slides:

Advertisements

Similar presentations

FPGA (Field Programmable Gate Array)

Advertisements

NetFPGA Project: 4-Port Layer 2/3 Switch Ankur Singla Gene Juknevicius

Presentation of Designing Efficient Irregular Networks for Heterogeneous Systems-on-Chip by Christian Neeb and Norbert Wehn and Workload Driven Synthesis.

Flattened Butterfly Topology for On-Chip Networks John Kim, James Balfour, and William J. Dally Presented by Jun Pang.

1 SpaceWire Router ASIC Steve Parkes, Chris McClements Space Technology Centre, University of Dundee Gerald Kempf, Christian Toegel Austrian Aerospace.

PRESENTED BY: PRIYANK GUPTA 04/02/2012 Generic Low Latency NoC Router Architecture for FPGA Computing Systems & A Complete Network on Chip Emulation Framework.

Graduate Computer Architecture I Lecture 15: Intro to Reconfigurable Devices.

NETWORK ON CHIP ROUTER Students : Itzik Ben - shushan Jonathan Silber Instructor : Isaschar Walter Final presentation part A Winter 2006.

Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al RC Reading Group – 3/29/2006 Presenter: Ilya Tabakh.

Team Morphing Architecture Reconfigurable Computational Platform for Space.

Network based System on Chip Final Presentation Part B Performed by: Medvedev Alexey Supervisor: Walter Isaschar (Zigmond) Winter-Spring 2006.

Network based System on Chip Part A Performed by: Medvedev Alexey Supervisor: Walter Isaschar (Zigmond) Winter-Spring 2006.

Chapter 10 Introduction to Wide Area Networks Data Communications and Computer Networks: A Business User’s Approach.

Field-Programmable Logic and its Applications INTERNATIONAL CONFERENCEMadrid, August 28-30, 2006 Jason D. Bakos, Charles L. Cathey, E. Allen Michalski,

1 Evgeny Bolotin – ICECS 2004 Automatic Hardware-Efficient SoC Integration by QoS Network on Chip Electrical Engineering Department, Technion, Haifa, Israel.

Issues in System-Level Direct Networks Jason D. Bakos.

1 Indirect Adaptive Routing on Large Scale Interconnection Networks Nan Jiang, William J. Dally Computer System Laboratory Stanford University John Kim.

Introduction to FPGA’s FPGA (Field Programmable Gate Array) –ASIC chips provide the highest performance, but can only perform the function they were designed.

Low-Latency Virtual-Channel Routers for On-Chip Networks Robert Mullins, Andrew West, Simon Moore Presented by Sailesh Kumar.

Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.

Switching, routing, and flow control in interconnection networks.

High Performance Embedded Computing © 2007 Elsevier Lecture 16: Interconnection Networks Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.

1 The Turn Model for Adaptive Routing. 2 Summary Introduction to Direct Networks. Deadlocks in Wormhole Routing. System Model. Partially Adaptive Routing.

On-FPGA Communication Architectures

On-Chip Networks and Testing

High-Performance Networks for Dataflow Architectures Pravin Bhat Andrew Putnam.

Elastic-Buffer Flow-Control for On-Chip Networks

ATM SWITCHING. SWITCHING A Switch is a network element that transfer packet from Input port to output port. A Switch is a network element that transfer.

Author : Jing Lin, Xiaola Lin, Liang Tang Publish Journal of parallel and Distributed Computing MAKING-A-STOP: A NEW BUFFERLESS ROUTING ALGORITHM FOR ON-CHIP.

High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.

J. Christiansen, CERN - EP/MIC

Anshul Kumar, CSE IITD CSL718 : Multiprocessors Interconnection Mechanisms Performance Models 20 th April, 2006.

Network on Chip - Architectures and Design Methodology Natt Thepayasuwan Rohit Pai.

CS 8501 Networks-on-Chip (NoCs) Lukasz Szafaryn 15 FEB 10.

Anshul Kumar, CSE IITD ECE729 : Advanced Computer Architecture Lecture 27, 28: Interconnection Mechanisms In Multiprocessors 29 th, 31 st March, 2010.

Interconnect simulation. Different levels for Evaluating an architecture Numerical models – Mathematic formulations to obtain performance characteristics.

4/19/20021 TCPSplitter: A Reconfigurable Hardware Based TCP Flow Monitor David V. Schuehler.

Run-time Adaptive on-chip Communication Scheme 林孟諭 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan, R.O.C.

Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.

Soc 5.1 Chapter 5 Interconnect Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.

Introducing a New Concept in Networking Fluid Networking S. Wood Nov Copyright 2006 Modern Systems Research.

SCORES: A Scalable and Parametric Streams-Based Communication Architecture for Modular Reconfigurable Systems Abelardo Jara-Berrocal, Ann Gordon-Ross NSF.

Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies Alvin R. Lebeck CPS 220.

Virtual-Channel Flow Control William J. Dally

ECE 448 Lecture 6 Finite State Machines State Diagrams vs. Algorithmic State Machine (ASM) Charts.

Effective bandwidth with link pipelining Pipeline the flight and transmission of packets over the links Overlap the sending overhead with the transport.

Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.

Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.

1 Lecture 22: Interconnection Networks Topics: Routing, deadlock, flow control, virtual channels.

System on a Programmable Chip (System on a Reprogrammable Chip)

-1- Soft Core Viterbi Decoder EECS 290A Project Dave Chinnery, Rhett Davis, Chris Taylor, Ning Zhang.

Runtime Reconfigurable Network-on- chips for FPGA-based systems Mugdha Puranik Department of Electrical and Computer Engineering

Network-on-Chip Paradigm Erman Doğan. OUTLINE SoC Communication Basics  Bus Architecture  Pros, Cons and Alternatives NoC  Why NoC?  Components 

Network Layer COMPUTER NETWORKS Networking Standards (Network LAYER)

Chapter 3 Part 3 Switching and Bridging

ESE532: System-on-a-Chip Architecture

Physical constraints (1/2)

Interconnection Networks: Flow Control

Azeddien M. Sllame, Amani Hasan Abdelkader

OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel

Deadlock Free Hardware Router with Dynamic Arbiter

Switching, routing, and flow control in interconnection networks

Low-Latency Virtual-Channel Routers for On-Chip Networks Robert Mullins, Andrew West, Simon Moore Presented by Sailesh Kumar.

Lecture: Interconnection Networks

Chapter 3 Part 3 Switching and Bridging

CS 6290 Many-core & Interconnect

Lecture 25: Interconnection Networks

Switching, routing, and flow control in interconnection networks

Presentation transcript:

High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick Barrow-Williams

Introduction  Semiconductor industry has grown rapidly for several decades  Continued shrinking of device dimension introduces new design challenges  Moving data around a chip can now be the limiting factor of performance  Existing solutions do not scale well

Why do existing solutions not scale?  Die size has been growing consistently  Global connections are longer  Wire depth increased to counter width decrease  Parasitic capacitive effects increase and cause slow signal propagation

Why do existing solutions not scale?  Existing system-level connection uses buses  Buses increase resource efficiency and decrease wiring congestion  Not suitable for a large number of modules  A network based alternative would offer higher aggregate bandwidth

Why design for FPGA systems?  FPGA market growth sustained for several years  FPGA silicon area already dominated by wiring  Global wires are limited in number  Increasing gate count only increases wiring congestion

The Solution: Network-on-Chip  Use technologies from network systems  Replace inefficient global wiring with high-level interconnection network  Create scalable systems to handle large numbers of modules  Use high metal layers to avoid parasitic effects

Existing Solutions  Most existing systems are for ASIC designs  Stanford Interconnect  RAW  SCALE  SPIN  PNoC: An solution for FPGAs  Complex  High hardware cost  Other simulated solutions exist but few are implemented

Proposal: Two network systems  Existing solutions use either packet switching or circuit switching techniques  Design, implement, test and synthesise one of each to compare performance and hardware cost  Map solutions to an FPGA platform to evaluate hardware cost in current generation systems

Network Architecture Design  Topology  Simple  Scalable  Low wiring requirements  Solution: 2D mesh Topology

Network Architecture Design  Routing Algorithm  Deterministic  Data always follows same path through network  Simple hardware  Sensitive to congestion  Adaptive  Paths through network can change according to load  Complex hardware  Avoids congestion

Network Architecture Design  When choosing routing algorithms must avoid:  Deadlock:  Livelock Solution: Use unidirectional wiring and allow each node to make two connection Solution: Use deterministic routing

Network Architecture Design  Flow control methods  Circuit switched  Circuit request propagates through network  Path reserved to destination  Grant signal propagates back  Data sent then circuit deallocated  Packet switched  Use header, body and tail  Wormhole routing  Forward header and body without waiting for tail  Need buffers to store stalled packets

Router Design  Each router contains a number of modules  FIFOs (only present in packet switched router)  Address to port-request decoder  Arbiter  Control finite state machines  Crossbar

Router Design: Address decoder  Takes addresses from each five input ports  Outputs the direction to route the packet Addresses In Port Requests Out Router Address Registers Logic

Router Design: Control FSMs  Each FSM has multiplexed inputs and outputs  Reduces the size of the FSM considerably  Example here is from circuit switched router FSM Requests In Grant Out Grant In Requests Out In PortOut Port

Router Design: Crossbar  Each crossbar can make two connections to avoid deadlock  Pipelined design to increase router throughput Data In Data Out In Port x 2 Out Port x 2

Circuit Switched Router Structure Request In Request Out Grant In Grant Out Data In Data Out Data In In & Out Ports CrossbarCrossbar FSMFSM ArbiterArbiter Address to Port Decoder

Packet Switched Router Structure Request From FIFOs Request In Write Out Full In Grant Out Data From FIFOs Data Out Data From FIFOs In & Out Ports CrossbarCrossbar ControlControl ArbiterArbiter Address to Port Decoder FIFO FSM Data In Full Write Grant Req FIFO FSM Data In Full Write Grant Req 5 x Queue Data

Router Implementation and Testing  Both routers were coded using VHDL  Simulation and testing used a combination of ModelSim and Xilinx ISE 9.1  Ad-hoc tests used for individual modules  VHDL testbench used for system verification

Testbench Structure Mesh Network Read Input Read Input Input Tables Test Table Source Output Table Sink Compare TESTBENCH Command File Output File Clock Gen Reset Gen Cycle Count Success: ID: 1 Source : (0,3) Dest : (1,0) Hops : 4 Latency: 34 Success: ID: 2 Source : (0,2) Dest : (1,0) Hops : 3 Latency: 27 Success: ID: 3 Source : (3,2) Dest : (1,1) Hops : 3 Latency: 22 Success: ID: 4 Source : (1,3) Dest : (0,1) Hops : 3 Latency: 22 Success: ID: 5 Source : (3,0) Dest : (3,1) Hops : 1 Latency: 12 #STARTSOURCEDESTSIZEID #