SCORES: A Scalable and Parametric Streams-Based Communication Architecture for Modular Reconfigurable Systems Abelardo Jara-Berrocal, Ann Gordon-Ross NSF.

Slides:



Advertisements
Similar presentations
Digital Computer Fundamentals
Advertisements

© 2003 Xilinx, Inc. All Rights Reserved Course Wrap Up DSP Design Flow.
QuT: A Low-Power Optical Network-on-chip
Digital Signal Processing and Field Programmable Gate Arrays By: Peter Holko.
Zheming CSCE715.  A wireless sensor network (WSN) ◦ Spatially distributed sensors to monitor physical or environmental conditions, and to cooperatively.
NETWORK ON CHIP ROUTER Students : Itzik Ben - shushan Jonathan Silber Instructor : Isaschar Walter Final presentation part A Winter 2006.
Network based System on Chip Performed by: Medvedev Alexey Supervisor: Walter Isaschar (Zigmond) Winter-Spring 2006.
Network based System on Chip Students: Medvedev Alexey Shimon Ofir Supervisor: Walter Isaschar (Zigmond) Winter-Spring 2006.
Network based System on Chip Part A Performed by: Medvedev Alexey Supervisor: Walter Isaschar (Zigmond) Winter-Spring 2006.
1 Link Division Multiplexing (LDM) for NoC Links IEEE 2006 LDM Link Division Multiplexing Arkadiy Morgenshtein, Avinoam Kolodny, Ran Ginosar Technion –
1 FPGA Lab School of Electrical Engineering and Computer Science Ohio University, Athens, OH 45701, U.S.A. An Entropy-based Learning Hardware Organization.
Lab for Reliable Computing Generalized Latency-Insensitive Systems for Single-Clock and Multi-Clock Architectures Singh, M.; Theobald, M.; Design, Automation.
1 Evgeny Bolotin – ClubNet Nov 2003 Network on Chip (NoC) Evgeny Bolotin Supervisors: Israel Cidon, Ran Ginosar and Avinoam Kolodny ClubNet - November.
Modern trends in computer architecture and semiconductor scaling are leading towards the design of chips with more and more processor cores. Highly concurrent.
1 Evgeny Bolotin – ICECS 2004 Automatic Hardware-Efficient SoC Integration by QoS Network on Chip Electrical Engineering Department, Technion, Haifa, Israel.
Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University.
Virtual Architecture For Partially Reconfigurable Embedded Systems (VAPRES) Architecture for creating partially reconfigurable embedded systems Module.
Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Area & Power Analysis Comparison Against P2P/Buses 4 4.
Bitstream Relocation with Local Clock Domains for Partially Reconfigurable FPGAs Adam Flynn, Ann Gordon-Ross, Alan D. George NSF Center for High-Performance.
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
GallagherP188/MAPLD20041 Accelerating DSP Algorithms Using FPGAs Sean Gallagher DSP Specialist Xilinx Inc.
Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.
An Energy-Efficient Reconfigurable Multiprocessor IC for DSP Applications Multiple programmable VLIW processors arranged in a ring topology –Balances its.
High Performance Embedded Computing © 2007 Elsevier Lecture 16: Interconnection Networks Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
Anthony Gaught Advisors: Dr. In Soo Ahn and Dr. Yufeng Lu Department of Electrical and Computer Engineering Bradley University, Peoria, Illinois May 7,
On-FPGA Communication Architectures
On-Chip Networks and Testing
Introduction to Interconnection Networks. Introduction to Interconnection network Digital systems(DS) are pervasive in modern society. Digital computers.
R OUTE P ACKETS, N OT W IRES : O N -C HIP I NTERCONNECTION N ETWORKS Veronica Eyo Sharvari Joshi.
SMART: A Single- Cycle Reconfigurable NoC for SoC Applications -Jyoti Wadhwani Chia-Hsin Owen Chen, Sunghyun Park, Tushar Krishna, Suvinay Subramaniam,
Efficient FPGA Implementation of QR
QoS Support in High-Speed, Wormhole Routing Networks Mario Gerla, B. Kannan, Bruce Kwan, Prasasth Palanti,Simon Walton.
Presented By: Vasantha Lakshmi Gutha Graduate student (CS) Course: CENG 5931 University of Houston-Clear Lake Spring 2011.
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
(TPDS) A Scalable and Modular Architecture for High-Performance Packet Classification Authors: Thilan Ganegedara, Weirong Jiang, and Viktor K. Prasanna.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic- Buffer On-Chip Networks.
Presenter: Min-Yu Lo 2015/10/19 Asit K. Mishra, N. Vijaykrishnan, Chita R. Das Computer Architecture (ISCA), th Annual International Symposium on.
Architectural and Physical Design Optimization for Efficient Intra-Tile Communication Liza Rodriguez Aurelio Morales EEL Embedded Systems Dept.
Network on Chip - Architectures and Design Methodology Natt Thepayasuwan Rohit Pai.
Design Framework for Partial Run-Time FPGA Reconfiguration Chris Conger, Ann Gordon-Ross, and Alan D. George Presented by: Abelardo Jara-Berrocal HCS Research.
Exploiting Partially Reconfigurable FPGAs for Situation-Based Reconfiguration in Wireless Sensor Networks Rafael Garcia, Dr. Ann Gordon-Ross, Dr. Alan.
CS 8501 Networks-on-Chip (NoCs) Lukasz Szafaryn 15 FEB 10.
A Configurable High-Throughput Linear Sorter System Jorge Ortiz Information and Telecommunication Technology Center 2335 Irving Hill Road Lawrence, KS.
Performed by:Yulia Turovski Lior Bar Lev Instructor: Mony Orbach המעבדה למערכות ספרתיות מהירות High speed digital systems laboratory הטכניון - מכון טכנולוגי.
Field Programmable Port Extender (FPX) 1 Modular Design Techniques for the FPX.
FPGA Partial Reconfiguration Presented by: Abelardo Jara-Berrocal HCS Research Laboratory College of Engineering University of Florida April 10 th, 2009.
Development of Programmable Architecture for Base-Band Processing S. Leung, A. Postula, Univ. of Queensland, Australia A. Hemani, Royal Institute of Tech.,
Improving NoC-based Testing Through Compression Schemes Érika Cota 1 Julien Dalmasso 2 Marie-Lise Flottes 2 Bruno Rouzeyre 2 WNOC
Networks-on-Chip (NoC) Suleyman TOSUN Computer Engineering Deptartment Hacettepe University, Turkey.
Soc 5.1 Chapter 5 Interconnect Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)
1 Presenter: Min Yu,Lo 2015/12/21 Kumar, S.; Jantsch, A.; Soininen, J.-P.; Forsell, M.; Millberg, M.; Oberg, J.; Tiensyrja, K.; Hemani, A. VLSI, 2002.
VAPRES A Virtual Architecture for Partially Reconfigurable Embedded Systems Presented by Joseph Antoon Abelardo Jara-Berrocal, Ann Gordon-Ross NSF Center.
A Survey on Interlaken Protocol for Network Applications Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan,
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
Team LDPC, SoC Lab. Graduate Institute of CSIE, NTU Implementing LDPC Decoding on Network-On-Chip T. Theocharides, G. Link, N. Vijaykrishnan, M. J. Irwin.
Fast Lookup for Dynamic Packet Filtering in FPGA REPORTER: HSUAN-JU LI 2014/09/18 Design and Diagnostics of Electronic Circuits & Systems, 17th International.
DDRIII BASED GENERAL PURPOSE FIFO ON VIRTEX-6 FPGA ML605 BOARD PART B PRESENTATION STUDENTS: OLEG KORENEV EUGENE REZNIK SUPERVISOR: ROLF HILGENDORF 1 Semester:
Trigger Hardware Development Modular Trigger Processing Architecture Matt Stettler, Magnus Hansen CERN Costas Foudas, Greg Iles, John Jones Imperial College.
Mohamed ABDELFATTAH Andrew BITAR Vaughn BETZ. 2 Module 1 Module 2 Module 3 Module 4 FPGAs are big! Design big systems High on-chip communication.
S.Anvar, V.Gautard, H.Le Provost, F.Louis, K.Menager, Y.Moudden, B.Vallage, E.Zonca, on behalf of the KM3NeT consortium 1 IRFU/SEDI-CEA Saclay F
System on a Programmable Chip (System on a Reprogrammable Chip)
Runtime Reconfigurable Network-on- chips for FPGA-based systems Mugdha Puranik Department of Electrical and Computer Engineering
Runtime Temporal Partitioning Assembly to Reduce FPGA Reconfiguration Time Abelardo Jara-Berrocal, Ann Gordon-Ross HCS Research Laboratory College of Engineering.
Network-on-Chip Paradigm Erman Doğan. OUTLINE SoC Communication Basics  Bus Architecture  Pros, Cons and Alternatives NoC  Why NoC?  Components 
Optimizing Interconnection Complexity for Realizing Fixed Permutation in Data and Signal Processing Algorithms Ren Chen, Viktor K. Prasanna Ming Hsieh.
DIRECT MEMORY ACCESS and Computer Buses
Dynamic connection system
MACS: A Minimal Adaptive Routing Circuit Switched Architecture for Scalable and Parametric NoCs Rohit Kumar Dr. Ann Gordon-Ross Introduction MACS: A.
Presentation transcript:

SCORES: A Scalable and Parametric Streams-Based Communication Architecture for Modular Reconfigurable Systems Abelardo Jara-Berrocal, Ann Gordon-Ross NSF Center for High-Performance Reconfigurable Computing (CHREC) Department of Electrical and Computer Engineering University of Florida

2 of 16 Introduction – Parallel Computation Edges indicate communication volume 1.System Formulation 3. Task Allocation / System Placement Source FIR Sink Matrix IFFT Angle FFT Application decomposition High Performance Application 1, 7Data2,643,5 uProcMEMDSP1ASICDSP2 Modules To leverage parallel computation speedups, system can be decomposed in smaller tasks Parallel communication How do designers provide efficient module communication? Problem: Speedup can be limited by inefficient communication! Profile 1: DSP:0.5ms uProc: 2.2ms Profile 2: ASIC:0.5ms DSP: 2.5ms

3 of 16 Communication Architectures uProc MEM DSP1 ASICDSP2 a) Bus BusNetwork-on-Chip (NoC) Advantages Disadvantages MEM uProcDSP1 ASICDSP2 b) Network-on-Chip NoC node Very well known Smaller hardware overhead SoC standards: Coreconnect®, Amba®, Wishbone Scalable Very high bandwidth Wires are broken in smaller segments Multiple and simultaneous parallel communications Does not scale well as number of modules increases High power consumption due to long wires Cross-talk issues Significant area overhead Exacerbated by store-and-forward routers Interfaces between modules and nodes are not standard Specific signals and handshaking protocols for each design

4 of 16 General NoC architecture NoC Interface NoC Link NoC Node Routers (packet switching) Switches (circuit switching) MEM uProc DSP1 ASIC DSP2 I/O Slave DSP2 DSP2 uProc [1] Salminem et.al. Survey of Network-on-Chip Proposals. White Paper. OCP-IP, March 2008 NoC Topology Vary across designs Commonly 2D mesh or torus [1]

5 of 16 Motivation Relevant NoC metrics: Throughput Latency Area Power 2D Mesh NoC High throughput Low latency High communication parallelism Due to these advantages, some commercial 2D NoCs for ASICs have appeared: Arteris® How about NoC implementations in FPGAs? FPGAs are increasingly used in digital designs –Reconfigurable –Lower cost than ASICs NoC area overhead becomes a problem –Area of a 3x3 2D Mesh NoC consumed 28.72% of a Xilinx V2P30[2] (for maximum throughput of 9.5Gbps for complete 3x3 2D NoC) Problem is exacerbated with low capacity & low cost FPGA devices N7 N4 N1 N8 N5 N2 N9 N6 N3 Node Module Arteris NoC [2] B. Sethuraman, P. Bhattacharya, J. Khan, Ranga Vemuri: LiPaR: A light-weight parallel router for FPGA-based networks-on-chip. ACM Great Lakes Symposium on VLSI 2005:

6 of 16 CSCORES = Scalable Communication Architecture for Reconfigurable Embedded Systems Main contributions: High throughput / bandwidth –Circuit switching scheme Low area overhead –Linear topology Multiple clock domains Scalability –VHDL model with numerous architectural parameters –Allows customization for different SoCs communication needs SCORES - Contributions RECONFIGURABLE DEVICE (FPGA) Module 1Module 2Module 3 SCORES Interface scores-clk clk2 clk3 clk1 Different clock domains Implemented in Xilinx VLX25 FPGA

7 of 16 clk RECONFIGURABLE DEVICE (FPGA) Module 1Module 2Module 3 clk2 clk3 clk1 SCORES – Top Level Design SCORES main components: Switches – communication nodes inside SCORES Interfaces – communication between modules and SCORES Channels – communication links between switches and other switches or interfaces Modules access interfaces through local input ports and local output ports Module SCORES Switch Interface

8 of 16 SCORES – Parametric Architecture Module 4Module 3Module 2Module 1 kl – number of left switch channels kr – number of right switch channels ko - number local output ports from the interface ki - number local input ports to the interface SCORES Interfaces Switch  N = Number of modules  W = Width of a channel in bits Additional parameters Parameters enable SCORES to conform to custom communication requirements

9 of 16 SCORES – Terminology Interface Module 1Module 4Module 2Module 3 Producer: module which transmits data Consumer: module which receives data Streaming Data Channel (SDC): Dedicated path between a producer and a consumer Dynamically created and destroyed inside SCORES Bidirectional path Data flows from producer to consumer Control synchronization signals flow from consumer to producer Producer Streaming Data Channel (SDC) Consumer

10 of 16 SCORES – Communication Phases Interface Module 1Module 4Module 2Module 3 Three communication phases Phase I: Channel establishment: Producer requests a path to the consumer Path iteratively created inside switches between the producer and the consumer If a switch has no available channels –Sends a DENY signal to the producer –Producer can drop or maintain the request If successful, the Streaming Data Channel (SDC) is created between the producer and the consumer Producer Streaming Data Channel (SDC) Consumer

11 of 16 SCORES – Communication Phases Phase II: Streaming transmission Pipelined operation If consumer buffer is full –Consumer asserts “Full” to inform producer to pause transmission Interfaces built around asynchronous FIFOs –Eases crossing different clock domains Phase III: Channel release Producer deasserts its request Path between the producer and the consumer is iteratively destroyed Interface Module 1Module 4Module 2Module 3 Producer Streaming Data Channel (SDC) Consumer Register

12 of 16 SCORES – Simultaneous Data Transfers Interface Input Registers Switch 1Switch 2Switch 3Switch 4 Interface MUXes Free channel Set of FSM controllers running at each switch Allows SCORES to establish and operate multiple SDCs in parallel

13 of 16 Results – Clock Frequency Frequency (MHz) Number of right switch channels (Kr) (1 left switch channel) Number of left and right switch channels (Kr, Kl) (1 local input and 1 local output port per switch) Number of local input and output ports (Ki, Ko) per switch (1 left and 1 right switch channel) Achieved SCORES maximum frequency is equal to the SCORES maximum throughput Customized SCORES switch with 32-bit channels, 2 left and right switch channels, and 1 local input and 1 local output port operates at 254 MHz (Throughput=8.0Gbps, post place-and-route timing report).

14 of 16 Results - Area Area (slices) Customized SCORES switch with 32-bit channels, 2 left and right switch channels and 1 local input and 1 local output port consumes 315 slices (1.41% of Virtex 4 VLX25) Number of right switch channels (Kr) (1 left switch channel) Number of left and right switch channels (Kr, Kl) (1 local input and 1 local output port per switch) Number of local input and output ports (Ki, Ko) per switch (1 left and 1 right switch channel)

15 of 16 Conclusions We developed SCORES (Scalable Communication Architecture for Reconfigurable Embedded Systems) - a highly parametric communication architecture SCORES Contributions: –Low area overhead (315 slices for a 32-bit switch with multiple ports) –Modules can run at different and independent clock frequencies –Highly parametric design, which enables architecture optimization Future work –Optimization of switch FSM controllers –Development of algorithms for module placement inside SCORES –Tools for automatic determination of SCORES parameter values

16 of 16 Questions