Reconfigurable Computing Systems: An Overview

Slides:



Advertisements
Similar presentations
Field Programmable Gate Array
Advertisements

FPGA (Field Programmable Gate Array)
Lecture 6: Multicore Systems
TIE Extensions for Cryptographic Acceleration Charles-Henri Gros Alan Keefer Ankur Singla.
Implementation Approaches with FPGAs Compile-time reconfiguration (CTR) CTR is a static implementation strategy where each application consists of one.
A Survey of Logic Block Architectures For Digital Signal Processing Applications.
Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.
Reconfigurable Computing: What, Why, and Implications for Design Automation André DeHon and John Wawrzynek June 23, 1999 BRASS Project University of California.
Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.
Lecture 26: Reconfigurable Computing May 11, 2004 ECE 669 Parallel Computer Architecture Reconfigurable Computing.
Configurable System-on-Chip: Xilinx EDK
Dynamically Reconfigurable Architectures: An Overview Juanjo Noguera Dept. Computer Architecture (DAC-UPC)
UCB November 8, 2001 Krishna V Palem Proceler Inc. Customization Using Variable Instruction Sets Krishna V Palem CTO Proceler Inc.
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu
Implementation of DSP Algorithm on SoC. Mid-Semester Presentation Student : Einat Tevel Supervisor : Isaschar Walter Accompaning engineer : Emilia Burlak.
CS 151 Digital Systems Design Lecture 38 Programmable Logic.
General FPGA Architecture Field Programmable Gate Array.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
February 12, 1998 Aman Sareen DPGA-Coupled Microprocessors Commodity IC’s for the Early 21st Century by Aman Sareen School of Electrical Engineering and.
Reconfigurable Computing Systems: An Overview Presented by: Gurwant Kaur Koonar Vijay Pandya 14 th March 2003.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.
Automated Design of Custom Architecture Tulika Mitra
Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa.
J. Christiansen, CERN - EP/MIC
Page 1 Reconfigurable Communications Processor Principal Investigator: Chris Papachristou Task Number: NAG Electrical Engineering & Computer Science.
Reminder Lab 0 Xilinx ISE tutorial Research Send me an if interested Looking for those interested in RC with skills in compilers/languages/synthesis,
Programmable Logic Devices
Reconfigurable Computing Using Content Addressable Memory (CAM) for Improved Performance and Resource Usage Group Members: Anderson Raid Marie Beltrao.
Introduction to FPGA Created & Presented By Ali Masoudi For Advanced Digital Communication Lab (ADC-Lab) At Isfahan University Of technology (IUT) Department.
Field Programmable Gate Arrays (FPGAs) An Enabling Technology.
Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.
EE3A1 Computer Hardware and Digital Design
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Reconfigurable Architectures Forces that drive.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Reconfigurable Computing Ender YILMAZ, Hasan Tahsin OĞUZ.
Development of Programmable Architecture for Base-Band Processing S. Leung, A. Postula, Univ. of Queensland, Australia A. Hemani, Royal Institute of Tech.,
Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.
1 Advanced Digital Design Reconfigurable Logic by A. Steininger and M. Delvai Vienna University of Technology.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.
Introduction to Field Programmable Gate Arrays (FPGAs) EDL Spring 2016 Johns Hopkins University Electrical and Computer Engineering March 2, 2016.
Programmable Logic Devices
These slides are based on the book:
Programmable Hardware: Hardware or Software?
Topics SRAM-based FPGA fabrics: Xilinx. Altera..
ECE354 Embedded Systems Introduction C Andras Moritz.
Microarchitecture.
Ph.D. in Computer Science
Design for Embedded Image Processing on FPGAs
Introduction to Programmable Logic
System On Chip.
What is Fibre Channel? What is Fibre Channel? Introduction
Instructor: Dr. Phillip Jones
INTRODUCTION TO MICROPROCESSORS
Anne Pratoomtong ECE734, Spring2002
Reconfigurable Computing
Field Programmable Gate Array
Field Programmable Gate Array
Lecture 41: Introduction to Reconfigurable Computing
Dynamically Reconfigurable Architectures: An Overview
Embedded systems, Lab 1: notes
Characteristics of Reconfigurable Hardware
Instructor: Dr. Phillip Jones
Chapter 1 Introduction.
HIGH LEVEL SYNTHESIS.
Computer Evolution and Performance
Computer Architecture
(Lecture by Hasan Hassan)
Programmable logic and FPGA
Presentation transcript:

Reconfigurable Computing Systems: An Overview Presented by: Gurwant Kaur Koonar Vijay Pandya 14th March 2003

Introduction Reconfigurable Computing (RC) is an emerging paradigm for digital systems design. The key feature of which is the ability to perform computations in hardware to achieve performance of ASIC and flexibility of GP processors. Technology improvements have made possible new programmable logic devices (FPGAs, CPLDs).  Objective of the talk: Give an overview and the hardware architectures of reconfigurable computing, and the software that targets these machines, such as compilation tools.

Definition Reconfigurable Computing (RC) is a computing paradigm in which algorithms are implemented as a temporally and spatially ordered set of very complex tasks. These tasks are executed on a large set of interconnected programmable hardware elements

Definition(cont’d) computing paradigm - defines the basic RC computing model without reference to implementation. very complex tasks – commonly referred to as configurations RC tasks require more time than general purpose computing instructions and more area than the typical general purpose execution unit. Spatial and temporal partitioning – algorithms are decomposed into tasks in both the space and time domains. hardware elements - at their core RC devices consist of a very large set of simple programmable elements collectively called Reconfigurable Execution Unit (REU)

General Characteristics of RC Stored configuration algorithms No software Pipeline architectures are common Real-time applications Advantages Flexible Configurable Cost comparable to GPP Hardware is readily available Shorter development cycle than ASICs Parallelism Algorithm parallelism exploited in custom architecture Problem specific operators and control High-performance Reduced memory dependence and exploit fine-grained algorithm parallelism. Timesharing Hardware can be time multiplexed by multiple applications

Disadvantages Additional area requirements Configuration memory (internal/external), Internal switches and other hardware overhead Time Overhead Device configuration, and internal switches

Traditional Computing Using Application-Specific Integrated Circuits (ASICs) to “hard-wire” an algorithm in hardware.  Extremely fast Require less Silicon area Less power hungry than GP architectures Extremely inflexible Expensive both in design and fabrication Errors are difficult to correct Examples:Consumer Electronics, Telecommunications, Automotive Industry 

Traditional Computing(Cont'd) General-purpose hardware, combined with application-specific software Extremely flexible due to versatile instruction set. Much less expensive to develop. Poor performance compared to ASICs. Errors can be dynamically patched. Examples: Commodity PC hardware running commercial software. 

Reasons for Poor Software Performance Fetching of instructions Interpretation of instructions Scheduling of instructions Wrong mix of hardware resources to suit a particular application’s needs Therefore Reconfigurable computing is intended to fill the gap between HW and SW.

Flexibility and Efficiency Tradeoffs

Can we call FPGA’s to be Reconfigurable Processing unit ? Traditional FPGAs are configurable, but not run-time reconfigurable Traditional FPGAs expect to read their configuration out of a serial EEPROM, one bit at a time. Therefore, FPGA must be reprogrammed in its entirety and that its previous internal state cannot be captured beforehand.

Features for Reconfigurable Hardware On-the-Fly Reprogrammability Partial Reprogrammability Externally-Visible Internal State

Kress ALU Array-III(KrAA-III) instruction level parallelism transparently scalable fast routing and placement (seconds only) dynamically and partially reconfigurable (microseconds) suitable for full custom design on microprocessor chip: much higher acceleration than by caches on microprocessor chip: fast and low power by full custom design acceleration by massive run time to compile time migration

Kress ALU Array-III(KrAA-III) KrAA-III consists of PEs called rDPU-III (reconfigurable DataPath Unit III) arranged in a NEWS network. Figure shows the KrAAIII chip containing 9 rDPUs.

Basic Architecture of today’s commercial reconfigurable processor

Devices which combined FPGA with Standard processor core Triscend’s E5 and A7 Altera’s two Excalibur families Atmel’s FPSLIC Chameleon Systems’ CS2000

Zippy Architecture It is used to develop reconfigurable processor technology for domain of handheld and wearable computing. To investigate new trade offs between performance, power consumption and system cost It is an international research effort lead by Swiss Federal Institute of Technology

Reconfigurable Computing Merging Efficiency and Versatility

Hardware Design steps

Examples SPLASH II Multi FPGA parallel computer with orchestrated systolic communications to perform inter- FPGA data transfer

Garp For general purpose loop acceleration

CMC Rapid Prototyping Platform

RC Applications RC has demonstrated >10x performance density advantage over microprocessors and DSPs Pattern matching Data encryption Data compression Video and image processing Commercial Push Handheld devices - PDAs, mobile Phones, specialized tools Networks - telecom switches, network routers, network bridges High-performance Computing – super computers, medical appliances, robot navigation and planning Defense – Ballistic Missiles, KV navigation, Spacecraft processing

RC Implementations Hardware Catalina Research Incorporated - http://www.catalinaresearch.com/Chameleon Annapolis Microsystems - http://www.annapmicro.com/Wildstar Alpha Data Parallel Systems - http://www.alpha-data.com Tools Celoxica - http://www.celoxica.com Star Bridge Systems - http://www.starbridgesystems.com Annapolis Microsystems - http://www.annapmicro.com/CoreFire

Content Coupling Approaches (Reconfigurable Hardware with General Processor) Granularity of the FPGA as an RCS Implementation Approaches Compile Time Reconfiguration Run Time Reconfiguration Some more advantages Challenges Software like Design environment

Coupling Approaches for Reconfigurable Hardware (RH) RH can be coupled to GP as: A functional unit (Tight Coupling) A Co-processor An Attached processing unit A Standalone processing unit (Loosely coupled)

Coupling Approaches Cont’d As a Functional Unit: Within a host processor (General purpose: GP) Uses data-path of a host machine As a Coprocessor: Without constant supervision of the GP GP initializes the RH Independent parallel computation Less communication overhead

Coupling Approaches Cont’d As an attached processing unit: Behaves as an additional processor Memory Cache not visible Independent Computation but high communication overhead As a Standalone: The most loosely coupled to GP Infrequent Communication with the GP Independent computation for very long period of time

Different levels of coupling Workstation Attached Processing Unit Coprocessor Standalone Processing Unit I/O Interface CPU Memory Caches FU

Pros and Cons of different coupling approaches The tight integration Very less communication overhead RH can not operate “alone” for long period of time Amount of Reconfig. Logic is limited The loose integration Greater parallelism Higher communication overhead

Logic Block Granularity Referred to the size and complexity of the CLB Fine grained logic block Less complex, Altera Flex 10k consists of single 4 input LUT with flip-flop Useful for bit-level manipulation Exceed the performance of GP in case of operation on variable bit data width Smaller area, high amount of computation (Compact) Encryption and image processing application

Logic Block Granularity cont’d Coarse grained logic block Larger granularity of the CLB Helps perform more complex operations Four 2-bit inputs (GARP) and multiplier in each logic block for 4 x 4 multiplication Finite State Machine Word-width (16 bit) data path circuits implementation in Very coarse-grained structure Logic block closer to small processor

Implementation Approaches Compile Time Reconfiguration (CTR) Static implementation strategy Single system wide configuration Configuration doesn’t change during computation Similar to using ASIC for application acceleration Run Time Reconfiguration (RTR) Dynamic implementation strategy Multiple time-exclusive configurations Dynamic hardware allocation (run-time)

RTR Main Task: Dividing algorithm into time-exclusive segments Global RTR Allocates whole FPGA resources for each configuration Single system wide configuration for each phase Local RTR Locally reconfigure subsets of logic at run-time Partial reconfiguration, flexibility Functional division of labor

RTR Cont’d Global RTR Local RTR LOAD A EXE. A LOAD B EXE. B LOAD C EXE. C Local RTR A A D B EXE. EXE. C

Implementation Issues Temporal partitions a iterative process Possibly inefficient usage of FPGA resources in global RTR Simulation Efficient usage of hardware in local RTR Current CAD tools: poor match for local RTR (Examples of Local RTR: RRANN-2 and DISC )

Power Savings in RC system Exploitation of numerical properties of an application Higher number of operations per clock due to deep pipelines Sensor/actuator pre-conditioning and “glue logic” functions on chip

Some Challenges Access to the development of RCS restricted to hardware developers Run-time environment, RTR scheduling Difficulties in routing for RC hardware having large number of CLBs Connection scheme in multi-FPGA system

Software Aspect Software like design environment System C (Synopsys), Handel C (Celoxica) Hardware-Software co-design (ARM Rapid Prototyping Platform (RPP) Generation of detail gate level description (netlist) by HLL (High level language) Technology mapping, Placement and Routing Generation of .bit files (language of the FPGA)

Software Aspect Cont’d Programming language/HDL SoC consists 50 to 90% software Wide acceptability of C/C++ Simulation timing Simulation takes long time in current CAD tools C/C++ debugger very efficient

RC1000 Celoxica platform DK1 design suite (handel C) RC1000 plug-in card, PCI bus interfacing Xilinx Virtex-1000 FPGA (1 million gates) Design Flow Handel C Source Files Generate VHDL/Verilog Compile Simulate & netlist Generate EDIF (netlist) Place & Route Tools Generation BitStream

Hardware-Software Co-design Amdahl’s Law T = 1 (1 – a) + a / s T = Overall speedup a = Fraction of the original program that could be enhanced by transferring to h/w s = Speedup obtained for particular fraction of program

Summary RCS to bridge the gap between Software and hardware (flexibility and performance) FPGA ideal candidate for an RH Spatial Execution Reprogrammability Design time Design and synthesis flow for CAD tools Hybrid Architecture Recent advancement in CAD tools

Questions?????????????