Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.

Slides:



Advertisements
Similar presentations
Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Advertisements

Dr. Rabie A. Ramadan Al-Azhar University Lecture 3
A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.
TIE Extensions for Cryptographic Acceleration Charles-Henri Gros Alan Keefer Ankur Singla.
Sumitha Ajith Saicharan Bandarupalli Mahesh Borgaonkar.
Bio Michel Hanna M.S. in E.E., Cairo University, Egypt B.S. in E.E., Cairo University at Fayoum, Egypt Currently is a Ph.D. Student in Computer Engineering.
Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.
 Understanding the Sources of Inefficiency in General-Purpose Chips.
Spartan II Features  Plentiful logic and memory resources –15K to 200K system gates (up to 5,292 logic cells) –Up to 57 Kb block RAM storage  Flexible.
Multithreaded FPGA Acceleration of DNA Sequence Mapping Edward Fernandez, Walid Najjar, Stefano Lonardi, Jason Villarreal UC Riverside, Department of Computer.
1 Asynchronous Bit-stream Compression (ABC) IEEE 2006 ABC Asynchronous Bit-stream Compression Arkadiy Morgenshtein, Avinoam Kolodny, Ran Ginosar Technion.
EECE579: Digital Design Flows
Zheming CSCE715.  A wireless sensor network (WSN) ◦ Spatially distributed sensors to monitor physical or environmental conditions, and to cooperatively.
Team Morphing Architecture Reconfigurable Computational Platform for Space.
1 Performed By: Khaskin Luba Einhorn Raziel Einhorn Raziel Instructor: Rivkin Ina Spring 2004 Spring 2004 Virtex II-Pro Dynamical Test Application Part.
Lecture 26: Reconfigurable Computing May 11, 2004 ECE 669 Parallel Computer Architecture Reconfigurable Computing.
ENGIN112 L38: Programmable Logic December 5, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 38 Programmable Logic.
Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.
Data Manipulation Computer System consists of the following parts:
Configurable System-on-Chip: Xilinx EDK
Memory Organization.
CS 151 Digital Systems Design Lecture 38 Programmable Logic.
Chapter 6 Memory and Programmable Logic Devices
Chapter 7 – Registers and Register Transfers Part 1 – Registers, Microoperations and Implementations Logic and Computer Design Fundamentals.
Field Programmable Gate Array (FPGA) Layout An FPGA consists of a large array of Configurable Logic Blocks (CLBs) - typically 1,000 to 8,000 CLBs per chip.
GallagherP188/MAPLD20041 Accelerating DSP Algorithms Using FPGAs Sean Gallagher DSP Specialist Xilinx Inc.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #7 – Applications.
Students: Oleg Korenev Eugene Reznik Supervisor: Rolf Hilgendorf
February 12, 1998 Aman Sareen DPGA-Coupled Microprocessors Commodity IC’s for the Early 21st Century by Aman Sareen School of Electrical Engineering and.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Lecture 2: Field Programmable Gate Arrays September 13, 2004 ECE 697F Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays.
Gene Matching Using JBits Steven A. Guccione Eric Keller.
Basic Microcomputer Design. Inside the CPU Registers – storage locations Control Unit (CU) – coordinates the sequencing of steps involved in executing.
Making FPGAs a Cost-Effective Computing Architecture Tom VanCourt Yongfeng Gu Martin Herbordt Boston University BOSTON UNIVERSITY.
DLS Digital Controller Tony Dobbing Head of Power Supplies Group.
Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob.
COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.
Efficient FPGA Implementation of QR
Automated Design of Custom Architecture Tulika Mitra
Author: Haoyu Song, Fang Hao, Murali Kodialam, T.V. Lakshman Publisher: IEEE INFOCOM 2009 Presenter: Chin-Chung Pan Date: 2009/12/09.
Lecture 18: Dynamic Reconfiguration II November 12, 2004 ECE 697F Reconfigurable Computing Lecture 18 Dynamic Reconfiguration II.
Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa.
FPGA (Field Programmable Gate Array): CLBs, Slices, and LUTs Each configurable logic block (CLB) in Spartan-6 FPGAs consists of two slices, arranged side-by-side.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
A Front End and Readout System for PET Overview: –Requirements –Block Diagram –Details William W. Moses Lawrence Berkeley National Laboratory Department.
VLSI Algorithmic Design Automation Lab. 1 Integration of High-Performance ASICs into Reconfigurable Systems Providing Additional Multimedia Functionality.
Computer Organization - 1. INPUT PROCESS OUTPUT List different input devices Compare the use of voice recognition as opposed to the entry of data via.
Lecture 10: Logic Emulation October 8, 2013 ECE 636 Reconfigurable Computing Lecture 13 Logic Emulation.
EE3A1 Computer Hardware and Digital Design
Lecture 13: Reconfigurable Computing Applications October 10, 2013 ECE 636 Reconfigurable Computing Lecture 11 Reconfigurable Computing Applications.
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
COARSE GRAINED RECONFIGURABLE ARCHITECTURES 04/18/2014 Aditi Sharma Dhiraj Chaudhary Pruthvi Gowda Rachana Raj Sunku DAY
Department of Industrial Engineering Sharif University of Technology Session# 6.
PentiumPro 450GX Chipset Synthesis Steen Larsen Presentation 1 for ECE572 Nov
A Programmable Single Chip Digital Signal Processing Engine MAPLD 2005 Paul Chiang, MathStar Inc. Pius Ng, Apache Design Solutions.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #12 – Systolic.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
Fundamentals of Programming Languages-II
Fast Lookup for Dynamic Packet Filtering in FPGA REPORTER: HSUAN-JU LI 2014/09/18 Design and Diagnostics of Electronic Circuits & Systems, 17th International.
Los Alamos National Laboratory Streams-C Maya Gokhale Los Alamos National Laboratory September, 1999.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #7 – Applications.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Buffering Techniques Greg Stitt ECE Department University of Florida.
Dynamic and On-Line Design Space Exploration for Reconfigurable Architecture Fakhreddine Ghaffari, Michael Auguin, Mohamed Abid Nice Sophia Antipolis University.
Backprojection Project Update January 2002
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
CS 31006: Computer Networks – The Routers
Overview Part 1 - Registers, Microoperations and Implementations
Presentation transcript:

Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications

Lecture 16: Reconfigurable Computing Applications November 3, 2004 Overview Perhaps the most well-known reconfigurable computer is Splash/Splash 2 Implemented as linear, systolic array Developed at Supercomputing Research Center ( ) Memory tightly coupled with each FPGA Multiple Splash boards can be combined to form larger system.

Lecture 16: Reconfigurable Computing Applications November 3, 2004 Splash 2 Architecture

Lecture 16: Reconfigurable Computing Applications November 3, 2004 Splash 2 Models of Computations Linear (systolic) array -All near-neighbor communication, pipelined -Very fast (at the time) of 20-30MHz achieved -All FPGAs have same program SIMD array -Instructions fanned out to all processing element -Data across all elements collected at the end X1X1 X2X2 X3X3 X 16 X1X1 X2X2 X3X3 X0X0

Lecture 16: Reconfigurable Computing Applications November 3, 2004 Splash 2 Programming Environment Three components to be programmed -Splash board -> crossbar configurations and FPGA configurations determined individually -Splash interface -> FIFO controls data flow to boards -Host interface -> driver software controls application execution and collection of results Somewhat less automated than PAM -Typically comparable to programming a parallel multiprocessor system.

Lecture 16: Reconfigurable Computing Applications November 3, 2004 Example Application Flow Frequently an iterative process Logic Synthesis Module Simulation System Simulation FPGA Place + Route VHDL Interface Description Crossbar Configuration FPGA bitstreams Module VHDL Description

Lecture 16: Reconfigurable Computing Applications November 3, 2004 Application #1: Text Searching Search through dictionary of words for data hit Applicable to internet search engines/databases Opportunities for search parallelism Splash implementation uses systolic communication

Lecture 16: Reconfigurable Computing Applications November 3, 2004 Data Access Each FPGA used to look into local memory. Longer data words “hashed” into 18 bit address Valid bit in memory indicates if data value is currently stored. Could be stored in several locations RAM X

Lecture 16: Reconfigurable Computing Applications November 3, 2004 Example Hash Function XOR two character value with temp result and hash function Rotate result Different hash function for each FPGA Shift amount: 7 bits Hash function: Clear hash register Input the letters “th” Temporary Result Result for “th” Input for letters “e_” Temporary result Result for “the_”

Lecture 16: Reconfigurable Computing Applications November 3, 2004 Text Searching Tips Distribute dictionary in parallel to all memories Collect word values in FIFOs Distribute words two characters at a time across all devices. Perform local hashing and lookup in parallel Collect “hit” result at end

Lecture 16: Reconfigurable Computing Applications November 3, 2004 Results Splash 2 implementation runs at 25 MHz Three phases needed -Fetch 2 bit-sliced characters -Perform hash -Table look-up Takes advantage of both systolic and SIMD modes.

Lecture 16: Reconfigurable Computing Applications November 3, 2004 Application #2: Genetic Pattern Matching Evaluate similarities between pairs of genetic sequences Edit distance defined as similarity between sequences abqrt acqsdh Operations include deleting characters, inserting characters, substituting characters Existing approach iterative (dynamic program) comparing one position at a time.

Lecture 16: Reconfigurable Computing Applications November 3, 2004 Base Comparison Cell

Lecture 16: Reconfigurable Computing Applications November 3, 2004 Genetic Search Implementation Bidirectional linear array used to transfer information back and forth Run time set at O(mn) for compares/accumulates. D i-1, j-1 D i-1, j D i, j-1 S1S1 S0S0 T0T0 T1T1

Lecture 16: Reconfigurable Computing Applications November 3, 2004 Splash 2 Data Flow

Lecture 16: Reconfigurable Computing Applications November 3, 2004 Splash 2 Data Flow

Lecture 16: Reconfigurable Computing Applications November 3, 2004 Genetic Search Result Nearly linear scaling in cell updates per second (CUPs) Need to reuse array for large patterns

Lecture 16: Reconfigurable Computing Applications November 3, 2004 Application #3: Building Pyramids Reconfigurable computers well suited to image processing due to high parallelism and specialization (filtering) Algorithms change sufficiently fast such that ASIC implementations become outdated. Examine two issues with Splash -Image compression and image error estimation Parallelize across array in SIMD and systolic fashion

Lecture 16: Reconfigurable Computing Applications November 3, 2004 Pyramid Operations Gaussian Pyramid -Down sample image to compress image size for communication. -Average over a set of points to create new point Laplacian Pyramid -Determine error found from Gaussian Pyramid -Expand contracted picture and compare with original

Lecture 16: Reconfigurable Computing Applications November 3, 2004 Gaussian Pyramid Implementation Systolic array in which each device performs a separate function. Limited by clock rate of slowest device.

Lecture 16: Reconfigurable Computing Applications November 3, 2004 Laplacian Pyramid Use interpolation to expand reduced image Error calculation can be used to tune reduction operation (filtering) Original image ReduceExpand --- Error

Lecture 16: Reconfigurable Computing Applications November 3, 2004 Gaussian/Laplacian Pyramid Flow Generates both Gaussian and Laplacian pyramid for 512 x 480 image in 22.7 ms at 15.7 MHz Comparable to custom devices.

Lecture 16: Reconfigurable Computing Applications November 3, 2004 Other Image Processing Target recognition Break image into “chips” Each chip passed through linear array in attempt to match with stored image Images can be rotated, mirrored. Zoom in if suspicious object found. RAM

Lecture 16: Reconfigurable Computing Applications November 3, 2004 Summary Splash 2 effective due to scalability and programming model. Parameterizable applications benefit that are regular and distributed High bandwidth effective for searching/signal processing Challenges remain in software development.