JIT FPGA Ideas Contributing Ph.D. Students Roman Lysecky (Ph.D. 2005, now Asst. Prof. at Univ. of Arizona Greg Stitt (Ph.D. 2007, now Asst. Prof. at Univ.

Slides:



Advertisements
Similar presentations
Sumitha Ajith Saicharan Bandarupalli Mahesh Borgaonkar.
Advertisements

CSE 340 Computer Architecture Spring 2014 MIPS ISA Review
1 SECURE-PARTIAL RECONFIGURATION OF FPGAs MSc.Fisnik KRAJA Computer Engineering Department, Faculty Of Information Technology, Polytechnic University of.
Conjoining Soft-Core FPGA Processors David Sheldon a, Rakesh Kumar b, Frank Vahid a*, Dean Tullsen b, Roman Lysecky c a Department of Computer Science.
The Warp Processor Dynamic SW/HW Partitioning David Mirabito A presentation based on the published works of Dr. Frank Vahid - Principal Investigator Dr.
Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.
Portability for FPGA Applications—Warp Processing and SystemC Bytecode Contributing Ph.D. Students Roman Lysecky (Ph.D. 2005, now Asst. Prof. at Univ.
1 Performed By: Khaskin Luba Einhorn Raziel Einhorn Raziel Instructor: Rivkin Ina Spring 2004 Spring 2004 Virtex II-Pro Dynamical Test Application Part.
Aug. 24, 2007ELEC 5200/6200 Project1 Computer Design Project ELEC 5200/6200-Computer Architecture and Design Fall 2007 Vishwani D. Agrawal James J.Danaher.
Application-Specific Customization of Parameterized FPGA Soft-Core Processors David Sheldon a, Rakesh Kumar b, Roman Lysecky c, Frank Vahid a*, Dean Tullsen.
A Configurable Logic Architecture for Dynamic Hardware/Software Partitioning Roman Lysecky, Frank Vahid* Department of Computer Science and Engineering.
A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning Roman Lysecky, Frank Vahid* Department.
Dynamic FPGA Routing for Just-in-Time Compilation Roman Lysecky a, Frank Vahid a*, Sheldon X.-D. Tan b a Department of Computer Science and Engineering.
The New Software: Invisible Ubiquitous FPGAs that Enable Next-Generation Embedded Systems Frank Vahid Professor Department of Computer Science and Engineering.
Portability for FPGA Applications—Warp Processing and SystemC Bytecode Contributing Ph.D. Students Roman Lysecky (Ph.D. 2005, now Asst. Prof. at Univ.
Configurable System-on-Chip: Xilinx EDK
Warp Processing – Towards FPGA Ubiquity Frank Vahid Professor Department of Computer Science and Engineering University of California, Riverside Associate.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 08: RC Principles: Software (1/4) Prof. Sherief Reda.
The Xilinx EDK Toolset: Xilinx Platform Studio (XPS) Building a base system platform.
Performance Analysis of Processor Midterm Presentation Performed by : Winter 2005 Alexei Iolin Alexander Faingersh Instructor: Evgeny.
1/21 Scalable Object Detection Accelerators on FPGAs Using Custom Design Space Exploration Chen Huang and Frank Vahid Dept. of Computer Science and Engineering.
Application-Specific Customization of Microblaze Processors, and other UCR FPGA Research Frank Vahid Professor Department of Computer Science and Engineering.
Thread Warping: A Framework for Dynamic Synthesis of Thread Accelerators Greg Stitt Dept. of ECE University of Florida This research was supported in part.
Just-in-Time Compilation for FPGA Processor Cores This work was supported in part by the National Science Foundation (CNS ) and by the Semiconductor.
Self-Improving Computer Chips – Warp Processing Contributing Ph.D. Students Roman Lysecky (Ph.D. 2005, now Asst. Prof. at Univ. of Arizona Greg Stitt (Ph.D.
Frank Vahid, UC Riverside 1 Self-Improving Configurable IC Platforms Frank Vahid Associate Professor Dept. of Computer Science and Engineering University.
Propagating Constants Past Software to Hardware Peripherals Frank Vahid*, Rilesh Patel and Greg Stitt Dept. of Computer Science and Engineering University.
Lecture 7 Lecture 7: Hardware/Software Systems on the XUP Board ECE 412: Microcomputer Laboratory.
Scott Sirowy Department of Computer Science and Engineering University of California, Riverside This work was supported in part by the National Science.
Computer Architecture ECE 4801 Berk Sunar Erkay Savas.
Scott Sirowy*, Greg Stitt‡, Frank Vahid*†
Impulse Embedded Processing Video Lab Generate FPGA hardware Generate hardware interfaces HDL files HDL files FPGA bitmap FPGA bitmap C language software.
B212/MAPLD 2005 Craven1 Configurable Soft Processor Arrays Using the OpenFire Processor Stephen Craven Cameron Patterson Peter Athanas Configurable Computing.
A Fast On-Chip Profiler Memory Roman Lysecky, Susan Cotterell, Frank Vahid* Department of Computer Science and Engineering University of California, Riverside.
A Reconfigurable Processor Architecture and Software Development Environment for Embedded Systems Andrea Cappelli F. Campi, R.Guerrieri, A.Lodi, M.Toma,
FPGA IRRADIATION and TESTING PLANS (Update) Ray Mountain, Marina Artuso, Bin Gui Syracuse University OUTLINE: 1.Core 2.Peripheral 3.Testing Procedures.
COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.
A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.
Automated Design of Custom Architecture Tulika Mitra
Research on Reconfigurable Computing Using Impulse C Carmen Li Shen Mentor: Dr. Russell Duren February 1, 2008.
Reminder Lab 0 Xilinx ISE tutorial Research Send me an if interested Looking for those interested in RC with skills in compilers/languages/synthesis,
Introduction to Reconfigurable Computing Greg Stitt ECE Department University of Florida.
Department of Electrical Engineering Electronics Computers Communications Technion Israel Institute of Technology High Speed Digital Systems Lab. High.
Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.
LAB1 Summary Zhaofeng SJTU.SOME. Embedded Software Tools CPU Logic Design Tools I/O FPGA Memory Logic Design Tools FPGA + Memory + IP + High Speed IO.
Part A Presentation Implementation of DSP Algorithm on SoC Student : Einat Tevel Supervisor : Isaschar Walter Accompanying engineer : Emilia Burlak The.
GreenBus Extensions for System-On-Chip Exploration.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Reconfigurable Architectures Forces that drive.
Infrastructure design & implementation of MIPS processors for students lab based on Bluespec HDL Students: Danny Hofshi, Shai Shachrur Supervisor: Mony.
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
A Monte Carlo Simulation Accelerator using FPGA Devices Final Year project : LHW0304 Ng Kin Fung && Ng Kwok Tung Supervisor : Professor LEONG, Heng Wai.
Jason Li Jeremy Fowers 1. Speedups and Energy Reductions From Mapping DSP Applications on an Embedded Reconfigurable System Michalis D. Galanis, Gregory.
SOC Virtual Prototyping: An Approach towards fast System- On-Chip Solution Date – 09 th April 2012 Mamta CHALANA Tech Leader ST Microelectronics Pvt. Ltd,
VAPRES A Virtual Architecture for Partially Reconfigurable Embedded Systems Presented by Joseph Antoon Abelardo Jara-Berrocal, Ann Gordon-Ross NSF Center.
Making Good Points : Application-Specific Pareto-Point Generation for Design Space Exploration using Rigorous Statistical Methods David Sheldon, Frank.
Codesigned On-Chip Logic Minimization Roman Lysecky & Frank Vahid* Department of Computer Science and Engineering University of California, Riverside *Also.
WARP PROCESSORS ROMAN LYSECKY GREG STITT FRANK VAHID Presented by: Xin Guan Mar. 17, 2010.
Content Project Goals. Workflow Background. System configuration. Working environment. System simulation. System synthesis. Benchmark. Multicore.
Survey of Reconfigurable Logic Technologies
Scott Sirowy, Chen Huang, and Frank Vahid † Department of Computer Science and Engineering University of California, Riverside {ssirowy,chuang,
On-Chip Logic Minimization Roman Lysecky & Frank Vahid* Department of Computer Science and Engineering University of California, Riverside *Also with the.
Warp Processing: Making FPGAs Ubiquitous via Invisible Synthesis Greg Stitt Department of Electrical and Computer Engineering University of Florida.
CoDeveloper Overview Updated February 19, Introducing CoDeveloper™  Targeting hardware/software programmable platforms  Target platforms feature.
A Study of the Scalability of On-Chip Routing for Just-in-Time FPGA Compilation Roman Lysecky a, Frank Vahid a*, Sheldon X.-D. Tan b a Department of Computer.
Improving java performance using Dynamic Method Migration on FPGAs
Implementation of IDEA on a Reconfigurable Computer
A High Performance SoC: PkunityTM
Dynamic FPGA Routing for Just-in-Time Compilation
Portable SystemC-on-a-Chip
Online SystemC Emulation Acceleration
Presentation transcript:

JIT FPGA Ideas Contributing Ph.D. Students Roman Lysecky (Ph.D. 2005, now Asst. Prof. at Univ. of Arizona Greg Stitt (Ph.D. 2007, now Asst. Prof. at Univ. of Florida, Gainesville Scotty Sirowy (current) David Sheldon (current) Chen Huang (current) This research was supported in part by the National Science Foundation, the Semiconductor Research Corporation, Intel, Freescale, IBM, and Xilinx Frank Vahid Dept. of CS&E University of California, Riverside Associate Director, Center for Embedded Computer Systems, UC Irvine

Frank Vahid, UC Riverside 2 SystemC Bytecode for FPGAs Demo

Frank Vahid, UC Riverside 3 FPGA Common Presence Caches, FPUs, GPUs, FPGAs App developers may expect FPGA presence How create/distribute apps that make good use of FPGA if present? µP Binary CacheFPU FPGA µP GPU

Frank Vahid, UC Riverside 4 “Spatial” Algorithms for FPGAs Example – Count patterns Sequential algorithm Hash table 10s cycles per pattern int patterns[1,000]; int counts[1,000]; while (1) { WaitForPattern(); CurrPattern = X; hash = HashFct(CurrPattern); item = Find(patterns, CurrPattern, hash); if (item) { counts[item]++; } count Level 1 logic pattern logic Level 2 Level m logic CurrPattern count pattern count pattern bus Spatial algorithm Pipelined stages Essence is the connectivity of components, not the sequencing of instructions

Frank Vahid, UC Riverside 5 Bytecode Modern portability approach Java, C# Pentium Atom Opteron bytecode Compiler VM Virtual Machine (VM): Program that executes bytecode May JIT compile to native architecture

Frank Vahid, UC Riverside 6 SystemC Bytecode? Pentium FPGA SystemC bytecode Compiler VM SystemC Opteron + FPGA VM

Frank Vahid, UC Riverside 7 UCR SystemC Bytecode and Compiler class EDGE_DETECTOR : public sc_module { //signal declarations … EDGE_DETECTOR() { SC_method(mainComp); sensitive << dataReady; SC_method(getPixel); sensitive << clock.pos(); void getPixel(){ … dataReady.write(1); } void mainComp(){ int i, j; for(i = 0; i < 3; i++){ for(j = 0; j < 3; j++){ sumX = sumX + mem.read()*GX[i][j] } … edge.write(sumX + sumY) } SystemC --header signal clock : 1 signal reset : 1 signal memory_in : 32 signal fb_data : 32 signal leds : 4 process(clock) READ $1 memory_in ADD $2 $0 3 ADD $3 $2 $1 WRITE $3 s1 ADDI $1 $0 1 WRITE $1 dataReady END process(dataReady) READ $5 val6 SW $5 24($0) READ $5 val7 … ADDI $10 $0 0 ADDI $7 $0 0 ADDI $13 $0 8 … END UCR’s SystemC bytecode UCR’s SystemC-to- bytecode compiler MIPS-like sequential instructions Spatial Constructs

Frank Vahid, UC Riverside 8 SystemC Bytecode Emulator Emulator Input Memory Output Memory UART Buttons LEDs Read Signal Memory Write Signal Memory Main Processor Instruction Memory USB Interface FPGA Bytecode uploadable via USB drive Accelerators speedup emulation SystemC bytecode

Frank Vahid, UC Riverside 9 SystemC Bytecode Accelerators Emulator Input Memory Output Memory UART Buttons LEDs Read Signal Memory Write Signal Memory Main Processor Instruction Memory USB Interface Accelerator 1 Accelerator 2 Accelerator 3 FPGA SystemC bytecode Implementation MIPS-like multicycle RISC datapath 100 MHz Clock ~33 Million Instr/Sec Communicates to core emulator memory mapped registers Area: ~5000 slices # of accelerators limited to # of masters allowed on bus ~1200 lines of VHDL Accelerator RISC Datapath Register File Local Mem Bus, start, load logic

Frank Vahid, UC Riverside 10 Dynamic SystemC Accelerator Management Emulator Input Memory Output Memory UART Buttons LEDs Read Signal Memory Write Signal Memory Main Processor Instruction Memory USB Interface Accelerator 1 Accelerator 2 Accelerator 3 FPGA SystemC bytecode Only a limited number of SystemC accelerators can fit on an FPGA fabric Dynamically map processes to accelerators based on process usage Involves online algorithms Image Filter Example

Frank Vahid, UC Riverside 11 Just-in-Time Synthesis Emulator Input Memory Output Memory UART Buttons LEDs Read Signal Memory Write Signal Memory Main Processor Instruction Memory Accelerator 1 Accelerator 2 Accelerator 3 FPGA SystemC bytecode Possible to even perform synthesis on-chip – “warp processing” (previous UCR work) Send SystemC bytecode to synthesis server FPGA Specific Bitstream Dynamically reconfigure some or all of the FPGA

Frank Vahid, UC Riverside 12 2 n Count 2 n patterns 4 Count 4 patterns 2 Count 2 patterns 1 Count Spatial Algorithms for FPGAs Even better spatial algorithm for pattern counting Pipelined binary tree Level 1 logic Memory 1 pattern logic Memory 2 patterns logic Memory 4 patterns Level 2 Level 3 Level n logic Memory 2 n patterns CurrPattern......

Frank Vahid, UC Riverside 13 Study of Spatial Algorithms in FCCM (Sirowy FPGA’2008) YearApplicationType 20013D Vec. NormalizationSpatial 2001Efficient CAM Automated SensorTemporal 2001Regular ExpressionSpatial 2002Hyperspectral ImageSpatial 2002Machine VisionSpatial 2002RC4Temporal 2002Set CoveringSpatial 2002Template MatchingSpatial 2002Triangle MeshSpatial 2003Congruential SievesTemporal 2003Content ScanningTemporal 2003F.P and Square RootSpatial 2003Gaussian NoiseSpatial 2003TRNG D FDTD MethodSpatial 2004Deep Packet Filter Online Floating Point Molecular DynamicsSpatial 2004Pattern MatchingSpatial 2004Seismic MigrationSpatial 2004Software Deceleration V.M Window Data MiningSpatial 2005Cell AutomataTemporal 2005Particle GraphicsSpatial 2005RadiosityTemporal 2005Transient WavesSpatial 2005Road TrafficTemporal 2006All Pairs Shortest PathSpatial 2006Apriori Data MiningSpatial 2006Molecular DynamicsSpatial 2006Gaussian EliminationSpatial 2006Radiation DoseTemporal 2006Random VariatesSpatial FCCM papers describing fast application on FPGA Examined 35 in depth (every other one) 6 used device-specific features 9 represented expected synthesized circuit from the obvious sequential algorithm 20 were spatially-oriented applications akin to earlier pipelined binary tree

Frank Vahid, UC Riverside 14 Portable Spatial Applications? Current portable microprocessor binaries – sequential Extensions for threads, processes,... How support spatial constructs Ports, connections, timing model Adds libraries and macros, still standard C++ Sequential and spatial constructs Compiling links in the simulation kernel Self-executing simulation Intended for SoC simulation

Frank Vahid, UC Riverside 15 Transmuting Coprocessors Demo

Frank Vahid, UC Riverside 16 FPGA is a Size-Limited Coprocessing Resource FPGA implements coprocessors Upload app profile info Select coproc. set, generate new FPGA bitstream Send back new bitstream, re- program FPGA Speedup with previous apps App executions change. Must decide which coprocessors should be FPGA-resident at a given time – transmuting coprocessors

Frank Vahid, UC Riverside 17 Transmuting Coprocessor Demo Three image filters: Blur filter (S/L): Blur the image Sobel filter (S/L): Find the edge of the image Emboss filter(S/L): Emboss the image Platform: Virtex 2P(XC2VP30): PPC + Coprocessors PPC Frequency: 100Mhz Coproc. Frequency: 50Mhz 30x120x Size(slice)SmallLarge Blur30120 Sobel Emboss81324

Frank Vahid, UC Riverside 18 Demo architecture PPC Peripherals Instruction BRAM EDK Interface to external Display BRAM Image BRAM Coproc VGA control VGA display UARTPush button ISE Image (128*128 pixels and 24bit color): 24 BRAMs Soft version: Read (Image BRAM)  Execution (PPC)  Write (Display BRAM) Coprocessor version: Read (Image BRAM)  Execution(Coproc)  Write (Display BRAM) Dock: send the profile information through UART. PLB

Frank Vahid, UC Riverside 19 Coprocessor configurations Microprocessor only Small blur+ small sobel Small blur + small emboss Small sobel + small emboss Large blur Large sobel Large emboss Choose the configuration according to app profile info. PPCPeripherals Memory Virtex2P Coprocessor region Blur (S) Sobel(S) Blur (S) Emboss(s) Sobel(s) Emboss(s) Blur (L)Sobel (L) Emboss(L)

Frank Vahid, UC Riverside 20 Video demo program flow Execution Read profile info from UART Update profile information Dock Select new program file Reprogram FPGA Different objectives and different heuristics. Time information Dock + CP selection0.001s Start IMPACT + FPGA reprogramming 12s Filter PPC only (128 frames)30s Filter CP small (128 frames)1s Filter CP large (128 frames)0.25s