LOGO “ Add your company slogan ” Comparative analysis of High Level Programming for Reconfigurable Computers: Methodology and Empirical Study Wen-qian.

Slides:



Advertisements
Similar presentations
Embedded System, A Brief Introduction
Advertisements

© 2003 Xilinx, Inc. All Rights Reserved Course Wrap Up DSP Design Flow.
1 SECURE-PARTIAL RECONFIGURATION OF FPGAs MSc.Fisnik KRAJA Computer Engineering Department, Faculty Of Information Technology, Polytechnic University of.
Evolution and History of Programming Languages Software/Hardware/System.
Performance Analysis of Multiprocessor Architectures
Implementation methodology for Emerging Reconfigurable Systems With minimum optimization an appreciable speedup of 3x is achievable for this program with.
Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.
Introductory Comments Regarding Hardware Description Languages.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.
V The DARPA Dynamic Programming Benchmark on a Reconfigurable Computer Justification High performance computing benchmarking Compare and improve the performance.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 08: RC Principles: Software (1/4) Prof. Sherief Reda.
Tejas Bhatt and Dennis McCain Hardware Prototype Group, NRC/Dallas Matlab as a Development Environment for FPGA Design Tejas Bhatt June 16, 2005.
ECE 699: Lecture 2 ZYNQ Design Flow.
Implementation of DSP Algorithm on SoC. Characterization presentation Student : Einat Tevel Supervisor : Isaschar Walter Accompany engineer : Emilia Burlak.
Copyright Arshi Khan1 System Programming Instructor Arshi Khan.
Dynamic Hardware Software Partitioning A First Approach Komal Kasat Nalini Kumar Gaurav Chitroda.
1 A survey on Reconfigurable Computing for Signal Processing Applications Anne Pratoomtong Spring2002.
GallagherP188/MAPLD20041 Accelerating DSP Algorithms Using FPGAs Sean Gallagher DSP Specialist Xilinx Inc.
(1) Introduction © Sudhakar Yalamanchili, Georgia Institute of Technology, 2006.
© 2011 Xilinx, Inc. All Rights Reserved Intro to System Generator This material exempt per Department of Commerce license exception TSU.
Digital signature using MD5 algorithm Hardware Acceleration
George Mason University ECE 448 – FPGA and ASIC Design with VHDL High Level Language (HLL) Design Flow Reconfigurable Supercomputers ECE 448 Lecture 21.
Performance and Overhead in a Hybrid Reconfigurable Computer O. D. Fidanci 1, D. Poznanovic 2, K. Gaj 3, T. El-Ghazawi 1, N. Alexandridis 1 1 George Washington.
An automatic tool flow for the combined implementation of multi-mode circuits Brahim Al Farisi, Karel Bruneel, João Cardoso, Dirk Stroobandt.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.
ASIC/FPGA design flow. FPGA Design Flow Detailed (RTL) Design Detailed (RTL) Design Ideas (Specifications) Design Ideas (Specifications) Device Programming.
Automated Design of Custom Architecture Tulika Mitra
Research on Reconfigurable Computing Using Impulse C Carmen Li Shen Mentor: Dr. Russell Duren February 1, 2008.
NDA Confidential. Copyright ©2005, Nallatech.1 Implementation of Floating- Point VSIPL Functions on FPGA-Based Reconfigurable Computers Using High- Level.
Advanced Computer Architecture, CSE 520 Generating FPGA-Accelerated DFT Libraries Chi-Li Yu Nov. 13, 2007.
1 H ardware D escription L anguages Modeling Digital Systems.
Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa.
IEEE ICECS 2010 SysPy: Using Python for processor-centric SoC design Evangelos Logaras Elias S. Manolakos {evlog, Department of Informatics.
Hardware/Software Co-design Design of Hardware/Software Systems A Class Presentation for VLSI Course by : Akbar Sharifi Based on the work presented in.
Reminder Lab 0 Xilinx ISE tutorial Research Send me an if interested Looking for those interested in RC with skills in compilers/languages/synthesis,
StreamX10: A Stream Programming Framework on X10 Haitao Wei School of Computer Science at Huazhong University of Sci&Tech.
Languages for HW and SW Development Ondrej Cevan.
DIPARTIMENTO DI ELETTRONICA E INFORMAZIONE Novel, Emerging Computing System Technologies Smart Technologies for Effective Reconfiguration: The FASTER approach.
1 - CPRE 583 (Reconfigurable Computing): VHDL to FPGA: A Tool Flow Overview Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture 5: 9/7/2011.
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
Evaluating and Improving an OpenMP-based Circuit Design Tool Tim Beatty, Dr. Ken Kent, Dr. Eric Aubanel Faculty of Computer Science University of New Brunswick.
Introduction to VHDL Simulation … Synthesis …. The digital design process… Initial specification Block diagram Final product Circuit equations Logic design.
Gedae, Inc. Gedae: Auto Coding to a Virtual Machine Authors: William I. Lundgren, Kerry B. Barnes, James W. Steed HPEC 2004.
1 Copyright  2001 Pao-Ann Hsiung SW HW Module Outline l Introduction l Unified HW/SW Representations l HW/SW Partitioning Techniques l Integrated HW/SW.
Parallel Portability and Heterogeneous programming Stefan Möhl, Co-founder, CSO, Mitrionics.
PTII Model  VHDL Codegen Verification Project Overview 1.Generate VHDL descriptions for Ptolemy models. 2.Maintain bit and cycle accuracy in implementation.
Cray XD1 Reconfigurable Computing for Application Acceleration.
ASIC/FPGA design flow. Design Flow Detailed Design Detailed Design Ideas Design Ideas Device Programming Device Programming Timing Simulation Timing Simulation.
Quantifying Acceleration: Power/Performance Trade-Offs of Application Kernels in Hardware WU DI NOV. 3, 2015.
CoDeveloper Overview Updated February 19, Introducing CoDeveloper™  Targeting hardware/software programmable platforms  Target platforms feature.
Presenter: Yi-Ting Chung Fast and Scalable Hybrid Functional Verification and Debug with Dynamically Reconfigurable Co- simulation.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
Topics Modeling with hardware description languages (HDLs).
FPGAs in AWS and First Use Cases, Kees Vissers
Improving java performance using Dynamic Method Migration on FPGAs
Topics Modeling with hardware description languages (HDLs).
Anne Pratoomtong ECE734, Spring2002
Introduction to cosynthesis Rabi Mahapatra CSCE617
Implementation of IDEA on a Reconfigurable Computer
Reconfigurable Computing
Course Agenda DSP Design Flow.
Matlab as a Development Environment for FPGA Design
High Level Synthesis Overview
Embedded systems, Lab 1: notes
ECE 699: Lecture 3 ZYNQ Design Flow.
HIGH LEVEL SYNTHESIS.
Digital Designs – What does it take
Presentation transcript:

LOGO “ Add your company slogan ” Comparative analysis of High Level Programming for Reconfigurable Computers: Methodology and Empirical Study Wen-qian Wu EEL 6935

Contents Reconfigurable Computing (RC) design flow 1 High level language (HLL) based design 2 Metrics of HLL and quantification 3 Experiment and result 4

RC design flow  Reconfigurable Computing system takes the advantage Hardware (e.g. FPGA) acceleration  Two main mechanisms for acceleration:  Loop unrolling and pipelining  Not all computations are fit for FPGA  Control intensive application  Floating point operations  Data transfer overhead  RC system is naturally a hybrid system which consists both software and hardware  Programming SW in C/C++/Java, HW in VHDL/Verilog  HW and SW partitioning

HDL OR C  HDL is not so friendly  Ratio of VHDL to C developers 1:  Function is dependent on architecture, not portable  Tedious, slow development cycle…  Most application developers are willing to give up some performance and chip utilization in exchange of productivity  High-level design tools are hence raised!

High Level design flow  High level design flow  Describing HW using HLLs is possible, and has been tried in several commercial products, such as Xilinx Forge, Celoxica Handel-C, Impulse-C, Mitrion-C, DSPLogic, etc. HDL Netlist Bitfile Processor FPGA RT Synthesis Physical Design Technology Mapping Placement Routing High-level Synthesis C/C++, Java, Simulink, etc

High Level Language (HLL)  Benefits of HLL  Allow mathematicians and computer scientists to develop entire applications without relying on hardware designers.  Substantially increase the productivity of the design process.  Challenges of HLL  Able to compile high level description to HDL  Mutually synchronization and data transfer between microprocessor and FPGA  Facilitate HW and SW co-simulation and code reuse  This paper review 3 HLLs  Impulse-C (C-based)  Mitrion-C (C-based)  DSPLogic (graphical-based)

High Level Language (HLL) – cont.  Why choosing the 3 HLLs for studying  1) Fully developed HLL tools with variable platform supporting.  2) Each one wants to reach broader audience of potential RC users.  3) Each one has different and distinct vision on how to realize its goal. Impulse-C: Imperative programming Mitrion-C: Functional programming DSPLogic: Schematic programming

Impulse C  Translate C code to HDL  A subset of ANSI C  Pragma pipelining and scheduling features may be controlled at the level of your C source code through the use of certain predefined pragmas, like co pipeline, co unroll, etc.

Impulse C  Built-in communication  Separate processes are made for the SW (CPU) & HW (FPGA)  Communication via co stream  Synchronization via co signal  HW/SW co-simulation  Test vector is generated automatically when performing C level simulation.  The test vector is used for writing testbench for HDL

Mitrion C  Mitrion-C programming language is an implicitly parallel with syntax similar to C  Focus on parallelism and data-dependencies.  Any operation may be executed as soon as its data dependencies are fulfilled.  Software written in the Mitrion-C language is compiled into a configuration of Mitrion Virtual Processor (MVP)  MVP is a fine-grain, massively parallel, reconfigurable soft processor.  The configuration of MVP will be downloaded to FPGA.

Mitrion C  Mitrion-C programming flow:

DSPLogic  DSPLogic provides RC toolbox  Offers a graphical (as well as text based) programming environment  Based on Xilinx System Generator for DSP package.  Enable the use of MATLAB/Simulink package  Blocks both from RC toolbox and Xilinx System Generator are used to create a Data Flow Graph.

HLL Metrics  How to evaluate HLL?  Metrics: easy-of-use, efficiency  Easy-of-use  In terms of total acquisition time + total development time  Acquisition time is dependent on the type of paradigm being adopted  Development time depends on both the paradigm and the application being developed  Also affected by programming model explicitness The more explicit the programming model is, the more architectural details needs to be handled, the longer for both acquisition time and development time will be  User/developer previous experience also needs to be considered

HLL Metrics  Efficiency  Ability to extract the maximum possible parallelism/performance with the lowest cost  In terms of end-to-end throughput, maximum achievable frequency and resource usage  Again, the user/developer experience should be taken into consideration

Quantification  Easy-of-use  Average ease-of-use factor: is the average acquisition time of a reference language (usually VHDL) is the difference of acquisition time between reference language and one HLL Assumption: the number of user included in the experiment, the number of applications considered, the number of language for each paradigm and the number of platform being used as testbed should be as large as possible to achieve accurate result.

Quantification  Efficiency  Average efficiency factor of language x relative to a reference language, e.g. VHDL  Average ease-of-use factor ranges from 0 to 1  0 represents a language which is most difficult to use like VHDL  1 represents the easiest-to-use language  Similarly, average efficiency factor ranges from 0 to 1  0 represents least efficient language  1 represents most efficient language like VHDL

Applications  4 workload were selected for implementation on Cray- XD1  Pass-through Reads input data from the uP and sends it back. Purpose: measure the overhead of each HLL by the simplest application  Discrete wavelet transform (DWT)  Data encryption standard (DES) algorithm  DES breaking

Cray-XD1 platform  General structure of the XD1:  One chassis houses 6 compute cards  Cards are connected via RapidArray switch fabric  2 AMD Opteron uP, 2 RapidArry proceesor and 1 FPGA accelerator on each compute card XD1 development flow

Result 1  Easy-of-use in terms of acquisition time

Result 2  Efficiency and development time

Result 3  Tools Efficiency vs. Easy-of-use

Result 3 – cont.  Tools Efficiency vs. Easy-of-use

Conclusions  A formal methodology and framework is established to evaluate different high level programming paradigms for RC application  The metrics we devised were easy-of-use, and efficiency of generating hardware characterized by high throughput at the lowest cost of resource usage  Encouraging :  preliminary results achieved by HLL are close to manual HDL  The SW to HW porting becomes considerably easier and more efficient with every new release  Lesson learned:  Optimized hardware is not the same as software optimization  Some hardware knowledge is still needed for performance optimization, for creating a new platform wrapper, etc.  Debugging hardware is another major problem. Tracing is difficult since the internal VHDL signals are unknown.

LOGO “ Add your company slogan ”

Reference  [1] W. Luk, N. Shirazi, and P.Y.K. Cheung, Compilation Tools for Run-time Reconfigurable Designs, IEEE symposium on Field-Programmable Custom Computing Machines, FCCM 1997;  [2] K. Comptoon, S. Hauck, Reconfigurable Computing: A survey of Systems and Software, ACM Computing Surveys 34 (2) (2002)  Impulse C – “Impulse Accelerated Technologies” website available at  Mitrion website available at  DSPLogic website available at  Crat website available at