Delevopment Tools Beyond HDL

Slides:



Advertisements
Similar presentations
VHDL Design of Multifunctional RISC Processor on FPGA
Advertisements

© 2003 Xilinx, Inc. All Rights Reserved Course Wrap Up DSP Design Flow.
High Level Languages: A Comparison By Joel Best. 2 Sources The Challenges of Synthesizing Hardware from C-Like Languages  by Stephen A. Edwards High-Level.
Evolution and History of Programming Languages Software/Hardware/System.
EELE 367 – Logic Design Module 2 – Modern Digital Design Flow Agenda 1.History of Digital Design Approach 2.HDLs 3.Design Abstraction 4.Modern Design Steps.
Integrated Circuits Laboratory Faculty of Engineering Digital Design Flow Using Mentor Graphics Tools Presented by: Sameh Assem Ibrahim 16-October-2003.
Graduate Computer Architecture I Lecture 15: Intro to Reconfigurable Devices.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
Design Flow – Computation Flow. 2 Computation Flow For both run-time and compile-time For some applications, must iterate.
DSP Algorithm on System on Chip Performed by : Einat Tevel Supervisor : Isaschar Walter Accompanying engineers : Emilia Burlak, Golan Inbar Technion -
Chapter 15 Digital Signal Processing
Configurable System-on-Chip: Xilinx EDK
FPGA BASED IMAGE PROCESSING Texas A&M University / Prairie View A&M University Over the past few decades, the improvements from machine language to objected.
ECE 699: Lecture 2 ZYNQ Design Flow.
Implementation of DSP Algorithm on SoC. Mid-Semester Presentation Student : Einat Tevel Supervisor : Isaschar Walter Accompaning engineer : Emilia Burlak.
Foundation and XACTstepTM Software
1 Chapter 7 Design Implementation. 2 Overview 3 Main Steps of an FPGA Design ’ s Implementation Design architecture Defining the structure, interface.
v8.2 System Generator Audio Quick Start
Digital System Design EEE344 Lecture 1 INTRODUCTION TO THE COURSE
GallagherP188/MAPLD20041 Accelerating DSP Algorithms Using FPGAs Sean Gallagher DSP Specialist Xilinx Inc.
FPGA Based Fuzzy Logic Controller for Semi- Active Suspensions Aws Abu-Khudhair.
(1) Introduction © Sudhakar Yalamanchili, Georgia Institute of Technology, 2006.
© 2011 Xilinx, Inc. All Rights Reserved Intro to System Generator This material exempt per Department of Commerce license exception TSU.
© 2011 Xilinx, Inc. All Rights Reserved This material exempt per Department of Commerce license exception TSU Xilinx Tool Flow.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
1 Chapter 2. The System-on-a-Chip Design Process Canonical SoC Design System design flow The Specification Problem System design.
ISE. Tatjana Petrovic 249/982/22 ISE software tools ISE is Xilinx software design tools that concentrate on delivering you the most productivity available.
Trigger design engineering tools. Data flow analysis Data flow analysis through the entire Trigger Processor allow us to refine the optimal architecture.
Highest Performance Programmable DSP Solution September 17, 2015.
© 2003 Xilinx, Inc. All Rights Reserved CORE Generator System.
1 WORLD CLASS – through people, technology and dedication High level modem development for Radio Link INF3430/4431 H2013.
Xilinx Development Software Design Flow on Foundation M1.5
Automated Design of Custom Architecture Tulika Mitra
© 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Xilinx Design Flow FPGA Design Flow Workshop.
1 HandleC ) prepared by: Mitra Khorram Abadi professor: Dr. Maziar Goudarzi A language based on ISO-C, extended for hardware design ( HandleC ) prepared.
Xilinx Programmable Logic Design Solutions Version 2.1i Designing the Industry’s First 2 Million Gate FPGA Drop-In 64 Bit / 66 MHz PCI Design.
ASIP Architecture for Future Wireless Systems: Flexibility and Customization Joseph Cavallaro and Predrag Radosavljevic Rice University Center for Multimedia.
Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa.
J. Christiansen, CERN - EP/MIC
FPGA (Field Programmable Gate Array): CLBs, Slices, and LUTs Each configurable logic block (CLB) in Spartan-6 FPGAs consists of two slices, arranged side-by-side.
IEEE ICECS 2010 SysPy: Using Python for processor-centric SoC design Evangelos Logaras Elias S. Manolakos {evlog, Department of Informatics.
Chonnam national university VLSI Lab 8.4 Block Integration for Hard Macros The process of integrating the subblocks into the macro.
Introduction to FPGA Created & Presented By Ali Masoudi For Advanced Digital Communication Lab (ADC-Lab) At Isfahan University Of technology (IUT) Department.
This material exempt per Department of Commerce license exception TSU Xilinx Tool Flow.
Introductory project. Development systems Design Entry –Foundation ISE –Third party tools Mentor Graphics: FPGA Advantage Celoxica: DK Design Suite Design.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
An Overview of Hardware Design Methodology Ian Mitchelle De Vera.
Evaluating and Improving an OpenMP-based Circuit Design Tool Tim Beatty, Dr. Ken Kent, Dr. Eric Aubanel Faculty of Computer Science University of New Brunswick.
ECE-C662 Lecture 2 Prawat Nagvajara
Tools - LogiBLOX - Chapter 5 slide 1 FPGA Tools Course The LogiBLOX GUI and the Core Generator LogiBLOX L BX.
1 Hardware/Software Co-Design Final Project Emulation on Distributed Simulation Co-Verification System 陳少傑 教授 R 黃鼎鈞 R 尤建智 R 林語亭.
© 2004 Xilinx, Inc. All Rights Reserved Adding a Processor System to an FPGA Design.
© 2003 Xilinx, Inc. All Rights Reserved Course Wrap Up DSP Design Flow.
Survey of Reconfigurable Logic Technologies
Real-Time System-On-A-Chip Emulation.  Introduction  Describing SOC Designs  System-Level Design Flow  SOC Implemantation Paths-Emulation and.
CoDeveloper Overview Updated February 19, Introducing CoDeveloper™  Targeting hardware/software programmable platforms  Target platforms feature.
Altera Technical Solutions Seminar Schedule OpeningIntroduction FLEX ® 10KE Devices APEX ™ 20K & Quartus ™ Overview Design Integration EDA Integration.
Programmable Hardware: Hardware or Software?
Prototyping SoC-based Gate Drive Logic for Power Convertors by Generating code from Simulink models. Researchers Rounak Siddaiah, Graduate Student-University.
Topics Modeling with hardware description languages (HDLs).
Topics Modeling with hardware description languages (HDLs).
Reconfigurable Computing
Course Agenda DSP Design Flow.
ECE-C662 Introduction to Behavioral Synthesis Knapp Text Ch
The performance requirements for DSP applications continue to grow and the traditional solutions do not adequately address this new challenge Paradigm.
ECE 699: Lecture 3 ZYNQ Design Flow.
THE ECE 554 XILINX DESIGN PROCESS
Digital Designs – What does it take
THE ECE 554 XILINX DESIGN PROCESS
Xilinx Alliance Series
Presentation transcript:

Delevopment Tools Beyond HDL Wolfgang Kühn, Univ. Giessen

Wolfgang Kühn, Univ. Giessen Overview Introduction FPGA Design Challenges VHDL Tools with higher abstraction level Handel-C Features Differences to VHDL Advantages Drawbacks Possible applications Wolfgang Kühn, Univ. Giessen

Wolfgang Kühn, Univ. Giessen Common Prejudice ….. FPGAs are good for applications which involve simple algorithms which can be executed in parallel require high speed (few ns level) response to real time events do not need frequent redesigns (expert knowledge required !) DSPs are good for applications which involve complex algorithms with many arithmetic operations are less demanding in real time requirements require programming in C / C++ because sometimes even a physicst needs to change part of the code Wolfgang Kühn, Univ. Giessen

FPGA / DSP Performance (3/2003) Example: XILINX Virtex-II and Virtex-II Pro Function Industry’s Fastest DSP Processor Core Xilinx Virtex-II Virtex-II Pro 8 x 8 Multiply Accumulate (MAC) 4.8 billion MAC/s 0.5 Tera MAC/s 1 Tera MAC/s FIR Filter - 256 taps, linear phase - 16-bit data/coefficients 9.3 MSPS 600 MHz 180 MSPS 180 MHz 300 MSPS 300 MHz Complex FFT - 1024 point, 16-bit data 10.2 s 1 s* 140 MHz 1 s** 150 MHz DSP processor is: TI C64x 8x8 MAC: TMS320C64x: Has four 16x16 MACs running at 600 MHz in parallel. Each can be used as two 8x8 MACs => 8 x 600 MHz = 4.8 billion MAC/s XC2V8000: 2V8000 has 46,592 slices; therefore, we can fit 46,592/4,800 = 9.7 256-tap FIR filter. Round it off to 10 256-tap FIR filters running at 180 MHz: 10 x 256 x 180 MHz = 461 BMACs. Add to this 168 embedded multipliers running at ~180MHz (for an 8x8): 168 x 180MHz = 30 BMACs, for a grand total of 461 + 30 = 491 BMACs and almost 0.5 Tera MACs 256-tap FIR filter: Based on the DA FIR filter implementation with coefficients generated by MATLAB for a low pass filter, a 256-tap, 8-bit data, symmetric filter uses 4,800 slices. Such a filter can run at 180 MHz in parallel, and hence reaches 180 MSPS. 1024-point FFT: The latest 1024-point FFT core has an execution time of 7.3 us @fclk = 140 MHz, requires 12 multipliers, ~2,500 slices and 12 block memories. Including eight such FFTs in one FPGA allows a new transform to be completed every microsecond. XC2VP125 – Virtex-II PRO: 2VP125 has 62,568 slices and 556 embedded multipliers. There are two ways to implement a FIR Filter. The MAC approach requires the multiplier and the accumulator. The Distributed Arithmetic approach uses exclusively slices. - # of 8x8 MACs using embedded multipliers: 556 If we add a 16-bit accumulator, this will also require 556 x 8 = 4,448 slices. 556 multipliers running at 300 MHz results in 556 x 300 = 167 BMAC/s - # of 8x8 MACs implemented in slices One 8-bit data, 8-bit coefficient, symmetric, 256-tap FIR filter consumes 4,800 slices when implemented in slices. We can therefore fit (62,568-4,448)/4,800 = 12.1 256-tap FIR filter. Round it off to 12 256-tap FIR filters running at 260 MHz: 12 x 256 x 260 MHz = 798 BMACs, for a grand total of 167 + 798 = 965 BMACs and almost 1 Tera MACs 256-tap FIR filter: Implementing this filter using 256 embedded multipliers in parallel, or 128 embedded multipliers if we take advantage of coefficient symmetry, along with fully pipelined adders can be clocked at 300 MHz. As a new sample can be sent on each clock cycle, this results in a 300MSPS FIR filter. 1024-point FFT: The latest 1024-point FFT core has an execution time of 7.3 us @fclk = 150 MHz, requires 12 multipliers, ~2,500 slices and 22 block memories. Including eight such FFTs in one FPGA allows a new transform to be completed every microsecond. * Using 96 embedded multipliers in the largest Virtex-II device (XC2V8000) ** Using 96 embedded multipliers and 176 Block Ram in V-II PRO (XC2V125) Wolfgang Kühn, Univ. Giessen

Wolfgang Kühn, Univ. Giessen

Typical FPGA Design Flow Plan & Budget Create Code/ Schematic HDL RTL Simulation Implement Functional Simulation Synthesize to create netlist Translate Map Place & Route These are the major stages of implementing a design in a Xilinx device. The implementation stage consists of three steps, which will be discussed later in this presentation. Although simulation points can happen in other parts of the design cycle, the three simulation points in the above diagram are the Xilinx recommended simulation points. More details on Timing Closure in a coming slide. For more detailed flow diagrams, refer to Chapter 2 (Design Flow) of the Development System Reference Guide at support.xilinx.com > Software Manuals Attain Timing Closure Timing Simulation Create Bit File Wolfgang Kühn, Univ. Giessen

XILINX Tools for Digital Signal Processing Simulink® DSP Modeling MATLAB® Automatic Translation Generate: - VHDL/Verilog - IP cores ISE® 4.1i Implementation & Verification ® XST® Leonardo Spectrum® Synplify® Synthesis Xilinx offers the most advanced tool suite for doing DSP design on FPGAs For the front-end, customers can use popular industry standards for developing DSP models and algorithms. From then on, Xilinx offers a complete front-to-back DSP design flow for FPGAs which includes System Generator v2.1, XST for synthesis and the industry’s best implementation tool suite ISE4.2i. Simulink DSP Systems modeled with System Generator can be translated into VHDL or Verilog for synthesis. Synthesis can be performed using third party tools from companies like Exemplar or Synplicity, or using XST whish is actually integrated into ISE. The benefit of this flow is that Professors, Researchers, and students that are not familiar with FPGAs can still use tools that they are familiar with (e.g. MATLAB and Simulink from Mathworks) and let Xilinx tools do the rest. Wolfgang Kühn, Univ. Giessen

Wolfgang Kühn, Univ. Giessen MATLAB MATLAB™, the most popular system design tool, is a programming language, interpreter, and modeling environment Extensive libraries for math functions, signal processing, DSP, communications, and much more Visualization: large array of functions to plot and visualize your data and system/design Open architecture: software model based on base system and domain-specific plug-ins The MathWorks has been developing system design tools since 1984. Its latest product is MATLAB 6.5 (from MATLAB tools release 13, July, 2002). Visit The MathWorks website at http://www.mathworks.com for further details. Other vendors of system-level modeling packages are: Visual data flow SPW (Cadence), COSSAP (Synopsys), Ptolemy (UC Berkeley), SystemView (Elanix) Programming language based C++ SystemC, OCAPI (IMEC) C Streams-C (Gokhale et al.), Handel-C (Celoxica) Java JHDL (BYU) Relative success of each approach, at least to date, is indicated by the predominance of commercial offerings in VDF (Visual Data Flow) as compared to research activity. Wolfgang Kühn, Univ. Giessen

Wolfgang Kühn, Univ. Giessen Simulink Simulink™ - Visual data flow environment for modeling and simulation of dynamical systems Fully integrated with the MATLAB engine Graphical block editor Event-driven simulator Models parallelism Extensive library of parameterizable functions Simulink Blockset - math, sinks, sources DSP Blockset - filters, transforms, etc. Communications Blockset - modulation, etc. Simulink, The MathWorks’ visual data flow tool, presents an alternative to using programming languages for system design. This enables designers to visualize the dynamic nature of your system while illustrating their complete system in a realistic fashion with respect to the hardware design. Most hardware design starts out with a block diagram description and specification of the system, very similar to the Simulink design. The main part of Simulink is the Library browser that contains all the available building blocks to the user. This library is expandable and each block is parameterizable. Users can even create their own libraries of functions they have created. An important point of note about Simulink is that it can model concurrency in a system. Unlike the sequential manner of software code, the Simulink model can be seen to be executing sections of a design at the same time (in parallel). This notion is fundamental to implementing a high-performance hardware implementation. Wolfgang Kühn, Univ. Giessen

Traditional Simulink FPGA Flow System Architect System Verification GAP Simulink FPGA Designer HDL Synthesis Functional Simulation Verify Equivalence In the past, if a DSP designer wanted to target an FPGA, he would have no option but a “dual path” of development. The DSP designer writes an algorithm in pseudo-C, using filters, certain C code, certain precision. He may know everything about DSP and Simulink models, but may not know anything about FPGAs. Not only does he not know how to target an FPGA, he doesn’t know how to take advantage of the FPGA architecture, or how to write a design to avoid a bad FPGA implementation. When he’s done with his DSP design, he may have a working model in Simulink, but he must design the same thing in VHDL, or he gives his design to an FPGA implementer (who may know nothing about DSP) who writes the VHDL for him. The implementer might end up using a core that doesn’t do exactly what the designer wants, but not being a DSP expert, the FPGA implementer is just trying to translate the pseudo code that came to him into VHDL for an FPGA. There is also no way to co-simulate: one is simulating in C in MATLAB, the other simulating in VHDL in a behavioral simulation. It’s only when they get into the lab and simulate the board, late in the process, that they find out something’s wrong. Implementation Timing Simulation Download In-Circuit Verification Wolfgang Kühn, Univ. Giessen

XILINX System Generator MATLAB/Simulink VHDL IP Testbench Constraints File HDL System Verification System Generator Synthesis Functional Simulation Implementation Timing Simulation Now, with the System Generator, our DSP designer has a single development path - no need for parallel development effort and the possibility of two different results. Currently, only XST from Xilinx, as well as Synplify (from Synplicity) and Leonardo Spectrum (from Exemplar) support the VHDL code generated by System Generator. There is no schedule for FPGA Express to support the VHDL code. Download In-Circuit Verification Wolfgang Kühn, Univ. Giessen

Handel-C ( http://www.celoxica.com ) Handel-C is a language for programming applications Handel-C is not an HDL. It is not C used as an HDL Handel-C is meaningful to both s/w and h/w engineers Focus of describing solutions to problems as algorithms VHDL/Verilog focus on describing the structure of a system capable of performing an algorithm. Hardware design means controlling space (parallelism) and time (sequential processing) The par command gives control over space The Single clock assignment rule gives control over time Wolfgang Kühn, Univ. Giessen

Handel-C Core Language Features Standard ISO-C (ANSI-C) Control commands: if, while, switch etc. Functions, structures, pointers Extensions for hardware implementation par{…} construct - specifies spatial-parallel architecture Single cycle assignment – specifies temporal architecture Arbitrary widths on variables, expressions etc. Type-checked bit-width inference system Recursive macro expansion system Multiple clock domains with automatic metastability resolution Powerful bit manipulation operators Signals, channels, interfaces to pins, external IP cores RAMs/ROMs and external pin connections Wolfgang Kühn, Univ. Giessen

Wolfgang Kühn, Univ. Giessen Timing is predictable Designer has control over timing Simple model: assignments take one clock cycle Cycle-accurate, fast simulator Parallelism is deterministic Language extensions include parallel processing and communications between parallel elements Parallelism based on sound mathematical formalism Changes are predictable Changes in Handel-C code produce predictable changes in hardware Enables fast iterative refinement Wolfgang Kühn, Univ. Giessen

Hardware/Software Co-Design Enables development of complete systems, ideal for: Board-level prototyping Reconfigurable SoC designs Hybrid CPU & FPGA devices Design kit (DK1) facilitates co-design with: Instruction set simulators VHDL simulators External C test benches Enables hardware/software partitioning decisions later in the design cycle Rapid conversion of software algorithms into custom hardware Wolfgang Kühn, Univ. Giessen

DK1 Design Suite Features Handel-C Simulate Compiler Output is: Optimised Deterministic Target specific Targets Xilinx and Altera net lists directly (EDIF) RTL VHDL output Generation of IP cores (Handel-C, EDIF, VHDL) Inclusion of IP cores as ‘black boxes’ GUI for integrated project management, code editing and source level debugging Fast simulation/debug Compile Netlist Place And Route FPGA Vendor’s Tools Configure Wolfgang Kühn, Univ. Giessen

Wolfgang Kühn, Univ. Giessen Conclusions Exploiting the power of modern FPGAs gets increasingly difficult using only „traditional“ HDL design methods 1 Million Gate XILINX Spartan III costs only 12 $ !!! New areas of application beyond traditional FPGA domains require higher levels of abstraction Tools such as Handel-C look promising Experience with real designs needed Wolfgang Kühn, Univ. Giessen