Download presentation
1
Delevopment Tools Beyond HDL
Wolfgang Kühn, Univ. Giessen
2
Wolfgang Kühn, Univ. Giessen
Overview Introduction FPGA Design Challenges VHDL Tools with higher abstraction level Handel-C Features Differences to VHDL Advantages Drawbacks Possible applications Wolfgang Kühn, Univ. Giessen
3
Wolfgang Kühn, Univ. Giessen
Common Prejudice ….. FPGAs are good for applications which involve simple algorithms which can be executed in parallel require high speed (few ns level) response to real time events do not need frequent redesigns (expert knowledge required !) DSPs are good for applications which involve complex algorithms with many arithmetic operations are less demanding in real time requirements require programming in C / C++ because sometimes even a physicst needs to change part of the code Wolfgang Kühn, Univ. Giessen
4
FPGA / DSP Performance (3/2003)
Example: XILINX Virtex-II and Virtex-II Pro Function Industry’s Fastest DSP Processor Core Xilinx Virtex-II Virtex-II Pro 8 x 8 Multiply Accumulate (MAC) 4.8 billion MAC/s 0.5 Tera MAC/s 1 Tera MAC/s FIR Filter - 256 taps, linear phase - 16-bit data/coefficients 9.3 MSPS 600 MHz 180 MSPS 180 MHz 300 MSPS 300 MHz Complex FFT point, 16-bit data 10.2 s 1 s* 140 MHz 1 s** 150 MHz DSP processor is: TI C64x 8x8 MAC: TMS320C64x: Has four 16x16 MACs running at 600 MHz in parallel. Each can be used as two 8x8 MACs => 8 x 600 MHz = 4.8 billion MAC/s XC2V8000: 2V8000 has 46,592 slices; therefore, we can fit 46,592/4,800 = tap FIR filter. Round it off to tap FIR filters running at 180 MHz: 10 x 256 x 180 MHz = 461 BMACs. Add to this 168 embedded multipliers running at ~180MHz (for an 8x8): 168 x 180MHz = 30 BMACs, for a grand total of = 491 BMACs and almost 0.5 Tera MACs 256-tap FIR filter: Based on the DA FIR filter implementation with coefficients generated by MATLAB for a low pass filter, a 256-tap, 8-bit data, symmetric filter uses 4,800 slices. Such a filter can run at 180 MHz in parallel, and hence reaches 180 MSPS. 1024-point FFT: The latest 1024-point FFT core has an execution time of 7.3 = 140 MHz, requires 12 multipliers, ~2,500 slices and 12 block memories. Including eight such FFTs in one FPGA allows a new transform to be completed every microsecond. XC2VP125 – Virtex-II PRO: 2VP125 has 62,568 slices and 556 embedded multipliers. There are two ways to implement a FIR Filter. The MAC approach requires the multiplier and the accumulator. The Distributed Arithmetic approach uses exclusively slices. - # of 8x8 MACs using embedded multipliers: 556 If we add a 16-bit accumulator, this will also require 556 x 8 = 4,448 slices. 556 multipliers running at 300 MHz results in 556 x 300 = 167 BMAC/s - # of 8x8 MACs implemented in slices One 8-bit data, 8-bit coefficient, symmetric, 256-tap FIR filter consumes 4,800 slices when implemented in slices. We can therefore fit (62,568-4,448)/4,800 = tap FIR filter. Round it off to tap FIR filters running at 260 MHz: 12 x 256 x 260 MHz = 798 BMACs, for a grand total of = 965 BMACs and almost 1 Tera MACs 256-tap FIR filter: Implementing this filter using 256 embedded multipliers in parallel, or 128 embedded multipliers if we take advantage of coefficient symmetry, along with fully pipelined adders can be clocked at 300 MHz. As a new sample can be sent on each clock cycle, this results in a 300MSPS FIR filter. 1024-point FFT: The latest 1024-point FFT core has an execution time of 7.3 = 150 MHz, requires 12 multipliers, ~2,500 slices and 22 block memories. Including eight such FFTs in one FPGA allows a new transform to be completed every microsecond. * Using 96 embedded multipliers in the largest Virtex-II device (XC2V8000) ** Using 96 embedded multipliers and 176 Block Ram in V-II PRO (XC2V125) Wolfgang Kühn, Univ. Giessen
5
Wolfgang Kühn, Univ. Giessen
6
Typical FPGA Design Flow
Plan & Budget Create Code/ Schematic HDL RTL Simulation Implement Functional Simulation Synthesize to create netlist Translate Map Place & Route These are the major stages of implementing a design in a Xilinx device. The implementation stage consists of three steps, which will be discussed later in this presentation. Although simulation points can happen in other parts of the design cycle, the three simulation points in the above diagram are the Xilinx recommended simulation points. More details on Timing Closure in a coming slide. For more detailed flow diagrams, refer to Chapter 2 (Design Flow) of the Development System Reference Guide at support.xilinx.com > Software Manuals Attain Timing Closure Timing Simulation Create Bit File Wolfgang Kühn, Univ. Giessen
7
XILINX Tools for Digital Signal Processing
Simulink® DSP Modeling MATLAB® Automatic Translation Generate: - VHDL/Verilog - IP cores ISE® 4.1i Implementation & Verification XST® Leonardo Spectrum® Synplify® Synthesis Xilinx offers the most advanced tool suite for doing DSP design on FPGAs For the front-end, customers can use popular industry standards for developing DSP models and algorithms. From then on, Xilinx offers a complete front-to-back DSP design flow for FPGAs which includes System Generator v2.1, XST for synthesis and the industry’s best implementation tool suite ISE4.2i. Simulink DSP Systems modeled with System Generator can be translated into VHDL or Verilog for synthesis. Synthesis can be performed using third party tools from companies like Exemplar or Synplicity, or using XST whish is actually integrated into ISE. The benefit of this flow is that Professors, Researchers, and students that are not familiar with FPGAs can still use tools that they are familiar with (e.g. MATLAB and Simulink from Mathworks) and let Xilinx tools do the rest. Wolfgang Kühn, Univ. Giessen
8
Wolfgang Kühn, Univ. Giessen
MATLAB MATLAB™, the most popular system design tool, is a programming language, interpreter, and modeling environment Extensive libraries for math functions, signal processing, DSP, communications, and much more Visualization: large array of functions to plot and visualize your data and system/design Open architecture: software model based on base system and domain-specific plug-ins The MathWorks has been developing system design tools since Its latest product is MATLAB 6.5 (from MATLAB tools release 13, July, 2002). Visit The MathWorks website at for further details. Other vendors of system-level modeling packages are: Visual data flow SPW (Cadence), COSSAP (Synopsys), Ptolemy (UC Berkeley), SystemView (Elanix) Programming language based C++ SystemC, OCAPI (IMEC) C Streams-C (Gokhale et al.), Handel-C (Celoxica) Java JHDL (BYU) Relative success of each approach, at least to date, is indicated by the predominance of commercial offerings in VDF (Visual Data Flow) as compared to research activity. Wolfgang Kühn, Univ. Giessen
9
Wolfgang Kühn, Univ. Giessen
Simulink Simulink™ - Visual data flow environment for modeling and simulation of dynamical systems Fully integrated with the MATLAB engine Graphical block editor Event-driven simulator Models parallelism Extensive library of parameterizable functions Simulink Blockset - math, sinks, sources DSP Blockset - filters, transforms, etc. Communications Blockset - modulation, etc. Simulink, The MathWorks’ visual data flow tool, presents an alternative to using programming languages for system design. This enables designers to visualize the dynamic nature of your system while illustrating their complete system in a realistic fashion with respect to the hardware design. Most hardware design starts out with a block diagram description and specification of the system, very similar to the Simulink design. The main part of Simulink is the Library browser that contains all the available building blocks to the user. This library is expandable and each block is parameterizable. Users can even create their own libraries of functions they have created. An important point of note about Simulink is that it can model concurrency in a system. Unlike the sequential manner of software code, the Simulink model can be seen to be executing sections of a design at the same time (in parallel). This notion is fundamental to implementing a high-performance hardware implementation. Wolfgang Kühn, Univ. Giessen
10
Traditional Simulink FPGA Flow
System Architect System Verification GAP Simulink FPGA Designer HDL Synthesis Functional Simulation Verify Equivalence In the past, if a DSP designer wanted to target an FPGA, he would have no option but a “dual path” of development. The DSP designer writes an algorithm in pseudo-C, using filters, certain C code, certain precision. He may know everything about DSP and Simulink models, but may not know anything about FPGAs. Not only does he not know how to target an FPGA, he doesn’t know how to take advantage of the FPGA architecture, or how to write a design to avoid a bad FPGA implementation. When he’s done with his DSP design, he may have a working model in Simulink, but he must design the same thing in VHDL, or he gives his design to an FPGA implementer (who may know nothing about DSP) who writes the VHDL for him. The implementer might end up using a core that doesn’t do exactly what the designer wants, but not being a DSP expert, the FPGA implementer is just trying to translate the pseudo code that came to him into VHDL for an FPGA. There is also no way to co-simulate: one is simulating in C in MATLAB, the other simulating in VHDL in a behavioral simulation. It’s only when they get into the lab and simulate the board, late in the process, that they find out something’s wrong. Implementation Timing Simulation Download In-Circuit Verification Wolfgang Kühn, Univ. Giessen
11
XILINX System Generator
MATLAB/Simulink VHDL IP Testbench Constraints File HDL System Verification System Generator Synthesis Functional Simulation Implementation Timing Simulation Now, with the System Generator, our DSP designer has a single development path - no need for parallel development effort and the possibility of two different results. Currently, only XST from Xilinx, as well as Synplify (from Synplicity) and Leonardo Spectrum (from Exemplar) support the VHDL code generated by System Generator. There is no schedule for FPGA Express to support the VHDL code. Download In-Circuit Verification Wolfgang Kühn, Univ. Giessen
12
Handel-C ( http://www.celoxica.com )
Handel-C is a language for programming applications Handel-C is not an HDL. It is not C used as an HDL Handel-C is meaningful to both s/w and h/w engineers Focus of describing solutions to problems as algorithms VHDL/Verilog focus on describing the structure of a system capable of performing an algorithm. Hardware design means controlling space (parallelism) and time (sequential processing) The par command gives control over space The Single clock assignment rule gives control over time Wolfgang Kühn, Univ. Giessen
13
Handel-C Core Language Features
Standard ISO-C (ANSI-C) Control commands: if, while, switch etc. Functions, structures, pointers Extensions for hardware implementation par{…} construct - specifies spatial-parallel architecture Single cycle assignment – specifies temporal architecture Arbitrary widths on variables, expressions etc. Type-checked bit-width inference system Recursive macro expansion system Multiple clock domains with automatic metastability resolution Powerful bit manipulation operators Signals, channels, interfaces to pins, external IP cores RAMs/ROMs and external pin connections Wolfgang Kühn, Univ. Giessen
14
Wolfgang Kühn, Univ. Giessen
Timing is predictable Designer has control over timing Simple model: assignments take one clock cycle Cycle-accurate, fast simulator Parallelism is deterministic Language extensions include parallel processing and communications between parallel elements Parallelism based on sound mathematical formalism Changes are predictable Changes in Handel-C code produce predictable changes in hardware Enables fast iterative refinement Wolfgang Kühn, Univ. Giessen
15
Hardware/Software Co-Design
Enables development of complete systems, ideal for: Board-level prototyping Reconfigurable SoC designs Hybrid CPU & FPGA devices Design kit (DK1) facilitates co-design with: Instruction set simulators VHDL simulators External C test benches Enables hardware/software partitioning decisions later in the design cycle Rapid conversion of software algorithms into custom hardware Wolfgang Kühn, Univ. Giessen
16
DK1 Design Suite Features
Handel-C Simulate Compiler Output is: Optimised Deterministic Target specific Targets Xilinx and Altera net lists directly (EDIF) RTL VHDL output Generation of IP cores (Handel-C, EDIF, VHDL) Inclusion of IP cores as ‘black boxes’ GUI for integrated project management, code editing and source level debugging Fast simulation/debug Compile Netlist Place And Route FPGA Vendor’s Tools Configure Wolfgang Kühn, Univ. Giessen
17
Wolfgang Kühn, Univ. Giessen
Conclusions Exploiting the power of modern FPGAs gets increasingly difficult using only „traditional“ HDL design methods 1 Million Gate XILINX Spartan III costs only 12 $ !!! New areas of application beyond traditional FPGA domains require higher levels of abstraction Tools such as Handel-C look promising Experience with real designs needed Wolfgang Kühn, Univ. Giessen
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.