Digital Design using VHDL and Xilinx FPGA
VHDL based synthesis
VHDL code architecture RTL1 of RESOURCE is begin seq : process (RSTn, CLOCK) if (RSTn = '0') then DOUT <= (others => '0'); elsif (CLOCK'event and CLOCK = '1') then case SEL is when "00" => DOUT <= unsigned(A) - 1; when "01" => DOUT <= unsigned(B) - 1; when "10" => DOUT <= unsigned(C) - 1; when others => DOUT <= unsigned(D) - 1; end case; end if; end process; end RTL1;
Synthesized schematic for RTL1 of resource delay 57 ns area 65 number of flip-flops 16
HDL Design Verification Synthesis Implementation Download VHDL Implement your design using VHDL Functional Simulation Timing Simulation In-Circuit Verification Behavioral Simulation In the HDL flow, two sets of codes must be written: HDL design and Verification code (testbench) for Behavioral and Functional simulation. The designer is responsible for testing and should create the environment for verification.
Synthesis Design Verification Behavioral Simulation HDL Synthesis Implementation Download VHDL Synthesize the design to create an FPGA netlist Functional Simulation Timing Simulation In-Circuit Verification
Implementation Design Verification Behavioral Simulation HDL Synthesis Implementation Download VHDL Translate, place and route, and generate a bitstream to download in the FPGA Functional Simulation Timing Simulation In-Circuit Verification If the design does not meet performance, the HDL code may have to be modified.
On-Chip Verification ChipScope ILA System Diagram Target FPGA with ILA cores USER FUNCTION Chipscope ILA USER FUNCTION ILA ILA PC running ChipScope USER FUNCTION ILA JTAG Control JTAG Connection MultiLINX Cable or Parallel Cable III Target Board
Digilab D2FT & DIO5 Boards The Digilab 2FT/DIO5 board combination is an FPGA-based development platform with a large FPGA and I/O devices to support a wide range of digital circuits, including a complete computer system.
4-bit Shift Register
4-bit Shift Register
Xilinx FPGA Architecture I/O Blocks (IOBs) Programmable Interconnect Configurable Logic Blocks (CLBs) Tristate Buffers All Xilinx FPGA devices contain these basic resources: Input/Output Blocks (IOBs) Configurable Logic Blocks (CLBs) Programmable Interconnect (AKA routing) Tristate Buffers (TBUFs or BUFTs) Global Resources The design tips in this module are focused on the Global and CLB Resources. Global Resources
CLB Flexible resources Ease of Performance Wide-input functions 16:1 multiplexer in 1 CLB Fast arithmetic functions Two dedicated carry chains Cascadable shift registers in LUT 128-b shift register in 1 CLB Ease of Performance Direct routing enabling high speed CIN Switch Matrix TBUF COUT Slice S0 Slice S1 Fast Connects Slice S2 Slice S3 SHIFT Become familiar with the FPGA terminology because this is what will be used to define and compare design sizes.
Slice Each slice contains two: Each register: Dedicated logic: Four inputs lookup tables 16-bit distributed SelectRAM 16-bit shift register RAM16 SRL16 MUXFx Each register: D flip-flop Latch Dedicated logic: Muxes Arithmetic logic MULT_AND Carry Chain LUT Register CY G MUXF5 Register LUT CY F Arithmetic Logic
CLB Structure F5IN CIN CLK CE COUT D Q CK S R EC O G4 G3 G2 G1 Look-Up Table Carry & Control Logic YB Y F4 F3 F2 F1 XB X BY SR SLICE The configurable logic block (CLB) contains two slices. Each slice contains two 4-input look-up tables (LUT), carry & control logic and two registers. There are two 3-state buffers associated with each CLB, that can be accessed by all the outputs of a CLB. Xilinx is the only major FPGA vendor that provides dedicated resources for on-chip 3-state bussing. This feature can increase the performance and lower the CLB utilization for wide multiplex functions. The Xilinx internal bus can also be extended off chip. Each slice has 2 LUT-FF pairs with associated carry logic Two 3-state buffers (BUFT) associated with each CLB, accessible by all CLB outputs
Four-Input LUT = Truth Table Implements combinatorial logic Any 4-input logic function Cascaded for wide-input functions LUT = 4-input logic function C D Z A B LUT is also known as function generator. It can be used to form any function of its four inputs. The software automatically cascades these LUTs to build wide input logic functions.
Distributed RAM = = or CLB LUT configurable as Distributed RAM RAM16X1S O D WE WCLK A0 A1 A2 A3 LUT = CLB LUT configurable as Distributed RAM A LUT equals 16x1 RAM Implements Single and Dual-Ports Cascade LUTs to increase RAM size Synchronous write Synchronous/Asynchronous read Accompanying flip-flops used for synchronous read LUT RAM32X1S O D WE WCLK A0 A1 A2 A3 A4 = LUT or When the CLB LUT is configured as memory, it can implement 16x1 synchronous RAM. One LUT can implement 16x1 Single-Port RAM. Two LUTs are used to implement 16x1 dual port RAM. The LUTs can be cascaded for desired memory depth and width. The write operation is synchronous. The read operation is asynchronous and can be made synchronous by using the accompanying flip flops of the CLB LUT. The distributed ram is compact and fast which makes it ideal for small ram based functions. RAM16X2S O1 D0 WE WCLK A0 A1 A2 A3 D1 O0
Shift Register = Each LUT can be configured as shift register Q CE LUT IN CLK DEPTH[3:0] OUT = Each LUT can be configured as shift register Serial in, serial out Dynamically addressable delay up to 16 cycles For programmable pipeline Cascade for greater cycle delays Use CLB flip-flops to add depth The LUT can be configured as a shift register (serial in, serial out) with bit width programmable from 1 to 16. For example, DEPTH[3:0] = 0010(binary) means that the shift register is 3-bit wide. In the simplest case, a 16 bit shift register can be implemented in a LUT, eliminating the need for 16 flip flops, and also eliminating extra routing resources that would have been lowered the performance otherwise.
12- Input OR Function 4-Input NOR Truth Table Utilization Performance 0 1 INIT=0001 Vcc Output LUT1 LUT2 LUT3 D C B A H G F E L K J I MUXCY Utilization 3 LUTs and 3 MUXCYs As opposed to 4 LUTs Performance 1 logic level As opposed to 2 logic levels The example shows the use of carry chain to implement wider input logic function.
High-Performance Routing INTERNAL BUSSES Single-length lines Buffered Hex lines Direct connections Long lines and Global lines Internal 3-state busses General Routing Matrix (GRM) The sophisticated routing structure in the Spartan-IIE family provides software tools ultimate flexibility, higher performance and faster compile times. Local routing Direct connections General Routing Matrix (GRM) Single line, Long line, Hex line Dedicated routing Internal 3-state bus Global routing Primary Clock Buffer lines, Secondary lines
Improved Clock-to-out Using DLL Spartan-II clock-to-out delays reduced over 50% Output standard = LVTTL Fast 16mA (OBUF_F_16) Temp=room, Vdd=2.5V, Vcco=3.3V Waveforms: 1: CLKIN 2: DATA OUT (no DLL) 3: DATA OUT (DLL deskewed) Timing w/o DLL w/ DLL r->r r->f r->r r->f 3.6n 3.5n 1.4n 1.4n A key benefit of the DLL is the ability to “remove” delay from the clock path, and improve the effective clock-to-out delay.
CORE Generator Design Verification Instantiate optimized IP within the VHDL code VHDL Behavioral Simulation COREGen Functional Simulation Synthesis Implementation Timing Simulation In-Circuit Verification Download
Synthesize, Implement, Download Design Verification Behavioral Simulation Synthesis Implementation Download Functional Simulation Timing Simulation In-Circuit Verification COREGen Synthesize, Implement, and Download the bitstream, similar to the original design flow VHDL
Xilinx IP Solutions DSP Functions Math Functions Memory Functions $P Additive White Gaussian Noise (AWGN) $P Reed Solomon $ 3GPP Turbo Code $P Viterbi Decoder P Convolution Encoder $P Interleaver/De-interleaver P LFSR P 1D DCT P 2D DCT P DA FIR P MAC P MAC-based FIR filter Fixed FFTs 16, 64, 256, 1024 points P FFT 16- to 16384- points P FFT - 32 Point P Sine Cosine Look-Up Tables $P Turbo Product Code (TPC) P Direct Digital Synthesizer P Cascaded Integrator Comb P Bit Correlator P Digital Down Converter P Multiplier Generator - Parallel Multiplier - Dyn Constant Coefficient Mult - Serial Sequential Multiplier - Multiplier Enhancements P Pipelined Divider P CORDIC P Asynchronous FIFO P Block Memory modules P Distributed Memory P Distributed Mem Enhance P Sync FIFO (SRL16) P Sync FIFO (Block RAM) P CAM (SRL16) P CAM (Block RAM) Base Functions P Binary Decoder P Twos Complement P Shift Register RAM/FF P Gate modules P Multiplexer functions P Registers, FF & latch based P Adder/Subtractor P Accumulator P Comparator P Binary Counter IP CENTER http://www.xilinx.com/ipcenter Although continuously increasing, the list of IP is limited and you will not always find the function of interest. Several options are available in that case: build the required function from lower level block IP or choose a mix-mode where part of the function is specified in HDL or schematic and the rest in IP. Key: $ = License Fee, P = Parameterized, S = Project License Available, BOLD = Available in the Xilinx Blockset for the System Generator for DSP
Xilinx CORE Generator List of available IP from or Fully Parameterizable The Xilinx CORE Generator is the delivery vehicle for IP. IP from Xilinx and from Alliance partner are listed in the CORE Generator, although only the Xilinx LogiCORE can be generated from this tool. IP is fully parameterizable via a customization GUI and can be generated for any type of HDL/schematic flow as long as it is officially supported by Xilinx.
Xilinx Smart-IP Technology Pre-defined placement and routing enhances performance and predictability Performance is independent of: Relative Placement Other logic has no effect on the core Fixed Placement & Pre-defined Routing Guarantees Performance Guarantees I/O and Logic Predictability Fixed Placement I/Os 200 MHz Core Placement Number of Cores Device Size With relative placement of the logic within a core, you get logic predictability. Because the logic has consistent internal placement, the performance of a core remains constant regardless of its position in the device. This is the intelligent software part of Smart-IP technology. In addition to the modular routing capability, we can keep track of the relative location of a core’s logic. Hence, we can floorplan the core or fix its placement with respect to the I/O. For guaranteed performance, we can even fix the placement and predefine the routing. For example, the Xilinx DSP CORES use only relative placement, but for the more performance-sensitive PCI design we use the fixed placement and predefined routing strategy. Designs can be migrated to larger devices without any performance degradation. Because of the use of regular local logic and interconnect as well as segmented routing, the IP modules can be placed anywhere on the device without impacting performance. Because the IP modules use regular local logic and interconnect, you can also place multiple copies of the same module on a device and they will ALL continue to function at the published performance speeds. For the same reasons, you can also migrate an IP module from smaller devices to larger devices without degrading the module’s performance.
MATLAB MATLAB™, the most popular system design tool, is a programming language, interpreter, and modeling environment Extensive libraries for math functions, signal processing, DSP, communications, and much more Visualization: large array of functions to plot and visualize your data and system/design Open architecture: software model based on base system and domain-specific plug-ins The MathWorks has been developing system design tools since 1984. Its latest product is MATLAB 6.5 (from MATLAB tools release 13, July, 2002). Visit The MathWorks website at http://www.mathworks.com for further details. Other vendors of system-level modeling packages are: Visual data flow SPW (Cadence), COSSAP (Synopsys), Ptolemy (UC Berkeley), SystemView (Elanix) Programming language based C++ SystemC, OCAPI (IMEC) C Streams-C (Gokhale et al.), Handel-C (Celoxica) Java JHDL (BYU) Relative success of each approach, at least to date, is indicated by the predominance of commercial offerings in VDF (Visual Data Flow) as compared to research activity.
MATLAB Frequency response of input sound file This is an excellent example of how designers can visualize their signals at any point in their algorithm and analyze the effects their systems are having on their design. This is far more difficult, if not impossible, to do with current FPGA design tools. Another strong reason for these system-level design tools is the speed and ease of implementing algorithms, concepts and ideas. This example was executed in three lines of code in no time at all. To implement such an algorithm in an FPGA design would take large amounts of design work, code, and time. Notice the Workspace Window pane on the left. This window pane is used to view the different variables that have been created by the designer, and are accessible to algorithms. The two variables that can currently be seen are Fw (the frequency response vector from zero to the Nyquist frequency) and voice (the variable which stores the sound information). Other useful windows include the Current Directory window (a navigation window).
Simulink Simulink™ - Visual data flow environment for modeling and simulation of dynamical systems Fully integrated with the MATLAB engine Graphical block editor Event-driven simulator Models parallelism Extensive library of parameterizable functions Simulink Blockset - math, sinks, sources DSP Blockset - filters, transforms, etc. Communications Blockset - modulation, DPCM, etc. Simulink, The MathWorks’ visual data flow tool, presents an alternative to using programming languages for system design. This enables designers to visualize the dynamic nature of your system while illustrating their complete system in a realistic fashion with respect to the hardware design. Most hardware design starts out with a block diagram description and specification of the system, very similar to the Simulink design. The main part of Simulink is the Library browser that contains all the available building blocks to the user. This library is expandable and each block is parameterizable. Users can even create their own libraries of functions they have created. An important point of note about Simulink is that it can model concurrency in a system. Unlike the sequential manner of software code, the Simulink model can be seen to be executing sections of a design at the same time (in parallel). This notion is fundamental to implementing a high-performance hardware implementation.
MATLAB/Simulink Real time frequency response from a microphone: emphasizes the dynamic nature of Simulink It is also possible to interface Simulink and MATLAB together, which enables users to bring M files into their Simulink model, save signals to the MATLAB workspace, and vice versa. Use the FIND feature at the top of the library browser to search for blocks. There are many available and it is not easy to remember where they all reside.
Traditional Simulink FPGA Flow System Architect System Verification GAP Simulink FPGA Designer VHDL Synthesis Functional Simulation Verify Equivalence In the past, if a DSP designer wanted to target an FPGA, he would have no option but a “dual path” of development. The DSP designer writes an algorithm in pseudo-C, using filters, certain C code, certain precision. He may know everything about DSP and Simulink models, but may not know anything about FPGAs. Not only does he not know how to target an FPGA, he doesn’t know how to take advantage of the FPGA architecture, or how to write a design to avoid a bad FPGA implementation. When he’s done with his DSP design, he may have a working model in Simulink, but he must design the same thing in VHDL, or he gives his design to an FPGA implementer (who may know nothing about DSP) who writes the VHDL for him. The implementer might end up using a core that doesn’t do exactly what the designer wants, by not being a DSP expert, the FPGA implementer is just trying to translate the pseudo code that came to him into VHDL for an FPGA. There is also no way to co-simulate: one is simulating in C in MATLAB, the other simulating in VHDL in a behavioral simulation. It’s only when they get into the lab and simulate the board, late in the process, that they find out something’s wrong. Implementation Timing Simulation Download In-Circuit Verification
Creating a System Generator Design Invoke Simulink library browser To open the Simulink library browser, click the Simulink library browser button or type “Simulink” in MATLAB console The library browser contains all the blocks available to designers Start a new design by clicking the new sheet button The .mdl (MATLAB Description Language) file is the main design file for Simulink designs. You can also open .mdl files directly through the MATLAB console GUI or from the MATLAB command line. Make sure your MATLAB current directory path is always pointing to your working directory. You can “cd” to the correct directory.
Creating a System Generator Design Build the design by dragging and dropping blocks from the Xilinx blockset onto your new sheet. Design Entry is similar to a schematic editor Connect up blocks by pulling the arrows on the sides of each block The next few slides show the process that a system designer may use to get his design into an FPGA, via the System Generator and Simulink. Right-click on a block to format blocks (rotate, drop shadow, font) and to change foreground color and background color.
Creating a System Generator Design I/O blocks used as interface between the Xilinx Blockset and other Simulink blocks A System Generator design can be dissected into very distinct sections. The “hardware realizable” section is shown in BLUE. This will be the section that will go into the FPGA. The YELLOW blocks represent the gateways into and out of the Xilinx blockset as they must be implemented in fixed point arithmetic. The gateways will also illustrate the inputs and outputs of your VHDL top-level entity (the pins of the device). All Simulink blocks do not have color and most designs will utilize the Simulink/DSP sources and sinks, which will drive and test the design. Any Simulink block (although not hardware-realizable unless manually created in VHDL) can be interfaced to the Xilinx Blockset with the aid of the gateway blocks to convert between “doubles” and “fixed-point numbers.” The System Generator token is required at the top-level design for the design using Xilinx blocks to be simulated. SysGen blocks realizable in Hardware Simulink sinks and library functions Simulink sources
Using the Scope Click Properties to change the number of axes displayed and the time range value (X-axis) Use the Data History tab to control how many values are stored and displayed on the scope Also can direct output to workspace Click Autoscale to quickly let the tools configure the display to the correct axis values Right-click on the Y-axis to set its value This will be the most frequently used Simulink block. To display multiple signals on one scope, use the “MUX” block to combine signals. (Simulink Signals&Systems) For more information on using Simulink blocks, see the help documentation under the parameterization GUI of each block.
Design and Simulate in Simulink Push “play” to simulate the design. Go to “Simulation Parameters” under the “Simulation” menu to control the length of simulations This example shows a Costas Loop, which is used in communications to account for Doppler shifts in transmitted signals. The eye from the eye diagram is clearly open and, in the absence of any channel impairments, the receiver can easily make correct symbol decisions using this waveform. This diagram is produced by plotting segments of successive matched filter outputs on top of each other. You can enter “inf” into the end time for the length of simulation so that a simulation will run forever.
Generate the VHDL Code Select the target device Once complete, double-click the System Generator token Select the target device Select to generate the testbench Set the System clock period desired Generate the VHDL Once a design is completed and successfully simulated, double-click the System Generator token, which should reside on highest hierarchy level including Xilinx blocks. Double-clicking opens the Options window, where you can specify the target device, the synthesis tool you will be using, and whether testbench generation is desired. Warning: if “create testbench” is selected, the simulation will be run again to capture the DAT files and create the testbench. The Simulink System Period is the value at which the System Period must work in order to achieve all the respective block’s desired sample period in the design. More on this later (multi-rate systems module). You must set the System Clock Period in this GUI. This value will translate to the period constraint in the UCF file; PAR will shoot for this value when laying out the design. You can also specify whether you want cores to be generated (one may not want to if it has been done before, as to save time) and also globally specify simulation in doubles, if desired. Click the Generate button and the System Generator will create all the files that were outlined earlier.
Inputting Data from the Workspace “From Workspace” block can be used to input MATLAB data to a Simulink model Format: t = 0:time_step:final_time; x = func(t); make these into a matrix for Simulink Example: In the MATLAB console, type: Type ‘FromWorkspace’ to view the example One way to use a variable in the MATLAB workspace as an input signal in Simulink is to use the “From Workspace” block in the Simulink Sources library. The variable to be used should have a specific format. The first column should have the time sequence, and the following columns should include the corresponding signal data. If required, Simulink will linearly interpolate the data for undefined time step. Another example: Type at the MATLAB prompt >> t = [0 3 6 9 10]; >> x = [-1 1 -1 1 1/3]’ >> simin = [t’, x’]; The [t’, x’] command turns the horizontal vector data into a two-column matrix. This will create a triangular waveform. You can view it by typing >> plot(t, x); Type ‘FromWorkspace’ to view the example from c:\training\dsp_flow\labs\lab3 folder t = 0:0.01:1; x = sin(2*pi*t); simin = [t', x'];
Outputting Data to the Workspace “To Workspace” block can be used to output a signal to the MATLAB workspace The output is written to the workspace when the simulation has finished or is paused Data can be saved as a structure (including time) or as an array Type ‘ToWorkspace’ to view the example There are a number of ways that Simulink can write data out to the workspace. The simplest is to use the “To Workspace” block from the Simulink Sinks library. The variable name is specified in the parameters window, as well as the number of data points to save. The save format is also worth noting. It can be an array that contains signal value/s at each time step, or a structure, which has the signal values as one of its fields and contains more information, such as the label of the signal. You may also choose to save time information as part of structure. To access a structure, use the following syntax at the command prompt: >> simout.signals.values Help reference: Using Simulink: Analyzing Simulation Results: Using the To Workspace Block Type ‘ToWorkspace’ to view the example from c:\training\dsp_flow\labs\lab3 folder