Student : Andrey Kuyel Supervised by Mony Orbach Spring 2011 Final Presentation High speed digital systems laboratory High-Throughput FFT Technion - Israel institute of technology department of Electrical Engineering
Presentation overview Project motivation and goals Theory studding FFT 16/32 core definitions Encountered problems Selecting optimal algorithm FFT core design and development Validation and verification Xilinx development boar Demo
Project goals The project goals is to design and implement on FPGA device FFT that capable to deal with high rate data processing (rates up to 10MSamp/sec*). The design will be written on VHDL and tested on Xilinx development board. The project has aspects of: signal processing and logic design and high rate data processing. *- 5Ms/sec for each of I and Q components.
FFT - Theoretic overview The DFT (N- length vector) definition is: The time-complexity of the DFT is: The FFT algorithm (developed at first by J.W. Cooley and John Tukey at 1965) comes to reduce the time-complexity of DFT intoJ.W. CooleyJohn Tukey This algorithm called: "The Cooley–Tukey radix-2 FFT algorithm". It is one of the most common FFT algorithms.
Radix 4 algorithm
The FFT (N=16) radix 2 data flow The FFT (N=8) radix 2 data flow Studding and Examining different FFT parallel algorithms Sixteen-point radix-4 decimation-in-time algorithm Length-16, Decimation-in-Frequency, In- order input, Radix-4 FFT
FFT core will have the following features: Real and imaginary Inputs: 8 bits width each. Real and imaginary outputs: 20bits width each, where 12 MSB bits for integer part and 8 LSB bits for fractional part. Drop-in module for Virtex-6 ( xc6vlx240T ) Forward complex FFT Transform sizes N = 16/32 Arithmetic type: Fixed-point Truncation after the butterfly natural Input/output order Input data at frequency 10 Ms/sec (total rate of real and image part of data ) FFT core features
FFT core general schematics 16 points Complex Parallel FFT Clock Start Real part Data input [7:0] Imaginary part Data input [7:0] FFT Real data out 20q8 FFT Imag Data out 20q8 Done Edone x16 rst x0_re x15_re y0_im y15_im fx0_re fx15_re fy0_im fy15_im
Selected FFT 16/32 core algorithm (Minimal DSP slices utilization) Sixteen-point radix-4 decimation-in-time algorithm Basic butterfly computation in a radix-4 FFT algorithm
XC6VLX240T FPGA utilization FFT sizeMaximal frequencyDSP slices utilization 16 points383MHz (12[GSam/sec])27 32 points335MHz (21 [Gsam/sc])102=27*2+16*3
Debugging and verification RTL Matlab model of FFT core, signals values on each pipe line stage Xilinx simulator Xilinx development board verification using chip scope Quantization error estimation against Matlab double precision FFT Maximal frequency operation validation.
Stimulu s ROM Input data Control logic Data path FFT 16 points PLL Frequency multiplier FFT results memor y Output data control logic Increased clock To all modules Input clock Data path ChipScope To PC Xilinx development board design validation Matlab results comarement
Results verification between Matlab fft function and 32 FFT core running at 320MHz At Xilinx development board FFT 16/32 core design validation and error estimation Imaginary part of Matlab vs FFT core fft Quantization error estimation
FFT 16/32 core xilinx development board demo FFT 32/16 core Real data Imag data Transform Real data Transform Imag data 4 different signals bank A 4 different signals bank B Wrap around Error estimation PLL Operational FFT clock Input clock