Download presentation
Presentation is loading. Please wait.
Published byElinor Armstrong Modified over 8 years ago
1
Parallel Implementation of Fast Fourier Transform on a Multi-core System Tao Liu Chi-Li Yu Nov. 29, 2007
2
Goal Implement and optimize 2D FFT on FPGA platform. Evaluate multi-core architectures with various number of cores. Design memory structures suitable for the various multi-core architectures
3
Basic method and the problem N-point 1D FFT Generated by Xilinx LigiCORE. Throughput rate: 1 sample per clock. Up to 150MHz. N*N Matrix Stored in a dual-port SRAM constructed by Xilinx BRAM. Total Latency: Row-wise + Colum-wise = N 2 +N 2 =2N 2 Our target is to reduce the latency.
4
Quad-Core Architecture 4 (N/2)-point 1D-FFTs: Lower latency: Only ¼ latency (N 2 /2 clocks) for local 2D-FFT. Overhead: 2 Radix-2 butterflies are required for preprocessing. Extra latency: 2*(N/2)*(N/2) = N 2 /2 clocks Total latency: N 2 clocks (Single-core: 2N 2 )
5
8-Core Architecture 8 (N/2)-point 1D-FFTs: Latency : N 2 /4 16 banks of memory 8 Radix-2 butterflies Extra latency is reduced: N 2 /8 clocks Total latency: 3N 2 /8
6
16-Core Architecture 16 (N/4)-point FFT 16 banks of memory 4 Radix-4 butterflies Latency: N 2 /4 Hardware resource of the FPGA is not enough! Radix-4 BTY
7
Implementation We implemented the architectures with Verilog Hardware Description Language. Used Xilinx ISE Foundation to synthesize the designs. The target FPGA platform is Digilent XUP V2 Pro.
8
8 Comparisons Single coreQuad-core8-core16-core (Strip down ver.) Butterfly0Radix-2 Bty *2Radix-2 Bty *8Radix-4 Bty * 4 1D FFTN-point * 1(N/2)-point *4(N/2)-point * 8(N/4)–point * 16(N/4)-point * 8 Banks of Mem.1416 FPGA occupation* (Slices) 4299 (10%) 12565 (28%) 26434 (59%) 47744 (109%) 23872 (54%) Latency (Butterfly) 02*(N/2)*(N/2)2*(N/4)*(N/4) Latency (Local 2D-FFT) 2*N*N2*(N/2)*(N/2)(N/2)*(N/2)2*(N/4)*(N/4)4*(N/4)*(N/4) Total latency2*N 2 1*N 2 (3/8)*N 2 (1/4)*N 2 (3/8)*N 2 Total latency* (Measured) 32,99816,6146,37443266,374 *: 128x128 2D FFT. Target FPGA : Xilinx XC2VP100, which contains 44096 slices.
9
Conclusion Implemented 2D FFT on an FPGA Evaluated various multi-core architecture Designed and optimized memory structures for every multi-core architecture Experimental results meet with theoretical predication
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.