Download presentation
Published byPeyton Keys Modified over 10 years ago
0
Rader’s FFT algorithm acceleration using Maxeler
Author: Tadej Matek
1
Fourier Transform Fourier transform decomposes a signal into its frequency components Used in telecommunications, data compression, digital signal processing, fast multiplication of polynomials ... Tadej Matek Source: 1/17
2
Fourier Transform and computers
Transformation: Discrete Fourier Transform Time: O(n2) Algorithm(s): Fast Fourier Transform (FFT) (Cooley-Tukey, Bruun’s FFT, Rader’s FFT, Bluestein’s FFT …) Time: O(nlogn) Tadej Matek 2/17
3
Why is FFT faster than DFT
Divide & conquer + properties of primitive roots Primitive root of unity: Conquer step (butterfly): Source: Tadej Matek 3/17
4
Rader’s FFT algorithm overview
Primitive root defined as: Bit reversal revk(i): rev4(3): 3(10) = 0011(2) → 1100(2) = 12(10) Tadej Matek 4/17
5
Example of calculation
n = k = log(n) = z = p = 13 8, 2, 2, 4 i = 0 s = revk(i) = 2 s = revk(i) = 0 i = 1 10, 6 6, 11 8+z0*2 % 13 = 10 8+z2*2 % 13 = 6 2+z0*4 % 13 = 6 2+z2*4 % 13 = 11 i = 0 i = 1 i = 2 i = 3 s = 0 s = 3 s = 2 s = 1 3 4 9 3 Tadej Matek 5/17
6
Example: fast multiplication
How to multiply two large polynomials? Basic approach: multiply each component of 1st with each component of 2nd -> O(n2) Using FFT: compute DFT transform of both polynomials, multiply in O(n) time and do inverse FFT -> O(nlogn) Tadej Matek 6/17
7
Dataflow implementation (1)
8, 2, 2, 4 Data dependency! 10, 6 6, 11 Kernel needs updated data for each level! Solution: LMem 7/17 Tadej Matek
8
Dataflow implementation (2)
Input sequence Call kernel k times CPU (1) (3) (2) ... Output sequence Kernel Manager Manager streams data in and out of Kernel LMem Tadej Matek 8/17
9
Dataflow implementation (3)
LMem works in bursts (example: 384 B, but depends on DFE) Good for consecutive calculations zs are calculated on CPU and written to LMem Tadej Matek 9/17
10
Performance & results (1)
CPU used for testing: Intel Core2 Quad Processor Q GHz Maxeler card of type MAX2336B was used for DFE testing Tadej Matek 10/17
11
Performance & results (2)
Conditions: BIG data, 95% run time in loops Type of experiments: consecutive calculations starting from 10K and up to 10M Consecutive calculations for input sequences of length 32, 64, 128 and 256 Tadej Matek 11/17
12
Performance & results (3)
Execution time, N = 32, for CPU and DFE Tadej Matek 12/17
13
Performance & results (4)
Speedup according to the number of consecutive calculations for N = 32 13/17 Tadej Matek
14
Performance & results (5)
Speedup according to the number of consecutive calculations for N = 64 Tadej Matek 14/17
15
Performance & results (6)
Speedup according to the number of consecutive calculations for N = 256 15/17 Tadej Matek
16
Performance & results (7)
Speedup according to the size of input sequence (for 100K calculations) 16/17 Tadej Matek
17
Conclusion FFTs are one of the most used algorithms today
There can be massive speedup but the requirement are consecutive calculations Power usage: reduced due to lower frequency (200Mhz vs 2.86GHz) Tadej Matek 17/17
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.