Fast Fourier Transform CS 498LVK Hassan Jafri
Overview An FFT is an efficient algorithm to compute the Discrete Fourier Transform (DFT) and it inverse Complexity of Direct computation of DFT is O(n^2)
FFT Algorithms FFT algorithms reduce the complexity to O(n log n) However, these algorithms are not cache friendly Radix-2, Radix-4, Radix-8 etc.
The Matrix Algorithm Matrix Fourier Algorithm (4-step algorithm) has better cache locality Works for composite data lenghth. For input set size n = R x C Consider input array as RxC matrix
The Matrix Algorithm THE ALGORITHM Apply a (length R) FFT on each column Multiply each matrix element (index r, c) by the twiddle factor Apply a (length C) transform on each row Transpose the Matrix
MFA with Slight Variation n1 simultaneous n2-point multirow FFTs with twiddle factor multiplication n2 individual n1-point multicolumn FFTs Transpose
The Code subroutine parallel_fft(A, W, U, N) double complex A(*), W(*), U(*) if (N .LE. CACHESIZE) then CALL in_cache_fft(A, W, U, N) return end if
Step 1 !$OMP PARALLEL !$OMP DO do I=1, N/2 W(I) = A(I) + A(I+N/2) W(I+N/2) = (A(I)-A(I+N/2)) * U(I) end do
Step 2 !$OMP DO do J=1, 2 call rec_fft(W((J-1)*(N/2)+1), A(J-1)*(N/2)+1, U(N/2+1), N/2) end do
Step 3 !$OMP DO do I=1, N/2 A(2*I-1)=W(I) A(2*I)= W(I+N/2) end do !$OMP END PARALLEL return end
For Reference Swarztrauber, P.N.: Multiprocessor FFTs. Parallel Computing 5 (1987) 197–210 Cochrane, W.T., Cooley, J.W., Favin, D.L., Helms, H.D., Kaenel, R.A., Lang,W.W., Maling, Jr., G.C., Nelson, D.E., Rader, C.M., Welch, P.D.: What is the fast Fourier transform? IEEE Trans. Audio Electroacoust. 15 (1967) 45–55 Swarztrauber, P.N.: FFT algorithms for vector computers. Parallel Computing 1 (1984) 45–63 Frigo, M., Johnson, S.G.: Fftw. (http://www.fftw.org) Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex Fourier series. Math. Comput. 19 (1965) 297–301. Daisuke Takahashi, Mitsuhisa Sato, Taisuke Boku: "An OpenMP Implementation of Parallel FFT and Its Performance on IA-64 Processors". WOMPAT 2003: 99-108 Wadleigh, K.R.: High performance FFT algorithms for cache-coherent multiprocessors.The International Journal of High Performance Computing Applications 13 (1999) 163–171 Takahashi, D.: A blocking algorithm for parallel 1-D FFT on shared-memory parallel computers. In: Proc. 6th International Conference on Applied Parallel Computing (PARA 2002). Volume 2367 of Lecture Notes in Computer Science., Springer-Verlag (2002) 380–389