Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 6068 Parallel Computing Fall 2013 Lecture 10 – Nov 18 The Parallel FFT Prof. Fred Office Hours: 10-11 MWF.

Similar presentations


Presentation on theme: "CS 6068 Parallel Computing Fall 2013 Lecture 10 – Nov 18 The Parallel FFT Prof. Fred Office Hours: 10-11 MWF."— Presentation transcript:

1 CS 6068 Parallel Computing Fall 2013 Lecture 10 – Nov 18 The Parallel FFT Prof. Fred Annexstein @proffreda fred.annexstein@uc.edu Office Hours: 10-11 MWF or by appointment Tel: 513-556-1807 Meeting: Mondays 6:00-8:50PM Baldwin 645

2 Outline Fourier analysis Discrete Fourier transform Fast Fourier transform Parallel implementation

3 Discrete Fourier Transform Many applications in science, engineering Examples – Voice recognition – Image processing Straightforward implementation:  (n 2 ) Fast Fourier transform:  (n log n) Parallel FFT  (log n)

4 Fourier Analysis Fourier analysis: Represent periodic continuous functions by (potentially infinite) series of sine and cosine functions Discrete Fourier transform: Map a sequence over time to another sequence over frequency – Signal strength as a function of time  – Fourier coefficients as a function of frequency

5 DFT Example (1/4) 16 data points representing signal strength over time

6 DFT Example (2/4) DFT yields amplitudes and frequencies of sine/cosine functions

7 DFT Example (3/4) Plot of four constituent sine/cosine functions and their sum

8 DFT Example (4/4) Continuous function and original 16 samples.

9 DFT of Speech Sample “An gorra cats are furrier...” Signal Frequency and amplitude Figure courtesy Ron Cole and Yeshwant Muthusamy of the Oregon Graduate Institute

10 Computing DFT Matrix-vector product F n x – x is input vector (n signal samples) – F n is the nth order Fourier Matrix – f i,j =  n ij for 0  i, j < n and  n is primitive nth root of unity

11 11 Discrete Fourier Transform Given a polynomial a 0 + a 1 x +... + a n-1 x n-1, evaluate it at n distinct points x 0,..., x n-1. Key idea: choose x k =  k where  is principal n th root of unity.

12 Example 1 Compute DFT of vector (2, 3)  2, the primitive square root of unity, is -1

13 Example 2 Compute DFT of vector (1, 2, 4, 3) The primitive 4th root of unity is i

14 14 Roots of Unity Def. An n th root of unity is a complex number x such that x n = 1. Fact. The n th roots of unity are:  0,  1, …,  n-1 where  = e 2  i / n. Pf. (  k ) n = (e 2  i k / n ) n = (e  i ) 2k = (-1) 2k = 1. Fact. The n/2 th roots of unity are: 0, 1, …, n/2-1 where = e 4  i / n. Fact.  2 = and (  2 ) k = k.

15 11  2 = 1 = i 33  4 = 2 = -1 55  6 = 3 = -i 77 n = 8  0 = 0 = 1

16 16 Fast Fourier Transform via Divide and Conquer Goal. Evaluate a degree n-1 polynomial A(x) = a 0 +... + a n-1 x n-1 at its n th roots of unity:  0,  1, …,  n-1. Divide. Break polynomial up into even and odd powers. – A even (x) = a 0 + a 2 x + a 4 x 2 + … + a n/2-2 x (n-1)/2. – A odd (x) = a 1 + a 3 x + a 5 x 2 + … + a n/2-1 x (n-1)/2. – A(x) = A even (x 2 ) + x A odd (x 2 ). Conquer. Evaluate degree A even (x) and A odd (x) at the ½n th roots of unity: 0, 1, …, n/2-1. Combine. – A(  k+n ) = A even ( k ) +  k A odd ( k ), 0  k < n/2 – A(  k+n ) = A even ( k ) -  k A odd ( k ), 0  k < n/2  k+n = -  k k = (  k ) 2 = (  k+n ) 2

17 17 fft(n, a 0,a 1,…,a n-1 ) { if (n == 1) return a 0 (e 0,e 1,…,e n/2-1 )  FFT(n/2, a 0,a 2,a 4,…,a n-2 ) (d 0,d 1,…,d n/2-1 )  FFT(n/2, a 1,a 3,a 5,…,a n-1 ) for k = 0 to n/2 - 1 {  k  e 2  ik/n y k+n/2  e k +  k d k y k+n/2  e k -  k d k } return (y 0,y 1,…,y n-1 ) } FFT Algorithm

18 18 Odd-Even Recursion Tree a 0, a 1, a 2, a 3, a 4, a 5, a 6, a 7 a 1, a 3, a 5, a 7 a 0, a 2, a 4, a 6 a 3, a 7 a 1, a 5 a 0, a 4 a 2, a 6 a0a0 a4a4 a2a2 a6a6 a1a1 a5a5 a3a3 a7a7 "bit-reversed" order 000100 010110001101011111 perfect shuffle

19 Phases of Parallel FFT Algorithm Phase 1: Processes permute a’s (global bit reversal data communication pattern) Phase 2: – First log n – log p iterations of FFT – Handled in shared memory -No global communication is required Phase 3: – Final log p iteration steps must be handled globally – Organized as logical hypercube – In each iteration every process swaps values with partner across a hypercube dimension

20 20 FFT in Practice: Sequential and Parallel Fastest Fourier transform in the West. [Frigo and Johnson] – Optimized C library. – Features: DFT, DCT, real, complex, any size, any dimension. – Won 1999 Wilkinson Prize for Numerical Software. – Portable, competitive with vendor-tuned code. The NVIDIA CUDA Fast Fourier Transform library (cuFFT) provides a simple interface for computing FFTs up to 10x faster. By using hundreds of processor cores inside NVIDIA GPUs, cuFFT delivers the floating‐point performance of a GPU without having to develop your own custom GPU FFT. Reference: http://www.fftw.org

21 Summary Discrete Fourier transform used in many scientific and engineering applications Fast Fourier transform important because it implements DFT in time  (n log n) Developed parallel implementation of FFT Why isn’t scalability better? –  (n log n) sequential algorithm – Parallel version requires bit reversal data exchange – Log n parallel phase steps


Download ppt "CS 6068 Parallel Computing Fall 2013 Lecture 10 – Nov 18 The Parallel FFT Prof. Fred Office Hours: 10-11 MWF."

Similar presentations


Ads by Google