CS 6068 Parallel Computing Fall 2013 Lecture 10 – Nov 18 The Parallel FFT Prof. Fred Office Hours: 10-11 MWF.

Slides:



Advertisements
Similar presentations
David Hansen and James Michelussi
Advertisements

Fourier Transform and its Application in Image Processing
Fast Fourier Transform for speeding up the multiplication of polynomials an Algorithm Visualization Alexandru Cioaca.
Parallel Processing (CS 730) Lecture 7: Shared Memory FFTs*
Parallel Fast Fourier Transform Ryan Liu. Introduction The Discrete Fourier Transform could be applied in science and engineering. Examples: ◦ Voice recognition.
Digital Kommunikationselektronik TNE027 Lecture 5 1 Fourier Transforms Discrete Fourier Transform (DFT) Algorithms Fast Fourier Transform (FFT) Algorithms.
LECTURE Copyright  1998, Texas Instruments Incorporated All Rights Reserved Use of Frequency Domain Telecommunication Channel |A| f fcfc Frequency.
DFT/FFT and Wavelets ● Additive Synthesis demonstration (wave addition) ● Standard Definitions ● Computing the DFT and FFT ● Sine and cosine wave multiplication.
Image Processing A brief introduction (by Edgar Alejandro Guerrero Arroyo)
Fast Fourier Transform Lecture 6 Spoken Language Processing Prof. Andrew Rosenberg.
CS 179: GPU Programming Lecture 8. Last time GPU-accelerated: – Reduction – Prefix sum – Stream compaction – Sorting (quicksort)
FFT1 The Fast Fourier Transform. FFT2 Outline and Reading Polynomial Multiplication Problem Primitive Roots of Unity (§10.4.1) The Discrete Fourier Transform.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Lecture #17 INTRODUCTION TO THE FAST FOURIER TRANSFORM ALGORITHM Department of Electrical and Computer Engineering Carnegie Mellon University Pittsburgh,
Richard Fateman CS 282 Lecture 101 The Finite-Field FFT Lecture 10.
CSE 421 Algorithms Richard Anderson Lecture 15 Fast Fourier Transform.
FFT1 The Fast Fourier Transform by Jorge M. Trabal.
Princeton University COS 423 Theory of Algorithms Spring 2002 Kevin Wayne Fast Fourier Transform Jean Baptiste Joseph Fourier ( ) These lecture.
CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 16: Application-Driven Hardware Acceleration (1/4)
Introduction to Algorithms
Digital Image Processing Final Project Compression Using DFT, DCT, Hadamard and SVD Transforms Zvi Devir and Assaf Eden.
The Fourier series A large class of phenomena can be described as periodic in nature: waves, sounds, light, radio, water waves etc. It is natural to attempt.
1 CSE 417: Algorithms and Computational Complexity Winter 2001 Lecture 14 Instructor: Paul Beame.
11/26/02CSE FFT,etc CSE Algorithms Polynomial Representations, Fourier Transfer, and other goodies. (Chapters 28-30)
Unit 7 Fourier, DFT, and FFT 1. Time and Frequency Representation The most common representation of signals and waveforms is in the time domain Most signal.
Chapter 12 Fast Fourier Transform. 1.Metropolis algorithm for Monte Carlo 2.Simplex method for linear programming 3.Krylov subspace iteration (CG) 4.Decomposition.
Fast Fourier Transform Irina Bobkova. Overview I. Polynomials II. The DFT and FFT III. Efficient implementations IV. Some problems.
Fast Fourier Transform. Definition All Periodic Waves Can be Generated by Combining Sin and Cos Waves of Different FrequenciesAll Periodic Waves Can be.
Fast (finite) Fourier Transforms (FFTs) Shirley Moore CPS5401 Fall 2013 svmoore.pbworks.com December 5,
1 How to Multiply Slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved. integers, matrices, and polynomials.
Motivation Music as a combination of sounds at different frequencies
Christopher Mitchell CDA 6938, Spring The Discrete Cosine Transform  In the same family as the Fourier Transform  Converts data to frequency domain.
CSC 7600 Lecture 18: Applied Parallel Algorithms 4 Spring 2009 HIGH PERFORMANCE COMPUTING: MODELS, METHODS, & MEANS APPLIED PARALLEL ALGORITHMS 4 Dr. Hartmut.
1 Chapter 5 Divide and Conquer Slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved.
Transforms. 5*sin (2  4t) Amplitude = 5 Frequency = 4 Hz seconds A sine wave.
FFT USING OPEN-MP Done by: HUSSEIN SALIM QASIM & Tiba Zaki Abdulhameed
Implementing a Speech Recognition System on a GPU using CUDA
CS 179: GPU Programming Lecture 9 / Homework 3. Recap Some algorithms are “less obviously parallelizable”: – Reduction – Sorts – FFT (and certain recursive.
FFT1 The Fast Fourier Transform. FFT2 Outline and Reading Polynomial Multiplication Problem Primitive Roots of Unity (§10.4.1) The Discrete Fourier Transform.
5.6 Convolution and FFT. 2 Fast Fourier Transform: Applications Applications. n Optics, acoustics, quantum physics, telecommunications, control systems,
The Fast Fourier Transform
Mar. 1, 2001Parallel Processing1 Parallel Processing (CS 730) Lecture 9: Distributed Memory FFTs * Jeremy R. Johnson Wed. Mar. 1, 2001 *Parts of this lecture.
The Fast Fourier Transform and Applications to Multiplication
GPU-Accelerated Computing and Case-Based Reasoning Yanzhi Ren, Jiadi Yu, Yingying Chen Department of Electrical and Computer Engineering, Stevens Institute.
Motivation: Wavelets are building blocks that can quickly decorrelate data 2. each signal written as (possibly infinite) sum 1. what type of data? 3. new.
7- 1 Chapter 7: Fourier Analysis Fourier analysis = Series + Transform ◎ Fourier Series -- A periodic (T) function f(x) can be written as the sum of sines.
1 Fast Polynomial and Integer Multiplication Jeremy R. Johnson.
Fourier and Wavelet Transformations Michael J. Watts
Fast Fourier Transforms. 2 Discrete Fourier Transform The DFT pair was given as Baseline for computational complexity: –Each DFT coefficient requires.
CS 376b Introduction to Computer Vision 03 / 17 / 2008 Instructor: Michael Eckmann.
Applied Symbolic Computation1 Applied Symbolic Computation (CS 567) The Fast Fourier Transform (FFT) and Convolution Jeremy R. Johnson TexPoint fonts used.
The Frequency Domain Digital Image Processing – Chapter 8.
بسم الله الرحمن الرحيم Digital Signal Processing Lecture 14 FFT-Radix-2 Decimation in Frequency And Radix -4 Algorithm University of Khartoum Department.
Digital Image Processing Lecture 8: Fourier Transform Prof. Charlene Tsai.
Chapter 2 Divide-and-Conquer algorithms
Section II Digital Signal Processing ES & BM.
Chapter 2 Divide-and-Conquer algorithms
Data Structures and Algorithms (AT70. 02) Comp. Sc. and Inf. Mgmt
Polynomial + Fast Fourier Transform
Divide-and-Conquer Design
Fourier and Wavelet Transformations
FAST FOURIER TRANSFORM ALGORITHMS
DFT and FFT By using the complex roots of unity, we can evaluate and interpolate a polynomial in O(n lg n) An example, here are the solutions to 8 =
The Fast Fourier Transform
Lecture 17 DFT: Discrete Fourier Transform
Chapter 9 Computation of the Discrete Fourier Transform
Chapter 5 Divide and Conquer
The Fast Fourier Transform
Discrete Fourier Transform
Presentation transcript:

CS 6068 Parallel Computing Fall 2013 Lecture 10 – Nov 18 The Parallel FFT Prof. Fred Office Hours: MWF or by appointment Tel: Meeting: Mondays 6:00-8:50PM Baldwin 645

Outline Fourier analysis Discrete Fourier transform Fast Fourier transform Parallel implementation

Discrete Fourier Transform Many applications in science, engineering Examples – Voice recognition – Image processing Straightforward implementation:  (n 2 ) Fast Fourier transform:  (n log n) Parallel FFT  (log n)

Fourier Analysis Fourier analysis: Represent periodic continuous functions by (potentially infinite) series of sine and cosine functions Discrete Fourier transform: Map a sequence over time to another sequence over frequency – Signal strength as a function of time  – Fourier coefficients as a function of frequency

DFT Example (1/4) 16 data points representing signal strength over time

DFT Example (2/4) DFT yields amplitudes and frequencies of sine/cosine functions

DFT Example (3/4) Plot of four constituent sine/cosine functions and their sum

DFT Example (4/4) Continuous function and original 16 samples.

DFT of Speech Sample “An gorra cats are furrier...” Signal Frequency and amplitude Figure courtesy Ron Cole and Yeshwant Muthusamy of the Oregon Graduate Institute

Computing DFT Matrix-vector product F n x – x is input vector (n signal samples) – F n is the nth order Fourier Matrix – f i,j =  n ij for 0  i, j < n and  n is primitive nth root of unity

11 Discrete Fourier Transform Given a polynomial a 0 + a 1 x a n-1 x n-1, evaluate it at n distinct points x 0,..., x n-1. Key idea: choose x k =  k where  is principal n th root of unity.

Example 1 Compute DFT of vector (2, 3)  2, the primitive square root of unity, is -1

Example 2 Compute DFT of vector (1, 2, 4, 3) The primitive 4th root of unity is i

14 Roots of Unity Def. An n th root of unity is a complex number x such that x n = 1. Fact. The n th roots of unity are:  0,  1, …,  n-1 where  = e 2  i / n. Pf. (  k ) n = (e 2  i k / n ) n = (e  i ) 2k = (-1) 2k = 1. Fact. The n/2 th roots of unity are: 0, 1, …, n/2-1 where = e 4  i / n. Fact.  2 = and (  2 ) k = k.

11  2 = 1 = i 33  4 = 2 = -1 55  6 = 3 = -i 77 n = 8  0 = 0 = 1

16 Fast Fourier Transform via Divide and Conquer Goal. Evaluate a degree n-1 polynomial A(x) = a a n-1 x n-1 at its n th roots of unity:  0,  1, …,  n-1. Divide. Break polynomial up into even and odd powers. – A even (x) = a 0 + a 2 x + a 4 x 2 + … + a n/2-2 x (n-1)/2. – A odd (x) = a 1 + a 3 x + a 5 x 2 + … + a n/2-1 x (n-1)/2. – A(x) = A even (x 2 ) + x A odd (x 2 ). Conquer. Evaluate degree A even (x) and A odd (x) at the ½n th roots of unity: 0, 1, …, n/2-1. Combine. – A(  k+n ) = A even ( k ) +  k A odd ( k ), 0  k < n/2 – A(  k+n ) = A even ( k ) -  k A odd ( k ), 0  k < n/2  k+n = -  k k = (  k ) 2 = (  k+n ) 2

17 fft(n, a 0,a 1,…,a n-1 ) { if (n == 1) return a 0 (e 0,e 1,…,e n/2-1 )  FFT(n/2, a 0,a 2,a 4,…,a n-2 ) (d 0,d 1,…,d n/2-1 )  FFT(n/2, a 1,a 3,a 5,…,a n-1 ) for k = 0 to n/2 - 1 {  k  e 2  ik/n y k+n/2  e k +  k d k y k+n/2  e k -  k d k } return (y 0,y 1,…,y n-1 ) } FFT Algorithm

18 Odd-Even Recursion Tree a 0, a 1, a 2, a 3, a 4, a 5, a 6, a 7 a 1, a 3, a 5, a 7 a 0, a 2, a 4, a 6 a 3, a 7 a 1, a 5 a 0, a 4 a 2, a 6 a0a0 a4a4 a2a2 a6a6 a1a1 a5a5 a3a3 a7a7 "bit-reversed" order perfect shuffle

Phases of Parallel FFT Algorithm Phase 1: Processes permute a’s (global bit reversal data communication pattern) Phase 2: – First log n – log p iterations of FFT – Handled in shared memory -No global communication is required Phase 3: – Final log p iteration steps must be handled globally – Organized as logical hypercube – In each iteration every process swaps values with partner across a hypercube dimension

20 FFT in Practice: Sequential and Parallel Fastest Fourier transform in the West. [Frigo and Johnson] – Optimized C library. – Features: DFT, DCT, real, complex, any size, any dimension. – Won 1999 Wilkinson Prize for Numerical Software. – Portable, competitive with vendor-tuned code. The NVIDIA CUDA Fast Fourier Transform library (cuFFT) provides a simple interface for computing FFTs up to 10x faster. By using hundreds of processor cores inside NVIDIA GPUs, cuFFT delivers the floating‐point performance of a GPU without having to develop your own custom GPU FFT. Reference:

Summary Discrete Fourier transform used in many scientific and engineering applications Fast Fourier transform important because it implements DFT in time  (n log n) Developed parallel implementation of FFT Why isn’t scalability better? –  (n log n) sequential algorithm – Parallel version requires bit reversal data exchange – Log n parallel phase steps