Parallel Processing (CS 730) Lecture 7: Shared Memory FFTs*

Slides:



Advertisements
Similar presentations
DFT & FFT Computation.
Advertisements

Acceleration of Cooley-Tukey algorithm using Maxeler machine
David Hansen and James Michelussi
Fourier Transform Fourier transform decomposes a signal into its frequency components Used in telecommunications, data compression, digital signal processing,
Parallel Fast Fourier Transform Ryan Liu. Introduction The Discrete Fourier Transform could be applied in science and engineering. Examples: ◦ Voice recognition.
Digital Kommunikationselektronik TNE027 Lecture 5 1 Fourier Transforms Discrete Fourier Transform (DFT) Algorithms Fast Fourier Transform (FFT) Algorithms.
DFT and FFT FFT is an algorithm to convert a time domain signal to DFT efficiently. FFT is not unique. Many algorithms are available. Each algorithm has.
CS 179: GPU Programming Lecture 8. Last time GPU-accelerated: – Reduction – Prefix sum – Stream compaction – Sorting (quicksort)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Lecture #17 INTRODUCTION TO THE FAST FOURIER TRANSFORM ALGORITHM Department of Electrical and Computer Engineering Carnegie Mellon University Pittsburgh,
Richard Fateman CS 282 Lecture 101 The Finite-Field FFT Lecture 10.
Fast Fourier Transform. Agenda Historical Introduction CFT and DFT Derivation of FFT Implementation.
May 29, Final Presentation Sajib Barua1 Development of a Parallel Fast Fourier Transform Algorithm for Derivative Pricing Using MPI Sajib Barua.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 16: Application-Driven Hardware Acceleration (1/4)
CS 104 Introduction to Computer Science and Graphics Problems Data Structure & Algorithms (3) Recurrence Relation 11/11 ~ 11/14/2008 Yang Song.
Lecture 8 Topics Fourier Transforms –As the limit of Fourier Series –Spectra –Convergence of Fourier Transforms –Fourier Transform: Synthesis equation.
Lecture #18 FAST FOURIER TRANSFORM INVERSES AND ALTERNATE IMPLEMENTATIONS Department of Electrical and Computer Engineering Carnegie Mellon University.
Fast Fourier Transform (FFT) (Section 4.11) CS474/674 – Prof. Bebis.
Introduction to Algorithms
Chapter 12 Fast Fourier Transform. 1.Metropolis algorithm for Monte Carlo 2.Simplex method for linear programming 3.Krylov subspace iteration (CG) 4.Decomposition.
Fast Fourier Transforms
Parallelizing the Fast Fourier Transform David Monismith cs599.
Processor Architecture Needed to handle FFT algoarithm M. Smith.
CS 6068 Parallel Computing Fall 2013 Lecture 10 – Nov 18 The Parallel FFT Prof. Fred Office Hours: MWF.
FFT USING OPEN-MP Done by: HUSSEIN SALIM QASIM & Tiba Zaki Abdulhameed
CS 179: GPU Programming Lecture 9 / Homework 3. Recap Some algorithms are “less obviously parallelizable”: – Reduction – Sorts – FFT (and certain recursive.
FFT: Accelerator Project Rohit Prakash Anand Silodia.
Numerical Methods Fast Fourier Transform Part: Informal Development of Fast Fourier Transform
Karatsuba’s Algorithm for Integer Multiplication
10/18/2013PHY 711 Fall Lecture 221 PHY 711 Classical Mechanics and Mathematical Methods 10-10:50 AM MWF Olin 103 Plan for Lecture 22: Summary of.
Distributed WHT Algorithms Kang Chen Jeremy Johnson Computer Science Drexel University Franz Franchetti Electrical and Computer Engineering.
Mar. 1, 2001Parallel Processing1 Parallel Processing (CS 730) Lecture 9: Distributed Memory FFTs * Jeremy R. Johnson Wed. Mar. 1, 2001 *Parts of this lecture.
Applied Symbolic Computation1 Applied Symbolic Computation (CS 300) Karatsuba’s Algorithm for Integer Multiplication Jeremy R. Johnson.
Digital Signal Processing Chapter 3 Discrete transforms.
The Fast Fourier Transform and Applications to Multiplication
7- 1 Chapter 7: Fourier Analysis Fourier analysis = Series + Transform ◎ Fourier Series -- A periodic (T) function f(x) can be written as the sum of sines.
1 Fast Polynomial and Integer Multiplication Jeremy R. Johnson.
Fast Fourier Transforms. 2 Discrete Fourier Transform The DFT pair was given as Baseline for computational complexity: –Each DFT coefficient requires.
The Discrete Fourier Transform
PMLAB, IECS, FCU Designing Efficient Matrix Transposition on Various Interconnection Networks Using Tensor Product Formulation Presented by Chin-Yi Tsai.
Divide & Conquer Themes –Reasoning about code (correctness and cost) –iterative code, loop invariants, and sums –recursion, induction, and recurrence relations.
Applied Symbolic Computation1 Applied Symbolic Computation (CS 567) The Fast Fourier Transform (FFT) and Convolution Jeremy R. Johnson TexPoint fonts used.
May 9, 2001Applied Symbolic Computation1 Applied Symbolic Computation (CS 680/480) Lecture 6: Multiplication, Interpolation, and the Chinese Remainder.
Introduction to Programming Lecture 12. Today’s Lecture Includes Strings ( character arrays ) Strings ( character arrays ) Algorithms using arrays Algorithms.
بسم الله الرحمن الرحيم Digital Signal Processing Lecture 14 FFT-Radix-2 Decimation in Frequency And Radix -4 Algorithm University of Khartoum Department.
Chapter 9. Computation of Discrete Fourier Transform 9.1 Introduction 9.2 Decimation-in-Time Factorization 9.3 Decimation-in-Frequency Factorization 9.4.
Discrete Fourier Transform
Recursion A problem solving technique where an algorithm is defined in terms of itself A recursive method is a method that calls itself A recursive algorithm.
DIGITAL SIGNAL PROCESSING ELECTRONICS
An Iterative FFT We rewrite the loop to calculate nkyk[1] once
Chapter 12 Fast Fourier Transform
Applied Symbolic Computation
September 4, 1997 Applied Symbolic Computation (CS 300) Dixon’s Algorithm for Solving Rational Linear Systems Jeremy R. Johnson.
September 4, 1997 Applied Symbolic Computation (CS 300) Fast Polynomial and Integer Multiplication Jeremy R. Johnson.
Fast Fourier Transform
High Performance Computing (CS 540)
Fast Fourier Transform (FFT) (Section 4.11)
Applied Symbolic Computation
Lecture #17 INTRODUCTION TO THE FAST FOURIER TRANSFORM ALGORITHM
September 4, 1997 Applied Symbolic Computation (CS 300) Fast Polynomial and Integer Multiplication Jeremy R. Johnson.
The Fast Fourier Transform
Applied Symbolic Computation
September 4, 1997 Applied Symbolic Computation (CS 567) Fast Polynomial and Integer Multiplication Jeremy R. Johnson.
Chapter 9 Computation of the Discrete Fourier Transform
Applied Symbolic Computation
The Fast Fourier Transform
Lecture #18 FAST FOURIER TRANSFORM ALTERNATE IMPLEMENTATIONS
Applied Symbolic Computation
Lecture #17 INTRODUCTION TO THE FAST FOURIER TRANSFORM ALGORITHM
Fast Polynomial and Integer Multiplication
Presentation transcript:

Parallel Processing (CS 730) Lecture 7: Shared Memory FFTs* September 4, 1997 Parallel Processing (CS 730) Lecture 7: Shared Memory FFTs* Jeremy R. Johnson Wed. Feb. 14, 2001 *Parts of this lecture was derived from chapters IX in Lipson. Feb. 14, 2001 Parallel Processing

September 4, 1997 Introduction Objective: To derive and implement a shared-memory parallel program for computing the fast Fourier transform (FFT). Topics Derivation of the FFT Recursive version Iterative version A parallel divide & conquer algorithm using threads A parallel loop version using OpenMP Obtaining additional parallelism Feb. 14, 2001 Parallel Processing

FFT as a Matrix Factorization Compute y = Fnx, where Fn is n-point Fourier matrix. Feb. 14, 2001 Parallel Processing

Matrix Factorizations and Algorithms function y = fft(x) n = length(x) if n == 1 y = x else % [x0 x1] = L^n_2 x x0 = x(1:2:n-1); x1 = x(2:2:n); % [t0 t1] = (I_2 tensor F_m)[x0 x1] t0 = fft(x0); t1 = fft(x1); % w = W_m(omega_n) w = exp((2*pi*i/n)*(0:n/2-1)); % y = [y0 y1] = (F_2 tensor I_m) T^n_m [t0 t1] y0 = t0 + w.*t1; y1 = t0 - w.*t1; y = [y0 y1] end Feb. 14, 2001 Parallel Processing

Rewrite Rules Feb. 14, 2001 Parallel Processing

FFT Variants Cooley-Tukey Recursive FFT Iterative FFT Vector FFT (Stockham) Vector FFT (Korn-Lambiotte) Parallel FFT (Pease) Feb. 14, 2001 Parallel Processing

Tensor Permutations A natural class of permutations compatible with the FFT. Let  be a permutation of {1,…,t} Mixed-radix counting permutation of vector indices Well-known examples are stride permutations and bit-reversal.  Feb. 14, 2001 Parallel Processing

Example (Stride Permutation) 000 000 001 100 010 001 011 011 100 010 101 110 110 101 111 111 Feb. 14, 2001 Parallel Processing

Example (Bit Reversal) 000 000 001 100 010 010 011 110 100 001 101 101 110 011 111 111 Feb. 14, 2001 Parallel Processing

Iterative Cooley-Tukey Algorithm September 4, 1997 Iterative Cooley-Tukey Algorithm R Stage 0 Stage 1 Stage 2 Stage 3 Feb. 14, 2001 Parallel Processing

Iterative Cooley-Tukey Algorithm September 4, 1997 Iterative Cooley-Tukey Algorithm R Stage 0 Stage 1 Stage 2 Stage 3 Feb. 14, 2001 Parallel Processing

Modified Pease Algorithm September 4, 1997 Modified Pease Algorithm Stage 0 Stage 1 Stage 2 Stage 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 2 4 6 8 10 12 14 1 3 5 7 9 11 13 15 Feb. 14, 2001 Parallel Processing

Iterative Implementation function y = ifft2(x) % Input: x a vector of length n. n = 2^t, t an integer, t >= 0. % Output: y = F_{2^t} x % Algorithm: Iterative. % F_{2^t} = { Prod_{c=1}^t (I_{2^{c-1}} @ F_2 @ I_{2^{t-c}}) % (I_{2^{c-1}} @ T^{2^{t-c+1}}_{2^{t-c}}) } R^{2^t} n = length(x); t = ceil(log2(n)); xt = bitreversal(x); yt = zeros(n,1); for c=t:-1:1 m = 2^(c-1); p = 2^(t-c); % W = W_p(omega_{2p}) W = exp((2*pi*i)/(2*p)*-(0:p-1)'); % yt = (I_m @ F_2 @ I_p)xt for j=0:m-1 % y^{2p}_{j*2p+1} = (F_2 @ I_p)T^{2p}_p x^{2p}_{j*2p+1} % = (F_2 @ I_p)(I_p $ W) x^{2p}_{j*2p+1} xt((j*2+1)*p+1:(j+1)*2*p) = W .* xt((j*2+1)*p+1:(j+1)*2*p); yt(j*2*p+1:(j*2+1)*p) = xt(j*2*p+1:(j*2+1)*p) + xt((j*2+1)*p+1:(j+1)*2*p); yt((j*2+1)*p+1:(j+1)*2*p) = xt(j*2*p+1:(j*2+1)*p) - xt((j*2+1)*p+1:(j+1)*2*p); end xt = yt; y = yt; Feb. 14, 2001 Parallel Processing

Iterative Implementation function y = ipfft2(x) % In-place Pease FFT algorithm. % Input: x a vector of length n. n = 2^t, t an integer, t >= 0. % Output: y = F_{2^t} x % Algorithm: Conjugated Pease. % F_{2^t} = { Prod_{c=1}^t L^n_{2^{t-c}}(I_{2^{t-1}} @ F_2)T_c L^n_{2^c} R^{2^t} % n = length(x); t = ceil(log2(n)); y = bitreversal(x); w = exp(-2*pi*i/n); for c=t-1:-1:0 for r=0:2^(t-1)-1 r0 = mod(r,2^c); r1 = floor(r/2^c); a0 = r0*2^(t-c) + r1; a1 = a0 + 2^(t-c-1); y0 = y(a0+1); y1 = w^(r1*2^c) * y(a1+1); y(a0+1) = y0 + y1; y(a1+1) = y0 - y1; end Feb. 14, 2001 Parallel Processing