Download presentation
Presentation is loading. Please wait.
Published byAbel Hopkins Modified over 9 years ago
1
FFT: Accelerator Project Rohit Prakash Anand Silodia
2
Work done till now Studied various FFT algorithms Implemented radix-4, recursive and iterative algorithms Optimized these Compared the results with FFTW RESULT- FFTW fares better than our implementation
3
Current Objectives Validate the number of complex calculations in our implementation with theoretical number of computations Document the work done till now Make a website of the project Study FFTW code (also figure out the reasons for its efficiency) Run the code on intel compiler (icc)/ visual c++
4
Validating the computations Incorrect theoretical formula (cnx.org) Theoretical formula (for no. of complex computations) = (11/4)*nlog4(n) =8960 (Correct) (3/4)*nlog4(n) = 3840 (Incorrect) Actual 8960
5
Documentation and website Website of the project – –www.cse.iitd.ac.in/~cs1030186/btpwww.cse.iitd.ac.in/~cs1030186/btp Includes the details and results of our experimentations (till last week)
6
Running on intel compiler icc No improvement Possible reasons – –Tested on Intel Pentium Mobile –This does not support optimizations like exploiting SSE3 instructions (-fast flag)
7
FFTW code 56,489+ LOC (contains code written in Ocaml and C) We decided to study why FFTW is so fast (before going into the code itself) Text we came across in this context – –Design and implementation of FFTW3 (Matteo Frigo and Steven G. Johnson) –Documentation of FFTW
8
Why is FFTW fast? The transform is computed by an executor, composed of highly optimized, composable blocks of C code called codelets –At runtime, a ‘planner’ finds an efficient way to compose codelets: it measures the speed of different plans and chooses the best using a dynamic programming algorithm –The executor interprets the plan with negligible overhead –Codelets are generated automatically and are fast
9
Contd… The executor implements the recursive divide and conquer Cooley Tukey FFT algorithm Basically, it adapts to hardware in order to maximize performance ‘Performance has little to do with the number of operations.Fast code must exploit instruction level parallelism of the processor. It is important to write the code in such a way that C compiler can schedule it efficiently’
10
Contd… It uses some tricky optimizations like – It also exploits SIMD instructions
11
Further plan ? Since FFTW supports MPI and adapts itself to the given hardware architecture, we may use it as it is.
12
References www.fftw.org The Design and Implementation of FFTW3 (Matteo Frigo and Steven G. Johnson) The Fastest Fourier Transform in the West (Matteo Frigo and Steven G. Johnson)
13
Thank You
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.