FFT: Accelerator Project Rohit Prakash Anand Silodia.

FFT: Accelerator Project Rohit Prakash Anand Silodia

Work done till now Studied various FFT algorithms Implemented radix-4, recursive and iterative algorithms Optimized these Compared the results with FFTW RESULT- FFTW fares better than our implementation

Current Objectives Validate the number of complex calculations in our implementation with theoretical number of computations Document the work done till now Make a website of the project Study FFTW code (also figure out the reasons for its efficiency) Run the code on intel compiler (icc)/ visual c++

Validating the computations Incorrect theoretical formula (cnx.org) Theoretical formula (for no. of complex computations) = (11/4)*nlog4(n) =8960 (Correct) (3/4)*nlog4(n) = 3840 (Incorrect) Actual 8960

Documentation and website Website of the project – –www.cse.iitd.ac.in/~cs1030186/btpwww.cse.iitd.ac.in/~cs1030186/btp Includes the details and results of our experimentations (till last week)

Running on intel compiler icc No improvement Possible reasons – –Tested on Intel Pentium Mobile –This does not support optimizations like exploiting SSE3 instructions (-fast flag)

FFTW code 56,489+ LOC (contains code written in Ocaml and C) We decided to study why FFTW is so fast (before going into the code itself) Text we came across in this context – –Design and implementation of FFTW3 (Matteo Frigo and Steven G. Johnson) –Documentation of FFTW

Why is FFTW fast? The transform is computed by an executor, composed of highly optimized, composable blocks of C code called codelets –At runtime, a ‘planner’ finds an efficient way to compose codelets: it measures the speed of different plans and chooses the best using a dynamic programming algorithm –The executor interprets the plan with negligible overhead –Codelets are generated automatically and are fast

Contd… The executor implements the recursive divide and conquer Cooley Tukey FFT algorithm Basically, it adapts to hardware in order to maximize performance ‘Performance has little to do with the number of operations.Fast code must exploit instruction level parallelism of the processor. It is important to write the code in such a way that C compiler can schedule it efficiently’

Contd… It uses some tricky optimizations like – It also exploits SIMD instructions

Further plan ? Since FFTW supports MPI and adapts itself to the given hardware architecture, we may use it as it is.

References www.fftw.org The Design and Implementation of FFTW3 (Matteo Frigo and Steven G. Johnson) The Fastest Fourier Transform in the West (Matteo Frigo and Steven G. Johnson)

Thank You

FFT: Accelerator Project Rohit Prakash Anand Silodia.

Similar presentations

Presentation on theme: "FFT: Accelerator Project Rohit Prakash Anand Silodia."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

FFT: Accelerator Project Rohit Prakash Anand Silodia.

Similar presentations

Presentation on theme: "FFT: Accelerator Project Rohit Prakash Anand Silodia."— Presentation transcript:

Similar presentations

About project

Feedback