Presentation is loading. Please wait.

Presentation is loading. Please wait.

4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 Sirpa Saarinen 2

Similar presentations


Presentation on theme: "4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 Sirpa Saarinen 2"— Presentation transcript:

1 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu), and Paul M. Alsing 1 (alsing@ahpcc.unm.edu), 1 AHPCC, 2 NCSA http://www.ahpcc.unm.edu/Workshop/FFTW

2 4/18/00Spring 2000 FFTw workshop2 Contents  FFT basic (Paul)  What is FFT and why FFT  FFTw  Outline of FFTW(Guobin)  Characteristics  C routines  Performance and C example codes(Sirpa)  Fortran wrappers and example codes (Guobin)  Exercises (skipped)

3 4/18/00Spring 2000 FFTw workshop3 FFT Basic What is FFT and why FFT by Paul Alsing

4 4/18/00Spring 2000 FFTw workshop4 Fourier Transform: frequency analysis of time series data.  DFT: Discrete Fourier Transform (N time/freq points)  FFT: Fast Fourier Transform: efficient implementation ~O(Nlog 2 N)

5 4/18/00Spring 2000 FFTw workshop5 Aliasing issues: Let f c = Nyquist Frequency = 1/(2  t). A sine wave sampled at f c will be sampled at 2 points, the peak and the trough. Frequency components f > | f c | will be falsely folded back into the range -f c < f < f c.

6 4/18/00Spring 2000 FFTw workshop6 Fourier Transform: radix 2, Danielson-Lanczos

7 4/18/00Spring 2000 FFTw workshop7 Fourier Transform: radix 2, Danielson-Lanczos (cont.)

8 4/18/00Spring 2000 FFTw workshop8 Fourier Transform: radix 2, butterfly Cooley-Tukey algorithm We finally get down to 1-point transforms such as The question is: which value of m corresponds to which pattern of e’s and o’s? The answer is: Let {e=0,o=1}. Reverse the pattern of e’s and o’s and you will have the value of m in binary.

9 4/18/00 Bit reversal: The Cooley-Tukey algorithm first rearranges the data in bit reversed form, then builds up the transform in N log 2 N iterations (decimation in time). eee eeo eoe eoo oee oeo ooe eee eeo eoe eoo oee oeo ooe eee

10 4/18/0010 Ordering of time series (coord space) and frequencies in fourier (momentum) space.

11 11 Example Application: Quantum Mechanics Propagation of (dimensionless) Schrodinger Wave Function

12 4/18/00 x y y x transpose Transpose data to keep y transforms continguous in memory. x data is contiguous in memory (Fortran) Serial FFTs

13 transpose In parallel, all x transforms are local operations on each processor (no communication) In performing the transpose processors must perform an All-to-All communication. Parallel FFTs y x P0P0 P3P3 P1P1 P2P2 x y P2P2 P0P0 P1P1 P3P3

14 4/18/00Spring 2000 FFTw workshop14 Outline of FFTw By Guobin Ma

15 4/18/00Spring 2000 FFTw workshop15 Characteristics of FFTw  C routines generated by Caml-Light ML  1D/nD, real/complex data  Arbitrary input size, not necessary 2 n  Serial/Parallel, Share/Distributed Memory  Faster than all others, high performance  Portable, automatically adapt to machine

16 4/18/00Spring 2000 FFTw workshop16 Two Phases of FFTw  Hardware dependent algorithm  Planner  ‘Learn’ the fast way on your machine  Produce a data structure --‘plan’  Reusable  Executor  Compute the transform  Apply to all FFTw operation modes  1D/nD, complex/real, serial/parallel

17 4/18/00Spring 2000 FFTw workshop17 C Routines of FFTw  Routines  1D/nD complex  1D/nD real  Corresponding parallel (MPI) ones  Arguments  Special notes  Data formats

18 4/18/00Spring 2000 FFTw workshop18 1D Complex Transform  Typical call #include … { fftw_complex in[N], out[N]; fftw_plan p; … p = fftw_create_plan(int n, fftw_direction dir, int flags); … fftw_one(p, in, out); … fftw_destroy_plan(p); }

19 4/18/00Spring 2000 FFTw workshop19 1D Complex Transform (cont.)  Routines  fftw_plan fftw_create_plan(int n, fftw_direction dir, int flags);  void fftw_one(fftw_plan plan, fftw_complex *in, fftw_complex *out);  fftw_plan fftw_create_plan_specific(int n, fftw_direction dir, int flags, fftw_complex *in, int istride, fftw_complex *out, int ostride);

20 4/18/00Spring 2000 FFTw workshop20 1D Complex Transform (cont.)  Routines (cont.)  void fftw(fftw_plan plan, int howmany, fftw_complex *in, int istride, int idist, fftw_complex *out, int ostride, int odist);  fftw_destroy_plan(fftw_plan plan);

21 4/18/00Spring 2000 FFTw workshop21 1D Complex Transform (cont.)  Arguments  plan: data structure containing all the information  n: data size  dir: FFTW_FORWARD (-1), FFTW_BACKWORD (+1)  flags: FFTW_MESURE, FFTW_ESTIMATE, FFTW_OUT_PLACE, FFTW_IN_PLACE, FFTW_USE_WISDOM, separated by |  howmany: number of transforms / input arrays  in, istride, idist: input arrays, in[i*istride+j*idist]  out, ostride, odist: output arrays,...

22 4/18/00Spring 2000 FFTw workshop22 1D Complex Transform (cont.)  Notes  out of place (default), in[N], out[N]  in place, save memory, cost more time ignore ostride and odist; ignore out  in-order output, 0 frequency at out[0]  unnormalized, factor of N

23 4/18/00Spring 2000 FFTw workshop23 nD Complex Transform  Routines, similar to 1D case, except …  fftwnd_plan fftwnd_create_plan(int rank, const *int n, fftw_direction dir, int flags);  void fftwnd_one(fftwnd_plan plan,, );  fftwnd_plan fftw_create_plan_specific(int rank, const *int n, fftw_direction dir,,,,, );  void fftwnd(fftwnd_plan plan,,,,,,, );  fftwnd_destroy_plan(fftwnd_plan plan);

24 4/18/00Spring 2000 FFTw workshop24 nD Complex Transform (cont.)  Arguments  rank: dimensionality of the arrays to be transformed  n: pointer to an array of rank - size of each dimension, e.g. n[8,4,5]  row-major for C, column-major for Fortran  Special routines for 2D and 3D cases  nd -> 2d, 3d  n_dim -> nx, ny or nx, ny, nz

25 4/18/00Spring 2000 FFTw workshop25 1D Real Transform  Routines, similar to 1D complex case, except …  rfftw_plan rfftw_create_plan(,, );  void rfftw_one(rfftw_plan plan, fftw_real *in, fftw_real *out);  rfftw_plan rfftw_create_plan_specific(int n, fftw_direction dir, int flags, fftw_real *in, int istride, fftw_real *out, int ostride);  void rfftw(rfftwnd_plan plan, int howmany, fftw_real *in, int istride, int idist, fftw_real *out, int ostride, int odist);  rfftw_destroy_plan(rfftw_plan plan);

26 4/18/00Spring 2000 FFTw workshop26 1D Real Transform (cont.)  Arguments  dir: FFTW_REAL_TO_COMPLEX = FFTW_FORWARD = -1 FFTW_COMPLEX_TO_REAL = FFTW_BACK_WARD = 1  others have the same meaning as before

27 4/18/00Spring 2000 FFTw workshop27 nD Real Transform  Routines, similar to 1D real case, but …  rfftwnd_plan rfftwnd_create_plan(int rank, const *int n, fftw_direction dir, int flags);  void rfftwnd_one_real_to_complex(rfftwnd_plan plan, fftw_real *in, fftw_complex *out);  void rfftwnd_one_complex_to_real(rfftwnd_plan plan, fftw_complex *in, fftw_real *out);  void rfftwnd_real_to_complex(rfftwnd_plan plan, int howmany, fftw_real *in, int istride, int idist, fftw_complex *out, int ostride, int odist);

28 4/18/00Spring 2000 FFTw workshop28 nD Real Transform (cont.)  Routines (cont.)  void rfftwnd_complex_to_real(rfftwnd_plan plan, int howmany, fftw_complex *in, int istride, int idist, fftw_real *out, int ostride, int odist);  rfftwnd_destroy_plan(rfftwnd plan);  Special 2D and 3D routines

29 4/18/00Spring 2000 FFTw workshop29 nD Array Format nD arrays stored as a single contiguous block  C order, Row-major order  First index most slowly, last most quickly  Fortran order, Column-major order  First index most quickly, last most slowly  Static Array - no problem  Dynamic Array - may have problem in nD case

30 4/18/00Spring 2000 FFTw workshop30 Parallel FFTw  Multi-thread  Skipped  MPI  nD complex  Routines  Notes  Data Layout  1D complex  nD real

31 4/18/00Spring 2000 FFTw workshop31 nD Complex MPI FFTw  Routines, similar to uniprocessor case, except mpi …  fftwnd_mpi_plan fftwnd_create_plan(mpi_comm comm, int rank, const *int n, fftw_direction dir, int flags);  void fftwnd_mpi_local_size(fftwnd_mpi_plan p, int *local_first, int *local_first_start, int *local_second_after_transpose, int *local_second_start_after_transpose, int *total_local_size);  local_data = (fftw_complex*) malloc(sizeof(fftw_complex) * total_local_size);  work = (fftw_complex*) malloc(sizeof(fftw_complex) * total_local_size);

32 4/18/00Spring 2000 FFTw workshop32 nD Complex MPI FFTw (cont.)  Routines (cont.)  void fftwnd_mpi(fftwnd_mpi_plan p, int n_fields, fftw_complex *local_data, fftw_complex *work, fftw_mpi_output_order output_order);  void fftw_mpi_destroy_plan(fftwnd_mpi_plan p);

33 4/18/00Spring 2000 FFTw workshop33 nD Complex MPI FFTw (cont.)  Notets  First argument: comm - MPI communicator  Data layout  All fftw_mpi are in-place  work :  Optional,  Same size as local_data,  great efficiency by extra storage  output_order : normal/transposed  transposed : performance improvements, need to reshape the data manually, may have problem sometimes

34 4/18/00Spring 2000 FFTw workshop34 nD Complex MPI FFTw (cont.)  Data layout  Distributed data  Divided according to row (1st dimension) in C  Divided according to column (last dimension) in Fortran  Given plan, all other parameters regarding to data layout are determined by fftwnd_mpi_local_size  total_local_size = n1/np*n1*n2…*nk*n_fields  transposed_order: n2 will be the 1st dimension in output inverse transform n[n2,n1,n3,...,nk]

35 4/18/00Spring 2000 FFTw workshop35 1D Complex MPI FFTw  Routines, similar to nD case, except no nd …  fftw_mpi_plan fftw_create_plan(mpi_comm comm, int n, fftw_direction dir, int flags);  void fftw_mpi_local_size(fftw_mpi_plan p, int *local_n, int *local_n_start, int *local_n_after_transpose, int *local_start_after_transpose, int *total_local_size);

36 4/18/00Spring 2000 FFTw workshop36 1D Complex MPI FFTw (cont.)  Routines (cont.)  void fftw_mpi(fftw_mpi_plan p, int n_fields, fftw_complex *local_data, fftw_complex *work, fftw_mpi_output_order output_order);  void fftw_mpi_destroy_plan(fftw_mpi_plan p);  Generally worse speedup than nD, fit large size

37 4/18/00Spring 2000 FFTw workshop37 nD Real MPI FFTw  Similar to that for uniprocessor and complex MPI  Speedup 2, save 1/2 space at the expense of more complicated data format  Can have transposed-order output data  No 1D Real MPI FFTw

38 4/18/00Spring 2000 FFTw workshop38 Break

39 4/18/00Spring 2000 FFTw workshop39 FFTw Performance By Sirpa Saarinen http://www.ncsa.uiuc.edu/MEDIA/agppt/myFFTW2.ppt

40 4/18/00Spring 2000 FFTw workshop40 C Example Codes By Sirpa Saarinen http://www.ncsa.uiuc.edu/MEDIA/agppt/myFFTW2.ppt

41 4/18/00Spring 2000 FFTw workshop41 FFTW Fortran Wrappers and Example Codes By Guobin Ma

42 4/18/00Spring 2000 FFTw workshop42 FFTw Fortran-Callable Wrappers  Routine names, append _ f77 in C routine names  fftw/fftwnd/rfftw/rfttwnd -> fftw_f77/fftwnd_f77/rfftw_f77/rfttwnd_f77  fftw_mpi/fftwnd_mpi -> fftw_f77_mpi/fftwnd_f77_mpi  e.g. fftwnd_create_plan(3, n_dim, FFTW_FORWARD, FFTW_ESTIMATE | FFTW_IN_PLACE) -> fftwnd_f77_create_plan(plan, 3, n_dim, FFTW_FORWARD, FFTW_ESTIMATE + FFTW_IN_PLACE)

43 4/18/00Spring 2000 FFTw workshop43 FFTw Fortran-Callable Wrappers  Notes  Any function that returns a value is converted into a subroutines with an additional (first) parameter.  No null in Fortran, must allocate and pass an array for out.  nD arrays, column-major, Fortran order  plan variables: be declared as integer  Constants  FFTW_FORWARD, FFTW_BACKWARD, FFTW_IN_PLACE, … separated by ‘ + ’ instead of ‘ | ’  In file fortran/fftw_f77.i, fftw_f90.i, fftw_f90_mpi.i

44 4/18/00Spring 2000 FFTw workshop44 Fortran Examples  Source codes at AHPCC (tested on Turing, BB, SGI):  ~gbma/workshop/fftw/codes or  http://www.arc.unm.edu/~gbma/Workshop/FFTW/codes  Complex data  1D serial, fftw_1d.f90  1D parallel, fftw_1d_p.f90  nD serial, fftw_3d.f90  nD Parallel  Normal order, fftw_3d_p_n.f90  Transposed order, fftw_3d_p_t.f90

45 4/18/00Spring 2000 FFTw workshop45 Fortran Examples (cont.)  1D case  Input  Forward output  Inverse output  nD case  Input  Forward output  Inverse output

46 4/18/00Spring 2000 FFTw workshop46 1D Serial Fortran Example  FFTw codes... call fftw_f77_create_plan(plan_forward,N, & FFTW_FORWARD, FFTW_ESTIMATE) call fftw_f77_create_plan(plan_reverse,N, & FFTW_BACKWARD,FFTW_ESTIMATE)... call fftw_f77_one(plan_forward,in,out)... call fftw_f77_one(plan_reverse,out,in)... call fftw_f77_destroy_plan(plan_forward) call fftw_f77_destroy_plan(plan_reverse)

47 4/18/00Spring 2000 FFTw workshop47 1D Parallel Fortran Example  FFTw codes... call fftw_f77_mpi_create_plan(p_fwd,MPI_COMM_WORLD,N, & FFTW_FORWARD,FFTW_ESTIMATE)... call fftw_f77_mpi_local_sizes(p_fwd, local_n, local_start, & local_n_after_trans, local_start_after_trans, total_local_size)... allocate( psi_local(0:total_local_size-1) )... allocate( work(0:total_local_size-1) )

48 4/18/00Spring 2000 FFTw workshop48 1D Parallel Fortran Example (cont.)  FFTw codes (cont.)... call fftw_f77_mpi(p_fwd,1,psi_local,work,USE_WORK)... call fftw_f77_mpi_destroy_plan(p_fwd)... call fftw_f77_mpi_create_plan(p_rvs,MPI_COMM_WORLD,N, & FFTW_BACKWARD,FFTW_ESTIMATE)... call fftw_f77_mpi(p_rvs,1,psi_local,work,USE_WORK)... call fftw_f77_mpi_destroy_plan(p_rvs)

49 4/18/00Spring 2000 FFTw workshop49 nD Serial Fortran Example  FFTw codes call fftwnd_f77_create_plan(p_fwd,nd,n_dim, & FFTW_FORWARD,FFTW_ESTIMATE + FFTW_IN_PLACE) call fftwnd_f77_one(p_fwd,psi,0) call fftwnd_f77_destroy_plan(p_fwd) call fftwnd_f77_create_plan(p_rvs,nd,n_dim, & FFTW_BACKWARD,FFTW_ESTIMATE + FFTW_IN_PLACE) call fftwnd_f77_one(p_rvs,psi,0) call fftwnd_f77_destroy_plan(p_rvs)

50 4/18/00Spring 2000 FFTw workshop50 nD Parallel Fortran Example  FFTw codes, normal order, nD local array n_dim(1)=nx; n_dim(2)=ny; n_dim(3)=nz call fftwnd_f77_mpi_create_plan(p_fwd,MPI_COMM_WORLD,& nd,n_dim,FFTW_FORWARD,FFTW_ESTIMATE) call fftwnd_f77_mpi_local_sizes(p_fwd, local_nlast, & local_last_start, local_nlast2_after_trans, & local_last2_start_after_trans, total_local_size) allocate( psi_local(0:nx-1,0:ny-1,0:local_nlast-1) ) allocate( work(0:nx-1,0:ny-1,0:local_nlast-1) )

51 4/18/00Spring 2000 FFTw workshop51 nD Parallel Fortran Example (cont.)  FFTw codes, normal order, nD local array (cont.) call fftwnd_f77_mpi(p_fwd,1,psi_local,work,USE_WORK,order) call fftwnd_f77_mpi_destroy_plan(p_fwd) call fftwnd_f77_mpi_create_plan(p_rvs,MPI_COMM_WORLD, & nd,n_dim,FFTW_BACKWARD,FFTW_ESTIMATE) call fftwnd_f77_mpi(p_rvs,1,psi_local,work,USE_WORK,order) call fftwnd_f77_mpi_destroy_plan(p_rvs)

52 4/18/00Spring 2000 FFTw workshop52 nD Parallel Fortran Example (cont.)  FFTw codes, transposed order, 1D local array n_dim(1)=nx; n_dim(2)=ny; n_dim(3)=nz call fftwnd_f77_mpi_create_plan(p_fwd,MPI_COMM_WORLD,& nd,n_dim,FFTW_FORWARD,FFTW_ESTIMATE) call fftwnd_f77_mpi_local_sizes(p_fwd, local_nlast, & local_last_start, local_nlast2_after_trans, & local_last2_start_after_trans, total_local_size) allocate( psi_local(0:total_local_size-1) ) allocate( work(0:total_local_size-1) )

53 4/18/00Spring 2000 FFTw workshop53 nD Parallel Fortran Example (cont.)  FFTw codes, transposed order, 1D local array (cont.) call fftwnd_f77_mpi(p_fwd,1,psi_local,work,USE_WORK,order) call fftwnd_f77_mpi_destroy_plan(p_fwd) n_dim(1)=nx; n_dim(2)=nz; n_dim(3)=ny call fftwnd_f77_mpi_create_plan(p_rvs,MPI_COMM_WORLD, & nd,n_dim,FFTW_BACKWARD,FFTW_ESTIMATE) call fftwnd_f77_mpi(p_rvs,1,psi_local,work,USE_WORK,order) call fftwnd_f77_mpi_destroy_plan(p_rvs)

54 4/18/00Spring 2000 FFTw workshop54 nD Parallel Fortran Example (cont.)  Notes  Normal order  Easy to code, ‘low’ performance  Transposed order  ‘High’ performance, complicated to code, user reorder data  Use-work  High efficiency, large memory space

55 4/18/00Spring 2000 FFTw workshop55 Run the Examples at AHPCC  Copy files to your directory  cp ~ gbma/workshop/fftw/codes/*.*.  Compile  make filename.tur  make filename.bb  make filename.sgi  with link specification -lfftw -lfftw_mpi (only for MPI)  Run  BB: qsub -I -l nodes=2 mpirun -np 2 -machinefile $PBS_NODEFILE filename.bb  Turing: filename. tur  SGI: mpirun -np 2 filename. sgi

56 4/18/00Spring 2000 FFTw workshop56 References  Numerical Recipe (FOTRAN) by / William T. Vetterling et al., New York : Cambridge University Press, 1992  Numerical integration by P. J. Davis & P. Rabinowitz, Waltham, Mass., Blaisdell Pub. Co. 1967  www.fftw.org www.fftw.org  FFTW User’s manual by M. Frigo & S. G. Johnson

57 4/18/00Spring 2000 FFTw workshop57 Acknowledgement  Brain Baltz  installation of FFTw at AHPCC  running MPI at AHPCC  John Greenfield  setting up the grid access  Andrew Pineda  computer work environment at AHPCC  Brain Smith & Susan Atlas  many stimulated discussions  Many others...


Download ppt "4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 Sirpa Saarinen 2"

Similar presentations


Ads by Google