Download presentation
Presentation is loading. Please wait.
Published byValerie Stone Modified over 9 years ago
1
4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu), and Paul M. Alsing 1 (alsing@ahpcc.unm.edu), 1 AHPCC, 2 NCSA http://www.ahpcc.unm.edu/Workshop/FFTW
2
4/18/00Spring 2000 FFTw workshop2 Contents FFT basic (Paul) What is FFT and why FFT FFTw Outline of FFTW(Guobin) Characteristics C routines Performance and C example codes(Sirpa) Fortran wrappers and example codes (Guobin) Exercises (skipped)
3
4/18/00Spring 2000 FFTw workshop3 FFT Basic What is FFT and why FFT by Paul Alsing
4
4/18/00Spring 2000 FFTw workshop4 Fourier Transform: frequency analysis of time series data. DFT: Discrete Fourier Transform (N time/freq points) FFT: Fast Fourier Transform: efficient implementation ~O(Nlog 2 N)
5
4/18/00Spring 2000 FFTw workshop5 Aliasing issues: Let f c = Nyquist Frequency = 1/(2 t). A sine wave sampled at f c will be sampled at 2 points, the peak and the trough. Frequency components f > | f c | will be falsely folded back into the range -f c < f < f c.
6
4/18/00Spring 2000 FFTw workshop6 Fourier Transform: radix 2, Danielson-Lanczos
7
4/18/00Spring 2000 FFTw workshop7 Fourier Transform: radix 2, Danielson-Lanczos (cont.)
8
4/18/00Spring 2000 FFTw workshop8 Fourier Transform: radix 2, butterfly Cooley-Tukey algorithm We finally get down to 1-point transforms such as The question is: which value of m corresponds to which pattern of e’s and o’s? The answer is: Let {e=0,o=1}. Reverse the pattern of e’s and o’s and you will have the value of m in binary.
9
4/18/00 Bit reversal: The Cooley-Tukey algorithm first rearranges the data in bit reversed form, then builds up the transform in N log 2 N iterations (decimation in time). eee eeo eoe eoo oee oeo ooe eee eeo eoe eoo oee oeo ooe eee
10
4/18/0010 Ordering of time series (coord space) and frequencies in fourier (momentum) space.
11
11 Example Application: Quantum Mechanics Propagation of (dimensionless) Schrodinger Wave Function
12
4/18/00 x y y x transpose Transpose data to keep y transforms continguous in memory. x data is contiguous in memory (Fortran) Serial FFTs
13
transpose In parallel, all x transforms are local operations on each processor (no communication) In performing the transpose processors must perform an All-to-All communication. Parallel FFTs y x P0P0 P3P3 P1P1 P2P2 x y P2P2 P0P0 P1P1 P3P3
14
4/18/00Spring 2000 FFTw workshop14 Outline of FFTw By Guobin Ma
15
4/18/00Spring 2000 FFTw workshop15 Characteristics of FFTw C routines generated by Caml-Light ML 1D/nD, real/complex data Arbitrary input size, not necessary 2 n Serial/Parallel, Share/Distributed Memory Faster than all others, high performance Portable, automatically adapt to machine
16
4/18/00Spring 2000 FFTw workshop16 Two Phases of FFTw Hardware dependent algorithm Planner ‘Learn’ the fast way on your machine Produce a data structure --‘plan’ Reusable Executor Compute the transform Apply to all FFTw operation modes 1D/nD, complex/real, serial/parallel
17
4/18/00Spring 2000 FFTw workshop17 C Routines of FFTw Routines 1D/nD complex 1D/nD real Corresponding parallel (MPI) ones Arguments Special notes Data formats
18
4/18/00Spring 2000 FFTw workshop18 1D Complex Transform Typical call #include … { fftw_complex in[N], out[N]; fftw_plan p; … p = fftw_create_plan(int n, fftw_direction dir, int flags); … fftw_one(p, in, out); … fftw_destroy_plan(p); }
19
4/18/00Spring 2000 FFTw workshop19 1D Complex Transform (cont.) Routines fftw_plan fftw_create_plan(int n, fftw_direction dir, int flags); void fftw_one(fftw_plan plan, fftw_complex *in, fftw_complex *out); fftw_plan fftw_create_plan_specific(int n, fftw_direction dir, int flags, fftw_complex *in, int istride, fftw_complex *out, int ostride);
20
4/18/00Spring 2000 FFTw workshop20 1D Complex Transform (cont.) Routines (cont.) void fftw(fftw_plan plan, int howmany, fftw_complex *in, int istride, int idist, fftw_complex *out, int ostride, int odist); fftw_destroy_plan(fftw_plan plan);
21
4/18/00Spring 2000 FFTw workshop21 1D Complex Transform (cont.) Arguments plan: data structure containing all the information n: data size dir: FFTW_FORWARD (-1), FFTW_BACKWORD (+1) flags: FFTW_MESURE, FFTW_ESTIMATE, FFTW_OUT_PLACE, FFTW_IN_PLACE, FFTW_USE_WISDOM, separated by | howmany: number of transforms / input arrays in, istride, idist: input arrays, in[i*istride+j*idist] out, ostride, odist: output arrays,...
22
4/18/00Spring 2000 FFTw workshop22 1D Complex Transform (cont.) Notes out of place (default), in[N], out[N] in place, save memory, cost more time ignore ostride and odist; ignore out in-order output, 0 frequency at out[0] unnormalized, factor of N
23
4/18/00Spring 2000 FFTw workshop23 nD Complex Transform Routines, similar to 1D case, except … fftwnd_plan fftwnd_create_plan(int rank, const *int n, fftw_direction dir, int flags); void fftwnd_one(fftwnd_plan plan,, ); fftwnd_plan fftw_create_plan_specific(int rank, const *int n, fftw_direction dir,,,,, ); void fftwnd(fftwnd_plan plan,,,,,,, ); fftwnd_destroy_plan(fftwnd_plan plan);
24
4/18/00Spring 2000 FFTw workshop24 nD Complex Transform (cont.) Arguments rank: dimensionality of the arrays to be transformed n: pointer to an array of rank - size of each dimension, e.g. n[8,4,5] row-major for C, column-major for Fortran Special routines for 2D and 3D cases nd -> 2d, 3d n_dim -> nx, ny or nx, ny, nz
25
4/18/00Spring 2000 FFTw workshop25 1D Real Transform Routines, similar to 1D complex case, except … rfftw_plan rfftw_create_plan(,, ); void rfftw_one(rfftw_plan plan, fftw_real *in, fftw_real *out); rfftw_plan rfftw_create_plan_specific(int n, fftw_direction dir, int flags, fftw_real *in, int istride, fftw_real *out, int ostride); void rfftw(rfftwnd_plan plan, int howmany, fftw_real *in, int istride, int idist, fftw_real *out, int ostride, int odist); rfftw_destroy_plan(rfftw_plan plan);
26
4/18/00Spring 2000 FFTw workshop26 1D Real Transform (cont.) Arguments dir: FFTW_REAL_TO_COMPLEX = FFTW_FORWARD = -1 FFTW_COMPLEX_TO_REAL = FFTW_BACK_WARD = 1 others have the same meaning as before
27
4/18/00Spring 2000 FFTw workshop27 nD Real Transform Routines, similar to 1D real case, but … rfftwnd_plan rfftwnd_create_plan(int rank, const *int n, fftw_direction dir, int flags); void rfftwnd_one_real_to_complex(rfftwnd_plan plan, fftw_real *in, fftw_complex *out); void rfftwnd_one_complex_to_real(rfftwnd_plan plan, fftw_complex *in, fftw_real *out); void rfftwnd_real_to_complex(rfftwnd_plan plan, int howmany, fftw_real *in, int istride, int idist, fftw_complex *out, int ostride, int odist);
28
4/18/00Spring 2000 FFTw workshop28 nD Real Transform (cont.) Routines (cont.) void rfftwnd_complex_to_real(rfftwnd_plan plan, int howmany, fftw_complex *in, int istride, int idist, fftw_real *out, int ostride, int odist); rfftwnd_destroy_plan(rfftwnd plan); Special 2D and 3D routines
29
4/18/00Spring 2000 FFTw workshop29 nD Array Format nD arrays stored as a single contiguous block C order, Row-major order First index most slowly, last most quickly Fortran order, Column-major order First index most quickly, last most slowly Static Array - no problem Dynamic Array - may have problem in nD case
30
4/18/00Spring 2000 FFTw workshop30 Parallel FFTw Multi-thread Skipped MPI nD complex Routines Notes Data Layout 1D complex nD real
31
4/18/00Spring 2000 FFTw workshop31 nD Complex MPI FFTw Routines, similar to uniprocessor case, except mpi … fftwnd_mpi_plan fftwnd_create_plan(mpi_comm comm, int rank, const *int n, fftw_direction dir, int flags); void fftwnd_mpi_local_size(fftwnd_mpi_plan p, int *local_first, int *local_first_start, int *local_second_after_transpose, int *local_second_start_after_transpose, int *total_local_size); local_data = (fftw_complex*) malloc(sizeof(fftw_complex) * total_local_size); work = (fftw_complex*) malloc(sizeof(fftw_complex) * total_local_size);
32
4/18/00Spring 2000 FFTw workshop32 nD Complex MPI FFTw (cont.) Routines (cont.) void fftwnd_mpi(fftwnd_mpi_plan p, int n_fields, fftw_complex *local_data, fftw_complex *work, fftw_mpi_output_order output_order); void fftw_mpi_destroy_plan(fftwnd_mpi_plan p);
33
4/18/00Spring 2000 FFTw workshop33 nD Complex MPI FFTw (cont.) Notets First argument: comm - MPI communicator Data layout All fftw_mpi are in-place work : Optional, Same size as local_data, great efficiency by extra storage output_order : normal/transposed transposed : performance improvements, need to reshape the data manually, may have problem sometimes
34
4/18/00Spring 2000 FFTw workshop34 nD Complex MPI FFTw (cont.) Data layout Distributed data Divided according to row (1st dimension) in C Divided according to column (last dimension) in Fortran Given plan, all other parameters regarding to data layout are determined by fftwnd_mpi_local_size total_local_size = n1/np*n1*n2…*nk*n_fields transposed_order: n2 will be the 1st dimension in output inverse transform n[n2,n1,n3,...,nk]
35
4/18/00Spring 2000 FFTw workshop35 1D Complex MPI FFTw Routines, similar to nD case, except no nd … fftw_mpi_plan fftw_create_plan(mpi_comm comm, int n, fftw_direction dir, int flags); void fftw_mpi_local_size(fftw_mpi_plan p, int *local_n, int *local_n_start, int *local_n_after_transpose, int *local_start_after_transpose, int *total_local_size);
36
4/18/00Spring 2000 FFTw workshop36 1D Complex MPI FFTw (cont.) Routines (cont.) void fftw_mpi(fftw_mpi_plan p, int n_fields, fftw_complex *local_data, fftw_complex *work, fftw_mpi_output_order output_order); void fftw_mpi_destroy_plan(fftw_mpi_plan p); Generally worse speedup than nD, fit large size
37
4/18/00Spring 2000 FFTw workshop37 nD Real MPI FFTw Similar to that for uniprocessor and complex MPI Speedup 2, save 1/2 space at the expense of more complicated data format Can have transposed-order output data No 1D Real MPI FFTw
38
4/18/00Spring 2000 FFTw workshop38 Break
39
4/18/00Spring 2000 FFTw workshop39 FFTw Performance By Sirpa Saarinen http://www.ncsa.uiuc.edu/MEDIA/agppt/myFFTW2.ppt
40
4/18/00Spring 2000 FFTw workshop40 C Example Codes By Sirpa Saarinen http://www.ncsa.uiuc.edu/MEDIA/agppt/myFFTW2.ppt
41
4/18/00Spring 2000 FFTw workshop41 FFTW Fortran Wrappers and Example Codes By Guobin Ma
42
4/18/00Spring 2000 FFTw workshop42 FFTw Fortran-Callable Wrappers Routine names, append _ f77 in C routine names fftw/fftwnd/rfftw/rfttwnd -> fftw_f77/fftwnd_f77/rfftw_f77/rfttwnd_f77 fftw_mpi/fftwnd_mpi -> fftw_f77_mpi/fftwnd_f77_mpi e.g. fftwnd_create_plan(3, n_dim, FFTW_FORWARD, FFTW_ESTIMATE | FFTW_IN_PLACE) -> fftwnd_f77_create_plan(plan, 3, n_dim, FFTW_FORWARD, FFTW_ESTIMATE + FFTW_IN_PLACE)
43
4/18/00Spring 2000 FFTw workshop43 FFTw Fortran-Callable Wrappers Notes Any function that returns a value is converted into a subroutines with an additional (first) parameter. No null in Fortran, must allocate and pass an array for out. nD arrays, column-major, Fortran order plan variables: be declared as integer Constants FFTW_FORWARD, FFTW_BACKWARD, FFTW_IN_PLACE, … separated by ‘ + ’ instead of ‘ | ’ In file fortran/fftw_f77.i, fftw_f90.i, fftw_f90_mpi.i
44
4/18/00Spring 2000 FFTw workshop44 Fortran Examples Source codes at AHPCC (tested on Turing, BB, SGI): ~gbma/workshop/fftw/codes or http://www.arc.unm.edu/~gbma/Workshop/FFTW/codes Complex data 1D serial, fftw_1d.f90 1D parallel, fftw_1d_p.f90 nD serial, fftw_3d.f90 nD Parallel Normal order, fftw_3d_p_n.f90 Transposed order, fftw_3d_p_t.f90
45
4/18/00Spring 2000 FFTw workshop45 Fortran Examples (cont.) 1D case Input Forward output Inverse output nD case Input Forward output Inverse output
46
4/18/00Spring 2000 FFTw workshop46 1D Serial Fortran Example FFTw codes... call fftw_f77_create_plan(plan_forward,N, & FFTW_FORWARD, FFTW_ESTIMATE) call fftw_f77_create_plan(plan_reverse,N, & FFTW_BACKWARD,FFTW_ESTIMATE)... call fftw_f77_one(plan_forward,in,out)... call fftw_f77_one(plan_reverse,out,in)... call fftw_f77_destroy_plan(plan_forward) call fftw_f77_destroy_plan(plan_reverse)
47
4/18/00Spring 2000 FFTw workshop47 1D Parallel Fortran Example FFTw codes... call fftw_f77_mpi_create_plan(p_fwd,MPI_COMM_WORLD,N, & FFTW_FORWARD,FFTW_ESTIMATE)... call fftw_f77_mpi_local_sizes(p_fwd, local_n, local_start, & local_n_after_trans, local_start_after_trans, total_local_size)... allocate( psi_local(0:total_local_size-1) )... allocate( work(0:total_local_size-1) )
48
4/18/00Spring 2000 FFTw workshop48 1D Parallel Fortran Example (cont.) FFTw codes (cont.)... call fftw_f77_mpi(p_fwd,1,psi_local,work,USE_WORK)... call fftw_f77_mpi_destroy_plan(p_fwd)... call fftw_f77_mpi_create_plan(p_rvs,MPI_COMM_WORLD,N, & FFTW_BACKWARD,FFTW_ESTIMATE)... call fftw_f77_mpi(p_rvs,1,psi_local,work,USE_WORK)... call fftw_f77_mpi_destroy_plan(p_rvs)
49
4/18/00Spring 2000 FFTw workshop49 nD Serial Fortran Example FFTw codes call fftwnd_f77_create_plan(p_fwd,nd,n_dim, & FFTW_FORWARD,FFTW_ESTIMATE + FFTW_IN_PLACE) call fftwnd_f77_one(p_fwd,psi,0) call fftwnd_f77_destroy_plan(p_fwd) call fftwnd_f77_create_plan(p_rvs,nd,n_dim, & FFTW_BACKWARD,FFTW_ESTIMATE + FFTW_IN_PLACE) call fftwnd_f77_one(p_rvs,psi,0) call fftwnd_f77_destroy_plan(p_rvs)
50
4/18/00Spring 2000 FFTw workshop50 nD Parallel Fortran Example FFTw codes, normal order, nD local array n_dim(1)=nx; n_dim(2)=ny; n_dim(3)=nz call fftwnd_f77_mpi_create_plan(p_fwd,MPI_COMM_WORLD,& nd,n_dim,FFTW_FORWARD,FFTW_ESTIMATE) call fftwnd_f77_mpi_local_sizes(p_fwd, local_nlast, & local_last_start, local_nlast2_after_trans, & local_last2_start_after_trans, total_local_size) allocate( psi_local(0:nx-1,0:ny-1,0:local_nlast-1) ) allocate( work(0:nx-1,0:ny-1,0:local_nlast-1) )
51
4/18/00Spring 2000 FFTw workshop51 nD Parallel Fortran Example (cont.) FFTw codes, normal order, nD local array (cont.) call fftwnd_f77_mpi(p_fwd,1,psi_local,work,USE_WORK,order) call fftwnd_f77_mpi_destroy_plan(p_fwd) call fftwnd_f77_mpi_create_plan(p_rvs,MPI_COMM_WORLD, & nd,n_dim,FFTW_BACKWARD,FFTW_ESTIMATE) call fftwnd_f77_mpi(p_rvs,1,psi_local,work,USE_WORK,order) call fftwnd_f77_mpi_destroy_plan(p_rvs)
52
4/18/00Spring 2000 FFTw workshop52 nD Parallel Fortran Example (cont.) FFTw codes, transposed order, 1D local array n_dim(1)=nx; n_dim(2)=ny; n_dim(3)=nz call fftwnd_f77_mpi_create_plan(p_fwd,MPI_COMM_WORLD,& nd,n_dim,FFTW_FORWARD,FFTW_ESTIMATE) call fftwnd_f77_mpi_local_sizes(p_fwd, local_nlast, & local_last_start, local_nlast2_after_trans, & local_last2_start_after_trans, total_local_size) allocate( psi_local(0:total_local_size-1) ) allocate( work(0:total_local_size-1) )
53
4/18/00Spring 2000 FFTw workshop53 nD Parallel Fortran Example (cont.) FFTw codes, transposed order, 1D local array (cont.) call fftwnd_f77_mpi(p_fwd,1,psi_local,work,USE_WORK,order) call fftwnd_f77_mpi_destroy_plan(p_fwd) n_dim(1)=nx; n_dim(2)=nz; n_dim(3)=ny call fftwnd_f77_mpi_create_plan(p_rvs,MPI_COMM_WORLD, & nd,n_dim,FFTW_BACKWARD,FFTW_ESTIMATE) call fftwnd_f77_mpi(p_rvs,1,psi_local,work,USE_WORK,order) call fftwnd_f77_mpi_destroy_plan(p_rvs)
54
4/18/00Spring 2000 FFTw workshop54 nD Parallel Fortran Example (cont.) Notes Normal order Easy to code, ‘low’ performance Transposed order ‘High’ performance, complicated to code, user reorder data Use-work High efficiency, large memory space
55
4/18/00Spring 2000 FFTw workshop55 Run the Examples at AHPCC Copy files to your directory cp ~ gbma/workshop/fftw/codes/*.*. Compile make filename.tur make filename.bb make filename.sgi with link specification -lfftw -lfftw_mpi (only for MPI) Run BB: qsub -I -l nodes=2 mpirun -np 2 -machinefile $PBS_NODEFILE filename.bb Turing: filename. tur SGI: mpirun -np 2 filename. sgi
56
4/18/00Spring 2000 FFTw workshop56 References Numerical Recipe (FOTRAN) by / William T. Vetterling et al., New York : Cambridge University Press, 1992 Numerical integration by P. J. Davis & P. Rabinowitz, Waltham, Mass., Blaisdell Pub. Co. 1967 www.fftw.org www.fftw.org FFTW User’s manual by M. Frigo & S. G. Johnson
57
4/18/00Spring 2000 FFTw workshop57 Acknowledgement Brain Baltz installation of FFTw at AHPCC running MPI at AHPCC John Greenfield setting up the grid access Andrew Pineda computer work environment at AHPCC Brain Smith & Susan Atlas many stimulated discussions Many others...
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.