4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 Sirpa Saarinen 2

Slides:



Advertisements
Similar presentations
Introduction to C Programming
Advertisements

David Hansen and James Michelussi
Fourier Transform and its Application in Image Processing
Subroutines – parameter passing passing data to/from a subroutine can be done through the parameters and through the return value of a function subroutine.
MPI version of the Serial Code With One-Dimensional Decomposition Presented by Timothy H. Kaiser, Ph.D. San Diego Supercomputer Center Presented by Timothy.
Parallel Fast Fourier Transform Ryan Liu. Introduction The Discrete Fourier Transform could be applied in science and engineering. Examples: ◦ Voice recognition.
Digital Kommunikationselektronik TNE027 Lecture 5 1 Fourier Transforms Discrete Fourier Transform (DFT) Algorithms Fast Fourier Transform (FFT) Algorithms.
GPU programming: CUDA Acknowledgement: the lecture materials are based on the materials in NVIDIA teaching center CUDA course materials, including materials.
Math 130 Introduction to Computing Sorting Lecture # 17 10/11/04 B Smith: Save until Week 15? B Smith: Save until Week 15? B Smith: Skipped Spring 2005?
Physics 434 Module 4-FFT - T. Burnett 1 Physics 434 Module 4 week 2: the FFT Explore Fourier Analysis and the FFT.
1 Storage Registers vs. memory Access to registers is much faster than access to memory Goal: store as much data as possible in registers Limitations/considerations:
CS 536 Spring Run-time organization Lecture 19.
 2006 Pearson Education, Inc. All rights reserved Midterm review Introduction to Classes and Objects.
CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.
Elementary Data Types Scalar Data Types Numerical Data Types Other
Lecture #18 FAST FOURIER TRANSFORM INVERSES AND ALTERNATE IMPLEMENTATIONS Department of Electrical and Computer Engineering Carnegie Mellon University.
1 CSCE 1030 Computer Science 1 Arrays Chapter 7 in Small Java.
Introduction to Array The fundamental unit of data in any MATLAB program is the array. 1. An array is a collection of data values organized into rows and.
Chapter 12 Fast Fourier Transform. 1.Metropolis algorithm for Monte Carlo 2.Simplex method for linear programming 3.Krylov subspace iteration (CG) 4.Decomposition.
Fast Fourier Transforms
Chapter 8 :: Subroutines and Control Abstraction
Topic 7 - Fourier Transforms DIGITAL IMAGE PROCESSING Course 3624 Department of Physics and Astronomy Professor Bob Warwick.
Pointers (Continuation) 1. Data Pointer A pointer is a programming language data type whose value refers directly to ("points to") another value stored.
Chapter 3.1:Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access.
1 Titanium Review: Ti Parallel Benchmarks Kaushik Datta Titanium NAS Parallel Benchmarks Kathy Yelick U.C. Berkeley September.
5.3 Machine-Independent Compiler Features
Lecture 22 Miscellaneous Topics 4 + Memory Allocation.
Nirmalya Roy School of Electrical Engineering and Computer Science Washington State University Cpt S 122 – Data Structures Custom Templatized Data Structures.
Processor Architecture Needed to handle FFT algoarithm M. Smith.
College of Nanoscale Science and Engineering A uniform algebraically-based approach to computational physics and efficient programming James E. Raynolds.
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
CS 6068 Parallel Computing Fall 2013 Lecture 10 – Nov 18 The Parallel FFT Prof. Fred Office Hours: MWF.
FFT USING OPEN-MP Done by: HUSSEIN SALIM QASIM & Tiba Zaki Abdulhameed
FFT: Accelerator Project Rohit Prakash Anand Silodia.
Scientific Computing Division A tutorial Introduction to Fortran Siddhartha Ghosh Consulting Services Group.
Homework #5 New York University Computer Science Department Data Structures Fall 2008 Eugene Weinstein.
COP4020 Programming Languages Subroutines and Parameter Passing Prof. Xin Yuan.
Ch. 5 Ch. 51 jcmt CSE 3302 Programming Languages CSE3302 Programming Languages (more notes) Dr. Carter Tiernan.
2007/11/2 First French-Japanese PAAP Workshop 1 The FFTE Library and the HPC Challenge (HPCC) Benchmark Suite Daisuke Takahashi Center for Computational.
Radix Sort and Hash-Join for Vector Computers Ripal Nathuji 6.893: Advanced VLSI Computer Architecture 10/12/00.
CS 471 Final Project 2d Advection/Wave Equation Using Fourier Methods December 10, 2003 Jose L. Rodriguez
CSC 8505 Compiler Construction Runtime Environments.
Ch. 5 Ch. 51 jcmt Summer 2003Programming Languages CSE3302 Programming Languages (more notes) Summer 2003 Dr. Carter Tiernan.
 2008 Pearson Education, Inc. All rights reserved. 1 Arrays and Vectors.
CSCI-455/552 Introduction to High Performance Computing Lecture 23.
UniMAP Sem2-10/11 DKT121: Fundamental of Computer Programming1 Arrays.
STL CSSE 250 Susan Reeder. What is the STL? Standard Template Library Standard C++ Library is an extensible framework which contains components for Language.
Assembly - Arrays תרגול 7 מערכים.
MT311 Java Application Development and Programming Languages Li Tak Sing ( 李德成 )
Fast Fourier Transforms. 2 Discrete Fourier Transform The DFT pair was given as Baseline for computational complexity: –Each DFT coefficient requires.
3/12/2013Computer Engg, IIT(BHU)1 MPI-1. MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language.
Arrays Chap. 9 Storing Collections of Values 1. Introductory Example Problem: Teachers need to be able to compute a variety of grading statistics for.
Parallel FFT in Julia Review of FFT.
HP-SEE FFTs Using FFTW and FFTE Libraries Aleksandar Jović Institute of Physics Belgrade, Serbia Scientific Computing Laboratory
©2004 Joel Jones 1 CS 403: Programming Languages Lecture 3 Fall 2004 Department of Computer Science University of Alabama Joel Jones.
KUKUM-06/07 EKT120: Computer Programming 1 Week 6 Arrays-Part 1.
Discrete Fourier Transform
Examples (D. Schmidt et al)
APPENDIX a WRITING SUBROUTINES IN C
A bit of C programming Lecture 3 Uli Raich.
Chapter 12 Fast Fourier Transform
File System Structure How do I organize a disk into a file system?
Fast Fourier Transforms Dr. Vinu Thomas
Array Lists Chapter 6 Section 6.1 to 6.3
Chapter 9 :: Subroutines and Control Abstraction
Chap. 8 :: Subroutines and Control Abstraction
Chap. 8 :: Subroutines and Control Abstraction
EKT150 : Computer Programming
Mapping the FFT Algorithm to the IBM Cell Processor
Presentation transcript:

4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 Sirpa Saarinen 2 and Paul M. Alsing 1 1 AHPCC, 2 NCSA

4/18/00Spring 2000 FFTw workshop2 Contents  FFT basic (Paul)  What is FFT and why FFT  FFTw  Outline of FFTW(Guobin)  Characteristics  C routines  Performance and C example codes(Sirpa)  Fortran wrappers and example codes (Guobin)  Exercises (skipped)

4/18/00Spring 2000 FFTw workshop3 FFT Basic What is FFT and why FFT by Paul Alsing

4/18/00Spring 2000 FFTw workshop4 Fourier Transform: frequency analysis of time series data.  DFT: Discrete Fourier Transform (N time/freq points)  FFT: Fast Fourier Transform: efficient implementation ~O(Nlog 2 N)

4/18/00Spring 2000 FFTw workshop5 Aliasing issues: Let f c = Nyquist Frequency = 1/(2  t). A sine wave sampled at f c will be sampled at 2 points, the peak and the trough. Frequency components f > | f c | will be falsely folded back into the range -f c < f < f c.

4/18/00Spring 2000 FFTw workshop6 Fourier Transform: radix 2, Danielson-Lanczos

4/18/00Spring 2000 FFTw workshop7 Fourier Transform: radix 2, Danielson-Lanczos (cont.)

4/18/00Spring 2000 FFTw workshop8 Fourier Transform: radix 2, butterfly Cooley-Tukey algorithm We finally get down to 1-point transforms such as The question is: which value of m corresponds to which pattern of e’s and o’s? The answer is: Let {e=0,o=1}. Reverse the pattern of e’s and o’s and you will have the value of m in binary.

4/18/00 Bit reversal: The Cooley-Tukey algorithm first rearranges the data in bit reversed form, then builds up the transform in N log 2 N iterations (decimation in time). eee eeo eoe eoo oee oeo ooe eee eeo eoe eoo oee oeo ooe eee

4/18/0010 Ordering of time series (coord space) and frequencies in fourier (momentum) space.

11 Example Application: Quantum Mechanics Propagation of (dimensionless) Schrodinger Wave Function

4/18/00 x y y x transpose Transpose data to keep y transforms continguous in memory. x data is contiguous in memory (Fortran) Serial FFTs

transpose In parallel, all x transforms are local operations on each processor (no communication) In performing the transpose processors must perform an All-to-All communication. Parallel FFTs y x P0P0 P3P3 P1P1 P2P2 x y P2P2 P0P0 P1P1 P3P3

4/18/00Spring 2000 FFTw workshop14 Outline of FFTw By Guobin Ma

4/18/00Spring 2000 FFTw workshop15 Characteristics of FFTw  C routines generated by Caml-Light ML  1D/nD, real/complex data  Arbitrary input size, not necessary 2 n  Serial/Parallel, Share/Distributed Memory  Faster than all others, high performance  Portable, automatically adapt to machine

4/18/00Spring 2000 FFTw workshop16 Two Phases of FFTw  Hardware dependent algorithm  Planner  ‘Learn’ the fast way on your machine  Produce a data structure --‘plan’  Reusable  Executor  Compute the transform  Apply to all FFTw operation modes  1D/nD, complex/real, serial/parallel

4/18/00Spring 2000 FFTw workshop17 C Routines of FFTw  Routines  1D/nD complex  1D/nD real  Corresponding parallel (MPI) ones  Arguments  Special notes  Data formats

4/18/00Spring 2000 FFTw workshop18 1D Complex Transform  Typical call #include … { fftw_complex in[N], out[N]; fftw_plan p; … p = fftw_create_plan(int n, fftw_direction dir, int flags); … fftw_one(p, in, out); … fftw_destroy_plan(p); }

4/18/00Spring 2000 FFTw workshop19 1D Complex Transform (cont.)  Routines  fftw_plan fftw_create_plan(int n, fftw_direction dir, int flags);  void fftw_one(fftw_plan plan, fftw_complex *in, fftw_complex *out);  fftw_plan fftw_create_plan_specific(int n, fftw_direction dir, int flags, fftw_complex *in, int istride, fftw_complex *out, int ostride);

4/18/00Spring 2000 FFTw workshop20 1D Complex Transform (cont.)  Routines (cont.)  void fftw(fftw_plan plan, int howmany, fftw_complex *in, int istride, int idist, fftw_complex *out, int ostride, int odist);  fftw_destroy_plan(fftw_plan plan);

4/18/00Spring 2000 FFTw workshop21 1D Complex Transform (cont.)  Arguments  plan: data structure containing all the information  n: data size  dir: FFTW_FORWARD (-1), FFTW_BACKWORD (+1)  flags: FFTW_MESURE, FFTW_ESTIMATE, FFTW_OUT_PLACE, FFTW_IN_PLACE, FFTW_USE_WISDOM, separated by |  howmany: number of transforms / input arrays  in, istride, idist: input arrays, in[i*istride+j*idist]  out, ostride, odist: output arrays,...

4/18/00Spring 2000 FFTw workshop22 1D Complex Transform (cont.)  Notes  out of place (default), in[N], out[N]  in place, save memory, cost more time ignore ostride and odist; ignore out  in-order output, 0 frequency at out[0]  unnormalized, factor of N

4/18/00Spring 2000 FFTw workshop23 nD Complex Transform  Routines, similar to 1D case, except …  fftwnd_plan fftwnd_create_plan(int rank, const *int n, fftw_direction dir, int flags);  void fftwnd_one(fftwnd_plan plan,, );  fftwnd_plan fftw_create_plan_specific(int rank, const *int n, fftw_direction dir,,,,, );  void fftwnd(fftwnd_plan plan,,,,,,, );  fftwnd_destroy_plan(fftwnd_plan plan);

4/18/00Spring 2000 FFTw workshop24 nD Complex Transform (cont.)  Arguments  rank: dimensionality of the arrays to be transformed  n: pointer to an array of rank - size of each dimension, e.g. n[8,4,5]  row-major for C, column-major for Fortran  Special routines for 2D and 3D cases  nd -> 2d, 3d  n_dim -> nx, ny or nx, ny, nz

4/18/00Spring 2000 FFTw workshop25 1D Real Transform  Routines, similar to 1D complex case, except …  rfftw_plan rfftw_create_plan(,, );  void rfftw_one(rfftw_plan plan, fftw_real *in, fftw_real *out);  rfftw_plan rfftw_create_plan_specific(int n, fftw_direction dir, int flags, fftw_real *in, int istride, fftw_real *out, int ostride);  void rfftw(rfftwnd_plan plan, int howmany, fftw_real *in, int istride, int idist, fftw_real *out, int ostride, int odist);  rfftw_destroy_plan(rfftw_plan plan);

4/18/00Spring 2000 FFTw workshop26 1D Real Transform (cont.)  Arguments  dir: FFTW_REAL_TO_COMPLEX = FFTW_FORWARD = -1 FFTW_COMPLEX_TO_REAL = FFTW_BACK_WARD = 1  others have the same meaning as before

4/18/00Spring 2000 FFTw workshop27 nD Real Transform  Routines, similar to 1D real case, but …  rfftwnd_plan rfftwnd_create_plan(int rank, const *int n, fftw_direction dir, int flags);  void rfftwnd_one_real_to_complex(rfftwnd_plan plan, fftw_real *in, fftw_complex *out);  void rfftwnd_one_complex_to_real(rfftwnd_plan plan, fftw_complex *in, fftw_real *out);  void rfftwnd_real_to_complex(rfftwnd_plan plan, int howmany, fftw_real *in, int istride, int idist, fftw_complex *out, int ostride, int odist);

4/18/00Spring 2000 FFTw workshop28 nD Real Transform (cont.)  Routines (cont.)  void rfftwnd_complex_to_real(rfftwnd_plan plan, int howmany, fftw_complex *in, int istride, int idist, fftw_real *out, int ostride, int odist);  rfftwnd_destroy_plan(rfftwnd plan);  Special 2D and 3D routines

4/18/00Spring 2000 FFTw workshop29 nD Array Format nD arrays stored as a single contiguous block  C order, Row-major order  First index most slowly, last most quickly  Fortran order, Column-major order  First index most quickly, last most slowly  Static Array - no problem  Dynamic Array - may have problem in nD case

4/18/00Spring 2000 FFTw workshop30 Parallel FFTw  Multi-thread  Skipped  MPI  nD complex  Routines  Notes  Data Layout  1D complex  nD real

4/18/00Spring 2000 FFTw workshop31 nD Complex MPI FFTw  Routines, similar to uniprocessor case, except mpi …  fftwnd_mpi_plan fftwnd_create_plan(mpi_comm comm, int rank, const *int n, fftw_direction dir, int flags);  void fftwnd_mpi_local_size(fftwnd_mpi_plan p, int *local_first, int *local_first_start, int *local_second_after_transpose, int *local_second_start_after_transpose, int *total_local_size);  local_data = (fftw_complex*) malloc(sizeof(fftw_complex) * total_local_size);  work = (fftw_complex*) malloc(sizeof(fftw_complex) * total_local_size);

4/18/00Spring 2000 FFTw workshop32 nD Complex MPI FFTw (cont.)  Routines (cont.)  void fftwnd_mpi(fftwnd_mpi_plan p, int n_fields, fftw_complex *local_data, fftw_complex *work, fftw_mpi_output_order output_order);  void fftw_mpi_destroy_plan(fftwnd_mpi_plan p);

4/18/00Spring 2000 FFTw workshop33 nD Complex MPI FFTw (cont.)  Notets  First argument: comm - MPI communicator  Data layout  All fftw_mpi are in-place  work :  Optional,  Same size as local_data,  great efficiency by extra storage  output_order : normal/transposed  transposed : performance improvements, need to reshape the data manually, may have problem sometimes

4/18/00Spring 2000 FFTw workshop34 nD Complex MPI FFTw (cont.)  Data layout  Distributed data  Divided according to row (1st dimension) in C  Divided according to column (last dimension) in Fortran  Given plan, all other parameters regarding to data layout are determined by fftwnd_mpi_local_size  total_local_size = n1/np*n1*n2…*nk*n_fields  transposed_order: n2 will be the 1st dimension in output inverse transform n[n2,n1,n3,...,nk]

4/18/00Spring 2000 FFTw workshop35 1D Complex MPI FFTw  Routines, similar to nD case, except no nd …  fftw_mpi_plan fftw_create_plan(mpi_comm comm, int n, fftw_direction dir, int flags);  void fftw_mpi_local_size(fftw_mpi_plan p, int *local_n, int *local_n_start, int *local_n_after_transpose, int *local_start_after_transpose, int *total_local_size);

4/18/00Spring 2000 FFTw workshop36 1D Complex MPI FFTw (cont.)  Routines (cont.)  void fftw_mpi(fftw_mpi_plan p, int n_fields, fftw_complex *local_data, fftw_complex *work, fftw_mpi_output_order output_order);  void fftw_mpi_destroy_plan(fftw_mpi_plan p);  Generally worse speedup than nD, fit large size

4/18/00Spring 2000 FFTw workshop37 nD Real MPI FFTw  Similar to that for uniprocessor and complex MPI  Speedup 2, save 1/2 space at the expense of more complicated data format  Can have transposed-order output data  No 1D Real MPI FFTw

4/18/00Spring 2000 FFTw workshop38 Break

4/18/00Spring 2000 FFTw workshop39 FFTw Performance By Sirpa Saarinen

4/18/00Spring 2000 FFTw workshop40 C Example Codes By Sirpa Saarinen

4/18/00Spring 2000 FFTw workshop41 FFTW Fortran Wrappers and Example Codes By Guobin Ma

4/18/00Spring 2000 FFTw workshop42 FFTw Fortran-Callable Wrappers  Routine names, append _ f77 in C routine names  fftw/fftwnd/rfftw/rfttwnd -> fftw_f77/fftwnd_f77/rfftw_f77/rfttwnd_f77  fftw_mpi/fftwnd_mpi -> fftw_f77_mpi/fftwnd_f77_mpi  e.g. fftwnd_create_plan(3, n_dim, FFTW_FORWARD, FFTW_ESTIMATE | FFTW_IN_PLACE) -> fftwnd_f77_create_plan(plan, 3, n_dim, FFTW_FORWARD, FFTW_ESTIMATE + FFTW_IN_PLACE)

4/18/00Spring 2000 FFTw workshop43 FFTw Fortran-Callable Wrappers  Notes  Any function that returns a value is converted into a subroutines with an additional (first) parameter.  No null in Fortran, must allocate and pass an array for out.  nD arrays, column-major, Fortran order  plan variables: be declared as integer  Constants  FFTW_FORWARD, FFTW_BACKWARD, FFTW_IN_PLACE, … separated by ‘ + ’ instead of ‘ | ’  In file fortran/fftw_f77.i, fftw_f90.i, fftw_f90_mpi.i

4/18/00Spring 2000 FFTw workshop44 Fortran Examples  Source codes at AHPCC (tested on Turing, BB, SGI):  ~gbma/workshop/fftw/codes or   Complex data  1D serial, fftw_1d.f90  1D parallel, fftw_1d_p.f90  nD serial, fftw_3d.f90  nD Parallel  Normal order, fftw_3d_p_n.f90  Transposed order, fftw_3d_p_t.f90

4/18/00Spring 2000 FFTw workshop45 Fortran Examples (cont.)  1D case  Input  Forward output  Inverse output  nD case  Input  Forward output  Inverse output

4/18/00Spring 2000 FFTw workshop46 1D Serial Fortran Example  FFTw codes... call fftw_f77_create_plan(plan_forward,N, & FFTW_FORWARD, FFTW_ESTIMATE) call fftw_f77_create_plan(plan_reverse,N, & FFTW_BACKWARD,FFTW_ESTIMATE)... call fftw_f77_one(plan_forward,in,out)... call fftw_f77_one(plan_reverse,out,in)... call fftw_f77_destroy_plan(plan_forward) call fftw_f77_destroy_plan(plan_reverse)

4/18/00Spring 2000 FFTw workshop47 1D Parallel Fortran Example  FFTw codes... call fftw_f77_mpi_create_plan(p_fwd,MPI_COMM_WORLD,N, & FFTW_FORWARD,FFTW_ESTIMATE)... call fftw_f77_mpi_local_sizes(p_fwd, local_n, local_start, & local_n_after_trans, local_start_after_trans, total_local_size)... allocate( psi_local(0:total_local_size-1) )... allocate( work(0:total_local_size-1) )

4/18/00Spring 2000 FFTw workshop48 1D Parallel Fortran Example (cont.)  FFTw codes (cont.)... call fftw_f77_mpi(p_fwd,1,psi_local,work,USE_WORK)... call fftw_f77_mpi_destroy_plan(p_fwd)... call fftw_f77_mpi_create_plan(p_rvs,MPI_COMM_WORLD,N, & FFTW_BACKWARD,FFTW_ESTIMATE)... call fftw_f77_mpi(p_rvs,1,psi_local,work,USE_WORK)... call fftw_f77_mpi_destroy_plan(p_rvs)

4/18/00Spring 2000 FFTw workshop49 nD Serial Fortran Example  FFTw codes call fftwnd_f77_create_plan(p_fwd,nd,n_dim, & FFTW_FORWARD,FFTW_ESTIMATE + FFTW_IN_PLACE) call fftwnd_f77_one(p_fwd,psi,0) call fftwnd_f77_destroy_plan(p_fwd) call fftwnd_f77_create_plan(p_rvs,nd,n_dim, & FFTW_BACKWARD,FFTW_ESTIMATE + FFTW_IN_PLACE) call fftwnd_f77_one(p_rvs,psi,0) call fftwnd_f77_destroy_plan(p_rvs)

4/18/00Spring 2000 FFTw workshop50 nD Parallel Fortran Example  FFTw codes, normal order, nD local array n_dim(1)=nx; n_dim(2)=ny; n_dim(3)=nz call fftwnd_f77_mpi_create_plan(p_fwd,MPI_COMM_WORLD,& nd,n_dim,FFTW_FORWARD,FFTW_ESTIMATE) call fftwnd_f77_mpi_local_sizes(p_fwd, local_nlast, & local_last_start, local_nlast2_after_trans, & local_last2_start_after_trans, total_local_size) allocate( psi_local(0:nx-1,0:ny-1,0:local_nlast-1) ) allocate( work(0:nx-1,0:ny-1,0:local_nlast-1) )

4/18/00Spring 2000 FFTw workshop51 nD Parallel Fortran Example (cont.)  FFTw codes, normal order, nD local array (cont.) call fftwnd_f77_mpi(p_fwd,1,psi_local,work,USE_WORK,order) call fftwnd_f77_mpi_destroy_plan(p_fwd) call fftwnd_f77_mpi_create_plan(p_rvs,MPI_COMM_WORLD, & nd,n_dim,FFTW_BACKWARD,FFTW_ESTIMATE) call fftwnd_f77_mpi(p_rvs,1,psi_local,work,USE_WORK,order) call fftwnd_f77_mpi_destroy_plan(p_rvs)

4/18/00Spring 2000 FFTw workshop52 nD Parallel Fortran Example (cont.)  FFTw codes, transposed order, 1D local array n_dim(1)=nx; n_dim(2)=ny; n_dim(3)=nz call fftwnd_f77_mpi_create_plan(p_fwd,MPI_COMM_WORLD,& nd,n_dim,FFTW_FORWARD,FFTW_ESTIMATE) call fftwnd_f77_mpi_local_sizes(p_fwd, local_nlast, & local_last_start, local_nlast2_after_trans, & local_last2_start_after_trans, total_local_size) allocate( psi_local(0:total_local_size-1) ) allocate( work(0:total_local_size-1) )

4/18/00Spring 2000 FFTw workshop53 nD Parallel Fortran Example (cont.)  FFTw codes, transposed order, 1D local array (cont.) call fftwnd_f77_mpi(p_fwd,1,psi_local,work,USE_WORK,order) call fftwnd_f77_mpi_destroy_plan(p_fwd) n_dim(1)=nx; n_dim(2)=nz; n_dim(3)=ny call fftwnd_f77_mpi_create_plan(p_rvs,MPI_COMM_WORLD, & nd,n_dim,FFTW_BACKWARD,FFTW_ESTIMATE) call fftwnd_f77_mpi(p_rvs,1,psi_local,work,USE_WORK,order) call fftwnd_f77_mpi_destroy_plan(p_rvs)

4/18/00Spring 2000 FFTw workshop54 nD Parallel Fortran Example (cont.)  Notes  Normal order  Easy to code, ‘low’ performance  Transposed order  ‘High’ performance, complicated to code, user reorder data  Use-work  High efficiency, large memory space

4/18/00Spring 2000 FFTw workshop55 Run the Examples at AHPCC  Copy files to your directory  cp ~ gbma/workshop/fftw/codes/*.*.  Compile  make filename.tur  make filename.bb  make filename.sgi  with link specification -lfftw -lfftw_mpi (only for MPI)  Run  BB: qsub -I -l nodes=2 mpirun -np 2 -machinefile $PBS_NODEFILE filename.bb  Turing: filename. tur  SGI: mpirun -np 2 filename. sgi

4/18/00Spring 2000 FFTw workshop56 References  Numerical Recipe (FOTRAN) by / William T. Vetterling et al., New York : Cambridge University Press, 1992  Numerical integration by P. J. Davis & P. Rabinowitz, Waltham, Mass., Blaisdell Pub. Co   FFTW User’s manual by M. Frigo & S. G. Johnson

4/18/00Spring 2000 FFTw workshop57 Acknowledgement  Brain Baltz  installation of FFTw at AHPCC  running MPI at AHPCC  John Greenfield  setting up the grid access  Andrew Pineda  computer work environment at AHPCC  Brain Smith & Susan Atlas  many stimulated discussions  Many others...