HP-SEE FFTs Using FFTW and FFTE Libraries Aleksandar Jović Institute of Physics Belgrade, Serbia Scientific Computing Laboratory

Slides:

Advertisements

Similar presentations

Acceleration of Cooley-Tukey algorithm using Maxeler machine

Advertisements

MPI version of the Serial Code With One-Dimensional Decomposition Presented by Timothy H. Kaiser, Ph.D. San Diego Supercomputer Center Presented by Timothy.

Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property.

CSCI 1730 April 1 st, Materials Class notes slides & some “plain old” html & source code examples linked from course calendar board notes & diagrams.

Parallel Fast Fourier Transform Ryan Liu. Introduction The Discrete Fourier Transform could be applied in science and engineering. Examples: ◦ Voice recognition.

Fortran Jordan Martin Steven Devine. Background Developed by IBM in the 1950s Designed for use in scientific and engineering fields Originally written.

Digital Kommunikationselektronik TNE027 Lecture 5 1 Fourier Transforms Discrete Fourier Transform (DFT) Algorithms Fast Fourier Transform (FFT) Algorithms.

FOURIER TRANSFORMS CENG 5931: GNU RADIO Dr. George Collins.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

Lecture #17 INTRODUCTION TO THE FAST FOURIER TRANSFORM ALGORITHM Department of Electrical and Computer Engineering Carnegie Mellon University Pittsburgh,

Data Parallel Algorithms Presented By: M.Mohsin Butt

Point-to-Point Communication Self Test with solution.

Introduction to Fast Fourier Transform (FFT) Algorithms R.C. Maher ECEN4002/5002 DSP Laboratory Spring 2003.

Parallelization of FFT in AFNI Huang, Jingshan Xi, Hong Department of Computer Science and Engineering University of South Carolina.

CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.

Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 16: Application-Driven Hardware Acceleration (1/4)

Lecture #18 FAST FOURIER TRANSFORM INVERSES AND ALTERNATE IMPLEMENTATIONS Department of Electrical and Computer Engineering Carnegie Mellon University.

1 ICS103 Programming in C Lecture 2: Introduction to C (1)

1 Web Based Interface for Numerical Simulations of Nonlinear Evolution Equations Ryan N. Foster & Thiab Taha Department of Computer Science The University.

Numerical Analysis – Digital Signal Processing Hanyang University Jong-Il Park.

Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.

Upcrc.illinois.edu OpenMP Lab Introduction. Compiling for OpenMP Open project Properties dialog box Select OpenMP Support from C/C++ -> Language.

Lecture 8: Caffe - CPU Optimization

© iPerimeter Ltd Unix and IBM i  AIX and Linux run natively on Power Systems  IBM i can do Unix type things in two ways:  Posix/QShell  Ordinary.

Wavelet Transforms CENG 5931 GNU RADIO INSTRUCTOR: Dr GEORGE COLLINS.

Parallelism and Robotics: The Perfect Marriage By R.Theron,F.J.Blanco,B.Curto,V.Moreno and F.J.Garcia University of Salamanca,Spain Rejitha Anand CMPS.

ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.

4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 Sirpa Saarinen 2

1 Chapter 5 Divide and Conquer Slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved.

CS 6068 Parallel Computing Fall 2013 Lecture 10 – Nov 18 The Parallel FFT Prof. Fred Office Hours: MWF.

FFT USING OPEN-MP Done by: HUSSEIN SALIM QASIM & Tiba Zaki Abdulhameed

Parallel Computing Through MPI Technologies Author: Nyameko Lisa Supervisors: Prof. Elena Zemlyanaya, Prof Alexandr P. Sapozhnikov and Tatiana F. Sapozhnikov.

Introduction to GSL CS 3414 From GNU Scientific Library Reference Manual at

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 1 Porting from the Cray T3E to the IBM SP Jonathan Carter NERSC User Services.

Accelerating MATLAB with CUDA

Implementation of Fast Fourier Transform on General Purpose Computers Tianxiang Yang.

FFT: Accelerator Project Rohit Prakash Anand Silodia.

1 Computer Programming (ECGD2102 ) Using MATLAB Instructor: Eng. Eman Al.Swaity Lecture (1): Introduction.

Introduction to C & C++ Lecture 10 – library JJCAO.

2007/11/2 First French-Japanese PAAP Workshop 1 The FFTE Library and the HPC Challenge (HPCC) Benchmark Suite Daisuke Takahashi Center for Computational.

ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Under-Graduate Project Case Study: Single-path Delay Feedback FFT Speaker: Yu-Min.

1. 2 Define the purpose of MKL Upon completion of this module, you will be able to: Identify and discuss MKL contents Describe the MKL EnvironmentDiscuss.

Chapter 1 Computers, Compilers, & Unix. Overview u Computer hardware u Unix u Computer Languages u Compilers.

CS 471 Final Project 2d Advection/Wave Equation Using Fourier Methods December 10, 2003 Jose L. Rodriguez

Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA

MPI and OpenMP.

Threaded Programming Lecture 2: Introduction to OpenMP.

Single Node Optimization Computational Astrophysics.

Fast Fourier Transforms. 2 Discrete Fourier Transform The DFT pair was given as Baseline for computational complexity: –Each DFT coefficient requires.

3/12/2013Computer Engg, IIT(BHU)1 MPI-1. MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language.

Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA Shirley Moore CPS5401 Fall 2013 svmoore.pbworks.com November 12, 2012.

SC‘13: Hands-on Practical Hybrid Parallel Application Performance Engineering Hands-on example code: NPB-MZ-MPI / BT (on Live-ISO/DVD) VI-HPS Team.

Debugging Lab Antonio Gómez-Iglesias Texas Advanced Computing Center.

EE345S Real-Time Digital Signal Processing Lab Fall 2006 Lecture 17 Fast Fourier Transform Prof. Brian L. Evans Dept. of Electrical and Computer Engineering.

Husheng Li, UTK-EECS, Fall The specification of filter is usually given by the tolerance scheme.  Discrete Fourier Transform (DFT) has both discrete.

BIL 104E Introduction to Scientific and Engineering Computing Lecture 1.

HP-SEE TotalView Debugger Josip Jakić Scientific Computing Laboratory Institute of Physics Belgrade The HP-SEE initiative.

1 ITCS4145 Parallel Programming B. Wilkinson March 23, hybrid-abw.ppt Hybrid Parallel Programming Introduction.

Building and Using Libraries CISC/QCSE 810. Libraries libraries exist for many common scientific tasks linear algebra – BLAS/ATLAS, LAPACK optimization.

mathemaTcl – a collection of mathematical extensions

PARADOX Cluster job management

Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming

Fast Fourier Transform

Advanced TAU Commander

Introduction to GSL CS 3414 From GNU Scientific Library Reference Manual at

The Future of Fortran is Bright …

Concise guide on numerical methods

Alternative Processor Panel Results 2008

Simulation And Modeling

Lecture #17 INTRODUCTION TO THE FAST FOURIER TRANSFORM ALGORITHM

Presentation transcript:

HP-SEE FFTs Using FFTW and FFTE Libraries Aleksandar Jović Institute of Physics Belgrade, Serbia Scientific Computing Laboratory The HP-SEE initiative is co-funded by the European Commission under the FP7 Research Infrastructures contract no

Overview  Introduction  FFT algorithms and FFT libraries  FFTE library  Overview of mostly used subroutines  tests, compiling and running (serial, OPENMP, MPI, MPI+OPENMP)  FFTW library  Overview of mostly used subroutines  tests, compiling and running (serial, OPENMP, MPI, MPI+OPENMP) – <Date (DD-Month-YYYY)2

Introduction  The Discrete Fourier Transform (DFT) plays an important role in many scientific and technical applications, including time series and waveform analysis, solutions to linear partial differential equations, convolution, digital signal processing, and image filtering. DFT :  Where  IDFT:  A fast Fourier transform (FFT) is an efficient algorithm to compute DFT and its inverse – 3

FFT  FFT algorithms  Cooley–Tukey algorithm  Prime-factor FFT algorithm (PFA)  Bruun's FFT algorithm  Rader's FFT algorithm  Bluestein's FFT algorithm  Odlyzko–Schönhage algorithm  FFT libraries  FFTW  P3DFFT  FFTPACK  ACML  GSL  ESSL   FFTE – 4

FFTE library  FFTE (“Fastest Fourier Transform in the East”,) has been developed by Daisuke Takahashi of Tsukuba, Center for Computational Sciences, Graduate School of Systems and Information Engineering University of Tsukuba, Japan.  The name FFTE, which is an acronym for “Fastest Fourier Transform in the East”, is more of a tribute to FFTW than a signal of any serious attempt to offer a production-ready library to rival FFTW (even though FFTE has been observed to slightly outperform FFTW on very large FFTs).  FFTE is a FORTRAN (77 and 90) subroutine library for computing the FFT in one or more dimensions.  FFTE supports radix 2, 3, and 5 Discrete Fourier Transforms (DFTs), including optimized routines for radix-8, and has pure MPI and hybrid MPI+OpenMP.  More about these algorithms can be found in the FAST FOURIER TRANSFORM ALGORITHMS WITH APPLICATION  FFTE is open source, but FFTE comes with little documentation, and it is necessary to examine the source code in order to use it !!!!!!! – 5

FFTE library   Extract file with tar xvzf ffte-5.0.tgz  ~]$ cd ffte-5.0  Files :  zfft1d.f, zfft2d.f, zfft3d.f, zdfft2d.f, zdfft3d.f, fft235.f, kernel.f, mfft235.f, vzfft1d.f, vzfft2d.f, vzfft3d.f, dzfft2d.f, dzfft3d.f, param.h, readme.txt  tests/ ( Test directory, serial and OPENMP )  mpi/ (MPI version directory)  pzfft1d.f, pzfft2d.f, pzfft3d.f, pdzfft2d.f, pdzfft3d.f, pvzfft1d.f, pvzfft2d.f, pvzfft3d.f, pzdfft2d.f, pzdfft3d.f, pzfft3dv.f  tests/ ( MPI, MPI+OPENMP version test directory ) – 6

ZFFT1D  ZFFT1D(A,N,IOPT,B) is 1D-COMPLEX FFT routine  FORTRAN 77 + OPENMP  A(N) is comlpex input/output vector (COMPLEX*16)  B(N*2) is work/coefficient vector (COMPLEX*16)  N=2 IP *3 *IQ *5* IR is the length of the transforms 0, for initialization the coefficients  IOPT= -1, for FORWARD transform 1, for INVERSE transform  IMPORTANT : Subroutines fft235.f, kernel.f and header file param.h are needed to use zfft1d.f – 7

ZFFT2D  ZFFT2D(A,NX,NY,IOPT) is 2D-COMPLEX FFT routine  FORTRAN 77 + OPENMP  A(NX*NY) is comlpex input/output vector (COMPLEX*16)  NX=2 IP *3 *IQ *5* IR is the length of the transforms in the X-direction  NY=2 JP *3 *JQ *5* JR is the length of the transforms in the Y-direction 0, for initialization the coefficients  IOPT= -1, for FORWARD transform 1, for INVERSE transform  IMPORTANT : Subroutines fft235.f, kernel.f and header file param.h are needed to use zfft2d.f – 8

ZFFT3D  ZFFT3D(A,NX,NY,NZ,IOPT) is 3D-COMPLEX FFT routine  FORTRAN 77 + OPENMP  A(NX*NY*NZ) is comlpex input/output vector (COMPLEX*16)  NX=2 IP *3 *IQ *5* IR is the length of the transforms in the X-direction  NY=2 JP *3 *JQ *5* JR is the length of the transforms in the Y-direction  NZ=2 KP *3 *KQ *5* KR is the length of the transforms in the Z-direction 0, for initialization the coefficients  IOPT= -1, for FORWARD transform 1, for INVERSE transform  IMPORTANT : Subroutines fft235.f, kernel.f and header file param.h are needed to use zfft3d.f – 9

DZFFT2D  DZFFT2D(A,NX,NY,IOPT,B)  2D-REAL-TO-COMPLEX FFT routine  FORTRAN 77 + OPENMP  A(NX,NY) is real input vector (REAL*8)  A(NX/2+1,NY) is complex output vector (COMPLEX*16)  B(NX/2+1,NY) is work/coefficient vector (COMPLEX*16)  NX=2 IP *3 *IQ *5* IR is the length of the transforms in the X-direction  NY=2 JP *3 *JQ *5* JR is the length of the transforms in the Y-direction 0, for initialization the coefficients  IOPT= -1, for FORWARD transform  IMPORTANT : Subroutines fft235.f, kernel.f and header file param.h are needed to use dzfft2d.f – 10

ZDFFT2D  ZDFFT2D(A,NX,NY,IOPT,B)  2D-COMPLEX-TO-REAL FFT routine  FORTRAN 77 + OPENMP  A(NX/2+1,NY) is complex input vector (COMPLEX*16)  A(NX,NY) is real output vector (REAL*8)  B(NX/2+1,NY) is work/coefficient vector (COMPLEX*16)  NX=2 IP *3 *IQ *5* IR is the length of the transforms in the X-direction  NY=2 JP *3 *JQ *5* JR is the length of the transforms in the Y-direction 0, for initialization the coefficients  IOPT= 1, for BACKWARD transform  IMPORTANT : Subroutines fft235.f, kernel.f and header file param.h are needed to use zdfft2d.f – 11

DZFFT3D  DZFFT3D(A,NX,NY,NY,IOPT,B)  3D-REAL-TO-COMPLEX FFT routine  FORTRAN 77 + OPENMP  A(NX,NY,NZ) is real input vector (REAL*8)  A(NX/2+1,NY,NZ) is complex output vector (COMPLEX*16)  B(NX/2+1,NY,NZ) is work/coefficient vector (COMPLEX*16)  NX=2 IP *3 *IQ *5* IR is the length of the transforms in the X-direction  NY=2 JP *3 *JQ *5* JR is the length of the transforms in the Y-direction  NY=2 KP *3 *KQ *5* KR is the length of the transforms in the Z-direction 0, for initialization the coefficients  IOPT= -1, for FORWARD transform  IMPORTANT : Subroutines fft235.f, kernel.f and header file param.h are needed to use dzfft3d.f – 12

ZDFFT3D  ZDFFT3D(A,NX,NY,NZ,IOPT,B)  3D-COMPLEX-TO-REAL FFT routine  FORTRAN 77 + OPENMP  A(NX/2+1,NY,NZ) is complex input vector (COMPLEX*16)  A(NX,NY,NZ) is real output vector (REAL*8)  B(NX/2+1,NY,NZ) is work/coefficient vector (COMPLEX*16)  NX=2 IP *3 *IQ *5* IR is the length of the transforms in the X-direction  NY=2 JP *3 *JQ *5* JR is the length of the transforms in the Y-direction  NZ=2 KP *3 *KQ *5* KR is the length of the transforms in the Z-direction 0, for initialization the coefficients  IOPT= 1, for BACKWARD transform  IMPORTANT : Subroutines fft235.f, kernel.f and header file param.h are needed to use zdfft3d.f – 13

EXAMPLES  EXAMPLES:  test1d.f zfft1d.f ( test program)  test2d.f zfft2d.f ( test program)  test3d.f zfft3d.f ( test program)  speed3d.f zfft3d.f ( speed test program)  rtest3d.f test ( zdfft2d.f and dzfft2d.f )program  CALL ZFFT1D(A,N,0,B) ! This call is needed for initialization FFTE  Compiling and Running :  $ gfortran -O3 -fomit-frame-pointer -fopenmp test1d.f zfft1d.f kernel.f fft235.f  $ ifort -O3 –openmp test1d.f zfft1d.f kernel.f fft235.f – 14

Compiling and running FFTE  Compiling and Running :  Serial job :  $ gfortran test1d.f zfft1d.f kernel.f fft235.f –o test1d  ifort, f95  $./test1d  OPENMP job :  $ gfortran -O3 -fomit-frame-pointer -fopenmp test1d.f zfft1d.f kernel.f fft235.f –o test1d_omp  $ ifort -O3 -xHost –openmp test1d.f zfft1d.f kernel.f fft235.f –o test1d_omp  OMP_NUM_THREADS=./test1d_omp – 15

mpi/  PZFFT1D(A,B,W,N,ICOMM,ME,NPU,IOPT) is 1D-COMPLEX FFT routine  FORTRAN 90 + MPI SOURCE PROGRAM  W(N/NPU) is coefficient vector (COMPLEX*16)  N=2 IP *3 *IQ *5* IR is the length of the transform  ICOMM is the communicator (INTEGER*4)  ME is the rank  NPU is the number of processors  A(N/NPU) is complex input vector (COMPLEX*16)  B(N/NPU) is complex output vector(COMPLEX*16)  IOPT=0, for initialization the coefficients =-1, for FORWARD transform = 1, for BACKWARD transform IMPORTANT : Subroutines fft235.f, kernel.f, zfft1d.f and header file param.h are needed to use pzfft1d.f – 16

PZFFT2D  PZFFT2D(A,B,NX,NY,ICOMM,NPU,IOPT) is 2D-COMPLEX FFT routine  FORTRAN 77 + MPI SOURCE PROGRAM  NX=2 IP *3 *IQ *5* IR is the length of the transforms in the X-direction  NY=2 JP *3 *JQ *5* JR is the length of the transforms in the Y-direction  ICOMM is the communicator (INTEGER*4)  NPU is the number of processors  A(NX,NY/NPU) is complex input vector (COMPLEX*16)  B(NX,NY/NPU) is complex output vector(COMPLEX*16)  IOPT = 0 FOR INITIALIZING THE COEFFICIENTS (INTEGER*4) =-1 FOR FORWARD TRANSFORM =1 FOR BACKWARD TRANSFORM IMPORTANT : Subroutines fft235.f, kernel.f and header file param.h are needed to use pzfft2d.f – 17

PZFFT3D  PZFFT3D(A,B,NX,NY,NZ,ICOMM,NPU,IOPT) - 3D-COMPLEX  FORTRAN 77 + MPI SOURCE PROGRAM  NX=2 IP *3 *IQ *5* IR is the length of the transforms in the X-direction  NY=2 JP *3 *JQ *5* JR is the length of the transforms in the Y-direction  NZ=2 KP *3 *KQ *5* KR is the length of the transforms in the Y-direction  ICOMM is the communicator (INTEGER*4)  NPU is the number of processors  A(NX,NY,NZ/NPU) is complex input vector (COMPLEX*16)  B(NX,NY,NZ/NPU) is complex output vector(COMPLEX*16)  IOPT = 0 FOR INITIALIZING THE COEFFICIENTS (INTEGER*4) =-1 FOR FORWARD TRANSFORM =1 FOR BACKWARD TRANSFORM  IMPORTANT : Subroutines fft235.f, kernel.f and header file param.h are needed to use pzfft3d.f – 18

Examples, compiling and running  example_mpi_3D.f  MPI job :  $ mpif90 example_mpi_3D.f pzfft3d.f kernel.f fft235.f –o mpi3d  $ mpiexec –np 4./mpi3d  hybrid job :  $ mpif90 –openmp example_mpi_3D.f pzfft3d.f kernel.f fft235.f –o mpi3d  $ OMP_NUM_THREADS=2 mpiexec –np 2./mpi3d – 19

FFTW library  The Fastest Fourier Transform in the West (FFTW), is a software library for computing discrete Fourier transforms (DFTs), developed by Matteo Frigo and Steven G. Johnson at the Massachusetts Institute of Technology  FFTW is written in the C language  It can compute transforms of real and complex-valued arrays of arbitrary size and dimension in O(n log n) time.   It works best on arrays of sizes with small prime factors, with powers of 2 being optimal and large primes being worst case  for prime sizes it uses either Rader's or Bluestein's FFT algorithm  In 1999, FFTW won the J. H. Wilkinson Prize for Numerical Software – 20

Installation on Linux   Extract file with command tar xvzf fftw tar.gz  $ cd fftw  $./configure –-enable-openmp –-enable-mpi -- prefix=/home/ajovic/QE/ name_directory  $ make  $ make install  The configure script chooses the gcc compiler by default, if it is available; you can select some other compiler with:./configure CC=" “  --enable-mpi : Enables compilation and installation of the FFTW MPI library  --enable-openmp : Enables compilation and installation of the FFTW OPENMP library – 21

COMPLEX 1D FFTW  #include... {  fftw_complex *in, *out; fftw_plan p;.....  in = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N); out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N); p =fftw_plan_dft_1d(N, in, out, FFTW_FORWARD, FFTW_ESTIMATE); ... fftw_execute(p); /* repeat as needed */ ... fftw_destroy_plan(p);  fftw_free(in); fftw_free(out); }  This function creates the plan:  fftw_plan_dft_1d(int n, fftw_complex *in, fftw_complex *out, int sign, unsigned flags);  n is the size of the transform  arguments in and out are pointers to the input and output arrays of the transform  sign, can be either FFTW_FORWARD (-1) or FFTW_BACKWARD (+1) – 22

COMPLEX 1D FFTW  The flags : FFTW_MEASURE or FFTW_ESTIMATE  FFTW_MEASURE instructs FFTW to run and measure the execution time of several FFTs in order to find the best way to compute the transform of size n. This process takes some time (usually a few seconds), depending on your machine and on the size of the transform  FFTW_ESTIMATE, on the contrary, does not run any computation and just builds a reasonable plan that is probably sub-optimal  Computing the actual transforms via fftw_execute(plan) :  void fftw_execute(const fftw_plan plan);  If you want to transform a different array of the same size, you can create a new plan with fftw_plan_dft_1d and FFTW automatically reuses the information from the previous plan, if possible. – 23

COMPLEX 2D and 3D FFTW  2 and 3-dimensional transforms work much the same way as 1-dimensional transforms  The 2d and 3d routines have the following signature:  fftw_plan_dft_2d(int n1,int n2, fftw_complex *in, fftw_complex *out, int sign, unsigned flags);  fftw_plan_dft_3d(int n1,int n2, int n3, fftw_complex *in, fftw_complex *out, int sign, unsigned flags);  1D FFTW of real to comlpex and complex to real data :  fftw_plan_dft_r2c_1d(int n1,int n2, int n3, double *in, fftw_complex *out, int sign, unsigned flags);  fftw_plan_dft_c2r_1d(int n1,int n2, int n3, fftw_complex *in, double *out, int sign, unsigned flags); – 24

EXAMPLE 3D FFTW  Example : fftw-ser.c  You must link this code with the fftw3 library. On Unix systems, link with -lfftw3 -lm  $ icc fftw-ser.c –L/home/ajovic/QE/fftw-3.3.2/lib –I/home/ajovic/QE/fftw-3.3.2/include –lm –lfftw3 – o serijski ./serijski – 25

Usage of OPENMP in FFTW  before calling any FFTW routines, you should call the function:  int fftw_init_threads(void);  before creating a plan that you want to parallelize, you should call:  void fftw_plan_with_nthreads(int nthreads);  The nthreads argument indicates the number of threads you want FFTW to use  If you want to get rid of all memory and other resources allocated internally by FFTW, you can call:  void fftw_cleanup_threads(void);  Programs using the parallel complex transforms should be linked with -lfftw3_omp -lfftw3 –lm if you compiled with OpenMP – 26

EXAMPLE with OPENMP  fftw-omp.c  $ icc fftw-ser.c –L/home/ajovic/QE/fftw-3.3.2/lib – I/home/ajovic/QE/fftw-3.3.2/include –lm –openmp –lfftw3_omp –lfftw3 –o tredovan  $ gcc fftw-ser.c –L/home/ajovic/QE/fftw-3.3.2/lib – I/home/ajovic/QE/fftw-3.3.2/include –lm –fopenmp –lfftw3_omp –lfftw3 –o tredovan  – 27

MPI FFTW  All programs using FFTW's MPI support should include its header file #include  We must call fftw_mpi_init() after calling MPI_Init()  when we create the plan with f ftw_mpi_plan_dft_3d, analogous to fftw_plan_dft_3d, we pass an additional argument: the communicator, indicating which processes will participate in the transform (hereMPI_COMM_WORLD, indicating all processes) – 28

References   – 29