Download presentation
Presentation is loading. Please wait.
Published byLynn Gilbert Modified over 9 years ago
1
CS 471 Final Project 2d Advection/Wave Equation Using Fourier Methods December 10, 2003 Jose L. Rodriguez jlrod@cs.unm.edu
2
Project Description Use a Spectral Method (Fourier Method) for the equation: Use the JST Runge-Kutta Time Integrator for each time step.
3
Algorithm For each time step that we take, we do s sub stages:
4
Algorithm with Spectral Representation
5
Code Development Develop Serial C Code based off given Matlab code using FFTw libraries for fft and ifft calls Very straightforward Verification of code working correctly was simply comparing with Matlab result Develop Parallel C Code based off Serial C Code The FFTw libraries provide fft and ifft calls that do all MPI Calls for you. The tricky part of this development was placing the data correctly on each processor for the fft and ifft calls. Verification of code working correctly was again comparison with Matlab result
6
Results: N=512, 1000 Iterations
10
Usage of FFTw Libraries in Parallel: Function Calls Notice: Message Passing is transparent to the user
11
Usage of FFTw Libraries in Parallel: MPI Data Layout The transform data used by the MPI FFTW routines is distributed: a distinct portion of it resides with each process involved in the transform. This allows the transform to be parallelized, for example, over a cluster of workstations, each with its own separate memory, so that you can take advantage of the total memory of all the processors you are parallelizing over. In particular, the array is divided according to the rows (first dimension) of the data: each process gets a subset of the rows of the data. (This is sometimes called a "slab decomposition.") One consequence of this is that you can't take advantage of more processors than you have rows (e.g. 64x64x64 matrix can at most use 64 processors). This isn't usually much of a limitation, however, as each processor needs a fair amount of data in order for the parallel- computation benefits to outweight the communications costs. Taken from FFTw website/documentation
12
Usage of FFTw Libraries in Parallel: MPI Data Layout These calls needed to create fft and ifft plan, as well as find out what memory needs are to be met
13
Usage of FFTw Libraries in Parallel: MPI Data Layout ilocal_x_start tells us where we are in the global 2d array (row) and ilocal_nx tells us how many elements we have on this current processor. Using Row-Major Format
14
Notice: Message Passing is transparent to the user
15
Parallel Results Two versions written A Non-Efficient version that is not optimized for FFTw MPI calls: An extra work array is not used. An extra un-transposing of data is done prior to coming out of fft calls. An Efficient version that is optimized for FFTw MPI calls: An extra work array is used Data is left transposed so that an extra communication step of un-transposing data is not done
16
Notice: The slight differences
17
Efficient Version is Faster and more efficient.
18
We begin to see some scaling, however, efficiency starts to taper off indicating that much of the time spent is in communication.
19
Overall, we see the same trend as N increases, i.e. some scaling as Number of Procs increases, but starts to flatten, and efficiency steadily decreases.
20
The Sea of Black for the Non-Efficient Version N=256, 10 Iterations
21
A lot of communication between processors.
22
Communication goes on between each processor with MPI_SendRecv since each processor needs data from each other. We can actually see here when a fft is being performed.
23
8 processors and 16 processors: same trend of communication.
24
The sea of white for the Efficient Version. N=256, 10 Iterations
25
The Efficient Version uses MPI_AlltoAll for its communication between all processors.
26
We again can see when an fft call is being performed by each white bar for each process.
27
8 processors and 16 processors: same trend of communication.
28
Conclusions A lot of time is spent in communication since each process communicates with each other process. Efficiency goes down as a result because as number of process increases for a given size N, more communication is needed. We saw some scaling, but this starts to drop off as number of processors increases (efficiency issues). Time Spent on this project Code Development: ~8 hours with debugging Data Collection: ~2 days Overall: Quite a bit of time
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.