Presentation is loading. Please wait.

Presentation is loading. Please wait.

Accelerating Spherical Harmonic Transforms on the NVIDIA® GPGPU

Similar presentations


Presentation on theme: "Accelerating Spherical Harmonic Transforms on the NVIDIA® GPGPU"— Presentation transcript:

1 Accelerating Spherical Harmonic Transforms on the NVIDIA® GPGPU
ECE 734 PROJECT Accelerating Spherical Harmonic Transforms on the NVIDIA® GPGPU Accelerating Spherical Harmonic Transforms on the NVIDIA® GPGPU -Vikrant Soman

2 Agenda Problem Statement Motivation
Introduction to SPH – analysis and synthesis Overview of GPU architecture CPU-GPU implementation Results Conclusions and Future work References and Acknowledgements

3 Problem Statement Critical computational kernel in numerical weather prediction and climate modeling and other global geo-potential related applications Resolution of satellites is improving leading to enormous global datasets of very high degrees and orders becoming available

4 Motivation The computational aspects of SHTs have become challenging and time consuming. Makes SPH more DATA INTENSIVE and SLOWER ! No one has tried using GPU for SHT before. Try Google search for “Spherical Harmonic Transforms on GPU” !!

5 Spherical Harmonic Transforms
Spherical Harmonic Transforms (SHTs) are essentially Fourier transforms on the sphere. Consists of an Analysis step and Synthesis step. Analysis: Project grid point data on the sphere onto the spectral modes. Synthesis: Inverse transform reconstructs grid point data from the spectral information.

6 Analysis Synthesis FFT of grid point along longitudes (F) * gaussian weights (G) Spectral values (S) Legendre polynomial functions Spectral values (X) Compute IFFT and Normalize results

7 GPU architecture - Overview
Consists of 4 types of memory – Global(Device) Shared Constant Texture

8 Cuda CUDA extends C by allowing the programmer to define C functions, called kernels. Executed N times in parallel by N different CUDA threads, as opposed to only once like regular C functions. // Kernel definition __global__ void vecAdd(float* A, float* B, float* C) { } int main() // Kernel invocation vecAdd<<<1, N>>>(A, B, C);

9 One of the best parts of the GPGPU – Heterogeneous programming
BLAS operation acceleration. Allows the implementation of CPU-GPU architecture which I have used.

10 Implementation Details
Exploit the heterogeneous programming model CPU code implemented in MATLAB. Identified data intensive loops in the code. Map the loop indexing to GPGPU architecture to exploit parallelism Offload computation to GPU retrieve data back to CPU

11 Part of the kernel program Loop mapped to GPU
AS(ty, tx) = A[k*wA*wA + aBegin + wA * ty + tx]; BS(ty, tx) = B[bBegin_x + wB * ty + tx]; Csub (ty,tx) = 0; // Synchronize to make sure the matrices are loaded __syncthreads(); Csub(ty,tx) = AS(ty,tx) * BS(ty,tx); int c = bx*BLOCK_SIZE + by*BLOCK_SIZE*BLOCK_SIZE*(wA/BLOCK_SIZE); A[k*wA*wA + c + tx + ty*wA] = Csub(ty,tx); for n=0:nn Pn = (legendre(n,yg))'; % Note error in Matlab normalization for m= 0:n Nmn = (-1)^m * sqrt((2*n+1)/2 * factorial(n-m)/factorial(n+m) ); P(1:njo2,n+1,m+1) = Nmn*Pn(1:njo2,m+1); end

12 Legendre polynomial calculation
Offload data intensive operation to GPU

13 Analysis step Compute FFT on CPU side.
MATLAB has highly optimized FFT operation.

14 Synthesis step IFFT is again given to CPU.
GPU FFT is good only for very high points ! ( >10000 etc.)

15 CPU side – DELL, Intel Quad Core @2.5Ghz and 2.5GB RAM
GPU – NVIDIA® 8800 GT CPU side code on MATLAB GPU code written in MATLAB extensions provided by NVIDIA® called NVMEX Interfacing between CPU-GPU via plug-in for MATLAB.

16 Results For grid size of 512 speed up of almost 42x !!
Shows upward trend for higher sizes Not much speed up for analysis kernel. Values are comparable though

17 Conclusions and Future work
Improves the on-the-fly Legendre polynomial calculation. Good speed up overall Errors are low. ( less than E-10 on average) Need to look into performance for higher grid sizes. Complete synthesis step results Possible exchange of ideas with PhD student at SMU, Dallas

18 References Drake, J. B., Worley, P., and D’Azevedo, E Algorithm 888: Spherical harmonic transform algorithms. ACM Trans. Math. Softw. 35, 3, Article 23 (October 2008)  Akshara Kaginalkar, Sharad Purohit, Benchmarking of Medium Range Weather Forecasting Model on PARAM -A parallel machine, Center for Development of Advanced Computing (C-DAC), Pune University Campus, Pune India  Martin J. Mohlenkamp, A Fast Transform for Spherical Harmonics, The Journal of Fourier Analysis and Applications, 1999  Huadong Xiao, Yang Lu, Parallel computation for spherical harmonic synthesis and analysis, Computers & Geosciences, Volume 33, Issue 3, March 2007 5. NVIDIA CUDA Programming Guide 2.0 “Special thanks to Prof. Dan Negrut and Makarand Datar, UW Mech department for access to their GPU machines”


Download ppt "Accelerating Spherical Harmonic Transforms on the NVIDIA® GPGPU"

Similar presentations


Ads by Google