Download presentation
Presentation is loading. Please wait.
Published byRuth Lucas Modified over 9 years ago
1
Computing Spherical Harmonic Transforms on CUDA-Compatible GPUs Wangqun Lin, Fengshun Lu College of Computer National University of Defense Technology CACHES 2011 Tucson, Arizona, June 4th, 2011
2
Outline Motivation Spherical Harmonic Transforms (SHT) Methods Direct Method Efficiency of Threads Utilization Reshaped Method Concurrent Kernel Execution Experiments 2
3
Motivation Computing the S.H.T with GPUs S.H.T is widely used But with complexity of O(N 3 ) GPUs are powerful Performance Metric in the SM level Only emphasizing on the OCCUPANCY Finding another metric to measure how the launched threads are efficiently used 3
4
Spherical Harmonic Transforms(1/2) ξ: state variable ξ n m : spectral coefficients of state variable ξ μ: Gaussian latitude λ: Longitude M: model truncation wavenumber N(m): highest degree of associated Legendre function for wavenumber m P n m (μ)e imλ : associated Legendre functions 4
5
Spherical Harmonic Transforms(2/2) Forward Fourier Forward Legendre Inverse Legendre Inverse Fourier 5
6
Methods – Direct (1/9) Forward Legendre m ≤ n CUDA Thread Thread Block 6
7
Methods – Direct (2/9) Inverse Legendre m ≤ n CUDA Threads of block j 7
8
Methods – ETU Metric (3/9) Efficiency of Thread Utilization(ETU) Measures the proportion of launched threads doing useful work during the entire execution interval Mainly used as a algorithm design guideline Assumption Algorithms consist of many micro steps tu(t,s) function t: thread s: micro step 8
9
Methods – ETU (4/9) Algorithm 2: Direct Inverse Legendre Transform (DILT) Input: ξ n m, P n m, J, M Output: ξ m Execution configuration: (J, M+1) Declaration: tid, bid, fc_sh(M+1) // fc_sh: shared memory 1 initialize fc_sh(tid) to null; // 1 m_s 2 for n=0 to M do // M+1 m_s 3 if tid ≤ n then 4 fc_sh(tid) += ξ n tid ×P n tid (μ bid ); end if 5 end for 6 ξ tid (μ bid ) = fc_sh(tid); // 1 m_s ETU Metric Example 9
10
Methods – Reshaped (5/9) Forward Legendre reshape ETU ≈ 1/2 ETU ≈ 1 10
11
Methods – Reshaped (6/9) Inverse Legendre T213 model reshape 11
12
Methods – Reshaped (7/9) Inverse Legendre T213 model reconstruct 12
13
Methods – Reshaped (8/9) Inverse Legendre T213 model computation for trapezium α and β 13
14
Methods – Concurrent Kernel (9/9) Concurrent Kernel Execution Supported by Fermi and later architectures Programs with many small kernels can efficiently executed on GPUs The consideration of software scalability in the future T213 model Kernel Concurrent Forward LegendreConcurrent Inverse Legendre nGrid sizeBlock sizemGrid sizeBlock size 1 [ 0,53 ]5464[ 0,53 ]32064 2 [ 54,117]64128[ 54,117]32064 3 [118,213]96224[118,213]32096 14
15
Experiments (1/4) Validation of ETU metric T341 model Variable Block size Observations Basically larger ETU indicates better performance No direct relationship shows between OCCUPANCY and performance Same OCCUPANCY doesn't mean equal performance Same-OCCUPANCY, larger-ETU, better performance BSETUOCCUPANCYTime (ms) 960.80390.3121.975 1280.74800.4172.239 1600.78310.4172.038 1920.65190.6252.198 15
16
Experiments (2/4) Performance Forward Legendre Inverse Legendre 16
17
Experiments (3/4) Case Study: STSWM A global shallow water model based on S.H.T. Exhibits many mathematical and computational properties of more complete models Used to investigate and compare numerical methods for simulating atmospheric models T213 truncation Forward Legendre: ftrnve, ftrndi and ftrnpi Invserse legendre: shtrns 17
18
Experiments (4/4) Case Study: STSWM 18
19
Review Motivation Spherical Harmonic Transforms Methods Direct Method Efficiency of Threads Utilization Reshaped Method Concurrent Kernel Execution Experiments 19
20
20
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.