Download presentation
Presentation is loading. Please wait.
Published byJocelin Hubbard Modified over 8 years ago
1
Application of Emerging Computational Architectures (GPU, MIC) to Atmospheric Modeling Tom Henderson NOAA Global Systems Division Thomas.B.Henderson@noaa.gov Mark Govett, Jacques Middlecoff Paul Madden, James Rosinski, Craig Tierney
2
9/6/12 Correlation of Forecast Skill and Compute Power 2
3
HPC-Enabled Scientific Goals NIM 2013: Run @ global 4km resolution aqua-planet 2013: Run @ global 30km resolution with real data & topography 2014: Run @ global 4km resolution with real data & topography FIM 2013: Run 60-100 ensemble members @ global 15km resolution 2014: 100+ members @ 10km coupled to ocean- FIM However… 3
4
9/6/12 HPC Challenges CPU clock rates have stalled Emerging “accelerator” architectures crowd many (10s-100s) “cores” on a chip Graphics Processing Units (GPU): NVIDIA Many Integrated Core (MIC): Intel But they require exploitation of fine-grained parallelism 4
5
9/6/12 HPC Challenges Atmospheric Modeling has a lot of fine- grained parallelism … but it is memory-bandwidth bound How do we write software that runs efficiently on GPU & MIC? How can we leverage our existing software investments? Do we need new algorithms/formulations? Enter GSD’s ACS… 5
6
ESRL’s Advanced Computing Section (Sandy’s Vision in 1991) Lead HPC R&D group at NOAA for 20+ years Vector MPP COTS Fine-Grained “accelerators” (GPU and MIC) Focus on software challenges HPC: MPI, OpenMP, OpenACC, etc. Provide Modern SE support Emphasize performance-portability Early adoption of new HPC technology Benefit: competitive HPC procurements Top500 #8 in 2002 with modest budget 9/6/12 6
7
“We did it before, we’ll do it again” GSD MPP (1992- ) 1 st Operational NCEP MPP (2000) 7 GSD GPU (2008- )
8
9/6/12 Current “Accelerator” Research GPU (NVIDIA) NIM dynamical core FIM dynamical core Selected WRF physics packages MIC (Intel) FIM dynamical core Ongoing close interaction with technical staff at NVIDIA, Intel, & compiler vendors Technology transfer to commercial GPU compiler vendors 8
9
9/6/12 GPU vs MIC GPU >512 cores, 10,000s of “thin” threads Many threads allow overlap of memory latency with useful computation Limited working set size Code restructuring often required Hardware relatively mature MIC Fewer cores, fewer threads Likely easier to port code (i86) Code restructuring requirements unclear Hardware still beta, Intel gag order 9
10
9/6/12 Performance-Portable Programming Approaches GPU Commercial directive-based compilers CAPS HMPP 3.0.5 Portland Group PGI Accelerator 11.10 Cray (beta), Pathscale (beta) Directive syntax converging to OpenACC OpenMP long-term MIC OpenMP plus compiler vectorization 10
11
9/6/12 NIM NWP Dynamical Core Science-SE collaboration from the start “GPU-friendly” design (also good for CPU) Single-precision floating-point computations Computations structured as simple vector ops with horizontal indirect addressing and directly- addressed inner vertical loop Coarse-grained (MPI) parallelism via SMS directives Initial fine-grained (GPU) parallelism via locally-developed “F2C-ACC” Followed by PGI Accelerator and CAPS HMPP 11
12
9/6/12 Initial NIM Performance Results on GPU “G5-L96” test case 10242 columns, 96 levels, 1000 time steps Expect similar number of columns on each GPU at ~3km target resolution Optimize for both CPU and GPU CPU = Intel Westmere (2.66GHz) GPU = NVIDIA C2050 “Fermi” ~27% of peak on Westmere 2.8 GHz CPU Quite respectable for a NWP dynamical core! 12
13
Fermi GPU vs. Single/Multiple Westmere CPU cores, “G5-L96” NIM routine CPU 1- core Time (sec) CPU 6-core Time (sec) F2C-ACC GPU Time (sec) HMPP GPU Time (sec) PGI GPU Time (sec) F2C-ACC Speedup vs. 6-core CPU Total*86542068449--**--4.6 vdmints455910621961921975.4 vdmintv211944691101884.9 flux9641752624266.7 vdn131861817184.8 diag389744233--1.8 force8033711134.7 13 * Total time includes I/O, PCIe, etc. ** Recent result: HMPP now complete
14
FIM MIC & GPU Work MIC Added OpenMP parallelism to FIM (alongside SMS) Working closely with Intel staff to analyze and tune kernel performance on MIC Installed new “Knight’s Corner” boards at GSD Gag order… GPU Encouraging early results 9/6/12 14
15
Plans 2013 NIM aqua-planet runs on DoE Titan (GPU) FIM dynamics GPU & MIC parallelization complete Single source code for GPU/MIC/MPI 2014 Full FIM and NIM models with physics running on GPU and MIC nodes Add FIM-ocean model on MIC/GPU Ongoing Continue interactions with vendors Compiler technology transfer We benefit from multiple successful commercial hardware/software solutions! 9/6/12 15
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.