Is RRTMGP suited for GPU?
Expectations Embarrassing parallel Memory intensive computations Columns can be split up and computed in parallel Memory intensive computations Memory is faster on GPU than CPU Answer to the title: Yes
Context
Speed-up GPU vs CPU The expected speed up will be defined much by the memory performance. GDDR is faster than DDR. This advantage is expected to further grow in future GPU generations. Memory bandwidth speed is approximately 1.5 - 2 times faster on GPU. Drawback: GDDR is typically smaller than DDR
Computations in RRTMGP Multiple components to parallelize Gas optics, flux solver, etc. Multiple sub-components, each with its own logic Computations are relatively lightweight Terms and factors from multiple sources, often arrays, are combined using basic arithmetic. Static data can be parked on GPU memory e.g. k-coefficients
Scale Dimensions Approx. 40.000+ Columns 100 Layers 250 Pseudo-spectral 10 other
Memory Access Patterns Memory access is mostly sequential. There are local interpolations that interfere with a perfect sequential memory access. These disruptions are at a local scale only. Indexing on arrays change for components. Gas optics (pseudo-spectral, layer, column) Flux solver (column, layer, pseudo-spectral)
Lessons learned
Overview Compilers struggle with newer FORTRAN, OpenACC, and libraries. FORTRAN 2003 NetCDF library for I/O For OpenACC we tested: PGI and Cray Without OpenACC we tested: Intel, PGI, GNU, Cray, and NAG
Success: Cray and OpenACC We got gas optics to work to the extend that it compiled and computed the correct answers to 15 digit precision on GPU. With $ACC PARALLEL; $ACC KERNEL crashes Error messages could be better Issues Member variables and OpenACC are not workable Function calls within parallel regions are not supported by compiler Optional arguments and OpenACC are not workable Defining dynamic dimensions of variables in member functions
PGI and NetCDF Failure: PGI and NetCDF do not play nice ERROR: Segmentation fault pgi/15.3 netcdf/4.3.3.1 on Janus @ rc.colorado.edu This prevented us from testing OpenACC and PGI. The PGI compiler is one of the prime choices for OpenACC. Q: What is the standard NetCDF library for Python? netCDF4, scipy.io.netcdf, or Scientific.IO.NetCDF
Intel Does not support OpenACC for practical purpose. A few hick-ups with FORTAN 2003 standard, but overall “thumbs up”. Side note: The compiler is sometimes too lenient in the syntax it accepts intel/15.0.2 netcdf/4.3.3.1
GNU Does not support OpenACC for practical purpose. A few hick-ups with FORTAN 2003 standard, but overall “thumbs up”. Does not support some FORTAN 2003 implicit memory allocations Expected to be slower than other compilers gnu/4.9.2 netcdf/4.3.3.1
Extra slides
Parallelism in RRTMGP Columns Layers Pseudo-spectral (gpts) other
Strategies for OpenACC Parallelism Solver Gas Optics
OpenACC – example gas optics
Future Outlook C++ implementation, Hackathon, etc. http://www.openacc.org/content/openacc-hackathon-tu-dresdenforschungzentrum-julich