Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Library Approach to GPU Computations of Initial Value Problems Dave Yuen University of Minnesota, U.S.A. with Larry Hanyk and Radek Matyska Charles.

Similar presentations


Presentation on theme: "The Library Approach to GPU Computations of Initial Value Problems Dave Yuen University of Minnesota, U.S.A. with Larry Hanyk and Radek Matyska Charles."— Presentation transcript:

1 The Library Approach to GPU Computations of Initial Value Problems Dave Yuen University of Minnesota, U.S.A. with Larry Hanyk and Radek Matyska Charles University Prague, Czech Republic

2 NVIDIA GPUs CUDA: Compute Unified Device Architecture the most popular GPGPU parallel programming model today from notebooks and personal desktops to high performance computing the GPU with many cores runs the kernel concurrently by many threads two-level hardware parallelism on a GPU: SIMD and MIMD a programming model reflects the hardware parallelism: blocks and grids

3 G80 (since 2006): compute capability 1.0, 1.1 features 1.1: 8 cores/multiprocessor, single-precision real arithmetic GT200 (since 2008): compute capability 1.2, 1.3 features 1.3: 8 cores/MP, max. 30 x 8 = 240 cores, first double precision Fermi (since 2010): compute capability 2.0, 2.1 features 2.0: 32 cores/MP, max. 16 x 32 = 512 cores, fast double precision, cache features 2.1: 48 cores/MP, max. 8 x 48 = 384 cores, slower double precision Kepler (since 2012): compute capability 3.0 features 3.0: 192 cores/MP, max. 8 x 192 = 1536 cores slower double precision NVIDIA GPUs Architectures and compute capability

4 NVIDIA GPUs Product lines (classes) GeForce for games and PC graphics, most often found in desktops and notebooks Quadro for professional graphics Tesla for high-performance computing (CC 1.3, 2.0) http://developer.nvidia.com/cuda-gpus

5 GeForce for desktops fast double precision (troughput lower than in Tesla) compute capability 2.0 most appropriate: in DP, better than 2.1, 3.0 up to 2 x 16 multiprocessors, 32 x 32 = 1024 cores (e.g., GTX 590) GeForce for notebooks most often armed with compute capability 2.1 up to 8 multiprocessors, 8 x 48 = 384 cores (e.g., GTX 675M) NVIDIA GPUs GeForce class and GPU computing

6 CUDA Developer Tools Library approach Languages vs. Libraries 1. Languages: intrinsic difficulty in learning CUDA low-level model NVIDIA toolkit, PGI compiler suite (Fortran and C), OpenCL 2. Directives: higher-level OpenMP-like model PGI and Cray directives, standardized OpenACC directives 3. Libraries: fundamental libraries by NVIDIA (CUBLAS, CUFFT,...) libraries for linear algebra (MAGMA, CULA Tools,...) MATLAB approach (Jacket, Parallel Computing Toolbox,...) GPU support in NAG and IMSL Overview: http://developer.nvidia.com/cuda-tools-ecosystem With directives and libraries, programmer’s effort simplified. Great benefits without being CUDA experts.

7 CUDA Developer Tools Fundamental libraries by NVIDIA CUDA Toolkit C/C++ low-level compiler (nvcc) and API library debugger, profiler numerical libraries: CUBLAS (accelerated BLAS) CUSPARSE (accelerated Sparse BLAS) CUFFT (accelerated FFT in 1-2-3D) CURAND (accelerated random-number generators) CUDA Math, Performance Primitives, Thrust,... current version: 4.2 (incl. Kepler support), coming soon: 5.0

8 CUDA Developer Tools OpenCL and MATLAB tools OpenCL (Open Computing Language) C based language http://www.khronos.org Jacket and ArrayFire Jacket: GPU platform for MATLAB („better than the Parallel Computing Toolbox“ by MathWorks) ArrayFire: GPU library for C, C++, Fortran, Python http://www.accelereyes.com

9 MAGMA (Matrix Algebra on GPU and Multicore Archs) Hybrid library for dense algebra (LAPACK) for multicore CPU and GPU http://icl.cs.utk.edu/magma CULA Tools (CUDA Linear Algebra) CULA Dense: dense solvers (LU, QR, LSQ, EV, SVD) CULA Sparse: sparse solvers (CG, BCG, GMRES, preconditioners) http://www.culatools.com FLAME, SuperLU, Iterative CUDA and more CUDA Developer Tools Linear algebra libraries

10 PGI compiler suite PGI Workstation, Server, CUDA Fortran, CUDA-x86, Accelerator current version 12.5 (5th minor version in 2012, 10 versions in 2011) for Linux, Mac OS X and Microsoft Windows, 32/64 bit with OpenMP, MPI, parallel debugger and profiler with ACML and IMSL libraries, linkable with Intel MKL with Eclipse IDE (Linux) or Microsoft Visual Studio (Windows) http://www.pgroup.com CUDA Developer Tools Languages and directives by Portland Group

11 PGI compilers CUDA Fortran and C extensions CUDA-x86: emulation of CUDA Fortran/C codes on multicore CPUs NVIDIA compiler and libraries included Directives PGI Accelerator OpenMP-like prototype directives OpenACC directives: a new (2012) standardized specification by NVIDIA, PGI, Cray and CAPS CUDA Developer Tools Languages and directives by Portland Group

12 OpenACC example CPU serial code do i=1,imax a(i)=a(i)+z enddo CUDA Developer Tools OpenACC directives GPU accelerated code !$ACC KERNEL !$ACC LOOP do i=1,imax a(i)=a(i)+z enddo !$ACC END KERNELS For finer control: gang (~ grid), worker (~ warp), vector (~ thread), sequential run !$ACC LOOP GANG WORKER VECTOR SEQ

13 A Library GPU Approach to Initial Value Problems L. Hanyk, D. Yuen and C. Matyska for Lecture Notes in Earth Sciences Springer Verlag Chapter 1 Basic framework Chapter 2 Introduction to GPU Chapter 3 PGI CUDA Fortran Chapter 4 OpenACC and PGI Accelerator directives Chapter 5 Libraries for linear algebra and FFT Chapter 6 Elliptic PDEs. FFT-based solvers. Eigenvalue problems Chapter 7 Initial value problems for ODEs Chapter 8 Method of lines Chapter 9 Initial value problems for diffusion PDEs Chapter 10 Initial value problems for advection-diffusion PDEs Chapter 11 Initial value problems for hyperbolic PDEs Chapter 12 Future perspectives of GPUs in geosciences

14 A Library GPU Approach to Initial Value Problems Also included: Introductory examples CUDA templates. Mandelbrot set. Lyapunov fractals Linear algebra and static solvers Linear algebra and FFT with CPU libraries and GPU libraries Classical and fast Poisson solvers Eigenmodes of a 1D string and 2D drum Initial value solvers ODEs: Lorenz attractor by Runge-Kutta methods and BDFs Method of lines and implicit ODE solvers Heat equation by Crank-Nicolson and ADI schemes Stokes problem Wave equation Multi-GPU solutions Approaches by OpenMP and MPI

15 Conclusions Languages high expertise required fast code possible Directives low workload on users demanding implementation (work in progress) Libraries highly optimized building blocks, most often linear algebra and FFT general algorithms like ODEs and PDEs solvers based on standard building blocks repay best in hands of both experts and regular users


Download ppt "The Library Approach to GPU Computations of Initial Value Problems Dave Yuen University of Minnesota, U.S.A. with Larry Hanyk and Radek Matyska Charles."

Similar presentations


Ads by Google