Presentation is loading. Please wait.

Presentation is loading. Please wait.

Large-scale geophysical electromagnetic imaging and modeling on graphical processing units Michael Commer (LBNL) Filipe R. N. C. Maia (LBNL-NERSC) Gregory.

Similar presentations


Presentation on theme: "Large-scale geophysical electromagnetic imaging and modeling on graphical processing units Michael Commer (LBNL) Filipe R. N. C. Maia (LBNL-NERSC) Gregory."— Presentation transcript:

1 Large-scale geophysical electromagnetic imaging and modeling on graphical processing units Michael Commer (LBNL) Filipe R. N. C. Maia (LBNL-NERSC) Gregory A. Newman (LBNL)

2 Overview  Introduction: Geophysical modeling on GPUs  Iterative Krylov solvers on GPU and implementation details  Krylov solver performance tests  Conclusions

3 CSEM data inversion using QMR EMGeo-GPU has already been run successfully 16 NVIDIA Tesla C 2050 (Fermi) GPUs, 3 GB memory, 448 parallel CUDA processor cores Compared to 16  8 Intel Quad core Nehalem, 2.4 GHz CSEM imaging experiment of Troll gas field (North Sea)

4 ERT data inversion using CG CO 2 plume imaging study

5 SIP data inversion using BiCG Rifle SIP monitoring study

6 Finite-difference representation of Maxwell and Poisson equations Maxwell equation  13-point stencil Poisson equation  7-point stencil

7 Iterative Krylov subspace methods Solution of the linear system involves constructing the Krylov subspace in order to compute the optimal approximation

8 Numerical modeling on GPUs Main challenge: Manage memory access in most efficient way

9 Sparse matrix types arising in electrical and electromagnetic modeling problems Maxwell: Controlled-source EM, Magnetotelluric Poisson: Electrical resistivity tomography, Induced polarization

10 Sparse Matrix Storage Formats Diagonal (DIA) StructuredUnstructured Ellpack (ELL) Compressed Row (CSR) Hybrid (HYB) Coordinate (COO)

11 ELLPACK Format  Storage of N non-zeros per matrix row  Zero-padding for rows with < N non-zeros  Ease of implementation

12 ELL SpMV GPU implementation n – number of rows in the matrix (large) m – max number of non-zeros per row (small) Index matrixValue matrix x y

13 ELL SpMV GPU implementation Memory position with matrix element 1,3 GPU thread number 1 One thread per row, row concatenation.

14 ELL SpMV GPU implementation Memory position with matrix element 1,3 GPU thread number 1 One thread per row, row concatenation.

15 ELL SpMV GPU implementation Memory position with matrix element 1,3 GPU thread number 1 One thread per row, row concatenation.

16 ELL SpMV GPU implementation Memory position with matrix element 1,3 GPU thread number 1 Memory access not coalesced! One thread per row, row concatenation.

17 ELL SpMV GPU implementation Memory position with matrix element 1,3 GPU thread number 1 Many threads per row, row concatenation. Coalesced reads.

18 ELL SpMV GPU implementation Memory position with matrix element 1,3 GPU thread number 1 Many threads per row, row concatenation. In block reduction.

19 ELL SpMV GPU implementation Memory position with matrix element 1,3 GPU thread number 1 Many threads per row, row concatenation. Reduction and writing rhs are slow! In block reduction.

20 ELL SpMV GPU implementation Memory position with matrix element 1,3 GPU thread number 1 One thread per row, column concatenation. (from another block)

21 ELL SpMV GPU implementation Memory position with matrix element 1,3 GPU thread number 1 One thread per row, column concatenation. (from another block)

22 ELL SpMV GPU implementation Memory position with matrix element 1,3 GPU thread number 1 One thread per row, column concatenation. Coalesced reads and no reductions (from another block)

23 ELL SpMV GPU implementation For 13 non zero elements per row on a Tesla C2050.

24 Minimize Memory Bandwidth Use fused kernels. Use pointer swaps instead of memory copies when possible.

25 CPU communication

26 Multi GPU communication Use the same layout for vectors on the CPU and GPU. Simplifies MPI communication routines. Extra complication of the data transfer to CPU.

27 Multi GPU communication GPU communication diagram.

28 Multi GPU communication Blocking communication

29 Multi GPU communication Non blocking communication

30 Iterative Krylov solver performance tests Typically used for EM problems: CG, BiCG, QMR

31 Computing times for 1000 Krylov solver iterations

32 SpMV with “Constant-Coefficient-Matrix” Vector Helmholtz equation  =2  f

33 Choose Dirichlet boundary conditions such that the operator   ℝ n  n SpMV with Constant-Coefficient-Matrix

34

35

36 Pseudo code for SpMV with “standard” matrix: Ax=b

37 Pseudo code for SpMV with Constant- Coefficient-Matrix: Cx+dx=b Scaling of solution vector Scaling of rhs vector

38 QMR solver performace on CPU & GPU using CCM – solution times for 1000 Krylov solver iterations Example grid size: 190  190  100

39 QMR solver performace on GPU using CCM – memory throughput

40 Grid intervals  Coefficients Example grid size: 100  100  100

41 Grid intervals  Solution times Increase in computing time:  17 %

42 Grid intervals  Memory usage Only significant portion given by index array

43 Conclusions Our GPU implementation of iterative Krylov methods exploits massive parallelism of modern GPU hardware Efficiency increases with problem size Memory limitations are overcome by multi-GPU scheme and novel SpMV method for structured grids

44 Thanks to National Energy Research Scientific Computing Center (NERSC) for support provided through NERSC Petascale Program


Download ppt "Large-scale geophysical electromagnetic imaging and modeling on graphical processing units Michael Commer (LBNL) Filipe R. N. C. Maia (LBNL-NERSC) Gregory."

Similar presentations


Ads by Google