Download presentation
Presentation is loading. Please wait.
Published bySheila Gaines Modified over 8 years ago
1
Large-scale geophysical electromagnetic imaging and modeling on graphical processing units Michael Commer (LBNL) Filipe R. N. C. Maia (LBNL-NERSC) Gregory A. Newman (LBNL)
2
Overview Introduction: Geophysical modeling on GPUs Iterative Krylov solvers on GPU and implementation details Krylov solver performance tests Conclusions
3
CSEM data inversion using QMR EMGeo-GPU has already been run successfully 16 NVIDIA Tesla C 2050 (Fermi) GPUs, 3 GB memory, 448 parallel CUDA processor cores Compared to 16 8 Intel Quad core Nehalem, 2.4 GHz CSEM imaging experiment of Troll gas field (North Sea)
4
ERT data inversion using CG CO 2 plume imaging study
5
SIP data inversion using BiCG Rifle SIP monitoring study
6
Finite-difference representation of Maxwell and Poisson equations Maxwell equation 13-point stencil Poisson equation 7-point stencil
7
Iterative Krylov subspace methods Solution of the linear system involves constructing the Krylov subspace in order to compute the optimal approximation
8
Numerical modeling on GPUs Main challenge: Manage memory access in most efficient way
9
Sparse matrix types arising in electrical and electromagnetic modeling problems Maxwell: Controlled-source EM, Magnetotelluric Poisson: Electrical resistivity tomography, Induced polarization
10
Sparse Matrix Storage Formats Diagonal (DIA) StructuredUnstructured Ellpack (ELL) Compressed Row (CSR) Hybrid (HYB) Coordinate (COO)
11
ELLPACK Format Storage of N non-zeros per matrix row Zero-padding for rows with < N non-zeros Ease of implementation
12
ELL SpMV GPU implementation n – number of rows in the matrix (large) m – max number of non-zeros per row (small) Index matrixValue matrix x y
13
ELL SpMV GPU implementation Memory position with matrix element 1,3 GPU thread number 1 One thread per row, row concatenation.
14
ELL SpMV GPU implementation Memory position with matrix element 1,3 GPU thread number 1 One thread per row, row concatenation.
15
ELL SpMV GPU implementation Memory position with matrix element 1,3 GPU thread number 1 One thread per row, row concatenation.
16
ELL SpMV GPU implementation Memory position with matrix element 1,3 GPU thread number 1 Memory access not coalesced! One thread per row, row concatenation.
17
ELL SpMV GPU implementation Memory position with matrix element 1,3 GPU thread number 1 Many threads per row, row concatenation. Coalesced reads.
18
ELL SpMV GPU implementation Memory position with matrix element 1,3 GPU thread number 1 Many threads per row, row concatenation. In block reduction.
19
ELL SpMV GPU implementation Memory position with matrix element 1,3 GPU thread number 1 Many threads per row, row concatenation. Reduction and writing rhs are slow! In block reduction.
20
ELL SpMV GPU implementation Memory position with matrix element 1,3 GPU thread number 1 One thread per row, column concatenation. (from another block)
21
ELL SpMV GPU implementation Memory position with matrix element 1,3 GPU thread number 1 One thread per row, column concatenation. (from another block)
22
ELL SpMV GPU implementation Memory position with matrix element 1,3 GPU thread number 1 One thread per row, column concatenation. Coalesced reads and no reductions (from another block)
23
ELL SpMV GPU implementation For 13 non zero elements per row on a Tesla C2050.
24
Minimize Memory Bandwidth Use fused kernels. Use pointer swaps instead of memory copies when possible.
25
CPU communication
26
Multi GPU communication Use the same layout for vectors on the CPU and GPU. Simplifies MPI communication routines. Extra complication of the data transfer to CPU.
27
Multi GPU communication GPU communication diagram.
28
Multi GPU communication Blocking communication
29
Multi GPU communication Non blocking communication
30
Iterative Krylov solver performance tests Typically used for EM problems: CG, BiCG, QMR
31
Computing times for 1000 Krylov solver iterations
32
SpMV with “Constant-Coefficient-Matrix” Vector Helmholtz equation =2 f
33
Choose Dirichlet boundary conditions such that the operator ℝ n n SpMV with Constant-Coefficient-Matrix
36
Pseudo code for SpMV with “standard” matrix: Ax=b
37
Pseudo code for SpMV with Constant- Coefficient-Matrix: Cx+dx=b Scaling of solution vector Scaling of rhs vector
38
QMR solver performace on CPU & GPU using CCM – solution times for 1000 Krylov solver iterations Example grid size: 190 190 100
39
QMR solver performace on GPU using CCM – memory throughput
40
Grid intervals Coefficients Example grid size: 100 100 100
41
Grid intervals Solution times Increase in computing time: 17 %
42
Grid intervals Memory usage Only significant portion given by index array
43
Conclusions Our GPU implementation of iterative Krylov methods exploits massive parallelism of modern GPU hardware Efficiency increases with problem size Memory limitations are overcome by multi-GPU scheme and novel SpMV method for structured grids
44
Thanks to National Energy Research Scientific Computing Center (NERSC) for support provided through NERSC Petascale Program
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.