Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computación algebraica dispersa con GPUs y su aplicación en tomografía electrónica Non-linear iterative optimization method for locating particles using.

Similar presentations


Presentation on theme: "Computación algebraica dispersa con GPUs y su aplicación en tomografía electrónica Non-linear iterative optimization method for locating particles using."— Presentation transcript:

1 Computación algebraica dispersa con GPUs y su aplicación en tomografía electrónica Non-linear iterative optimization method for locating particles using HPC techniques G. Ortega 1, J. Lobera 2, I. García 3, M. P. Arroyo 2 and E. M. Garzón 1 1 Dpt. Computer Architecture and Electronics. University of Almería 2 Aragón Institute of Engineering Research (I3A). University of Zaragoza 3 Dpt. Computer Architecture. University of Málaga Heteropar 2014

2 Non-linear iterative optimization method for locating particles using HPC techniques 2 Introduction Non-Linear ODT model for locating particles (NLODT-P) Methodology to develop models GPU computing (Forward procedure) Validation of NLODT-P Evaluation of NLODT-P Conclusions Outline

3 Non-linear iterative optimization method for locating particles using HPC techniques ODT characteristics: ODT has been recently introduced in several fields. High accuracy with non-damaging radiation and its imaging capability to recover information from the object. 3 Optical Diffraction Tomography (ODT)

4 Non-linear iterative optimization method for locating particles using HPC techniques 4 Optical Diffraction Tomography (ODT) 2D holograms Laser ODT techniques

5 Non-linear iterative optimization method for locating particles using HPC techniques 5 2D holograms Laser ODT techniques A priori information INPUTS Set of seeding particle locations OUTPUT Diameter of the particles Refractive index of particles Fluid refractive index Optical Diffraction Tomography (ODT)

6 Non-linear iterative optimization method for locating particles using HPC techniques NLODT  Preliminary model 2D (proposed by J. Lobera and J.M. Coupland)  High computational cost. The resolution of large and sparse linear systems of equations of complex numbers in double precision was required. 6 Optical Diffraction Tomography (ODT) Linear Optical Diffraction Tomography (LODT) Non Linear Optical Diffraction Tomography (NLODT) Overcome the problems of multiple scattering in LODT High fidelity images Implementation and validation of a 3D NLODT model for the location of particles

7 Non-linear iterative optimization method for locating particles using HPC techniques Helmholtz equation : 7 Non-Linear ODT model for locating particles (NLODT-P) E r (r)  illuminating field. E s (r)  Scattering field. n (r)  Refractive index of the object to reconstruct. f (r)  Scattering potential. Reconstruction problem  Optimization problem Optimization Method CG Estimation of n(r)

8 Non-linear iterative optimization method for locating particles using HPC techniques 8 1: Compute PL(1) 2: It = 2 3: while It < iterMax and Value < threshold do 4: Update n(r) 5: for i = 1, 2,... until i < Nh do 6: E i s (r) = Forward (E i r (r), n(r)) 7: E i r,n (r) = E i s (r) + E i r (r) 8: E i c (r) = Filter (E i s (r), k i 0,NA) 9: E i m,n (r) = Forward ((E i c (r) − E i m (r)) ∗, n(r)) 10: E i m,n (r) = E i m,n (r) + (E i c (r) − E i m (r)) ∗ 11: g(r) ∗ = g(r) ∗ + (E i m,n (r), E i r,n (r)) 12: end for 13: gMF (r) = Matched Filtering(g(r), sample) 14: [PL(It), Value] = max(abs(gMF (r))) 15: It = It + 1 16: end Non-Linear ODT model for locating particles (NLODT-P) Locate next particle Initial Iteration Compute the updated gradient, g(r) Iterative Process Location of the 1st particle Update the refractive index field

9 Non-linear iterative optimization method for locating particles using HPC techniques 9 1: Compute PL(1) 2: It = 2 3: while It < iterMax and Value < threshold do 4: Update n(r) 5: for i = 1, 2,... until i < Nh do 6: E i s (r) = Forward (E i r (r), n(r)) 7: E i r,n (r) = E i s (r) + E i r (r) 8: E i c (r) = Filter (E i s (r), k i 0,NA) 9: E i m,n (r) = Forward ((E i c (r) − E i m (r)) ∗, n(r)) 10: E i m,n (r) = E i m,n (r) + (E i c (r) − E i m (r)) ∗ 11: g(r) ∗ = g(r) ∗ + (E i m,n (r), E i r,n (r)) 12: end for 13: gMF (r) = Matched Filtering(g(r), sample) 14: [PL(It), Value] = max(abs(gMF (r))) 15: It = It + 1 16: end Non-Linear ODT model for locating particles (NLODT-P) Initial Iteration Iterative Process Resolution Helmholtz equation 95% Runtime

10 Non-linear iterative optimization method for locating particles using HPC techniques 10 Methodology to develop models (MATLAB + GPU computing) Multithreaded High programming language level Visualization tools Libraries: GPUmat, Jacket, Mexfiles Accelerate procedures with high computational cost: Forward (Helmholtz equation)

11 Non-linear iterative optimization method for locating particles using HPC techniques 11 GPU computing (Forward procedure) 11 Green’s Functions Spatial Discretization (based on FEM) Large linear system of equations M is sparse, symmetric and with a regular pattern Linear Eliptic Partial Differential of Equations (PDE) Biconjugate Gradient Method (BCG)

12 Non-linear iterative optimization method for locating particles using HPC techniques 12 Regular Format dots saxpy SpMV Biconjugate Gradient Method (BCG) GPU computing (Forward procedure)

13 Non-linear iterative optimization method for locating particles using HPC techniques 13 GPU computing (Forward procedure) Regularities of M 1.Complex symmetric matrix 2.Max 7 nonzeros/row 3.Nonzeros are located by 7 diagonals 4.Same values for lateral diagonals (a, b, c) Mem. Req. (GB) for storing M: VolTP CRS ELLR-T Reg Format 160 3 0.55 0.44 0.06 640 3 35.14 28.33 3.91 1600 3 549.22 442.57 61.04

14 Non-linear iterative optimization method for locating particles using HPC techniques CUDA interface. CUBLAS library for saxpy and dot operations. Optimization techniques: The reading of the sparse matrix and data involved in vector operations are coalesced global memory access, this way the bandwidth of global memory is maximized. Shared memory and registers are used to store any intermediate data. 14 GPU computing (Forward procedure)

15 Non-linear iterative optimization method for locating particles using HPC techniques 15 Validation of NLODT-P Volume to reconstruct 4 particles of 2µm = 0.633µm NA = 0.55 Vol= 160 3 N h = 1 A PRIORI INFORMATION: Diameter of the particles Refractive index of particles Fluid refractive index

16 Non-linear iterative optimization method for locating particles using HPC techniques 16 Validation of NLODT-P 1ª iteration Evolution of the gradient of the cost function, g(r) Evolution of the distribution of the refractive index, n(r) 4ª iteration 2ª iteration 3ª iteration

17 Non-linear iterative optimization method for locating particles using HPC techniques 17 Evaluation of NLODT-P Tesla M2090 Peak performance (double precision) (GFlops) 665 Peak performance (simple precision) (GFlops) 1331 Device memory (GB)6 Clock rate (GHz)1.3 Memory bandwidth (GBytes/sec ) 177 Multiprocessors16 CUDA cores512 Compute Capability2 Year architecture2011 DRAM TYPEGDDR5 GPU architecture (CUDA programming interface) Multicore: Intel Xeon E5620 (8 cores) (2.4 GHz, 48 GB RAM) under Linux.

18 Non-linear iterative optimization method for locating particles using HPC techniques 18 Evaluation of NLODT-P 4 particles of 0.5µm = 0.633µm NA = 0.55 Vol= 200 3- 280 3 N h = 3 iterMax=4

19 Non-linear iterative optimization method for locating particles using HPC techniques A physical model of Non-linear ODT for the application in velocimetry techniques has been implemented and evaluated over 3D prototypes of interest (NLODT-P). MATLAB framework  productivity in the generation of codes. Integration of GPU computing and MATLAB  Mex-files. Illustration of a methodology for the development of scientific applications. GPU memory is a limiting factor. Future works: distributed implementation. 19 Conclusions

20 Non-linear iterative optimization method for locating particles using HPC techniques Distributed version of Forward 20 Hybrid Implementation (BCG-3DH) MultiGPU ImplementationMulticore CPU Implementation + Solve larger problems Runtime is reduced CUDAMessage Passing Interface (MPI) Conclusions

21 Non-linear iterative optimization method for locating particles using HPC techniques 21 Runtime (s) of 1000 iterations of the BCG-3DH using 8GPUs+8CPUs versus 8GPUs Vol Conclusions

22 Non-linear iterative optimization method for locating particles using HPC techniques Contraportada 22


Download ppt "Computación algebraica dispersa con GPUs y su aplicación en tomografía electrónica Non-linear iterative optimization method for locating particles using."

Similar presentations


Ads by Google