Presentation is loading. Please wait.

Presentation is loading. Please wait.

GPU Acceleration in Registration Danny Ruijters 26 April 2007.

Similar presentations


Presentation on theme: "GPU Acceleration in Registration Danny Ruijters 26 April 2007."— Presentation transcript:

1 GPU Acceleration in Registration Danny Ruijters 26 April 2007

2 GPU Acceleration in Registration, Danny Ruijters2 Outline The GPU Rigid 3D-3D Registration Elastic Registration Conclusions

3 GPU Acceleration in Registration, Danny Ruijters3 The GPU

4 GPU Acceleration in Registration, Danny Ruijters4 The graphics card Raserization of primitives Texture mapping Colour interpolation

5 GPU Acceleration in Registration, Danny Ruijters5 The GPU Graphics Processing Unit Programmable processor in the graphics rendering pipeline Parallel execution (SIMD like)

6 GPU Acceleration in Registration, Danny Ruijters6 on-chip cache memory video memory system memory rasterization CPU vertex shading (T&L) triangle setup fragment shading and raster operations textures frame buffer geometry commands pre-TnL cache post-TnL cache texture cache Graphics rendering pipeline

7 GPU Acceleration in Registration, Danny Ruijters7 Bottlenecks on-chip cache memoryvideo memory system memory rasterization CPU vertex shading (T&L) triangle setup fragment shading and raster operations textures frame buffer geometry commands pre-TnL cache post-TnL cache texture cache transform limited fragment shader limited CPU limited texture limited frame buffer limited setup limited raster limited transfer limited

8 GPU Acceleration in Registration, Danny Ruijters8 128 processing units Local cache Shared memory

9 GPU Acceleration in Registration, Danny Ruijters9 Performance

10 GPU Acceleration in Registration, Danny Ruijters10 Performance Parallelism & pipelining (up to 16 parallel pipelines) Vector processor Moore’s Law: CPU: 2* performance per 18 months GPU: 2* performance per 6 months GeForce 7900 GTXGeForce 8800 GTX Code nameG71G80 Release date3 / 200611 / 2006 Transistors278 M (90 nm)681 M (90 nm) Clock speed650 MHz1350 MHz Processing units24+8 (pixel + vertex)128 (unified) Peak pixel fill rate10.4 Gigapixels/s36.8 Gigapixels/s Peak memory bandwidth51.2 GB/s (256 bit)86.4 GB/s (384 bit) Memory512 MB768 MB Peak performance250 Gigaflops520 Gigaflops

11 GPU Acceleration in Registration, Danny Ruijters11 Textures & buffers 1D, 2D, 3D textures 2D output buffers (frame buffer, accumulation buffer, stencil buffer, p-buffer) 8, 10, 12, 16 bit integers, 16, 32 bit floating point 1 (intensity), 2 (luminance-alpha), 3 (RGB), 4 (RGBA) components per pixel

12 GPU Acceleration in Registration, Danny Ruijters12 Historic overview GPU RenderMan (1988, pre-history) Intel MMX (SIMD, 1997, pre-history) Register combiners (nVidia, 1999, bronze age) Vender specific APIs (2001, iron age) Generic assembly-like language (2002, middle-ages) Different high-level languages (2003, industrial age) CUDA: general purpose C-like language (2007, modern age)

13 GPU Acceleration in Registration, Danny Ruijters13 Register combiners (1999, bronse age) // Stage 0 // spare0.rgb = gradient dot ViewDir, spare1.rgb = -(gradient dot ViewDir) glCombinerInputNV(GL_COMBINER0_NV,GL_RGB,GL_VARIABLE_A_NV,GL_TEXTURE0_A RB,GL_EXPAND_NORMAL_NV,GL_RGB); glCombinerInputNV(GL_COMBINER0_NV,GL_RGB,GL_VARIABLE_B_NV,GL_CONSTANT_ COLOR1_NV,GL_EXPAND_NORMAL_NV,GL_RGB); glCombinerInputNV(GL_COMBINER0_NV,GL_RGB,GL_VARIABLE_C_NV,GL_TEXTURE0_A RB,GL_EXPAND_NEGATE_NV,GL_RGB); glCombinerInputNV(GL_COMBINER0_NV,GL_RGB,GL_VARIABLE_D_NV,GL_CONSTANT_ COLOR1_NV,GL_EXPAND_NORMAL_NV,GL_RGB); glCombinerOutputNV(GL_COMBINER0_NV,GL_RGB,GL_SPARE0_NV,GL_SPARE1_NV,GL_ DISCARD_NV,GL_NONE,GL_NONE,GL_TRUE,GL_TRUE,GL_FALSE);

14 GPU Acceleration in Registration, Danny Ruijters14 GL_ARB_fragment_program (2002) !!ARBfp1.0 ATTRIB coord = fragment.texcoord[0]; ATTRIB color = fragment.color; OUTPUT out = result.color; TEMP texel; TEMP lookup; TEX texel, coord, texture[0], 3D; TEX lookup, texel, texture[1], 1D; MUL out, lookup, color; END

15 GPU Acceleration in Registration, Danny Ruijters15 GLSlang (2003) uniform vec3 ViewDir; void main (void) { float value; vec3 gradient; gradient = texture3(0, gl_TexCoord0) * 2.0 - 1.0; value = 1.0 - abs(dot(gradient, ViewDir)); value *= 1.3 * dot(gradient, gradient); value = clamp(value, 0.0, 1.0); gl_FragColor = vec4(value); }

16 GPU Acceleration in Registration, Danny Ruijters16 CUDA (2007) Compute Unified Device Architecture General purpose C-like language nVidia only Very recently released

17 GPU Acceleration in Registration, Danny Ruijters17 Rigid 3D-3D Registration

18 GPU Acceleration in Registration, Danny Ruijters18 3DRA – MR registration

19 GPU Acceleration in Registration, Danny Ruijters19 3DRA – XperCT Registration 1 Pre-operative

20 GPU Acceleration in Registration, Danny Ruijters20 3DRA – XperCT Registration 2 Post-operative: verification of the embolization

21 GPU Acceleration in Registration, Danny Ruijters21 3DRA Slice

22 GPU Acceleration in Registration, Danny Ruijters22 Mutual information F. Maes et al., "Multimodality Image Registration by Maximization of Mutual Information,“ IEEE Transactions on Medical Imaging 16(2), pp. 187-198, April 1997

23 GPU Acceleration in Registration, Danny Ruijters23 Joint histogram

24 GPU Acceleration in Registration, Danny Ruijters24 Resampling Joint histogram: increment(g,g)

25 GPU Acceleration in Registration, Danny Ruijters25 3DRA – MR, before, after

26 GPU Acceleration in Registration, Danny Ruijters26 3DRA – MR: CPU interpolation

27 GPU Acceleration in Registration, Danny Ruijters27 3DRA – MR: GPU interpolation

28 GPU Acceleration in Registration, Danny Ruijters28 Elastic Registration

29 GPU Acceleration in Registration, Danny Ruijters29 Elastic deformation Parameterized deformation: B-spline deformation:

30 GPU Acceleration in Registration, Danny Ruijters30 Cubic B-spline

31 GPU Acceleration in Registration, Danny Ruijters31 GPU linear interpolation Hardwired: linear interpolation is much faster than separate lookups

32 GPU Acceleration in Registration, Danny Ruijters32 GPU Cubic Interpolation Compose cubic interpolation from weighted sum of linear interpolations: = C. Sigg, M. Hadwiger, “Fast Third-Order Texture Filtering”, GPU Gems 2

33 GPU Acceleration in Registration, Danny Ruijters33 Outline of proof =

34 GPU Acceleration in Registration, Danny Ruijters34 GPU Cubic Interpolation 2D: 4 linear-interpolated lookups, instead of 16 direct lookups 3D: 8 linear-interpolated lookups, instead of 64 direct lookups

35 GPU Acceleration in Registration, Danny Ruijters35 GPU Linear Interpolation Accuracy

36 GPU Acceleration in Registration, Danny Ruijters36 Linear deformation, linear interpolation

37 GPU Acceleration in Registration, Danny Ruijters37 Linear deformation, cubic interpolation

38 GPU Acceleration in Registration, Danny Ruijters38 Cubic deformation, linear interpolation

39 GPU Acceleration in Registration, Danny Ruijters39 Cubic deformation, cubic interpolation

40 GPU Acceleration in Registration, Danny Ruijters40 Optimization Many parameters: huge parameter space Solution: use derivatives like Jacobian, Hessian Examples: Gradient Descent, Quasi-Newton, Levenberg-Marquardt

41 GPU Acceleration in Registration, Danny Ruijters41 GPU Elastic Registration Iteration 1.Generate deformed image on GPU & store to texture 2.Calculate Similarity Measure & First-Order Derivative on GPU –Texture with reference image –Texture with deformed image

42 GPU Acceleration in Registration, Danny Ruijters42 First-Order Derivative of Sim. Measure J. Kybic, M. Unser, “Fast Parametric Elastic Image Registration”

43 GPU Acceleration in Registration, Danny Ruijters43 Derivative of the Similarity Measure SSD:

44 GPU Acceleration in Registration, Danny Ruijters44 Derivative of the Deformed Image Sobel operator to calculate gradients: 01 -404 01 141 000 -4

45 GPU Acceleration in Registration, Danny Ruijters45 Derivative of the Control Points Constant B-spline: separatable kernel of fixed size

46 GPU Acceleration in Registration, Danny Ruijters46 Original Fluoroscopy Sequence

47 GPU Acceleration in Registration, Danny Ruijters47 2 * 2 Control Points

48 GPU Acceleration in Registration, Danny Ruijters48 8 * 8 Control Points

49 GPU Acceleration in Registration, Danny Ruijters49 Deformation Field

50 GPU Acceleration in Registration, Danny Ruijters50 GPU Elastic Registration 40 images: Quasi Newton: 16 seconds Gradient Descent: 63 seconds 8 * 8 Control Points: rest motion Multi-resolution deformation field, with reduced parameters (discussed with Dirk Loeckx)

51 GPU Acceleration in Registration, Danny Ruijters51 CUDA Libraries

52 GPU Acceleration in Registration, Danny Ruijters52 CUDA Software Stack

53 GPU Acceleration in Registration, Danny Ruijters53 CUDA Libraries CUBLAS CUFFT

54 GPU Acceleration in Registration, Danny Ruijters54 CUBLAS Basic Linear Algebra Subprograms Vector, Matrix, Numerical Math Almost no initialization Function calls

55 GPU Acceleration in Registration, Danny Ruijters55 CUBLAS performance

56 GPU Acceleration in Registration, Danny Ruijters56 CUBLAS performance

57 GPU Acceleration in Registration, Danny Ruijters57 CUFFT performance

58 GPU Acceleration in Registration, Danny Ruijters58 Conclusion & Future work

59 GPU Acceleration in Registration, Danny Ruijters59 Conclusions GPU: powerful parallel processor, but has its limitations Rigid Registration: interpolation on the GPU Elastic Registration: calculation of the Similarity Measure & first order derivative on the GPU

60 GPU Acceleration in Registration, Danny Ruijters60 Future work Multi-resolution deformation fields 2D-3D registration of the Coronary Arteries (not presented)

61 GPU Acceleration in Registration, Danny Ruijters61 Questions?

62


Download ppt "GPU Acceleration in Registration Danny Ruijters 26 April 2007."

Similar presentations


Ads by Google