Download presentation
Presentation is loading. Please wait.
Published byChester Woods Modified over 9 years ago
1
GPU Acceleration in Registration Danny Ruijters 26 April 2007
2
GPU Acceleration in Registration, Danny Ruijters2 Outline The GPU Rigid 3D-3D Registration Elastic Registration Conclusions
3
GPU Acceleration in Registration, Danny Ruijters3 The GPU
4
GPU Acceleration in Registration, Danny Ruijters4 The graphics card Raserization of primitives Texture mapping Colour interpolation
5
GPU Acceleration in Registration, Danny Ruijters5 The GPU Graphics Processing Unit Programmable processor in the graphics rendering pipeline Parallel execution (SIMD like)
6
GPU Acceleration in Registration, Danny Ruijters6 on-chip cache memory video memory system memory rasterization CPU vertex shading (T&L) triangle setup fragment shading and raster operations textures frame buffer geometry commands pre-TnL cache post-TnL cache texture cache Graphics rendering pipeline
7
GPU Acceleration in Registration, Danny Ruijters7 Bottlenecks on-chip cache memoryvideo memory system memory rasterization CPU vertex shading (T&L) triangle setup fragment shading and raster operations textures frame buffer geometry commands pre-TnL cache post-TnL cache texture cache transform limited fragment shader limited CPU limited texture limited frame buffer limited setup limited raster limited transfer limited
8
GPU Acceleration in Registration, Danny Ruijters8 128 processing units Local cache Shared memory
9
GPU Acceleration in Registration, Danny Ruijters9 Performance
10
GPU Acceleration in Registration, Danny Ruijters10 Performance Parallelism & pipelining (up to 16 parallel pipelines) Vector processor Moore’s Law: CPU: 2* performance per 18 months GPU: 2* performance per 6 months GeForce 7900 GTXGeForce 8800 GTX Code nameG71G80 Release date3 / 200611 / 2006 Transistors278 M (90 nm)681 M (90 nm) Clock speed650 MHz1350 MHz Processing units24+8 (pixel + vertex)128 (unified) Peak pixel fill rate10.4 Gigapixels/s36.8 Gigapixels/s Peak memory bandwidth51.2 GB/s (256 bit)86.4 GB/s (384 bit) Memory512 MB768 MB Peak performance250 Gigaflops520 Gigaflops
11
GPU Acceleration in Registration, Danny Ruijters11 Textures & buffers 1D, 2D, 3D textures 2D output buffers (frame buffer, accumulation buffer, stencil buffer, p-buffer) 8, 10, 12, 16 bit integers, 16, 32 bit floating point 1 (intensity), 2 (luminance-alpha), 3 (RGB), 4 (RGBA) components per pixel
12
GPU Acceleration in Registration, Danny Ruijters12 Historic overview GPU RenderMan (1988, pre-history) Intel MMX (SIMD, 1997, pre-history) Register combiners (nVidia, 1999, bronze age) Vender specific APIs (2001, iron age) Generic assembly-like language (2002, middle-ages) Different high-level languages (2003, industrial age) CUDA: general purpose C-like language (2007, modern age)
13
GPU Acceleration in Registration, Danny Ruijters13 Register combiners (1999, bronse age) // Stage 0 // spare0.rgb = gradient dot ViewDir, spare1.rgb = -(gradient dot ViewDir) glCombinerInputNV(GL_COMBINER0_NV,GL_RGB,GL_VARIABLE_A_NV,GL_TEXTURE0_A RB,GL_EXPAND_NORMAL_NV,GL_RGB); glCombinerInputNV(GL_COMBINER0_NV,GL_RGB,GL_VARIABLE_B_NV,GL_CONSTANT_ COLOR1_NV,GL_EXPAND_NORMAL_NV,GL_RGB); glCombinerInputNV(GL_COMBINER0_NV,GL_RGB,GL_VARIABLE_C_NV,GL_TEXTURE0_A RB,GL_EXPAND_NEGATE_NV,GL_RGB); glCombinerInputNV(GL_COMBINER0_NV,GL_RGB,GL_VARIABLE_D_NV,GL_CONSTANT_ COLOR1_NV,GL_EXPAND_NORMAL_NV,GL_RGB); glCombinerOutputNV(GL_COMBINER0_NV,GL_RGB,GL_SPARE0_NV,GL_SPARE1_NV,GL_ DISCARD_NV,GL_NONE,GL_NONE,GL_TRUE,GL_TRUE,GL_FALSE);
14
GPU Acceleration in Registration, Danny Ruijters14 GL_ARB_fragment_program (2002) !!ARBfp1.0 ATTRIB coord = fragment.texcoord[0]; ATTRIB color = fragment.color; OUTPUT out = result.color; TEMP texel; TEMP lookup; TEX texel, coord, texture[0], 3D; TEX lookup, texel, texture[1], 1D; MUL out, lookup, color; END
15
GPU Acceleration in Registration, Danny Ruijters15 GLSlang (2003) uniform vec3 ViewDir; void main (void) { float value; vec3 gradient; gradient = texture3(0, gl_TexCoord0) * 2.0 - 1.0; value = 1.0 - abs(dot(gradient, ViewDir)); value *= 1.3 * dot(gradient, gradient); value = clamp(value, 0.0, 1.0); gl_FragColor = vec4(value); }
16
GPU Acceleration in Registration, Danny Ruijters16 CUDA (2007) Compute Unified Device Architecture General purpose C-like language nVidia only Very recently released
17
GPU Acceleration in Registration, Danny Ruijters17 Rigid 3D-3D Registration
18
GPU Acceleration in Registration, Danny Ruijters18 3DRA – MR registration
19
GPU Acceleration in Registration, Danny Ruijters19 3DRA – XperCT Registration 1 Pre-operative
20
GPU Acceleration in Registration, Danny Ruijters20 3DRA – XperCT Registration 2 Post-operative: verification of the embolization
21
GPU Acceleration in Registration, Danny Ruijters21 3DRA Slice
22
GPU Acceleration in Registration, Danny Ruijters22 Mutual information F. Maes et al., "Multimodality Image Registration by Maximization of Mutual Information,“ IEEE Transactions on Medical Imaging 16(2), pp. 187-198, April 1997
23
GPU Acceleration in Registration, Danny Ruijters23 Joint histogram
24
GPU Acceleration in Registration, Danny Ruijters24 Resampling Joint histogram: increment(g,g)
25
GPU Acceleration in Registration, Danny Ruijters25 3DRA – MR, before, after
26
GPU Acceleration in Registration, Danny Ruijters26 3DRA – MR: CPU interpolation
27
GPU Acceleration in Registration, Danny Ruijters27 3DRA – MR: GPU interpolation
28
GPU Acceleration in Registration, Danny Ruijters28 Elastic Registration
29
GPU Acceleration in Registration, Danny Ruijters29 Elastic deformation Parameterized deformation: B-spline deformation:
30
GPU Acceleration in Registration, Danny Ruijters30 Cubic B-spline
31
GPU Acceleration in Registration, Danny Ruijters31 GPU linear interpolation Hardwired: linear interpolation is much faster than separate lookups
32
GPU Acceleration in Registration, Danny Ruijters32 GPU Cubic Interpolation Compose cubic interpolation from weighted sum of linear interpolations: = C. Sigg, M. Hadwiger, “Fast Third-Order Texture Filtering”, GPU Gems 2
33
GPU Acceleration in Registration, Danny Ruijters33 Outline of proof =
34
GPU Acceleration in Registration, Danny Ruijters34 GPU Cubic Interpolation 2D: 4 linear-interpolated lookups, instead of 16 direct lookups 3D: 8 linear-interpolated lookups, instead of 64 direct lookups
35
GPU Acceleration in Registration, Danny Ruijters35 GPU Linear Interpolation Accuracy
36
GPU Acceleration in Registration, Danny Ruijters36 Linear deformation, linear interpolation
37
GPU Acceleration in Registration, Danny Ruijters37 Linear deformation, cubic interpolation
38
GPU Acceleration in Registration, Danny Ruijters38 Cubic deformation, linear interpolation
39
GPU Acceleration in Registration, Danny Ruijters39 Cubic deformation, cubic interpolation
40
GPU Acceleration in Registration, Danny Ruijters40 Optimization Many parameters: huge parameter space Solution: use derivatives like Jacobian, Hessian Examples: Gradient Descent, Quasi-Newton, Levenberg-Marquardt
41
GPU Acceleration in Registration, Danny Ruijters41 GPU Elastic Registration Iteration 1.Generate deformed image on GPU & store to texture 2.Calculate Similarity Measure & First-Order Derivative on GPU –Texture with reference image –Texture with deformed image
42
GPU Acceleration in Registration, Danny Ruijters42 First-Order Derivative of Sim. Measure J. Kybic, M. Unser, “Fast Parametric Elastic Image Registration”
43
GPU Acceleration in Registration, Danny Ruijters43 Derivative of the Similarity Measure SSD:
44
GPU Acceleration in Registration, Danny Ruijters44 Derivative of the Deformed Image Sobel operator to calculate gradients: 01 -404 01 141 000 -4
45
GPU Acceleration in Registration, Danny Ruijters45 Derivative of the Control Points Constant B-spline: separatable kernel of fixed size
46
GPU Acceleration in Registration, Danny Ruijters46 Original Fluoroscopy Sequence
47
GPU Acceleration in Registration, Danny Ruijters47 2 * 2 Control Points
48
GPU Acceleration in Registration, Danny Ruijters48 8 * 8 Control Points
49
GPU Acceleration in Registration, Danny Ruijters49 Deformation Field
50
GPU Acceleration in Registration, Danny Ruijters50 GPU Elastic Registration 40 images: Quasi Newton: 16 seconds Gradient Descent: 63 seconds 8 * 8 Control Points: rest motion Multi-resolution deformation field, with reduced parameters (discussed with Dirk Loeckx)
51
GPU Acceleration in Registration, Danny Ruijters51 CUDA Libraries
52
GPU Acceleration in Registration, Danny Ruijters52 CUDA Software Stack
53
GPU Acceleration in Registration, Danny Ruijters53 CUDA Libraries CUBLAS CUFFT
54
GPU Acceleration in Registration, Danny Ruijters54 CUBLAS Basic Linear Algebra Subprograms Vector, Matrix, Numerical Math Almost no initialization Function calls
55
GPU Acceleration in Registration, Danny Ruijters55 CUBLAS performance
56
GPU Acceleration in Registration, Danny Ruijters56 CUBLAS performance
57
GPU Acceleration in Registration, Danny Ruijters57 CUFFT performance
58
GPU Acceleration in Registration, Danny Ruijters58 Conclusion & Future work
59
GPU Acceleration in Registration, Danny Ruijters59 Conclusions GPU: powerful parallel processor, but has its limitations Rigid Registration: interpolation on the GPU Elastic Registration: calculation of the Similarity Measure & first order derivative on the GPU
60
GPU Acceleration in Registration, Danny Ruijters60 Future work Multi-resolution deformation fields 2D-3D registration of the Coronary Arteries (not presented)
61
GPU Acceleration in Registration, Danny Ruijters61 Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.