Parallelization of System Matrix generation code Mahmoud Abdallah Antall Fernandes
SPECT System
Inverse Cone
Back Projection Ref figure: Tomographic Reconstruction of SPECT DataTomographic Reconstruction of SPECT Data – Bill Amini, Magnus Björklund, Ron Dror, Anders Nygren oo Filtered Back Projection is applying a ramp filter on the back projected image. Still widely used for its high speed and easy implementation.
Maximum Likelihood-Expectation Maximization Algorithm Is found to reduce noise in reconstruction iteratively An iterative algorithm is used to solve the following linear problem FX = P P – vector of projection data X – voxelized image F – projection matrix operator Needs a large number of iterations to reconstruct an image
EM Algorithm The EM algorithm is given by Summation over k is projection operation Summation over j is the back projection operation
System Matrix Maps the image space to the data space Takes detector geometry as input Generates detector data for every bin for each angle (usually there are 72 angles/frames)
System Matrix Algorithm for each angle DO // number of angles = 72 for each detector bin in U direction Do // bins: around 14 for each detector bin in V direction Do // bins: around 64 for each row in the inverse cone grid Do // <= 99 for each Column in the inverse cone grid Do //<= 99 for each voxel intersected the Ray Do calculate point response end Number of loops = 72 x 14 x 64 x 99 x 99 =
System Matrix Parallelization Observation: At each angle, each bin’s calculations are independent from other bins’. Proposal: Parallelize all calculations for each angle. E.g. use GPU.
System Matrix Parallelization on GPU
Parallelized System Matrix Algorithm Host Program: for each angle DO Run all kernels for all bins at the same time end GPU Kernel: for each voxel intersected the Ray Do calculate attenuation and store it in SysMat end
SIMD (Architecture of GPU) From: (AMD) Advanced Micro Devices INC 2010 (Introduction to OpenCL Programming)
OpenCL Based on ISO C99 with some extensions & restrictions provides parallel computing using task-based and data-based parallelism Architecture Host Program Kernel
Program Architecture Host Program Executes on the host system Sends kernels to execute on OpenCL™ devices using command queue. Kernels Similar to C function. Executed on OpenCL™ devices ( GPU).
Thank You