Optimizing the trace transform Using OpenMP and CUDA Tim Besard 2013-06-19.

Optimizing the trace transform Using OpenMP and CUDA Tim Besard 2013-06-19

Trace transform needs to be real-time MATLAB – Slow – Difficult to optimize C++ base implementation – Allows for optimizations

Optimizing the trace transform How to parallelize? OpenMP CUDA Performance

How to parallelize? p   p 

Coarse-grained parallelism Rotate 0° T-functionals 359° …… Sinogram row Sinogram

How to parallelize? Fine-grained parallelism – Rotation – Functionals: prefix sum

OpenMP Compiler directives – #pragma omp parallel for – #pragma omp critical – #pragma omp barrier

OpenMP Compiler directives Address coarse-grained parallelism – Unobtrusive – Significant overhead 5× speed-up – 8-core machine – Unoptimized

CUDA Parallel computing platform Programming model – Lightweight threads – Massively parallel Address fine-grained parallelism – Pixel-centric approach – Complete re-implementation

CUDA Low-level details matter a lot – Memory access patterns – Branch divergence 10× speed-up – GeForce GTX TITAN (20% usage) – Unoptimized

Performance for 10 signatures Execution time in milliseconds MEX C++ OpenMP CUDA

Future work Optimize CUDA – Compare against state of the art Julia implementation – Algorithmic IR

Optimizing the trace transform Using OpenMP and CUDA Tim Besard 2013-06-19

Optimizing the trace transform Using OpenMP and CUDA Tim Besard 2013-06-19.

Similar presentations

Presentation on theme: "Optimizing the trace transform Using OpenMP and CUDA Tim Besard 2013-06-19."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Optimizing the trace transform Using OpenMP and CUDA Tim Besard 2013-06-19.

Similar presentations

Presentation on theme: "Optimizing the trace transform Using OpenMP and CUDA Tim Besard 2013-06-19."— Presentation transcript:

Similar presentations

About project

Feedback