Download presentation
Presentation is loading. Please wait.
Published byRose Logan Modified over 9 years ago
1
Optimizing the trace transform Using OpenMP and CUDA Tim Besard 2013-06-19
2
Trace transform needs to be real-time MATLAB – Slow – Difficult to optimize C++ base implementation – Allows for optimizations
3
Optimizing the trace transform How to parallelize? OpenMP CUDA Performance
4
How to parallelize? p p
5
Coarse-grained parallelism Rotate 0° T-functionals 359° …… Sinogram row Sinogram
6
How to parallelize? Fine-grained parallelism – Rotation – Functionals: prefix sum
7
OpenMP Compiler directives – #pragma omp parallel for – #pragma omp critical – #pragma omp barrier
8
OpenMP Compiler directives Address coarse-grained parallelism – Unobtrusive – Significant overhead 5× speed-up – 8-core machine – Unoptimized
9
CUDA Parallel computing platform Programming model – Lightweight threads – Massively parallel Address fine-grained parallelism – Pixel-centric approach – Complete re-implementation
10
CUDA Low-level details matter a lot – Memory access patterns – Branch divergence 10× speed-up – GeForce GTX TITAN (20% usage) – Unoptimized
11
Performance for 10 signatures Execution time in milliseconds MEX C++ OpenMP CUDA
12
Future work Optimize CUDA – Compare against state of the art Julia implementation – Algorithmic IR
13
Optimizing the trace transform Using OpenMP and CUDA Tim Besard 2013-06-19
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.