Optimizing Katsevich Image Reconstruction Algorithm on Multicore Processors Eric FontaineGeorgiaTech Hsien-Hsin LeeGeorgiaTech.

Slides:



Advertisements
Similar presentations
Implementation of Voxel Volume Projection Operators Using CUDA
Advertisements

Computed Tomography Principles
David Hansen and James Michelussi
Fast Fourier Transform for speeding up the multiplication of polynomials an Algorithm Visualization Alexandru Cioaca.
By Cynthia Rodriguez University of Texas at San Antonio
11/11/02 IDR Workshop Dealing With Location Uncertainty in Images Hasan F. Ates Princeton University 11/11/02.
Presented by Scott Lichtor Introduction to Tomography.
Direct Volume Rendering. What is volume rendering? Accumulate information along 1 dimension line through volume.
ARM-DSP Multicore Considerations CT Scan Example.
Mark Mirotznik, Ph.D. Associate Professor The University of Delaware
Fast Volume Rendering Using a Shear-Warp Factorization of the Viewing Transformation Philippe Larcoute & Marc Levoy Stanford University Published in SIGGRAPH.
Image Reconstruction T , Biomedical Image Analysis Seminar Presentation Seppo Mattila & Mika Pollari.
Implementing the Probability Matrix Technique for Positron Emission Tomography By: Chris Markson Student Adviser: Dr. Kaufman.
IMAGE, RADON, AND FOURIER SPACE
Copyright © 2007 Intel Corporation. ® 16bit 3D Convolution Implementation SSE + OpenMP Benchmarking on Penryn Dr. Zvi Danovich, Senior Application Engineer.
Footprint Evaluation for Volume Rendering
BMME 560 & BME 590I Medical Imaging: X-ray, CT, and Nuclear Methods Tomography Part 3.
BMME 560 & BME 590I Medical Imaging: X-ray, CT, and Nuclear Methods Tomography Part 4.
Project Overview Reconstruction in Diffracted Ultrasound Tomography Tali Meiri & Tali Saul Supervised by: Dr. Michael Zibulevsky Dr. Haim Azhari Alexander.
Surface Reconstruction from 3D Volume Data. Problem Definition Construct polyhedral surfaces from regularly-sampled 3D digital volumes.
Splatting Josh Anon Advanced Graphics 1/29/02. Types of Rendering Algorithms Backward mapping Image plane mapped into data Ray casting Forward mapping.
Motion Analysis (contd.) Slides are from RPI Registration Class.
RADON TRANSFORM A small introduction to RT, its inversion and applications Jaromír Brum Kukal, 2009.
Back-Projection on GPU: Improving the Performance Wenlay “Esther” Wei Advisor: Jeff Fessler Mentor: Yong Long April 29, 2010.
1 Computer Science 631 Lecture 4: Wavelets Ramin Zabih Computer Science Department CORNELL UNIVERSITY.
Introduction to Longitudinal Phase Space Tomography Duncan Scott.
Application of Digital Signal Processing in Computed tomography (CT)
Planar scintigraphy produces two-dimensional images of three dimensional objects. It is handicapped by the superposition of active and nonactive layers.
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Basic principles Geometry and historical development
Generic Software Pipelining at the Assembly Level Markus Pister
A COMPARISON MPI vs POSIX Threads. Overview MPI allows you to run multiple processes on 1 host  How would running MPI on 1 host compare with POSIX thread.
PET data preprocessing and alternative image reconstruction strategies.
EXACT TM CT Scanner EXACT: The heart of an FAA-certified Explosives Detection Scanner 3-D Image.
Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Summary of Contributions Background: MapReduce and FREERIDE Wavelet.
Parallelism and Robotics: The Perfect Marriage By R.Theron,F.J.Blanco,B.Curto,V.Moreno and F.J.Garcia University of Salamanca,Spain Rejitha Anand CMPS.
Lecture 3 : Direct Volume Rendering Bong-Soo Sohn School of Computer Science and Engineering Chung-Ang University Acknowledgement : Han-Wei Shen Lecture.
Seeram Chapter 7: Image Reconstruction
Fan Zhang, Yang Gao and Jason D. Bakos
Image Preprocessing: Geometric Correction Image Preprocessing: Geometric Correction Jensen, 2003 John R. Jensen Department of Geography University of South.
Marching Cubes: A High Resolution 3D Surface Construction Algorithm William E. Lorenson Harvey E. Cline General Electric Company Corporate Research and.
Filtered Backprojection. Radon Transformation Radon transform in 2-D. Named after the Austrian mathematician Johann Radon RT is the integral transform.
Medical Image Analysis Image Reconstruction Figures come from the textbook: Medical Image Analysis, by Atam P. Dhawan, IEEE Press, 2003.
Flow Chart of FBP.. BME 525 HW 1: Programming assignment The Filtered Back-projection Image reconstruction using Shepp-Logan filter You can use any programming.
© 2007 SET Associates Corporation SAR Processing Performance on Cell Processor and Xeon Mark Backues, SET Corporation Uttam Majumder, AFRL/RYAS.
Image Reconstruction from Projections Antti Tuomas Jalava Jaime Garrido Ceca.
Pipelined and Parallel Computing Data Dependency Analysis for 1 Hongtao Du AICIP Research Mar 9, 2006.
Scalable Multi-core Sonar Beamforming with Computational Process Networks Motivation Sonar beamforming requires significant computation and input/output.
RADON TRANSFORM A small introduction to RT, its inversion and applications Jaromír Brum Kukal, 2009.
Parallelization of likelihood functions for data analysis Alfio Lazzaro CERN openlab Forum on Concurrent Programming Models and Frameworks.
OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY AND LOOP TRANSFORMATION Chun Xue, Zili Shao, Ying Chen, Edwin H.-M. Sha Department of Computer.
Single-Slice Rebinning Method for Helical Cone-Beam CT
Cone-beam image reconstruction by moving frames Xiaochun Yang, Biovisum, Inc. Berthold K.P. Horn, MIT CSAIL.
Ultrasound Computed Tomography 何祚明 陳彥甫 2002/06/12.
Development of the parallel TPC tracking Marian Ivanov CERN.
Introduction to Medical Imaging Week 3: Introduction to Medical Imaging Week 3: CT – Reconstruction and uses Guy Gilboa Course
Theory of Reconstruction Schematic Representation o f the Scanning Geometry of a CT System What are inside the gantry?
Buffering Techniques Greg Stitt ECE Department University of Florida.
Chapter-4 Single-Photon emission computed tomography (SPECT)
Optimizing Parallel Algorithms for All Pairs Similarity Search
Implementation of DWT using SSE Instruction Set
Modern imaging techniques in biology
Fast and Robust Object Tracking with Adaptive Detection
Silhouette Intersection
Basic principles Geometry and historical development
Lecture 2 The Art of Concurrency
Memory System Performance Chapter 3
Implementation of a De-blocking Filter and Optimization in PLX
Lecture 13: CT Reconstruction
Computed Tomography (C.T)
Presentation transcript:

Optimizing Katsevich Image Reconstruction Algorithm on Multicore Processors Eric FontaineGeorgiaTech Hsien-Hsin LeeGeorgiaTech

2 Outline Image Reconstruction Overview Katsevich Algorithm Prior Work and Our Optimizations: –PI-Interval Method –Cone-Beam Cover Method Our Work: –Symmetry Method Results Conclusion

3 Image Reconstruction Overview Is it possible to reconstruct the 3-D volume of an object from projections? –Early 20 th century: Radon Transform and Fourier Slice Theorem Common methods –MRI Noninvasive magnetic field applied. Main function FFT. –Positron Emission Tomography Patient injected with radioactive matter. When decay, release radiation which is detected by sensors. –Computed Tomography Use x-ray projections of object. Use filtered back-projection to obtain original volume. Contain fine-grained and coarse-grained data parallelism.

4 Fourier Slice Theorem Fourier Transform of 1-D Projection of 2-D Image = Slice of 2-D Fourier Transform of Image Formula can be rearranged as filtered backprojection.

5 Filtered-Backprojection After projections filtered, then backprojected. –Less computationally expensive than filtering after backprojection. Require 180 degrees of projection data. Can be extended to fan-beams instead of parallel-beams. Projection Backprojection

6 3-D Volume? Previous methods for 2-D slices. Can repeat for multiple slices to get 3-D volume. Two common 3-D back-projection algorithms. –FDK (1985) Approximation, fast reconstruction. Use projections taken on a circular path surrounding the object. More accurate on the plane containing the circle. Can be generalized for helical scanning paths. –Katsevich (2003) Theoretically exact, but also more compute-intensive. Use projections taken on a helical path surrounding the object. Can reconstruct long objects, unlike the original FDK. Fast scanning.

7 Katsevich Image Reconstruction Reconstruct density of 3-D cylindrical volume. –Analyze many 2-D cone-beam projections taken along helical scanning path. First exact helical cone beam image reconstruction algorithm. Filtered-backprojection form. –More computationally expensive than other non-exact algorithms such as FDK. –Also requires differentiation and remapping of projections to and from filtering coordinates.

8 Katsevich Step 1: Differentiation Take difference between neighboring texels. Take difference between neighboring projections. Projection k Projection k+1 Differentiated Projection k

9 Katsevich Step 2: Filtering Remap projection to filtering coordinates. Perform horizontal convolution along kappa lines.Remap back to projection coordinates.

10 Katsevich Step 3: Backprojection Backprojection Projection Backprojection X-ray projection source Volume of Interest Projection Projection is formed by line integral of density along path of ray from x-ray source to detector. Backprojection is the reverse – smear projection data from detector onto image voxel. Use linear interpolation of 4 neighboring texels when looking up backprojection value.

11 PI-Interval Method PI-Interval formed by line intersecting: –A point inside helix and two points on the helix voxel PI-Interval Helical Scanning Path

12 PI-Interval Method PI-Interval contains all data necessary for exact reconstruction. Iterate over all projections in PI-Interval containing each voxel. –Calculate voxel’s backprojected coordinate. –Get projection’s value at backprojected coordinate using linear interpolation and weight appropriately. –Accumulate contribution from each projection. –Use special weighting for beggining and end of interval.

13 PI-Interval Method Voxel Reconstruction Done!

14 PI-Interval Method Parallelization Strategy: Proj 1 Proj2 Proj 3 Proj K Diff Remap Convolve Remap Diff Remap Convolve Remap Slice Z Max Slice 1 Assign projections to different threads. Perform differentiation of each projection. Remap projection to filtering coordinates. Perform convolution along kappa lines. Remap back to projection coordinates. Barrier, then assign different image slices to different threads. Each thread performs backprojection of its assigned slice. Continue until all slices are done.

15 PI-Interval Method Basic Optimizations Majority of time spent calculating PI-intervals and backprojection. –PI-intervals are constant for a particular helix. Precompute one slice of PI-intervals. PI-intervals for different horizontal slices can be determined by rotation. Easy ~25% speedup Next focused on backprojection inner loop. –Removed trival lookup tables. ~10% speedup. –Used sin, cos lookup tables. ~15% speedup. –Moved if statements for smoothing the ends of the PI-interval outside loop. Duplicated inner loop code. ~10% speedup. –Removed if statements for bounds testing the backprojected coordinates. Needed to add extra row and column slack to projection data. ~3% speedup.

16 Cone-beam Cover Method Formed by intersection of cone beam and volume. Contain necessary data for reconstruction. X-ray projection source

17 Cone-beam Cover Method Access projection and image memory linearly. –Rotate projection 90 degrees. Accumulate partial image reconstruction. Iterate from bottom to top of projection. Bring in two columns of projection data.

18 Cone-beam Cover Method Parallelization Strategy: Proj 1 Proj2 Proj 3 Proj K Diff Remap Convolve Remap Diff Remap Convolve Remap Shared Image Memory Assign projections to different threads. Perform differentiation of each projection. Remap projection to filtering coordinates. Perform convolution along kappa lines. Remap back to projection coordinates. Each thread performs backprojection of its assigned projection to shared image memory. Continue until all projections are done.

19 SIMD Optimizations Use SIMD for backprojection. –Backproject 4 consecutive z voxels at a time. –Requires data shuffling. –Not all memory access are aligned. –Treat top and bottom of cone beam cover specially. Use SIMD for differentiation and remapping steps. –Act on 4 consecutive texels at a time

20 Symmetry Method Exploit backprojection redundancy among every π/2 source projection –due to π/2 symmetry of sin, cos. Reduce backprojection calculations by ~4x for each turn of helix

21 Symmetry Method Unpacked Image Data Packed Image Data Z Offset 0 Z Offset 1 All the colored voxels have identical backprojection coordinates Pack them so they occupy adjacent memory locations Voxels with same relative “z offset” grouped together

22 Symmetry Method Easily SIMDified. –No need for projection or image data shuffling. –All 128-bit memory access are aligned. –Need projection packing step (outside of main loop). –Need image unpacking step (outside of main loop). –Inner loop primarily consists of SIMD memory accesses. Coordinate and interpolation calculations outside of inner loop.

23 Results System: –Two Intel 2.33 Ghz Quad Core Clovertown processors. –4 GB Ram. –Windows Vista. Programming: –C ported from open source Matlab implementation. –OpenMP. –Intel Performance Primitives. –Intrinsic Assembly. Input: 2-D Projections of Shepp-Logan Phantom. –4 helical turns plus 1 overscan turn. Output: 3-D density.

24 Original Shepp-Logan Phantom

25 PI-Interval Method Reconstruction

26 PI-Interval Method Error

27 Cone-beam Cover Method Reconstruction

28 Cone-beam Cover Method Error

29 Symmetry Method Reconstruction

30 Symmetry Method Error

31 Reconstruction Time image from x32 projections image from x64 projections image from x128 projections image from x128 projections

32 Comparison to U Iowa image from x32 projections image from x64 projections ~ 73x speedup for Symmetry Method over U Iowa for 256^3 running on same system for 1 thread. Note: U Iowa implementation uses MPI. –Focused primarily on parallel speedup.

33 Reconstruction Time Breakdown StepInit-base Pi-Method Opt-1TOpt-2TOpt-4TOpt-8T Derivative (3.0x)18.9 (4.0x)18.8 (4.0x) Forward remap Convolve Backward remap SIMD Pack012.5 Backproject (1.0x) (11.4x) (22.3x) (32.5x) (38.0x) Total (1.0x) (11.2x) (21.9x) (31.8x) (37.1x) Time in seconds (speedup) for image.

34 Scalability of Symmetry Method

35 Conclusion Majority of time spent in backprojection. 37.1x speedup. –Comparing final Symmetry Method running on eight threads to the baseline π-Interval Method running on a single thread for 1024 image reconstruction. Symmetry Method has poor multi-thread speedup because it is memory bound. Front-side bus bandwidth becomes saturated and limits scalability.

36 Questions?

37 Bus Utilization (# bus cycles data ready line high / number bus cycles) average for inner loop 1024^3 reconstruction for 60 seconds after 60 seconds warmup

38 Difference between PI-Method & Cone-Beam

39 Difference between PI-Method & Symmetry

40 Difference between Cone-Beam & Symmetry

41 Symmetry Method: Projection Packing Interleave columns of projections Linear access to projection memory.