Presentation is loading. Please wait.

Presentation is loading. Please wait.

University of Michigan Electrical Engineering and Computer Science Power-Efficient Medical Image Processing using PUMA Ganesh Dasika, Kevin Fan 1, Scott.

Similar presentations


Presentation on theme: "University of Michigan Electrical Engineering and Computer Science Power-Efficient Medical Image Processing using PUMA Ganesh Dasika, Kevin Fan 1, Scott."— Presentation transcript:

1 University of Michigan Electrical Engineering and Computer Science Power-Efficient Medical Image Processing using PUMA Ganesh Dasika, Kevin Fan 1, Scott Mahlke 1 Parakinetics, Inc. University of Michigan Advanced Computer Architecture Laboratory

2 University of Michigan Electrical Engineering and Computer Science 2 The Advent of the GPGPU Increasingly popular substrate for HPC –Astrophysics –Weather Prediction –EDA –Financial instrument pricing –Medical Imaging

3 University of Michigan Electrical Engineering and Computer Science 3 Advantages of GPGPUs High degree of parallelism –Data-level –Thread-level High bandwidth Commodity products Increasingly programmable

4 University of Michigan Electrical Engineering and Computer Science 4 Disadvantages of GPGPUs Gap between computation and bandwidth –933 GFLOPS : 142 GB/s bandwidth (0.15B of data per FLOP, ~26:1 Compute:Mem Ratio) Very high power consumption –Graphics-specific hardware –Multiple thread contexts –Large register files and memories –Fully general datapath Inefficiencies in all general-purpose architectures

5 University of Michigan Electrical Engineering and Computer Science 5 Programmability vs Efficiency? FPGAs General Purpose Processors DSPs Domain-specific Accelerators, GPGPUs Efficiency Flexibility 5 Loop Accelerators, ASICs ??? Highly efficient, some programmability

6 University of Michigan Electrical Engineering and Computer Science 6 Medical Image Reconstruction Compute intensive loops –32-bit floating point code –High data/bandwidth requirements Increased demand for portability, low power Much current research focuses on using GPGPUs for this domain

7 University of Michigan Electrical Engineering and Computer Science 7 CT Image reconstruction X-Ray emitters and receptors on opposite sides of patients Received x-ray intensity corresponds to tissue density Multiple scans (“slices”) taken around patient put together to reconstruct 1 2D-image

8 University of Michigan Electrical Engineering and Computer Science 8 Projection & Sinogram Sinogram: All projections Projection: All ray-sums in a direction P(  t) f(x,y) t  y x X-rays Sinogram t  

9 University of Michigan Electrical Engineering and Computer Science 9 Example: Backprojection SinogramBackprojected Image

10 University of Michigan Electrical Engineering and Computer Science 10 Example: Filtered Backprojection Filtered Sinogram Reconstructed Image

11 University of Michigan Electrical Engineering and Computer Science 11 Reconstruction: Solve for  ’s  11  12  13  14  21  22  23  24  31  32  33  34  41  42  43  44 16221110 X-Ray Emitter Detector Values Densities “Human Body“ 22 12 10 15

12 University of Michigan Electrical Engineering and Computer Science 12 Real Reconstruction Problem Intensity measured Rays transmitted through multiple “pixels” Find individual “pixel” values from transmission data ?????? ?????? ?????? ?????? ?????? ?????? 534 417 364 555 501 355 255 712 199 512 values 512 values 100’s of diagonals @ 100’s of angles

13 University of Michigan Electrical Engineering and Computer Science 13 Medical Imaging Applications Image reconstruction for MRI/CT/PET scans Large amounts of Vector/Thread-level parallelism FP-intensive kernels –Often requiring math library functions Data-intensive (~5:1 compute:mem ratio) Benchmark Inner-loop %Scalar/Vector Outer-loop TLP Compute:Mem ratio SegmentationFully vectorizableDo-all4:1 Laplacian FilteringFully vectorizableDo-all3:1 Gaussian Convolution Fully vectorizable with predicates Do-all6:1 MRI FH VectorFully vectorizableDo-all6:1 MRI Q VectorFully vectorizableDo-all5.5:1

14 University of Michigan Electrical Engineering and Computer Science 14 Currently, most scans require moving patient to imaging room –Consumes time –Stress on patient Studies show benefits of portable, bed-side scanners: –86% increase in patients suitable for post-stroke thrombolytic therapy [Weinreb et al, RSNA] –80-100% drop in scan-related complications [Gunnarsson et al, J. of Neurosurgery] New X-Ray emitters push for mAs of current use Current Concerns: Portability/Power

15 University of Michigan Electrical Engineering and Computer Science 15 Current Concerns: Performance High-accuracy CT algorithms take too long –Iterative forward/backward projection –~Hours on modern CT scanners instead of minutes Interventional radiology –Scans currently takes minutes, but should take seconds CT-Flouroscopy –Several scans done in succession

16 University of Michigan Electrical Engineering and Computer Science 16 Flexibility Software algorithms change over time NRE Time-to-market 16

17 University of Michigan Electrical Engineering and Computer Science 17 PUMA Tiled architecture Bandwidth-matched for improved efficiency Each tile is a “Programmable Loop Accelerator” Extern. Interface CPU Mem Disk …

18 University of Michigan Electrical Engineering and Computer Science 18 Programmable Loop Accelerator Generalize accelerator without losing efficiency FPGAs Efficiency, Performance Flexibility Loop Accelerators, ASICs Programmable Loop Accelerators 18 General Purpose Processors DSPs Domain-specific Accelerators, GPGPUs ???

19 University of Michigan Electrical Engineering and Computer Science 19 Designing Loop Accelerators C Code Loop 19 Hardware Point-to-point Connections BR CRF + …… & …… MEM …… Local Mem + …… * …… MEM …… << …… Local Mem

20 University of Michigan Electrical Engineering and Computer Science 20 Loop Accelerator Architecture Point-to-point Connections + …… & …… MEM …… Local Mem FSM Control signals CRF BR Hardware realization of modulo scheduled loop Parameterized hardware: FUs Shift Register Files 20 Static Control Point-to-point Interconnect

21 University of Michigan Electrical Engineering and Computer Science 21 Programmable Loop-Accelerator Architecture Point-to-point Connections +/- …… &/| …… MEM …… Local Mem Control Memory Control signals CRF BR RR Literals Ring  Functionality  Storage  Connectivity  Control LA PLA Custom FU setGeneralized FUs + MOVs Point-to-pointRing + Port-swapping Limited size, no addr.Rotating Reg. Files Hardwired ControlLit. Reg. File + Control Mem 21 +& SRF FSM

22 University of Michigan Electrical Engineering and Computer Science 22 MRI.FH PLA ~0.6 mm 2 per tile 38 FUs 128 32-bit registers Inter-FU BW 1 TB/sec FU Type# FP-ADDSUB6 FP-MPY9 I-ADDSUB8 MEM9 I-MPY1 Other5

23 University of Michigan Electrical Engineering and Computer Science 23 Performance on MRI.FH PLA II preserved II doubled Unschedulable

24 University of Michigan Electrical Engineering and Computer Science 24 Efficiency on MRI.FH PLA

25 University of Michigan Electrical Engineering and Computer Science 25 PUMA System Design 5 systems designed around 5 benchmarks Each composed of identical tiles Assume same B/W as GTX280 (142 GB/s) # Tiles based on B/W requirements of benchmark Extern. Interface CPU Mem Disk …

26 University of Michigan Electrical Engineering and Computer Science 26 System Performance 4W3W2.8W2.3W2.7W

27 University of Michigan Electrical Engineering and Computer Science 27 Performance vs. GPGPU 63% performance of GTX 295 2X performance of GTS 250

28 University of Michigan Electrical Engineering and Computer Science 28 Efficiency vs. GPGPU 22X 54X

29 University of Michigan Electrical Engineering and Computer Science 29 Conclusions Power-efficient accelerator for medical imaging ASIC-like efficiency with programmability 63-201% of GPU performance 22-54X GPU Performance/Power efficiency

30 University of Michigan Electrical Engineering and Computer Science 30 Thank you!! Questions?


Download ppt "University of Michigan Electrical Engineering and Computer Science Power-Efficient Medical Image Processing using PUMA Ganesh Dasika, Kevin Fan 1, Scott."

Similar presentations


Ads by Google