Download presentation
Presentation is loading. Please wait.
1
Parallel Beam Back Projection: Implementation
Srdjan Coric Miriam Leeser Eric Miller
2
Outline Annapolis Wildstar “Simple Architecture” algorithm datapath
Performance Results Parallelism extraction “Advanced Architecture 4x” Implementation issues Future directions
4
Data Flow Sinogram data address generation Sinogram data retrieval
Linear interpolation Data accumulation write read Sinogram data prefetch
5
Interpolation factor error Corner starting position
LUT1 starting position Critical error-accumulation path LUT1 quantization error LUT2 quantization error Bit reduction error LUT3 quantization error LUT2: LUT3: 15 . 2 LUT1: 10 5 1
6
“Simple Architecture” Datapath
7
Performance Results: Software vs. FPGA Hardware
Software - Floating point MHz Pentium : ~ 240 s Software - Floating point - 1 GHz Dual Pentium : ~ 94 s Software - Fixed point MHz Pentium : ~ 50 s Software - Fixed point - 1 GHz Dual Pentium : ~ 28 s Hardware - 50 MHz : ~ 5.4 s Parameters: 1024 projections 1024 samples per projection 512*512 pixels image 9-bit sinogram data 3-bit interpolation factor
8
Original image Hardware output image Zoom: ~200%
Grayscale range < Pixel value range (heart features in focus)
9
Original image Hardware output image Zoom: ~200%
Grayscale range < Pixel value range (lung features in focus)
10
Original image - Hardware output image
11
Parallelism Issues Case 1: No parallelism extracted Case 2:
Pixel level parallelism extracted Case 3: Projection level parallelism extracted Projections Image columns V1 Image rows V3 V2 T~k1*V1 T~k1*V2 T~k2 * V3 k1 <k2, V2 = V3 = V1 /4, T=Execution time Memory bandwidth requirements at 50 MHz (for data accumulation) Case 1: 0.4 GB/s Case 2: 1.6 GB/s Case 3: 0.4 GB/s Memory bandwidth limit 1.2 GB/s
12
Advanced Architecture - Data Path
projection parallelism extracted Simple Architecture
13
Performance Results: Software vs. FPGA Hardware
Software - Floating point MHz Pentium : ~ 240 s Software - Floating point - 1 GHz Dual Pentium : ~ 94 s Software - Fixed point MHz Pentium : ~ 50 s Software - Fixed point - 1 GHz Dual Pentium : ~ 28 s Hardware - 50 MHz : ~ 5.4 s Hardware (Advanced Architecture) - 50 MHz : ~ 1.3 s Parameters: 1024 projections 1024 samples per projection 512*512 pixels image 9-bit sinogram data 3-bit interpolation factor
19
Implementation Issues
- fanout - prj_num(3) fanout = 1565 ! routing delay = ns (~39.99%)
20
Implementation Issues
- fanout - odd_2_A_4[4] fanout = 144 !
21
Memory Bridges Stuff 3 architectures implemented:
“Simple Architecture” = non-parallel (on slide 6) “Advanced Architecture” = 4-way parallel (slide 12) “Bridge Free Advanced Arch” = as B but contains no memory bridges (all design buffers in BlockRAMs) from PCI bus to memory banks required for Host-Memory communication. Bridges are separate design that is downloaded before (after) design C is downloaded so that input data can be stored to (output data read from) memories on the WildStar board. Virtex1000 resource utilization: 11% logic, 90% BlockRAMs (with bridges) 39% logic, 100% BlockRAMs 21% logic, 100% BlockRAMs
22
“Bridge Free Advanced Architecture” (design C on the previous slide)
Floorplan of the “Bridge Free Advanced Architecture” (design C on the previous slide)
23
Future Directions Graduate
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.