Presentation is loading. Please wait.

Presentation is loading. Please wait.

Backprojection Project Update January 2002

Similar presentations


Presentation on theme: "Backprojection Project Update January 2002"— Presentation transcript:

1 Backprojection Project Update January 2002
Haiqian Yu Miriam Leeser Srdjan Coric

2 Outline Review of backprojection algorithm Hardware considerations
Wildstar implementation and some results Comparison of Wildstar and Firebird Current work Issues and some proposals Future work

3 Backprojection Square pixel assumed Incremental algorithm
Use lookup tables

4 Data Flow Block Diagram
Sinogram data address generation Sinogram data retrieval Linear interpolation Data accumulation write read Sinogram data prefetch

5 Hardware Implementation Data Flow

6 Parameters selected in our hardware implementation
Fixed point data is used Quantization bits for the sinogram: 9 bits Interpolation factor bits: 3 bits Restored backprojection data width: 25 bits Lookup table width: table1: 15 bits table2: 16 bits table3: 17 bits

7 Simple Architecture -One Projection Processing

8 Advanced Architecture -Four Projection Parallel Processing

9 Some Results

10 Performance Results on Wildstar: Software vs. FPGA Hardware
Software - Floating point MHz Pentium : ~ 240 s Software - Floating point - 1 GHz Dual Pentium : ~ 94 s Software - Fixed point MHz Pentium : ~ 50 s Software - Fixed point - 1 GHz Dual Pentium : ~ 28 s Hardware - 50 MHz : ~ 5.4 s Hardware (Advanced Architecture) - 50 MHz : ~ 1.3 s Parameters: 1024 projections 1024 samples per projection 512*512 pixels image 9-bit sinogram data 3-bit interpolation factor

11 Board Parameters Compared

12 Logical Block Diagram of WildStar

13 Backprojection on Wildstar
PE1 is the only processor element used Not all the memory is used to store the data Left & Right PE1 Mezzanine Mem0 are used to store sinogram data, the input data for backprojection Left & Right PE1 Local memory banks are used to store the results The bottleneck is PE1’s block RAM (on chip memory) size. XCV1000 has 32 block RAMs, each has 4096 bits, totaling 131,072 bits We used 73,728 bits in our implementation, the maximum bits we can use

14 Logical Block Diagram of FireBird

15 Backprojection on FireBird
XCV2000E has 160 block RAMs with totally 655,360 bits, 5 times that of XCV1000. FireBird has 5 on board memory banks, 4 that are 64 bits wide, one that is 32 bits wide. The memory interface is different from WildStar.

16 Current Work Firebird configuration:
parameters setting for simulation scripts synthesis settings C environment settings Getting familiar with FireBird memory interface. Implementing simple architecture of parallel-beam backprojection.

17 Issues Because of increased CLBs on Virtex2000E: we can increase parallelism in backprojection processing by a factor of 5 or 20 projection parallelism. On chip block RAM is no longer the bottleneck Memory bandwidth to off-chip memory is the new bottleneck: To process 20 projection at one time, we need to load 40 projections (for interpolation), or 2*4*5*9=360 bits in one clock cycle Firebird has 4*64-bit memories, we need 6 for 20 projections We should be able to achieve 12 projections in parallel

18 Parallelism Improvement
Data dependency for backprojection processing Current parallelism we used in WildStar Desired parallelism we would implement in FireBird Projections Image columns Image columns Projections Image columns Projections Image rows Image rows Image rows

19 Example of Using four memory banks to increase the speed

20 More memory issues If we use four 64-bit width memory banks to store the sinogram data, we have only one 32-bit memory bank for storing the result Since the j-th step results depends on (j-1)-th results, we will have to use read-modify-write method to store the final data. This will decrease the overall speed. Possible solution: fewer projections in parallel use more memory to store results

21 Alternatives An alternative is to use 2*512 cycles to double the number of projection processed. However, we need to make sure processing all the data needs more than 1024 clock cycle. That will introduce extra delay for the first time processing, for the consequent processing, data is pre-fetched while processing the current projection. We can also modify the hardware structure to see if there are some other ways to explore parallelism.

22 Future Work Implement parallel backprojection on FireBird
With bridge to host Without bridge to host Investigate parallelism to optimize performance


Download ppt "Backprojection Project Update January 2002"

Similar presentations


Ads by Google