Parallel Beam Back Projection: Implementation

Slides:



Advertisements
Similar presentations
1 Streaming Integral Image Generation on FPGA Michael DeBole Acknowledgements: K. Irick The Pennsylvania State University Department of Computer Science.
Advertisements

H.264 Intra Frame Coder System Design Özgür Taşdizen Microelectronics Program at Sabanci University 4/8/2005.
The 3D FDTD Buried Object Detection Forward Model used in this project was developed by Panos Kosmas and Dr. Carey Rappaport of Northeastern University.
Bryan Lahartinger. “The Apriori algorithm is a fundamental correlation-based data mining [technique]” “Software implementations of the Aprioiri algorithm.
Computes the partial dot products for only the diagonal and upper triangle of the input matrix. The vector computed by this architecture is added to the.
Presenting: Itai Avron Supervisor: Chen Koren Final Presentation Spring 2005 Implementation of Artificial Intelligence System on FPGA.
An FPGA Based Adaptive Viterbi Decoder Sriram Swaminathan Russell Tessier Department of ECE University of Massachusetts Amherst.
Ionization Profile Monitor Project Current Status of IPM Buffer Board Project 10 February 2006 Rick Kwarciany.
Final Presentation Neural Network Implementation On FPGA Supervisor: Chen Koren Maria Nemets Maxim Zavodchik
Characterization Presentation Neural Network Implementation On FPGA Supervisor: Chen Koren Maria Nemets Maxim Zavodchik
Signal Processing for Aperture Arrays. AAVS1 256 antenna elements distributed over –4 stations –64 elements each.
A Parameterized Floating Point Library Applied to Multispectral Image Clustering Xiaojun Wang Dr. Miriam Leeser Rapid Prototyping Laboratory Northeastern.
FPGA Based Fuzzy Logic Controller for Semi- Active Suspensions Aws Abu-Khudhair.
Harris Corner Detector on FPGA Rohit Banerjee Jared Choi : Parallel Computer Architecture and Programming.
A New Reference Design Development Environment for JPEG 2000 Applications Bill Finch CAST, Inc. Warren Miller AVNET Design Services DesignCon 2003 January.
Highest Performance Programmable DSP Solution September 17, 2015.
FPGA Implementations for Volterra DFEs
VHDL Project Specification Naser Mohammadzadeh. Schedule  due date: Tir 18 th 2.
J. Greg Nash ICNC 2014 High-Throughput Programmable Systolic Array FFT Architecture and FPGA Implementations J. Greg.
1 Abstract & Main Goal המעבדה למערכות ספרתיות מהירות High speed digital systems laboratory The focus of this project was the creation of an analyzing device.
Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.
1 Implementation in Hardware of Video Processing Algorithm Performed by: Yony Dekell & Tsion Bublil Supervisor : Mike Sumszyk SPRING 2008 High Speed Digital.
Floating-Point Divide and Square Root for Efficient FPGA Implementation of Image and Signal Processing Algorithms Xiaojun Wang, Miriam Leeser
A Physical Resource Management Approach to Minimizing FPGA Partial Reconfiguration Overhead Heng Tan and Ronald F. DeMara University of Central Florida.
Sun Starfire: Extending the SMP Envelope Presented by Jen Miller 2/9/2004.
Picture Manipulation using Hardware Presents by- Uri Tsipin & Ran Mizrahi Supervisor– Moshe Porian Characterization presentation Dual-semester project.
Spatiotemporal Saliency Map of a Video Sequence in FPGA hardware David Boland Acknowledgements: Professor Peter Cheung Mr Yang Liu.
Picture Manipulation using Hardware Presents by- Uri Tsipin & Ran Mizrahi Supervisor– Moshe Porian Middle presentation Dual-semester project
Copyright © 2004, Dillon Engineering Inc. All Rights Reserved. An Efficient Architecture for Ultra Long FFTs in FPGAs and ASICs  Architecture optimized.
Acceleration of the Retinal Vascular Tracing Algorithm using FPGAs Direction Filter Design FPGA FIREBIRD BOARD Framegrabber PCI Bus Host Data Packing Design.
Backprojection and Synthetic Aperture Radar Processing on a HHPC Albert Conti, Ben Cordes, Prof. Miriam Leeser, Prof. Eric Miller
DDRIII BASED GENERAL PURPOSE FIFO ON VIRTEX-6 FPGA ML605 BOARD PART B PRESENTATION STUDENTS: OLEG KORENEV EUGENE REZNIK SUPERVISOR: ROLF HILGENDORF 1 Semester:
FPGA BASED REAL TIME VIDEO PROCESSING Characterization presentation Presented by: Roman Kofman Sergey Kleyman Supervisor: Mike Sumszyk.
1 An FPGA Implementation of the Two-Dimensional Finite-Difference Time-Domain (FDTD) Algorithm Wang Chen Panos Kosmas Miriam Leeser Carey Rappaport Northeastern.
FP7 Uniboard project Digital Receiver G. Comoretto, A. Russo, G. Tuccari, A Baudry, P. Camino, B. Quertier Dwingeloo, February 27, 2009.
HPEC 2003 Linear Algebra Processor using FPGA Jeremy Johnson, Prawat Nagvajara, Chika Nwankpa Drexel University.
M. Bellato INFN Padova and U. Marconi INFN Bologna
IAPP - FTK workshop – Pisa march, 2013
2018/4/27 PiDFA : A Practical Multi-stride Regular Expression Matching Engine Based On FPGA Author: Jiajia Yang, Lei Jiang, Qiu Tang, Qiong Dai, Jianlong.
Backprojection Project Update January 2002
Protection in Virtual Mode
Time-borrowing platform in the Xilinx UltraScale+ family of FPGAs and MPSoCs Ilya Ganusov, Benjamin Devlin.
Digital Down Converter (DDC)
Floating-Point FPGA (FPFPGA)
Hiba Tariq School of Engineering
Topics SRAM-based FPGA fabrics: Xilinx. Altera..
Design for Embedded Image Processing on FPGAs
FPGA implementation of CNN Convolution layer logic
Signal Processing for Aperture Arrays
Mahdi Nazemi, Shahin Nazarian, and Massoud Pedram July 10, 2017
The University of Adelaide, School of Computer Science
Instructor: Dr. Phillip Jones
Cache Memory Presentation I
FPGA implementation of a multi-channels, 1 ns time resolution, multi-hit TDC Lorenzo Iafolla Lorenzo Iafolla SuperB Workshop.
Gouraud-shaded Triangle Rasterization
We will be studying the architecture of XC3000.
Course Agenda DSP Design Flow.
A Comparison of Field Programmable Gate
Multiplier-less Multiplication by Constants
High Throughput LDPC Decoders Using a Multiple Split-Row Method
Graphics Hardware: Specialty Memories, Simple Framebuffers
Wavelet “Block-Processing” for Reduced Memory Transfers
Programmable Logic- How do they do that?
Technical Communication Skills Practicum
Optimizing stencil code for FPGA
Alireza Hodjat IVGroup
Course Outline for Computer Architecture
Convolution Layer Optimization
ADSP 21065L.
Lecture 2 Digital Image Fundamentals
Presentation transcript:

Parallel Beam Back Projection: Implementation Srdjan Coric Miriam Leeser Eric Miller

Outline Annapolis Wildstar “Simple Architecture” algorithm datapath Performance Results Parallelism extraction “Advanced Architecture 4x” Implementation issues Future directions

Data Flow Sinogram data address generation Sinogram data retrieval Linear interpolation Data accumulation write read Sinogram data prefetch

Interpolation factor error Corner starting position LUT1 starting position Critical error-accumulation path LUT1 quantization error LUT2 quantization error Bit reduction error LUT3 quantization error LUT2: LUT3: 15 . 2 LUT1: 10 5 1

“Simple Architecture” Datapath

Performance Results: Software vs. FPGA Hardware Software - Floating point - 450 MHz Pentium : ~ 240 s Software - Floating point - 1 GHz Dual Pentium : ~ 94 s Software - Fixed point - 450 MHz Pentium : ~ 50 s Software - Fixed point - 1 GHz Dual Pentium : ~ 28 s Hardware - 50 MHz : ~ 5.4 s Parameters: 1024 projections 1024 samples per projection 512*512 pixels image 9-bit sinogram data 3-bit interpolation factor

Original image Hardware output image Zoom: ~200% Grayscale range < Pixel value range (heart features in focus)

Original image Hardware output image Zoom: ~200% Grayscale range < Pixel value range (lung features in focus)

Original image - Hardware output image

Parallelism Issues Case 1: No parallelism extracted Case 2: Pixel level parallelism extracted Case 3: Projection level parallelism extracted Projections Image columns V1 Image rows V3 V2 T~k1*V1 T~k1*V2 T~k2 * V3 k1 <k2, V2 = V3 = V1 /4, T=Execution time Memory bandwidth requirements at 50 MHz (for data accumulation) Case 1: 0.4 GB/s Case 2: 1.6 GB/s Case 3: 0.4 GB/s Memory bandwidth limit 1.2 GB/s

Advanced Architecture - Data Path projection parallelism extracted Simple Architecture

Performance Results: Software vs. FPGA Hardware Software - Floating point - 450 MHz Pentium : ~ 240 s Software - Floating point - 1 GHz Dual Pentium : ~ 94 s Software - Fixed point - 450 MHz Pentium : ~ 50 s Software - Fixed point - 1 GHz Dual Pentium : ~ 28 s Hardware - 50 MHz : ~ 5.4 s Hardware (Advanced Architecture) - 50 MHz : ~ 1.3 s Parameters: 1024 projections 1024 samples per projection 512*512 pixels image 9-bit sinogram data 3-bit interpolation factor

Implementation Issues - fanout - prj_num(3) fanout = 1565 ! routing delay = 7.913 ns (~39.99%)

Implementation Issues - fanout - odd_2_A_4[4] fanout = 144 !

Memory Bridges Stuff 3 architectures implemented: “Simple Architecture” = non-parallel (on slide 6) “Advanced Architecture” = 4-way parallel (slide 12) “Bridge Free Advanced Arch” = as B but contains no memory bridges (all design buffers in BlockRAMs) from PCI bus to memory banks required for Host-Memory communication. Bridges are separate design that is downloaded before (after) design C is downloaded so that input data can be stored to (output data read from) memories on the WildStar board. Virtex1000 resource utilization: 11% logic, 90% BlockRAMs (with bridges) 39% logic, 100% BlockRAMs 21% logic, 100% BlockRAMs

“Bridge Free Advanced Architecture” (design C on the previous slide) Floorplan of the “Bridge Free Advanced Architecture” (design C on the previous slide)

Future Directions Graduate