Acceleration of the Retinal Vascular Tracing Algorithm using FPGAs Direction Filter Design FPGA FIREBIRD BOARD Framegrabber PCI Bus Host Data Packing Design.

Slides:



Advertisements
Similar presentations
3D Graphics Content Over OCP Martti Venell Sr. Verification Engineer Bitboys.
Advertisements

Sumitha Ajith Saicharan Bandarupalli Mahesh Borgaonkar.
System Design Tricks for Low-Power Video Processing Jonah Probell, Director of Multimedia Solutions, ARC International.
The 3D FDTD Buried Object Detection Forward Model used in this project was developed by Panos Kosmas and Dr. Carey Rappaport of Northeastern University.
K-means clustering –An unsupervised and iterative clustering algorithm –Clusters N observations into K clusters –Observations assigned to cluster with.
Zheming CSCE715.  A wireless sensor network (WSN) ◦ Spatially distributed sensors to monitor physical or environmental conditions, and to cooperatively.
Super Fast Camera System Performed by: Tokman Niv Levenbroun Guy Supervised by: Leonid Boudniak.
1 Performed By: Khaskin Luba Einhorn Raziel Einhorn Raziel Instructor: Rivkin Ina Spring 2004 Spring 2004 Virtex II-Pro Dynamical Test Application Part.
BEEKeeper Remote Management and Debugging of Large FPGA Clusters Terry Filiba Navtej Sadhal.
Zach Allen Chris Chan Ben Wolpoff Shane Zinner Project Z: Stereo Range Finding Based on Motorola Dragonball Processor.
1 FPGA Lab School of Electrical Engineering and Computer Science Ohio University, Athens, OH 45701, U.S.A. An Entropy-based Learning Hardware Organization.
Photoshop Plug-ins with Reconfigurable Logic Implementing a Skeletonization algorithm on the VCC Hotworks Development System (Xilinx XC6200) Mark L. Chang.
CROSSBAR LAN TEAM 8 CURTIS PETE D. ERIC ANDERSON DANIEL HYINK JOHN MUFARRIGE.
A Parameterized Floating Point Library Applied to Multispectral Image Clustering Xiaojun Wang Dr. Miriam Leeser Rapid Prototyping Laboratory Northeastern.
A Performance and Energy Comparison of FPGAs, GPUs, and Multicores for Sliding-Window Applications From J. Fowers, G. Brown, P. Cooke, and G. Stitt, University.
Vector Multiplication & Color Convolution Team Members Vinay Chinta Sreenivas Patil EECC VLSI Design Projects Dr. Ken Hsu.
Real Time Image Feature Vector Generator Employing Functional Cache Memory for Edge Takuki Nakagawa, Department of Electronic Engineering The University.
Mahesh Sukumar Subramanian Srinivasan. Introduction Face detection - determines the locations of human faces in digital images. Binary pattern-classification.
Image Processing With FPGAs Zach Fuchs Sarit Patel EEL April 2008.
Field Programmable Gate Array (FPGA) Layout An FPGA consists of a large array of Configurable Logic Blocks (CLBs) - typically 1,000 to 8,000 CLBs per chip.
System Architecture A Reconfigurable and Programmable Gigabit Network Interface Card Jeff Shafer, Hyong-Youb Kim, Paul Willmann, Dr. Scott Rixner Rice.
DSP-FPGA Based Image Processing System Final Presentation Jessica Baxter  Sam Clanton Simon Fung-Kee-Fung Almaaz Karachi  Doug Keen Computer Integrated.
Virtualized FPGA accelerators in Cloud Computing Systems
February 12, 1998 Aman Sareen DPGA-Coupled Microprocessors Commodity IC’s for the Early 21st Century by Aman Sareen School of Electrical Engineering and.
Impulse Embedded Processing Video Lab Generate FPGA hardware Generate hardware interfaces HDL files HDL files FPGA bitmap FPGA bitmap C language software.
1 Electronics Lab, Physics Dept., Aristotle Univ. of Thessaloniki, Greece 2 Micro2Gen Ltd., NCSR Demokritos, Greece 17th IEEE International Conference.
By: Oleg Schtofenmaher Maxim Fudim Supervisor: Walter Isaschar Characterization presentation for project Winter 2007 ( Part A)
Design and Characterization of TMD-MPI Ethernet Bridge Kevin Lam Professor Paul Chow.
Elad Hadar Omer Norkin Supervisor: Mike Sumszyk Winter 2010/11, Single semester project. Date:22/4/12 Technion – Israel Institute of Technology Faculty.
MOI PROJECT Gugulethu Mabuza Bachelor Science Electrical Engineering Michigan State University.
DLS Digital Controller Tony Dobbing Head of Power Supplies Group.
Efficient FPGA Implementation of QR
RiceNIC: A Reconfigurable and Programmable Gigabit Network Interface Card Jeff Shafer, Dr. Scott Rixner Rice Computer Architecture:
FPGA FPGA2  A heterogeneous network of workstations (NOW)  FPGAs are expensive, available on some hosts but not others  NOW provide coarse- grained.
LECC2003 AmsterdamMatthias Müller A RobIn Prototype for a PCI-Bus based Atlas Readout-System B. Gorini, M. Joos, J. Petersen (CERN, Geneva) A. Kugel, R.
Research on Reconfigurable Computing Using Impulse C Carmen Li Shen Mentor: Dr. Russell Duren February 1, 2008.
Implementing Codesign in Xilinx Virtex II Pro Betim Çiço, Hergys Rexha Department of Informatics Engineering Faculty of Information Technologies Polytechnic.
The Bernard M. Gordon Center for Subsurface Sensing and Imaging Systems Charles V. Stewart Dept. of Computer Science Rensselaer Poly. Inst. CenSSIS Charles.
Simply Gaming Final Project Project Leader: PJ Acevedo Fall 2009.
Design of a Novel Bridge to Interface High Speed Image Sensors In Embedded Systems Tareq Hasan Khan ID: ECE, U of S Term Project (EE 800)
A Configurable High-Throughput Linear Sorter System Jorge Ortiz Information and Telecommunication Technology Center 2335 Irving Hill Road Lawrence, KS.
MAPLD 2005/254C. Papachristou 1 Reconfigurable and Evolvable Hardware Fabric Chris Papachristou, Frank Wolff Robert Ewing Electrical Engineering & Computer.
An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.
Suppression of the eyelash artifact in ultra-widefield retinal images Vanessa Ortiz-Rivera – Dr. Badrinath Roysam, Advisor –
Rinoy Pazhekattu. Introduction  Most IPs today are designed using component-based design  Each component is its own IP that can be switched out for.
4/19/20021 TCPSplitter: A Reconfigurable Hardware Based TCP Flow Monitor David V. Schuehler.
Floating-Point Divide and Square Root for Efficient FPGA Implementation of Image and Signal Processing Algorithms Xiaojun Wang, Miriam Leeser
Algorithm and Programming Considerations for Embedded Reconfigurable Computers Russell Duren, Associate Professor Engineering And Computer Science Baylor.
Stores the OS/data currently in use and software currently in use Memory Unit 21.
A Monte Carlo Simulation Accelerator using FPGA Devices Final Year project : LHW0304 Ng Kin Fung && Ng Kwok Tung Supervisor : Professor LEONG, Heng Wai.
Company LOGO Final presentation Spring 2008/9 Performed by: Alexander PavlovDavid Domb Supervisor: Mony Orbach GPS/INS Computing System.
Wang Chen, Dr. Miriam Leeser, Dr. Carey Rappaport Goal Speedup 3D Finite-Difference Time-Domain.
SLAAC SLD Update Steve Crago USC/ISI September 14, 1999 DARPA.
Real-Time Turbo Decoder Nasir Ahmed Mani Vaya Elec 434 Rice University.
-BY KUSHAL KUNIGAL UNDER GUIDANCE OF DR. K.R.RAO. SPRING 2011, ELECTRICAL ENGINEERING DEPARTMENT, UNIVERSITY OF TEXAS AT ARLINGTON FPGA Implementation.
Backprojection and Synthetic Aperture Radar Processing on a HHPC Albert Conti, Ben Cordes, Prof. Miriam Leeser, Prof. Eric Miller
Sherman Braganza, Miriam Leeser Goal Accelerate the performance of the minimum L P Norm phase unwrapping algorithm.
Augmented Reality and 3D modelling Done by Stafford Joemat Supervised by Mr James Connan.
® Virtex-E Extended Memory Technical Overview and Applications.
Lecture 4 General-Purpose Input/Output NCHUEE 720A Lab Prof. Jichiang Tsai.
Fast and parallel implementation of Image Processing Algorithm using CUDA Technology On GPU Hardware Neha Patil Badrinath Roysam Department of Electrical.
1 An FPGA Implementation of the Two-Dimensional Finite-Difference Time-Domain (FDTD) Algorithm Wang Chen Panos Kosmas Miriam Leeser Carey Rappaport Northeastern.
Philipp Gysel ECE Department University of California, Davis
Implementation of Real Time Image Processing System with FPGA and DSP Presented by M V Ganeswara Rao Co- author Dr. P Rajesh Kumar Co- author Dr. A Mallikarjuna.
Calliope-Louisa Sotiropoulou C OGNITIVE I MAGING U SING FTK H ARDWARE M EETING ON M EDICAL I MAGING.
Backprojection Project Update January 2002
Hiba Tariq School of Engineering
Parallel Beam Back Projection: Implementation
Wavelet “Block-Processing” for Reduced Memory Transfers
Final Project presentation
Presentation transcript:

Acceleration of the Retinal Vascular Tracing Algorithm using FPGAs Direction Filter Design FPGA FIREBIRD BOARD Framegrabber PCI Bus Host Data Packing Design BlockRAM Image Data MEMORY 0 Image Data MEMORY 1 Memory Switching Design Miriam Leeser Shawn Miller Smart Camera:Provides Host PC with image data along with image processing results at frame rate with low latency Results MEMORY 2 Results MEMORY 3 Memory Switching Design

Retinal Vascular Tracing Application Goal: Detection and enhancement of the vascular structure of a patient’s retina from a live video feed Latency and throughput requirements of real-time image processing cannot be provided by software on a general-purpose processor

Timing Issues Return results for each pixel at frame rate of camera Very Low Latency Surgical laser must be shut off immediately after detecting that it is aimed incorrectly Cannot tolerate 1 frame delay (33msec at 30 frames/sec) Complex memory management required to achieve minimum latency Latency currently on the order of 100  sec Storage of a 5x5 pixel image Memory 0Memory 1

Implementation Hardware Acceleration  Template responses are calculated in hardware in parallel  All pixels in the image are processed  Camera connected directly to the board (no host interaction)  Only the results are sent to the host after they become available, and while new results are being calculated Very Low Latency  Surgical laser must be shut off immediately if we detect that it is aimed incorrectly  Cannot tolerate a one frame delay  Complex memory management scheme must be introduced Algorithm What does the algorithm do?  Retinal vascular tracing: detection of blood vessels in images of the retina  The algorithm finds blood vessels and traces out their structure Where is the algorithm used?  Processing live video of the patient retina during laser retinal surgery  Highlighting the vascular structure helps the surgeon avoid damage Why do we need to accelerate it?  Current implementation: software on a general-purpose processor  Images are 512x512 pixels, and need to be processed at frame rate Acceleration of the Retinal Vascular Tracing Algorithm using FPGAs Miriam Leeser Shawn Miller Badrinath Roysam Charles Stewart Ken Fritsche This work was supported in part by CenSSIS, the Center for Subsurface Sensing and Imaging Systems, under the Engineering Research Centers Program of the National Science Foundation (Award Number EEC ). More Information In proceedings: Rapid Automated Tracing and Feature Extraction from Retinal Fundus Images Using Direct Exploratory Algorithms A. Can, H. Shen, J.N. Turner, H.L. Tanenbaum and B. Roysam, IEEE Transactions on Information Technology in Biomedicine, June 99. On the web: Original Image Each pixel is passed through the design unaltered. Direction The direction template with the maximum response for each pixel. The direction is represented by a value between 0 and 15. Response The maximum response that led to the direction decision for each pixel. Results Objective To accelerate retinal vascular tracing by implementing computation of template responses in reconfigurable hardware. Reconfigurable Hardware Firebird reconfigurable computing engine from Annapolis Micro Systems  1 Xilinx VIRTEX E (XCV2000E) FPGA  5 Memory banks (4 x 64-bit, 1 x 32-bit)  5.4 Gbytes/sec of memory bandwidth  66Mhz/64-bit PCI interface to host FIREBIRD BOARD Image Data MEMORY 2 Image Data MEMORY 3 PCI Bus Host Direction Filter Design FPGA Data Packing Design BlockRAM Memory Switching Design Image Data MEMORY 0 Image Data MEMORY 1 Memory Switching Design Framegrabber (Dillon Eng.) Direction Templates Stand-alone camera outputs only image data. Our design outputs not only image data, but directional template responses as well. The cost of additional image processing is a latency on the order of seconds. This is a low cost when considering that at 30 frames/sec, a new frame of image data is introduced every 33 msec. The application for this project is Retinal Vascular Tracing, but the same method can be applied to any problem that requires real-time image processing. Conclusions Memory Management Problems  Must continuously write new data to input memory from the camera and be reading the data to be processed  Cannot read and write from one memory on the same clock cycle  The image is stored row-wise, but must be read column-wise Solution  Store the image in “checkerboard” fashion in two memories. Every other pixel is stored in a different memory  A column of data is read by alternating between the two memories on every clock cycle Input Memory 0 Input Memory 1 Clock Writing data from camera Reading data to be processed Inactive Time Memory 0Memory 1 Checkerboard storage of a 5x5 image