Fast Hessian Accelerator Hardware Specification V 1.0 Ahmed Al Maashri 10 June 2012 Updated 13 July 2012.

Slides:



Advertisements
Similar presentations
Chapter Three: Interconnection Structure
Advertisements

System Integration and Performance
Computer Organization, Bus Structure
Data Dependencies Describes the normal situation that the data that instructions use depend upon the data created by other instructions, or data is stored.
COMP25212 Advanced Pipelining Out of Order Processors.
Sumitha Ajith Saicharan Bandarupalli Mahesh Borgaonkar.
Pipeline transfer testing. The purpose of pipeline transfer increase the bandwidth for synchronous slave peripherals that require several cycles to return.
Computer Organization. This module surveys the physical resources of a computer system. –Basic components CPUMemoryBus I/O devices –CPU structure Registers.
Chapter 9 Memory Basics Henry Hexmoor1. 2 Memory Definitions  Memory ─ A collection of storage cells together with the necessary circuits to transfer.
DIRECT MEMORY ACCESS CS 147 Thursday July 5,2001 SEEMA RAI.
Recap – Our First Computer WR System Bus 8 ALU Carry output A B S C OUT F 8 8 To registers’ input/output and clock inputs Sequence of control signal combinations.
EECS 470 Cache and Memory Systems Lecture 14 Coverage: Chapter 5.
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Hao Ji.
Interface circuits I/O interface consists of the circuitry required to connect an I/O device to a computer bus. Side of the interface which connects to.
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
Copyright ©: Nahrstedt, Angrave, Abdelzaher
Power Amplifiers Power Amplifiers are used in the transmitter
Computer Architecture Lecture 08 Fasih ur Rehman.
Harris Corner Detector on FPGA Rohit Banerjee Jared Choi : Parallel Computer Architecture and Programming.
Sub- Nyquist Sampling System Hardware Implementation System Architecture Group – Shai & Yaron Data Transfer, System Integration and Debug Environment Part.
1.  Project Goals.  Project System Overview.  System Architecture.  Data Flow.  System Inputs.  System Outputs.  Rates.  Real Time Performance.
The Structure of the CPU
Computer Organization Computer Organization & Assembly Language: Module 2.
Unit I Digital computer: functional units and their interconnections Mr. Mukul Varshney.
THE COMPUTER SYSTEM. Lecture Objectives Computer functions – Instruction fetch & execute – Interrupt Handling – I/O functions Interconnections Computer.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Principles of I/0 hardware.
CIS250 OPERATING SYSTEMS Memory Management Since we share memory, we need to manage it Memory manager only sees the address A program counter value indicates.
Chapter 21 R(x) Algorithm a) Anomaly Detection b) Matched Filter.
VHDL IE- CSE. What do you understand by VHDL??  VHDL stands for VHSIC (Very High Speed Integrated Circuits) Hardware Description Language.
Introduction to structured VLSI Projects 4 and 5 Rakesh Gangarajaiah
ARM MIPS.  32 registers, each 32-bit wide. 30 are general purpose, R30(Hi) and R31(Low) are reserved for the results of long multiplication (64-bit.
Generate Statements Instructors: Fu-Chiung Cheng ( 鄭福炯 ) Associate Professor Computer Science & Engineering Tatung University.
1 Implementation in Hardware of Video Processing Algorithm Performed by: Yony Dekell & Tsion Bublil Supervisor : Mike Sumszyk SPRING 2008 High Speed Digital.
Chapter 4 MARIE: An Introduction to a Simple Computer.
7. Peripherals 7.1 Introduction of peripheral devices Computer Studies (AL)
Fast Census Transform-based Stereo Algorithm using SSE2
Lecture 9 RTL Design Methodology. Structure of a Typical Digital System Datapath (Execution Unit) Controller (Control Unit) Data Inputs Data Outputs Control.
80386DX functional Block Diagram PIN Description Register set Flags Physical address space Data types.
Video Tracking G. Medioni, Q. Yu Edwin Lei Maria Pavlovskaia.
Part A Final Dor Obstbaum Kami Elbaz Advisor: Moshe Porian August 2012 FPGA S ETTING U SING F LASH.
Lecture 4 General-Purpose Input/Output NCHUEE 720A Lab Prof. Jichiang Tsai.
Rohini Ravichandran Kaushik Narayanan A MINI STEREO DIGITAL AUDIO PROCESSOR (BEHAVIORAL MODEL)
Implementing Fast Image Processing Pipelines in a Codesign Environment Accelerate image processing tasks through efficient use of FPGAs. Combine already.
1 ECE 545 – Introduction to VHDL Algorithmic State Machines Sorting Example ECE 656 Lecture 8.
Elements of Datapath for the fetch and increment The first element we need: a memory unit to store the instructions of a program and supply instructions.
TX Application Architecture WINLAB – Rutgers University Date : July 27th 2009 Authors : Prasanthi Maddala, Khanh.
Essential components of the implementation are:  Formation of the network and weight initialization routine  Pixel analysis of images for symbol detection.
Frank Bergschneider February 21, 2014 Presented to National Instruments.
WINLAB Open Cognitive Radio Platform Architecture v1.0 WINLAB – Rutgers University Date : July 27th 2009 Authors : Prasanthi Maddala,
ITK 9.3. Level Set Segmentation Shape Detection Segmentation
Buffering Techniques Greg Stitt ECE Department University of Florida.
STUDY OF PIC MICROCONTROLLERS.. Design Flow C CODE Hex File Assembly Code Compiler Assembler Chip Programming.
Computer Organization
Lab 4 HW/SW Compression and Decompression of Captured Image
16.317: Microprocessor System Design I
Introduction Introduction to VHDL Entities Signals Data & Scalar Types
Morgan Kaufmann Publishers
CHAPTER 4 TOP LEVEL VIEW OF COMPUTER FUNCTION AND INTERCONNECTION
Interfacing Memory Interfacing.
Out of Order Processors
Hardware Acceleration of the Lifting Based DWT
Chapter 2: System Structures
RTL Design Methodology Transition from Pseudocode & Interface
ECE 352 Digital System Fundamentals
RTL Design Methodology
Computer Operation 6/22/2019.
Presentation transcript:

Fast Hessian Accelerator Hardware Specification V 1.0 Ahmed Al Maashri 10 June 2012 Updated 13 July 2012

Introduction The Fast Hessian (FH) accelerator is a hardware core that computes the Hessian determinants in a streaming fashion. The accelerator supports the first 3 octaves specified in the SURF algorithm. The accelerator is written to serve as an “operator” which can be instantiated within an SOP.

Algorithm Background The fast hessian is a sub-stage in the interest point detection within SURF algorithm The fast hessian is a group of box filters that vary in size and step size. It operates on an integral image

Fast Hessian Operator fh_accelerator_top src_valid_in dst_rdy_in data_in [31:0] new_frame load_config config [127:0] dst_rdy_out src_valid_out_0 data_out_0 [32:0]... src_valid_out_7 data_out_7 [32:0] clk

Fast Hessian (cont.) Input PortsDescription new_frameAsserted when a new frame is being processed. This signal can be used to perform any necessary initialization src_valid_inAsserted to indicate presence of a new pixel data_in [31:0]Integral image pixel load_configAsserted to indicate the presence of a new configurations config [127:0]Configuration bus:  image_width [10:0]  image_height [21:11]  Reserved [127:20] dst_rdy_outAsserted to indicate that destination is ready to accept output Output PortsDescription dst_rdy_inAsserted to indicate that the operator is willing to accept input pixels src_valid_out_xAsserted to indicate that presence of a valid output at the x th filter. With a 0-based index; 0 being 9x9 filter, 1 is 15x15, …. 7 is 99x99 data_out_x [32:0]Data bus of the results:  filter_response [31:0]  lapacian [32]

FH subcomponents Control i3b f0 f1 f2 f3 f4 f5 f6 f7 Bank of box filters new_frame dst_rdy_in data_in [31:0] load_config config [127:0] src_valid_in i3b_img_widthi3b_img_heighti3b_init octave_mode [2:0] filters_rdy filters_valid_in [7:0] filters_complete[7:0] dst_rdy_out src_valid_out_0 data_out_0 [32:0]... src_valid_out_7 data_out_7 [32:0] filters_data_in [1023:0] new_frame filters_complete[0:0] filters_valid_in [0:0]

Box Filter 8 filters, each has different filter size. Each filter needs to compute three values: Dxx, Dyy and Dxy. Each one of these components needs to be normalized using a weight ω that is equal to the inverse of the filter size squared (i.e. 1 / filter_size^2): Dxx * ω, Dyy * ω, Dxy 8 ω Then, the hessian response is computed as follows: (Dxx*Dyy) – (CF * Dxy 2 ) where CF is the compensation factor (i.e. 0.81) Also, the sign of the response (i.e. laplacian) is computed as follows: Dxx + Dyy >= 0 ? 1 : 0

Box Filter (cont.) The figure below shows a fully-pipelined implementation of the filter The pipeline is “freezed” if the destination is not ready to accept results A B C D A B C D CC Dxx Dyy Dxy * * 6 CC Dxx*Dyy Dxy^2 4 CC 0.81 * Dxx*Dyy 0.81 * Dxy^2 + 1 CC + > 0 laplacian response Dxx ω Dyy ω Dxy ω CC

Controller Responsible for the following:  Resetting when a new frame is received  Loading configurations to i3b buffer  Requesting a new batch of pixels for the box integrals. To do this, the controller uses the signal octave_mode as follows: 3’b000: No request 3’b001: Octave 1 only 3’b010: Octaves 1 and 2 3’b100: All octaves (1, 2, and 3)  The controller keeps track of the progress of the filter’s sliding window, using pre-defined step sizes, it determines which octaves are active and which are idle for the current iteration.

i3b Intelligent Integral Image Buffer (i3b): Manages the storage of the integral image This memory buffer is responsible for the following: – Managing integral image buffering – Supplying the hessian filters with data while managing the position of the sliding window of the filter – Throttling the integral image source in case the box filters are unable to accept input When i3b_init is asserted, the module registers the image width and height and resets its controller Depending on the octave_mode, the module will present the desired batches of integral pixels in iterations over multiple cycles. Each step the present the pixels corresponding to a particular filter, starting from the first filter (i.e. 9x9). One-hot- notation filters_valid bus is used to indicate the current filter. As soon as all batches of pixels are produced, the done signal is asserted. Concurrently, i3b progresses the sliding window in the x- or y- direction, depending on the current position. After visiting all positions with the sliding window, the module resets the buffer and gets ready for the next frame.

i3b n dst_rdy_in data_in [31:0] src_valid_in i3b_img_width i3b_img_height i3b_init octave_mode [2:0] n n n-1 Controller filters_rdy filters_valid [7:0] filters_data_in [1023:0] r0 r1 r2 k-1 n = 1024, k = 100 new _frame filters_complete [7:0]