Fast Hessian Accelerator Hardware Specification V 1.0 Ahmed Al Maashri 10 June 2012 Updated 13 July 2012
Introduction The Fast Hessian (FH) accelerator is a hardware core that computes the Hessian determinants in a streaming fashion. The accelerator supports the first 3 octaves specified in the SURF algorithm. The accelerator is written to serve as an “operator” which can be instantiated within an SOP.
Algorithm Background The fast hessian is a sub-stage in the interest point detection within SURF algorithm The fast hessian is a group of box filters that vary in size and step size. It operates on an integral image
Fast Hessian Operator fh_accelerator_top src_valid_in dst_rdy_in data_in [31:0] new_frame load_config config [127:0] dst_rdy_out src_valid_out_0 data_out_0 [32:0]... src_valid_out_7 data_out_7 [32:0] clk
Fast Hessian (cont.) Input PortsDescription new_frameAsserted when a new frame is being processed. This signal can be used to perform any necessary initialization src_valid_inAsserted to indicate presence of a new pixel data_in [31:0]Integral image pixel load_configAsserted to indicate the presence of a new configurations config [127:0]Configuration bus: image_width [10:0] image_height [21:11] Reserved [127:20] dst_rdy_outAsserted to indicate that destination is ready to accept output Output PortsDescription dst_rdy_inAsserted to indicate that the operator is willing to accept input pixels src_valid_out_xAsserted to indicate that presence of a valid output at the x th filter. With a 0-based index; 0 being 9x9 filter, 1 is 15x15, …. 7 is 99x99 data_out_x [32:0]Data bus of the results: filter_response [31:0] lapacian [32]
FH subcomponents Control i3b f0 f1 f2 f3 f4 f5 f6 f7 Bank of box filters new_frame dst_rdy_in data_in [31:0] load_config config [127:0] src_valid_in i3b_img_widthi3b_img_heighti3b_init octave_mode [2:0] filters_rdy filters_valid_in [7:0] filters_complete[7:0] dst_rdy_out src_valid_out_0 data_out_0 [32:0]... src_valid_out_7 data_out_7 [32:0] filters_data_in [1023:0] new_frame filters_complete[0:0] filters_valid_in [0:0]
Box Filter 8 filters, each has different filter size. Each filter needs to compute three values: Dxx, Dyy and Dxy. Each one of these components needs to be normalized using a weight ω that is equal to the inverse of the filter size squared (i.e. 1 / filter_size^2): Dxx * ω, Dyy * ω, Dxy 8 ω Then, the hessian response is computed as follows: (Dxx*Dyy) – (CF * Dxy 2 ) where CF is the compensation factor (i.e. 0.81) Also, the sign of the response (i.e. laplacian) is computed as follows: Dxx + Dyy >= 0 ? 1 : 0
Box Filter (cont.) The figure below shows a fully-pipelined implementation of the filter The pipeline is “freezed” if the destination is not ready to accept results A B C D A B C D CC Dxx Dyy Dxy * * 6 CC Dxx*Dyy Dxy^2 4 CC 0.81 * Dxx*Dyy 0.81 * Dxy^2 + 1 CC + > 0 laplacian response Dxx ω Dyy ω Dxy ω CC
Controller Responsible for the following: Resetting when a new frame is received Loading configurations to i3b buffer Requesting a new batch of pixels for the box integrals. To do this, the controller uses the signal octave_mode as follows: 3’b000: No request 3’b001: Octave 1 only 3’b010: Octaves 1 and 2 3’b100: All octaves (1, 2, and 3) The controller keeps track of the progress of the filter’s sliding window, using pre-defined step sizes, it determines which octaves are active and which are idle for the current iteration.
i3b Intelligent Integral Image Buffer (i3b): Manages the storage of the integral image This memory buffer is responsible for the following: – Managing integral image buffering – Supplying the hessian filters with data while managing the position of the sliding window of the filter – Throttling the integral image source in case the box filters are unable to accept input When i3b_init is asserted, the module registers the image width and height and resets its controller Depending on the octave_mode, the module will present the desired batches of integral pixels in iterations over multiple cycles. Each step the present the pixels corresponding to a particular filter, starting from the first filter (i.e. 9x9). One-hot- notation filters_valid bus is used to indicate the current filter. As soon as all batches of pixels are produced, the done signal is asserted. Concurrently, i3b progresses the sliding window in the x- or y- direction, depending on the current position. After visiting all positions with the sliding window, the module resets the buffer and gets ready for the next frame.
i3b n dst_rdy_in data_in [31:0] src_valid_in i3b_img_width i3b_img_height i3b_init octave_mode [2:0] n n n-1 Controller filters_rdy filters_valid [7:0] filters_data_in [1023:0] r0 r1 r2 k-1 n = 1024, k = 100 new _frame filters_complete [7:0]