Download presentation
Presentation is loading. Please wait.
1
Three-Dimensional Template Correlation: Object Recognition in 3D Voxel Data Tom VanCourtBoston University Yongfeng GuECE Department Martin Herbordt CAAD lab www.bu.edu/caadlab BOSTON UNIVERSITY
2
BOSTON UNIVERSITY CAMP `05 3D Template Matching 2 Increasing use of volumetric data sets Increasing use of volumetric data sets MRI / CAT, confocal microscopy, molecule structure MRI / CAT, confocal microscopy, molecule structure Increased complexity of correlation Increased complexity of correlation 2D: O(n 2 ) (x,y) O(n 1 ) rotations = O(n 3 ) 2D: O(n 2 ) (x,y) O(n 1 ) rotations = O(n 3 ) 3D : O(n 3 ) (x,y,z) O(n 3 ) rotations = O(n 6 ) 3D : O(n 3 ) (x,y,z) O(n 3 ) rotations = O(n 6 ) Transform techniques help a little: Transform techniques help a little: O(n 3 ) O(n 2 ) log nO(n 6 ) O(n 4 ) log n O(n 3 ) O(n 2 ) log nO(n 6 ) O(n 4 ) log n Solution: Application-specific accelerators Solution: Application-specific accelerators Programmable off-the-shelf hardware Programmable off-the-shelf hardware Custom logic design, unique to each application Custom logic design, unique to each application
3
BOSTON UNIVERSITY CAMP `05 3D Template Matching 3 Volumetric Data Sets Complex data types Complex data types Multiple fluorescence channels Multiple fluorescence channels Oriented data: flow vectors Oriented data: flow vectors Nonlinear scoring models Nonlinear scoring models True 3D data acquisition True 3D data acquisition Medical imaging (MRI, PET, CAT, …) Medical imaging (MRI, PET, CAT, …) Confocal microscopy Confocal microscopy Emerging techniques: Emerging techniques: Diffusion tensor tomography
4
BOSTON UNIVERSITY CAMP `05 3D Template Matching 4 COTS AND Custom? How? Field Programmable Gate Arrays Field Programmable Gate Arrays 1000s of uncommitted elements 1000s of uncommitted elements Custom processor built on demand Custom processor built on demand On-chip RAM bandwidth: >1TBit/sec On-chip RAM bandwidth: >1TBit/sec Massive parallelism: 100s-1000s of PEs Massive parallelism: 100s-1000s of PEs Accelerator is tailored to each application Accelerator is tailored to each application ~100% payload computation cycles ~100% payload computation cycles No load/store cycles No loop overhead cycles No address arithmetic cycles ~0% logic dedicated to unused features ~0% logic dedicated to unused features
5
BOSTON UNIVERSITY CAMP `05 3D Template Matching 5 Acceleration Strategy Standard approach: Standard approach: Accelerated approach: Accelerated approach: Transform Per Channel Rotated Image Molecule Grid Products of Transforms Correlation Result Molecule Grid Correlation Result FFT x FFT -1 Direct Correlation by Systolic Array Rotated Addressing
6
BOSTON UNIVERSITY CAMP `05 3D Template Matching 6 Correlation Pipeline Systolic 3D Correlation Voxel Value Rotation Rotated Image Access Data Reduction Filtering Customizable functions Customizable functions High data reuse High data reuse Direct correlation Direct correlation Beats FFT for modest problems Beats FFT for modest problems Generalizes correlation sum: Σ ijk F(A xyz, T ijk ) Generalizes correlation sum: Σ ijk F(A xyz, T ijk ) Natural for FPGA implementation Natural for FPGA implementation Regular structure Regular structure Simple data elements Simple data elements
7
BOSTON UNIVERSITY CAMP `05 3D Template Matching 7 Rotated Memory Access Load image once & reuse Load image once & reuse Access image in rotated order Access image in rotated order via index transformation via index transformation x i x j x k i x y i y j y k j = y z i z j z k k z Allows axis scaling, mirror reversal Allows axis scaling, mirror reversal Anisotropic: e.g. X,Y resolution ≠ Z No need for resampling ~0 delay & buffer overhead ~0 delay & buffer overhead Strength reduction eliminates multiplication Strength reduction eliminates multiplication Arithmetic cost hidden by pipelining Arithmetic cost hidden by pipelining x y i j
8
BOSTON UNIVERSITY CAMP `05 3D Template Matching 8 Voxel Value Rotation Not needed for scalar data (RGB, gray scale, etc) Not needed for scalar data (RGB, gray scale, etc) Step exists architecturally, as identity transform Step exists architecturally, as identity transform For spatially oriented data (e.g. fluid flow in brain tissue) For spatially oriented data (e.g. fluid flow in brain tissue) Perform rigid rotation of image … Perform rigid rotation of image … Then rotate oriented voxel values Then rotate oriented voxel values
9
BOSTON UNIVERSITY CAMP `05 3D Template Matching 9 Correlation Array 3D extension of conventional array 3D extension of conventional array Custom unit cell Custom unit cell Holds constant value for template Custom F(a, b) … 1D array + line buffer … 1D array + line buffer Extend line to result width … 2D array + plane buffer … 2D array + plane buffer Extend plane to result size … 3D array … 3D array One input voxel per cycle, padded One output correlation point per cycle A S in S out + F T A S in RAM FIFO
10
BOSTON UNIVERSITY CAMP `05 3D Template Matching 10 3D Correlation Result Template is stored in computation array Template is stored in computation array FIFOs hold partial correlation sums FIFOs hold partial correlation sums Template data and Computation array 3D Correlation result Whole volume shown FIFO line buffers Pad to result width FIFO plane buffers Pad to result depth Correlation complete Result passed to data reduction filter
11
BOSTON UNIVERSITY CAMP `05 3D Template Matching 11 Peak Capture / Data Reduction 3D result ≥ image size 3D result ≥ image size Full result would slow host Full result would slow host Template may occur > 1x Template may occur > 1x Find multiple maxima Find multiple maxima Reporting N highest points is not effective Reporting N highest points is not effective Instead: Local max by region 8x8x8 region– 512:1 reduction More maxima, less redundancy Record exact (x,y,z) in region B UT may miss close maxima Region template size may be OK Broad maximum reported redundantly Local maxima missed
12
BOSTON UNIVERSITY CAMP `05 3D Template Matching 12 Why Reconfigurable? Massive parallelism, modest cost Massive parallelism, modest cost COTS hardware, tracks technology COTS hardware, tracks technology Application-optimized processing Application-optimized processing Tracks application changes Tracks application changes Ex: 1, 2, 3-channel fluorescence Flexible performance tradeoffs Flexible performance tradeoffs Allows non-linear scoring Allows non-linear scoring Available now Available now PC add-ins PC add-ins SGI Altix SGI Altix Cray XD1 Cray XD1 24 bit RGB 8 bit Mono 4 bit
13
BOSTON UNIVERSITY CAMP `05 3D Template Matching 13 Performance Results Voxel value Voxel bits Logic per PE (slices) Number of PEs Clock MHz Speed: 10 9 SAC/sec 2-tuple211 2744 = 14 3 51.5141.9 3-tuple721 1331=11 3 46.161.3 2-tuple(nonlinear)544 729=9 3 30.622.2 2-tuple635 38.327.9 4-tuple(oriented)716 1331 = 11 3 46.361.7 Xilinx Virtex-II Pro VP70 Xilinx Virtex-II Pro VP70 Measured: Score-accumulate per sec (SAC/sec) Measured: Score-accumulate per sec (SAC/sec) Complex models not limited in number of bits Complex models not limited in number of bits Simple models not limited by worst-case speed Simple models not limited by worst-case speed
14
BOSTON UNIVERSITY CAMP `05 3D Template Matching 14 Conclusions Accelerators enable 3D template matching Accelerators enable 3D template matching >100x speedup over 3D FFT (n~100) >100x speedup over 3D FFT (n~100) Complex data types, including vector values Complex data types, including vector values Nonlinear comparisons supported Nonlinear comparisons supported Programmability avoids common limitations Programmability avoids common limitations No penalty due to over-generalization No penalty due to over-generalization No limit due to data/function restrictions No limit due to data/function restrictions 3D data and FPGA coprocessors match well 3D data and FPGA coprocessors match well Both are emerging and expanding Both are emerging and expanding FPGAs three years ago couldn’t do it! FPGAs three years ago couldn’t do it!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.