Download presentation
Presentation is loading. Please wait.
Published byLinette Hall Modified over 9 years ago
1
Deconstruction: Discriminative learning of local image descriptors Samantha Horvath Learning Based Methods in Vision 2/14/2012
2
Introduction Computer vision makes use of many “hand- crafted” descriptors. These descriptors share many common components This paper presents a modular framework for designing and optimizing new feature descriptors
3
Common Descriptors SIFT ▫Most well-known descriptor ▫Quantized gradient vectors ▫Grid based spatial histogram ▫Post-normalization PCA-SIFT ▫Quantized gradient vectors ▫PCA to reduce dimensionality GLOH ▫Quantized gradient vectors ▫Polar based histogram ▫PCA to reduce dimensionality
4
Common descriptors SURF ▫Haar wavelet responses ▫Grid based histograms HOG ▫Dense SIFT Shape Context ▫Extract points from object contour ▫Polar based histogram Geometric blur ▫Extract sparse channels ▫Apply spatially varying blur ▫Subsample to create descriptor
5
Image with Interest point Extracted Patch Feature extraction Pooling Dimensionality reduction
7
Generalized Framework Represent each portion of the descriptor algorithm as an interchangeable block Blocks inspired by existing descriptor algorithms Blocks are organized into candidate descriptors, then parameters are optimized
8
Algorithmic building blocks G-block: Gaussian smoothing T-block: non-linear transformation S-block: Spatial summation/pooling E-block: Embedding (dimensionality reduction) N-block: Normalization
9
General Pipeline
10
Transformations T1 – Gradient Orientation binning Variants: number of orientation bins T2 – Rectified gradient binning Variants: number of rectification bins T3 – Steerable filters Variants: filter order and number of orientations T4 – Difference of Gaussian Responses (center- surround) Parameter: size of center T5 – Haar wavelet transform T6 – Fixed 4 x 4 classifier T7 – Quantized gray levels Variant: number of gray level bins
11
Spatial summation blocks - parametric S1 - SIFT style bilinear weighted grid Parameters: overall footprint size (continuous) S2 – GLOH style log- polar regions Variants: number/arrangement of pooling regions Parameters: Ring radius
12
Spatial summation blocks - parametric S3 – Gaussian weighted pooling regions on a grid Variants: Grid size Parameters: Grid sample positions, Size of the gaussians S4 – Gaussian weighted, polar arranged pooling regions Variants: number of pooling regions Parameters: Ring radii, gaussian kernel size, relative angular rotation
13
Embedding – non-parametric E1 – PCA (non discriminative technique) E2/E3 – Projection minimizes the ratio of in-class variance for match pairs to the variance of all match pairs (LPP) E4/E5 – Projection minimizes the ratio of variance between matched and non-matched pairs (LDE) E6/E7 – Projection minimizes the ratio of in-class variance for match pairs to the total data variance (GLDE)
14
Smoothing and Normalization Gaussian smoothing ▫Parameter: σ Normalization ▫Normalize to unit vector ▫Clip to threshold ▫Renormalize, rinse, repeat ▫Parameters: clipping threshold
15
New names! SIFT ▫T1b – S1-16 GLOH ▫T1b – S2-17 – E1 PCA-SIFT ▫T1b – E1 SURF ▫T5 – S1
17
Ground Truth dataset
18
Ground truth dataset Uses camera calibration and dense multi-view stereo data DoG interest points are detected Interest points are mapped from one image to another view of the scene Stereo constraints are used to help match interest points
19
Same to same vs. like to like
20
Learning/optimization The G, S,N blocks and one T block (T4) contain parameters for optimization The G, S, N, and T blocks are jointly optimized using Powell minimization ▫Powell minimization a conjugate gradient method – does not require derivatives ▫Optimization initialized with reasonable values E blocks are optimized separately – generalized eigenvalue problem ▫Power regularization to avoid over fitting
22
Dimension Reduction on SIFT
23
T3 blocks (rectified steerable filters) with polar summation regions (S4/S2) performed the best Consistently 0utperformed SIFT descriptors
24
Optimal Summation regions Optimal summation regions are foveated!!!
25
Foveated summation regions The S4-25 spatial pooling variant is very similar to the DAISY descriptor (designed for dense matching) Foveated regions are similar to geometric blur (increased blurring away from interest point)
26
Results – Pipeline 2 Performance more varied Steerable filters still perform the best LDE and LPP best embedding methods Does not consistently outperform SIFT
28
Results – Pipeline 3 Dimensionality reduction after learned T/S block combination Greatly outperforms SIFT Straightforward PCA works the best
29
Thoughts Would like to see how the optimized spatial pooling blocks vary with the different training sets Ultimately, would like to see this framework tested on different dataset types Difficulty is getting “ground truth” matches for identification/classification tasks
31
Synthesis Majority of my computer vision research involves medical imaging Medical images are very different from natural scene images ▫Images represent a planar slice through the patient ▫Often poor contrast between different structures
32
Synthesis Interest point detection has applications for medical imaging Non-rigid registrations ▫Warp one set of interest points to overlay the second set Tracking
33
References 1.M. Brown, G. Hua and S. Winder, Discriminant Learning of Local Image Descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2010. (original) 2.S. Winder and M. Brown, “Learning local image descriptors,” in Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR07), Minneapolis, June 2007. (technical) 3.K. Mikolajczyk and C. Schmid, “Scale and affine invariant interest point detectors,” International Journal of Computer Vision, vol. 1, no. 60, pp. 63–86, 2004. (overview) 4.E. Tola, V. Lepetit, and P. Fua, “A fast local descriptor for dense matching,” in Proceedings of the International Conference on Computer Vision and Pattern Recognition, Anchorage, June 2008. (daisy) 5.Y. Ke and R. Sukthankar, “PCA-SIFT: a more distinctive representation for local image descriptors,” in Proceedings of the International Conference on Computer Vision and Pattern Recognition, vol. 2, July 2004, pp. 506–513. (PCA-sift) 6.A. Berg and J. Malik, “Geometric blur and template matching,” in International Conference on Computer Vision and Pattern Recognition, 2001, pp. I:607–614. 7.D. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004. (SIFT) 8.Bay,H,. Tuytelaars, T., &Van Gool, L.(2006). “SURF: Speeded Up Robust Features”, 9 th European Conference on Computer Vision. 9.G. Sharma, F. Jurie, Learning discriminative spatial representation for image classification, British Machine Vision Conference (BMVC.11) (grid paper) 10.W. T. Freeman and E. H. Adelson, “The design and use of steerable filters,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, pp. 891–906, 1991.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.