Features Induction Moravec Corner Detection Harris/Plessey Corner Detection FAST Corner Detection Scale Invariant Feature Transform (SIFT) Features Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
Introduction – Where have the edges gone? Given two images (a) and (b) taken at different times determine the movement of edge points from frame to frame… Need to talk about this. This is called the Aperture problem. Features Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
Introduction – Possible interpretations The Aperture Problem. Features Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
Introduction – Using Features / Corners instead Use corners / image features / interest points Corner = intersection of two edges Interest point = any feature which can be robustly detected Reduces number of points Easier to establish correspondences Spurious features To overcome this problem a common approach in computer vision is to instead make use of corners, image features or interest points (these terms are used interchangeably). Technically a corner is the intersection of two edges, whereas an interest point is any point which can be located robustly which includes corners but also includes such features as small marks. In this text we will refer to all of these by the most common term used which is “corners”. This use of corners significantly reduces the number of points which are considered from frame to frame and each of the points is more complex (than an edge point). Both of these factors make it much easier to establish reliable correspondences from frame to frame. Note also that there is a serious problem with spurious features caused by occlusions (look at where the maroon meets the grey – there would appear to be 3 features to track in our example!) Features Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
Introduction – Steps for corner detection Determine cornerness values. For each pixel Main difference Produces a Cornerness map. Non-maxima suppression. Multiple responses Compare to local neighbours Threshold the cornerness map. Significant corners. All of the algorithms for corner detection follow a similar serious of steps, which are presented here. Determine cornerness values. For each pixel in the image compute a cornerness value based on the local pixels. This computation is what distinguishes most corner detectors. The output of this stage is a cornerness map. Non-maxima suppression. Suppress all cornerness values which are less than a local neighbour (within n pixels distance; e.g. n=3) in order to avoid multiple responses for the same corner. Threshold the cornerness map. Finally we need to select those cornerness values which remain and are significant and this is typically done using thresholding (i.e. the cornerness value must be above some threshold T in order to be considered a corner. Features Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
Moravec corner detection Looks at the local variation around a point Compares local image patches where (u,v) { (-1,-1), (-1,0), (-1,1), (0,-1), (0,1) (1,-1), (1, 0), (1,1,) } and the Window is typically 3x3, 5x5 or 7x7 Select the Minimum value of Vu,v(i,j) The Moravec corner detector looks at the local variation around a point by comparing local images patches and computing the un-normalized local autocorrelation between those. In fact for each pixel it compares a patch centred on that pixel with 8 local patches which are simply shifted by a small amount (typically 1 pixel in each of the eight possible directions) from the current patch. It compares the patches using the following sum of squared differences formula and records the minimum of these 8 values as the cornerness for the pixel. See Figure 0‑3 for an illustrative (binary) example of the technique used. Features Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
Moravec – Binary Example Corner: Edge: Minimum difference: 2 Minimum difference: 0 Idea underlying the Moravec corner detector. Images of a corner and an edge are shown (a) & (c) together with the various patches used by the Moravec corner detector (b) & (d). Each of the patches with which the central patch is compared is annotated by the number of pixels which are different between that patch and the central patch. As the minimum difference is used, in the case of the edge the value computed is 0 whereas in the case of the corner the minimum computed is 2 Features Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
Moravec – Flaws. Anisotropic response Diagonal lines Smoothing Noisy response Larger area The Moravec corner detector has two flaws which have motivated the development of different detectors: Anisotropic response. If you consider the road sign in Figure 0‑4 it is apparent that the Moravec corner detector responses quite strongly to diagonal lines (whereas it does not response in the same way to vertical or horizontal lines. Hence the response on the operator is not isotropic. The anisotropic response can be reduced by smoothing in advance of applying the corner detector (See the last two lines in Figure 0‑4). Noisy response. The Moravec detector is also quite sensitive to noise (e.g. look at the grid like noise pattern in the cornerness map of the road sign in Figure 0‑4 which is caused by compression artifacts). This response to noise can be lessened by using a larger area or by smoothing before applying the corner detector Features Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
Harris / Plessey corner detection Difference – Cornerness determination Uses Partial derivatives Gaussian weighting Matrix Eigenvalues GoodFeaturesToTrackDetector harris_detector( 1000, 0.01, 10, 3, true ); vector<KeyPoint> keypoints; harris_detector.detect( gray_image, keypoints ); The Harris corner detector differs from the Moravec detector in how it determines the cornerness value. Rather than looking at the sum of squared differences it makes use of partial derivatives, a Gaussian weighting function, and the Eigenvalues of a matrix representation of the equation. Features Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
Harris / Plessey corner detection Consider the intensity variation for an arbitrary shift (Δi, Δj) as If Then The weighting function wx is used to put more weight on measurements which are made in the centre of the window. It is calculates using the Gaussian function. Features Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
Harris / Plessey corner detection From the matrix we can compute the Eigenvalues Both high => corner One high => edge None high => constant region Harris & Stephens proposed the following cornerness measure: This matrix equation then allows the computation of the eigenvalues (1, 2) of the matrix. If both eigenvalues are high then we have a corner (changes in both directions). If only one is high then we have an edge and otherwise we have a reasonably constant region. Based on this Harris and Stephens proposed the following cornerness measure: Empirically it has been determined that k should normally be between 0.04 and 0.06. The weighting function wx is used to put more weight on measurements which are made in the centre of the window. It is calculates using the Gaussian function. For example for a 3x3 window the weights are shown in Figure 0‑5 Gaussian weights for a 3x3 window as used in the Harris corner detector.Figure 0‑5. In algebra, a determinant is a function depending on n that associates a scalar, det(A), to every n×n square matrix A. The fundamental geometric meaning of a determinant is as the scale factor for volume when A is regarded as a linear transformation. In linear algebra, the trace of an n-by-n square matrix A is defined to be the sum of the elements on the main diagonal (the diagonal from the upper left to the lower right) of A. Equivalently, the trace of a matrix is the sum of its eigenvalues, making it an invariant with respect to chosen basis. In mathematics, a vector may be thought of as an arrow. It has a length, called its magnitude, and it points in some particular direction. A linear transformation may be considered to operate on a vector to change it, usually changing both its magnitude and its direction. An eigenvector of a given linear transformation is a vector which is multiplied by a constant called the eigenvalue during that transformation. The direction of the eigenvector is either unchanged by that transformation (for positive eigenvalues) or reversed (for negative eigenvalues). Features Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
Harris / Plessey corner detection The weighting function wx is used to put more weight on measurements which are made in the centre of the window. It is calculates using the Gaussian function. Features Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
Harris / Plessey – Pros and Cons More expensive computationally Sensitive to noise Somewhat anisotropic Pros: Very repeatable response Better detection rate The Harris corner detector is significantly more expensive computationally as compared to the Moravec corner detector. It is also quite sensitive to noise and does have somewhat of a anisotropic response (i.e. the response changes depending on the orientation). For all this the Harris detector is probably the commonly used corner detector – mainly due to two factors. It has a very repeatable response, and it has a better detection rate (i.e. taking into account true positives, false positives, true negatives and false negatives) than the Moravec detector. Mat display_image; drawKeypoints( image, keypoints, display_image, Scalar( 0, 0, 255 ) ); Features Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
Moravec & Harris/Plessey Four different images are shown (a) together with their Moravec cornerness maps (b), the Moravec corners detected shown as green crosses (c) and the Harris corners detected shown as green stars (d). In both (c) and (d) a minimum distance of 3 was used between corners. Note in the third case (which is part of a standard test image a large number of corners are found on the diagonal edge but once the image is smoothed (last case) which lessened the sampling effects, these extra corners disappear.. Features Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
FAST Corner Detection FROM “Machine learning for high-speed corner detection”, by Edward Rosten & T. Drummond, in ECCV 2006 Technique: Considers a circle of points If an arc of >= 9 points are all brighter or darker than the centre Circle found with strength t Where t is the minimum difference between any of the points in the arc and the centre point Features Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
FAST Corner Detection Features Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
FAST Corner Detection Ptr<FeatureDetector> feature_detector = FeatureDetector::create("FAST"); vector<KeyPoint> keypoints; cvtColor( image, gray_image, CV_BGR2GRAY ); feature_detector->detect( gray_image, keypoints ); // Or 5 times faster using the FASTX routine: FASTX( gray_image, keypoints, 50, true, FastFeatureDetector::TYPE_9_16 ); Features Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
Scale Invariant Feature Transform (SIFT) FROM “Distinctive Image Features from Scale-Invariant Keypoints”, by David G. Lowe in International Journal of Computer Vision, 60, 2 (2004), pp.91-110 Motivation: Providing repeatable robust features for Tracking, Recognition, Panorama Stitching, etc. Features: Invariant to scaling, & rotation. Partly invariant to illumination and viewpoint changes Features Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
SIFT – Overview & Contents Scale Space Extrema Detection Scale Space Difference of Gaussian Locate Extrema Accurate Keypoint Location Sub-pixel locate Filter response – remove low contrast and features primarily along an edge Keypoint Orientation assignment Keypoint Descriptors Matching Descriptors – including dropping poor ones Applications Features Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
SIFT – OpenCV code Ptr<FeatureDetector> feature_detector = FeatureDetector::create("SIFT"); vector<KeyPoint> keypoints; feature_detector->detect( gray_image, keypoints ); Features Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
SIFT – Scale Space Extrema Detection For scale invariance, consider the image at multiple scales L(x,y,σ) L(x,y,kσ) L(x,y,k2σ) L(x,y,k3σ) L(x,y,σ) = G(x,y,σ)* I(x,y) Applied in different octaves of scale space Each octave corresponds to a doubling of σ Features Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
SIFT – Scale Space Extrema Detection Stable keypoint locations defined to be at extrema in the Difference of Gaussian (DoG) functions across scale space… D(x,y,σ) = L(x,y,kσ) - L(x,y,σ) Extrema.. Centre point is Min or Max of Local 3x3 region in current DoG and in adjacent scales IN any octave Features Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
SIFT – Accurate Keypoint Location Originally location and scale taken from central point Locate keypoints more precisely Model data locally using a 3D quadratic Locate interpolated maximum/minimum Features Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
SIFT – Accurate Keypoint Location Discard low contrast keypoints If the local contrast is too low discard the keypoint Evaluated from the curvature of the 3D quadratic… Features Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
SIFT – Accurate Keypoint Location Discard poorly localised keypoints (e.g. along an edge) A poorly defined peak (i.e. a ridge) in the difference-of-Gaussian function will have a large principal curvature across the edge but a small one in the perpendicular direction. The principal curvatures can be computed from a 2x2 Hessian matrix, H, computed at the location and scale of the keypoint H = … The derivatives are estimated by taking differences of neighboring sample points. The eigenvalues of H are proportional to the principal curvatures of D. Borrowing from the approach used by Harris and Stephens (1988), we can avoid explicitly computing the eigenvalues, as we are only concerned with their ratio. Let alpha be the eigenvalue with the largest magnitude and beta be the smaller one. Then, we can compute the sum of the eigenvalues from the trace of H and their product from the determinant Tr = …, Det = … In the unlikely event that the determinant is negative, the curvatures have different signs so the point is discarded as not being an extremum. Let r be the ratio between the largest magnitude eigenvalue and the smaller one, so that alpha = … r depends only on the ratio of the eigenvalues rather than their individual values. The quantity Tr2/Det… is at a minimum when the two eigenvalues are equal and it increases with r. Therefore, to check that the ratio of principal curvatures is below some threshold we only need to check Tr2/Det… and ensure it is below the ratio for some fixed (threshold) value of r In algebra, a determinant is a function depending on n that associates a scalar, det(A), to every n×n square matrix A. The fundamental geometric meaning of a determinant is as the scale factor for volume when A is regarded as a linear transformation. In linear algebra, the trace of an n-by-n square matrix A is defined to be the sum of the elements on the main diagonal (the diagonal from the upper left to the lower right) of A. Equivalently, the trace of a matrix is the sum of its eigenvalues, making it an invariant with respect to chosen basis. This matrix equation then allows the computation of the eigenvalues (1, 2) of the matrix. If both eigenvalues are high then we have a corner (changes in both directions). If only one is high then we have an edge and otherwise we have a reasonably constant region. Features Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
SIFT – Keypoint Orientation For scale invariance, the keypoint scale is used to select the smoothed image with the closet scale For orientation invariance we describe the keypoint wrt. the principal orientation Create an orientation histogram (36 bins) Weight by gradient magnitude Sample points around the keypoint Highest peak + peaks within 80% Oriented keypoint(s) Stable results…. Following experimentation with a number of approaches to assigning a local orientation, the following approach was found to give the most stable results. The scale of the keypoint is used to select the Gaussian smoothed image, L, with the closest scale, so that all computations are performed in a scale-invariant manner. For each image sample, L(x, y), at this scale, the gradient magnitude, m(x, y), and orientation, (x, y), is precomputed using pixel differences: An orientation histogram is formed from the gradient orientations of sample points within a region around the keypoint. The orientation histogram has 36 bins covering the 360 degree range of orientations. Each sample added to the histogram is weighted by its gradient magnitude and by a Gaussian-weighted circular window with a that is 1.5 times that of the scale of the keypoint. Peaks in the orientation histogram correspond to dominant directions of local gradients. The highest peak in the histogram is detected, and then any other local peak that is within 80% of the highest peak is used to also create a keypoint with that orientation. Therefore, for locations with multiple peaks of similar magnitude, there will be multiple keypoints created at the same location and scale but different orientations. Only about 15% of points are assigned multiple orientations, but these contribute significantly to the stability of matching. Finally, a parabola is fit to the 3 histogram values closest to each peak to interpolate the peak position for better accuracy. Figure 6 shows the experimental stability of location, scale, and orientation assignment under differing amounts of image noise. As before the images are rotated and scaled by random amounts. The top line shows the stability of keypoint location and scale assignment. The second line shows the stability of matching when the orientation assignment is also required to be within 15 degrees. As shown by the gap between the top two lines, the orientation assignment remains accurate 95% of the time even after addition of ±10% pixel noise (equivalent to a camera providing less than 3 bits of precision). The measured variance of orientation for the correct matches is about 2.5 degrees, rising to 3.9 degrees for 10% noise. The bottom line in Figure 6 shows the final accuracy of correctly matching a keypoint descriptor to a database of 40,000 keypoints (to be discussed below). As this graph shows, the SIFT features are resistant to even large amounts of pixel noise, and the major cause of error is the initial location and scale detection. Figure 6: The top line in the graph shows the percent of keypoint locations and scales that are repeatably detected as a function of pixel noise. The second line shows the repeatability after also requiring agreement in orientation. The bottom line shows the final percent of descriptors correctly matched to a large database. Features Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
SIFT – Keypoint Description Could sample image intensity at relevant scale Match using normalized cross correlation Sensitive to affine transformations, 3D viewpoint changes and non-rigid deformations A better approach Edelman et al. (1997) Based on a model of biological vision Consider gradients at particular orientations and spatial frequencies Location not required to be precise One obvious approach would be to sample the local image intensities around the keypoint at the appropriate scale, and to match these using a normalized correlation measure. However, simple correlation of image patches is highly sensitive to changes that cause misregistration of samples, such as affine or 3D viewpoint change or non-rigid deformations. A better approach has been demonstrated by Edelman, Intrator, and Poggio (1997). Their proposed representation was based upon a model of biological vision, in particular of complex neurons in primary visual cortex. These complex neurons respond to a gradient at a particular orientation and spatial frequency, but the location of the gradient on the retina is allowed to shift over a small receptive field rather than being precisely localized. Edelman et al. hypothesized that the function of these complex neurons was to allow for matching and recognition of 3D objects from a range of viewpoints. They have performed detailed experiments using 3D computer models of object and animal shapes which show that matching gradients while allowing for shifts in their position results in much better classification under 3D rotation. For example, recognition accuracy for 3D objects rotated in depth by 20 degrees increased from 35% for correlation of gradients to 94% using the complex cell model. Our implementation described below was inspired by this idea, but allows for positional shift using a different computational mechanism. Features Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
SIFT – Keypoint Description Use blurred image at closest scale Sample points around the keypoint Compute gradients and orientations Rotate by keypoint orientation Divide region into subregions Create histograms (8 bins) for subregions Weight by gradient Weight by location (Gaussian) Distribute into bins (trilinear interpolation) Figure 7 illustrates the computation of the keypoint descriptor. First the image gradient magnitudes and orientations are sampled around the keypoint location, using the scale of the keypoint to select the level of Gaussian blur for the image. In order to achieve orientation invariance, the coordinates of the descriptor and the gradient orientations are rotated relative to the keypoint orientation. For efficiency, the gradients are precomputed for all levels of the pyramid as described in Section 5. These are illustrated with small arrows at each sample location on the left side of Figure 7. A Gaussian weighting function with alpha equal to one half the width of the descriptor window is used to assign a weight to the magnitude of each sample point. This is illustrated with a circular window on the left side of Figure 7, although, of course, the weight falls off smoothly. The purpose of this Gaussian window is to avoid sudden changes in the descriptor with small changes in the position of the window, and to give less emphasis to gradients that are far from the center of the descriptor, as these are most affected by misregistration errors. The keypoint descriptor is shown on the right side of Figure 7. It allows for significant shift in gradient positions by creating orientation histograms over 4x4 sample regions. The figure shows eight directions for each orientation histogram, with the length of each arrow corresponding to the magnitude of that histogram entry. A gradient sample on the left can shift up to 4 sample positions while still contributing to the same histogram on the right, thereby achieving the objective of allowing for larger local positional shifts. Features Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
SIFT – Matching Keypoints Nearest neighbour matching Euclidean distance between keypoints What about keypoints which have no match? Use a global distance threshold? Compare distance to closest neighbour to distance to 2nd closest neighbour (from a different object) Distance ratio > 0.8 eliminates 90% false matches 5% correct matches The best candidate match for each keypoint is found by identifying its nearest neighbor in the database of keypoints from training images. The nearest neighbor is defined as the keypoint with minimum Euclidean distance for the invariant descriptor vector as was described in Section 6. However, many features from an image will not have any correct match in the training database because they arise from background clutter or were not detected in the training images. Therefore, it would be useful to have a way to discard features that do not have any good match to the database. A global threshold on distance to the closest feature does not perform well, as some descriptors are much more discriminative than others. A more effective measure is obtained by comparing the distance of the closest neighbor to that of the second-closest neighbor. If there are multiple training images of the same object, then we define the second-closest neighbor as being the closest neighbor that is known to come from a different object than the first, such as by only using images known to contain different objects. This measure performs well because correct matches need to have the closest neighbor significantly closer than the closest incorrect match to achieve reliable matching. For false matches, there will likely be a number of other false matches within similar distances due to the high dimensionality of the feature space. We can think of the second-closest match as providing an estimate of the density of false matches within this portion of the feature space and at the same time identifying specific instances of feature ambiguity. Figure 11 shows the value of this measure for real image data. The probability density functions for correct and incorrect matches are shown in terms of the ratio of closest to second-closest neighbors of each keypoint. Matches for which the nearest neighbor was a correct match have a PDF that is centered at a much lower ratio than that for incorrect matches. For our object recognition implementation, we reject all matches in which the distance ratio is greater than 0.8, which eliminates 90% of the false matches while discarding less than 5% of the correct matches. This figure was generated by matching images following random scale and orientation change, a depth rotation of 30 degrees, and addition of 2% image noise, against a database of 40,000 keypoints. Features Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
SIFT – Matching Keypoints Features Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
SIFT – OpenCV code SiftFeatureDetector sift_detector; vector<KeyPoint> keypoints1, keypoints2; sift_detector.detect( gray_image1, keypoints1 ); sift_detector.detect( gray_image2, keypoints2 ); // Extract feature descriptors SiftDescriptorExtractor sift_extractor; Mat descriptors1, descriptors2; sift_extractor.compute( gray_image1, keypoints1, descriptors1 ); sift_extractor.compute( gray_image2, keypoints2, descriptors2 ); … Features Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
SIFT – OpenCV code … // Match descriptors. BFMatcher sift_matcher(NORM_L2); vector< DMatch > matches; matcher.match( descriptors1, descriptors2, matches ); // Display SIFT matches Mat display_image; drawMatches( gray_image1, keypoints1, gray_image2, keypoints2, matches, display_image ); Features Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014
SIFT – Recognition Match at least 3 features Cluster matches Hough transform Location (2D) Scale Orientation Really a 6D problem Use broad bins 30O for orientation Factor of 2 for scale 0.25 times image dimension for location Consider all bins with at least 3 entries Determine affine transformation To maximize the performance of object recognition for small or highly occluded objects, we wish to identify objects with the fewest possible number of feature matches. We have found that reliable recognition is possible with as few as 3 features. A typical image contains 2,000 or more features which may come from many different objects as well as background clutter. While the distance ratio test described in Section 7.1 will allow us to discard many of the false matches arising from background clutter, this does not remove matches from other valid objects, and we often still need to identify correct subsets of matches containing less than 1% inliers among 99% outliers. Many well-known robust fitting methods, such as RANSAC or Least Median of Squares, perform poorly when the percent of inliers falls much below 50%. Fortunately, much better performance can be obtained by clustering features in pose space using the Hough transform (Hough, 1962; Ballard, 1981; Grimson 1990). The Hough transform identifies clusters of features with a consistent interpretation by using each feature to vote for all object poses that are consistent with the feature. When clusters of features are found to vote for the same pose of an object, the probability of the interpretation being correct is much higher than for any single feature. Each of our keypoints specifies 4 parameters: 2D location, scale, and orientation, and each matched keypoint in the database has a record of the keypoint’s parameters relative to the training image in which it was found. Therefore, we can create a Hough transform entry predicting the model location, orientation, and scale from the match hypothesis. This prediction has large error bounds, as the similarity transform implied by these 4 parameters is only an approximation to the full 6 degree-of-freedom pose space for a 3D object and also does not account for any nonrigid deformations. Therefore, we use broad bin sizes of 30 degrees for orientation, a factor of 2 for scale, and 0.25 times the maximum projected training image dimension (using the predicted scale) for location. To avoid the problem of boundary effects in bin assignment, each keypoint match votes for the 2 closest bins in each dimension, giving a total of 16 entries for each hypothesis and further broadening the pose range. Features Based on A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe © Wiley & Sons Inc. 2014