Features-based Object Recognition Pierre Moreels California Institute of Technology Thesis defense, Sept. 24, 2007
2 The recognition continuum variability Individual objects means of transportation BMW logo Categories cars
Applications Autonomous navigation Identification, Security. Help Daiki find his toys !
4 Problem setup Features Coarse-to-fine algorithm Probabilistic model Experiments Conclusion Outline
5 … The detection problem New scene (test image) Models from database Find models and their pose (location, orientation…)
6 … Hypotheses – models + positions New scene (test image) Models from database 11 22 Θ = affine transformation
7 … Matching features Models from database New scene (test image) Set of correspondences = assignment vector
8 Features detection
9 Image characterization by features Features = high information content ‘locations in the image where the signal changes two-dimensionally’ C.Schmid Reduce the volume of information edge strength map features –[Sobel 68] –Diff of Gaussians [Crowley84] –[Harris 88] –[Foerstner94] –Entropy [Kadir&Brady01]
10 Correct vs incorrect descriptors matches Mutual Euclidean distances in appearance space of descriptors Pixels intensity within a patch - Steerable filters [Freeman1991] - SIFT [Lowe1999,2004] - Shape context [Belongie2002] - Spin [Johnson1999] - HOG [Dalal2005]
11 Stability with respect to nuisances Which detector / descriptor combination is best for recognition ?
Past work on evaluation of features Use of flat surfaces, ground truth easily established In 3D images appearance changes more ! [Schmid&Mohr00] [Mikolajczyk&Schmid 03,05,05]
13 Database : 100 3D objects
14 Testing setup [Moreels&Perona ICCV05, IJCV07] Used by [Winder, CVPR07]
Results – viewpoint change Mahalanobis distance No ‘background’ images
2D vs. 3D Ranking of detectors/descriptors combinations are modified when switching from 2D to 3D objects
17 Features matching algorithm
18 Features assignments models from database New scene (test image)... Interpretation...
19 Coarse-to-fine strategy We do it every day ! Search for my place : Los Angeles area – Pasadena – Loma Vista my car
Coarse-to-fine example [Fleuret & Geman 2001,2002] Face identification in complex scenes Coarse resolution Intermediate resolution Fine resolution
21 Progressively narrow down focus on correct region of hypothesis space Reject with little computation cost irrelevant regions of search space Use first information that is easy to obtain Simple building blocks organized in a cascade Probabilistic interpretation of each step Coarse-to-Fine detection
22 Coarse data : prior knowledge Which objects are likely to be there, which pose are they likely to have ? unlikely situations
23 New scene (test image) … Models from database 4 votes 2 votes 0 vote Model voting Search tree (appearance space – leaves = database features)
24 (x 1,y 1,s 1, 1 ) (x 2,y 2,s 2, 2 ) Transform predicted by this match: x = x 2 -x 1 y = y 2 -y 1 s = s 2 / s 1 = 2 - 1 Each match is represented by a dot in the space of 2D similarities (Hough space) xx yy ss Use of rich geometric information [Lowe1999,2004]
Prediction of position of model center after transform The space of transform parameters is discretized into ‘bins’ Coarse bins to limit boundary issues and have a low false- alarm rate for this stage We count the number of votes collected by each bin. Coarse Hough transform Model Test scene correct transformation
26 Output of PROSAC : pose transformation + set of features correspondences Correspondence or clutter ? PROSAC Similar to RANSAC – robust statistic for parameter estimation Priority to candidates with good quality of appearance match 2D affine transform : 6 parameters each sample contains 3 candidate correspondences. d d d [Fischler 1973] [Chum&Matas 2005]
27 Probabilistic model
28 Generative model
29 Recognition steps
Score of an extended hypothesis Hypothesis: model + position observed features geometry + appearance database of models constant Consistency (after PROSAC) Prior on model and poses Features assignments Votes per model Votes per model pose bin (Hough transform) Prior on assignments (before actual observations)
Consistency Consistency between observations and predictions from hypothesis model m position of model m Common-frame approximation : parts are conditionally independent once reference position of the object is fixed. [Lowe1999,Huttenlocher90,Moreels04] Constellation model Common-frame
32 foreground features ‘null’ assignments geometry appearance Consistency - appearanceConsistency - geometry Consistency Consistency between observations and predictions from hypothesis
Learning foreground & background densities Ground truth pairs of matches are collected Gaussian densities, centered on the nomimal value that appearance / pose should have according to H Learning background densities is easy: match to random images. [Moreels&Perona, IJCV, 2007]
34 Experiments
An example Model voting Hough bins
36 An example After PROSAC Probabilistic scores
37 Efficiency of coarse-to-fine processing
38 Giuseppe Toys database – Models 61 objects, 1-2 views/object
Giuseppe Toys database – Test scenes 141 test scenes
40 Home objects database – Models 49 objects, 1-2 views/object
41 Home objects database – Test scenes 141 test scenes
42 Results – Giuseppe Toys database Lowe’99,’04 Lower false alarm rate - more systematic verification of geometry consistency - more consistent verification of geometric consistency undetected objects: features with poor appearance distinctiveness index to incorrect models - +
43 Results – Home objects database
44 Failure mode Test image hand-labeled before the experiments
45 Test – Text and graphics
46 Test – no texture
Test – Clutter
48 Coarse-to-fine strategy prunes irrelevant search branches at early stages. Probabilistic interpretation of each step. Higher performance than Lowe, especially in cluttered environment. Front end (features) needs more work for smooth or shiny surfaces. Conclusions