Download presentation
Presentation is loading. Please wait.
1
Object recognition Jana Kosecka Slides from D. Lowe, D. Forsythe and J. Ponce book, ICCV 2005 Tutorial Fei-Fei Li, Rob Fergus and A. Torralba
2
perceptible vision materialthing
3
Challenges 1: view point variation Michelangelo 1475-1564
4
Challenges 2: illumination slide credit: S. Ullman
5
Challenges 3: occlusion Magritte, 1957
6
Challenges 4: scale
7
Challenges 5: deformation Xu, Beihong 1943
8
Challenges 6: background clutter Klimt, 1913
9
Object Recognition - history Collected databases of objects on uniform background (no occlusions, no clutter) Mostly focus on viewpoint variations (at fixed scale) PCA based techniques, Eigenimages [Turk 91], Leonardis & Bishoph 98] Not robust with respect to occlusions, clutter changes of viewpoint Requires segmentation
10
History - Recognition Alternative representations - Color histogram [Swain 91] (not discriminative enough) Geometric invariants [Rothwell 92] - Function with a value independent of the transformation - Invariant for image rotation : distance of two points - Invariant for planar homography : cross-ratio
11
Figure from “Efficient model library access by projectively invariant indexing functions,” by C.A. Rothwell et al., Proc. Computer Vision and Pattern Recognition, 1992, copyright 1992, IEEE - (courtesy Forsythe, Ponce CV, Prentice Hall)
12
Matching by relations Idea: –find bits, then say object is present if bits are ok Advantage: –objects with complex configuration spaces don’t make good templates internal degrees of freedom aspect changes (possibly) shading variations in texture etc.
13
Simplest Define a set of local feature templates –could find these with filters, etc. –corner detector+filters Think of objects as patterns Each template votes for all patterns that contain it Pattern with the most votes wins
14
History: single object recognition
15
Invariant Local Features Image content is transformed into local feature coordinates that are invariant to translation, rotation, scale, and other imaging parameters SIFT Features
16
Example of keypoint detection Threshold on value at DOG peak and on ratio of principle curvatures (Harris approach) (a) 233x189 image (b) 832 DOG extrema (c) 729 left after peak value threshold (d) 536 left after testing ratio of principle curvatures
17
SIFT vector formation Thresholded image gradients are sampled over 16x16 array of locations in scale space Create array of orientation histograms 8 orientations x 4x4 histogram array = 128 dimensions
18
Model verification Examine all clusters with at least 3 features Perform least-squares affine fit to model. Discard outliers and perform top-down check for additional features. Evaluate probability that match is correct Use Bayesian model, with probability that features would arise by chance if object was not present (Lowe, CVPR 01)
19
Solution for affine parameters Affine transform of [x,y] to [u,v]: Rewrite to solve for transform parameters:
20
3D Object Recognition Extract outlines with background subtraction
21
3D Object Recognition Only 3 keys are needed for recognition, so extra keys provide robustness Affine model is no longer as accurate
22
Recognition under occlusion
23
Test of illumination invariance Same image under differing illumination 273 keys verified in final match
24
Recognition using View Interpolation
25
Figure from “Local grayvalue invariants for image retrieval,” by C. Schmid and R. Mohr, IEEE Trans. Pattern Analysis and Machine Intelligence, 1997 copyright 1997, IEEE
26
Employ spatial relations Figure from “Local grayvalue invariants for image retrieval,” by C. Schmid and R. Mohr, IEEE Trans. Pattern Analysis and Machine Intelligence, 1997 copyright 1997, IEEE
28
Recognition … Integration of multiple view models (Complex 3D objects) Generative vs Discriminative Models Scaling issues > 10000 object Recognition of object categories Alternative models of context intra-object-within-class variations (chairs) Different feature types Enable models with large number of parts Image based retrieval – annotating by semantic context Associating words with pictures
29
History: early object categorization
30
Figure from, “A general framework for object detection,” by C. Papageorgiou, M. Oren and T. Poggio, Proc. ICCV, copyright 1998, IEEE Recognition by finding patterns General strategy: search image windows at a range of scales correct for illumination Present corrected window to classifier : - face/no face classifier Figure from A Statistical Method for 3D Object Detection Applied to Faces and Cars, H. Schneiderman and T. Kanade, CVPR, 2000.
32
Object categorization: the statistical viewpoint vs. Bayes rule: posterior ratio likelihood ratioprior ratio
33
Object categorization: the statistical viewpoint posterior ratio likelihood ratioprior ratio Discriminative methods model posterior Generative methods model likelihood and prior
34
Discriminative Direct modeling of Zebra Non-zebra Decision boundary
35
Model and Generative LowMiddle High Middle Low
36
Finding faces using relations Strategy: –Face is eyes, nose, mouth, etc. with appropriate relations between them –build a specialised detector for each of these (template matching) and look for groups with the right internal structure –Once we’ve found enough of a face, there is little uncertainty about where the other bits could be
37
Three main issues Representation –How to represent an object category Learning –How to form the classifier, given training data Recognition –How the classifier is to be used on novel data
38
Representation –Generative / discriminative / hybrid –Appearance only or location and appearance
39
Object Bag of ‘words’
40
Analogy to documents Of all the sensory impressions proceeding to the brain, the visual experiences are the dominant ones. Our perception of the world around us is based essentially on the messages that reach the brain from our eyes. For a long time it was thought that the retinal image was transmitted point by point to visual centers in the brain; the cerebral cortex was a movie screen, so to speak, upon which the image in the eye was projected. Through the discoveries of Hubel and Wiesel we now know that behind the origin of the visual perception in the brain there is a considerably more complicated course of events. By following the visual impulses along their path to the various cell layers of the optical cortex, Hubel and Wiesel have been able to demonstrate that the message about the image falling on the retina undergoes a step- wise analysis in a system of nerve cells stored in columns. In this system each cell has its specific function and is responsible for a specific detail in the pattern of the retinal image. sensory, brain, visual, perception, retinal, cerebral cortex, eye, cell, optical nerve, image Hubel, Wiesel China is forecasting a trade surplus of $90bn (£51bn) to $100bn this year, a threefold increase on 2004's $32bn. The Commerce Ministry said the surplus would be created by a predicted 30% jump in exports to $750bn, compared with a 18% rise in imports to $660bn. The figures are likely to further annoy the US, which has long argued that China's exports are unfairly helped by a deliberately undervalued yuan. Beijing agrees the surplus is too high, but says the yuan is only one factor. Bank of China governor Zhou Xiaochuan said the country also needed to do more to boost domestic demand so more goods stayed within the country. China increased the value of the yuan against the dollar by 2.1% in July and permitted it to trade within a narrow band, but the US wants the yuan to be allowed to trade freely. However, Beijing has made it clear that it will take its time and tread carefully before allowing the yuan to rise further in value. China, trade, surplus, commerce, exports, imports, US, yuan, bank, domestic, foreign, increase, trade, value
42
categorydecisionlearning feature detection & representation codewords dictionary image representation category models (and/or) classifiers recognition
43
Problem with bag-of-words All have equal probability for bag-of-words methods Location information is important
44
Finding faces using relations Strategy: compare Notice that once some facial features have been found, the position of the rest is quite strongly constrained. Figure from, “Finding faces in cluttered scenes using random labelled graph matching,” by Leung, T. ;Burl, M and Perona, P., Proc. Int. Conf. on Computer Vision, 1995 copyright 1995, IEEE
45
Constellation of Parts Model Fischler & Elschlagar, 1973 [Coots, Taylor et al’95] Univ. of Manchester [G.Csurka et al’04] Xerox Research Europe [Webber et al’00, Fei-Fei Li et al 03] Caltech [Fergus et al’03,04] Oxford Constellation of parts models Object detection part, category recognition
46
Constellation of Parts Model Strategy for learning models for recognition Idea: Learn generative probabilistic model of objects 1.Run part detectors, obtain parts (location, appearance, scale) 2.Form likely object hypothesis, update the probability model and validate - hypothesis is particular configuration of parts Recognition (computing likelihood ratio)
47
Foreground model Gaussian shape pdf Prob. of detection Gaussian part appearance pdf Generative probabilistic model courtesy of R. Fergus (presentation) Uniform shape pdf Clutter model Gaussian appearance pdf Gaussian relative scale pdf Log(scale) 0.80.750.9 Poission pdf on # detections Uniform relative scale pdf Log(scale)
48
Constellation model and Bayesian Framework P parts: location X, Scale S and Appearance A. Distribution is modeled by hidden parameters .( e.g. Mean, Covariance of Gaussian) Maximum Likelihood (ML) with a single value (Fergus et al). Approximation to make the integral tractable (Li Fei-Fei et al) Appearance, shape, scale, hypothesis
49
Motorbikes (Fergus’s results) Samples from appearance model
50
References L. Fei-Fei, R. Fergus, and P. Perona. A Bayesian approach to unsupervised One-Shot learning of Object categories. Proc. ICCV. 2003 Fergus, R., Perona, P. and Zisserman, A., A Visual Category Filter for Google Images,Proc. Of of the 8th European Conf. on Computer Vision, ECCV 2004. Y. Amit and D. Geman, “A computational model for visual selection”, Neural Computation, vol. 11, no. 7, pp1691-1715, 1999. M. Weber, M. Welling and P. Perona, “Unsupervised learning of models for recognition”, Proc. 6th ECCV, vol. 2, pp. 101-108, 2000.
51
Horses
53
Figure from “Efficient Matching of Pictorial Structures,” P. Felzenszwalb and D.P. Huttenlocher, Proc. Computer Vision and Pattern Recognition2000, copyright 2000, IEEE
54
Hausdorff distance matching Let M be an nxn binary template and N an nxn binary image we want to compare to that template H(M,N)) = max(h(M, N), h(N, M)) where –|| || is a distance function like the Euclidean distance function h(A,B) is called the directed Hausdorff distance. –ranks each point in A based on closeness to a point in B –most mis-matched point is measure of match –if h(A,B) = e, then all points in A must be within distance e of B. –generally, h(A,B) <> h(B,A) –easy to compute Hausdorff distances from distance transform
55
Haussdorf Distance Matching
56
Shape Context From Shape Matching and Object Recognition using Shape Context, by Belongie, Malik, Puzicha, IEEE PAMI (24), 2002
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.