Object recognition Jana Kosecka Slides from D. Lowe, D. Forsythe and J. Ponce book, ICCV 2005 Tutorial Fei-Fei Li, Rob Fergus and A. Torralba.

Object recognition Jana Kosecka Slides from D. Lowe, D. Forsythe and J. Ponce book, ICCV 2005 Tutorial Fei-Fei Li, Rob Fergus and A. Torralba

perceptible vision materialthing

Challenges 1: view point variation Michelangelo 1475-1564

Challenges 2: illumination slide credit: S. Ullman

Challenges 3: occlusion Magritte, 1957

Challenges 4: scale

Challenges 5: deformation Xu, Beihong 1943

Challenges 6: background clutter Klimt, 1913

Object Recognition - history Collected databases of objects on uniform background (no occlusions, no clutter) Mostly focus on viewpoint variations (at fixed scale) PCA based techniques, Eigenimages [Turk 91], Leonardis & Bishoph 98] Not robust with respect to occlusions, clutter changes of viewpoint Requires segmentation

History - Recognition Alternative representations - Color histogram [Swain 91] (not discriminative enough) Geometric invariants [Rothwell 92] - Function with a value independent of the transformation - Invariant for image rotation : distance of two points - Invariant for planar homography : cross-ratio

Figure from “Efficient model library access by projectively invariant indexing functions,” by C.A. Rothwell et al., Proc. Computer Vision and Pattern Recognition, 1992, copyright 1992, IEEE - (courtesy Forsythe, Ponce CV, Prentice Hall)

Matching by relations Idea: –find bits, then say object is present if bits are ok Advantage: –objects with complex configuration spaces don’t make good templates internal degrees of freedom aspect changes (possibly) shading variations in texture etc.

Simplest Define a set of local feature templates –could find these with filters, etc. –corner detector+filters Think of objects as patterns Each template votes for all patterns that contain it Pattern with the most votes wins

History: single object recognition

Invariant Local Features Image content is transformed into local feature coordinates that are invariant to translation, rotation, scale, and other imaging parameters SIFT Features

Example of keypoint detection Threshold on value at DOG peak and on ratio of principle curvatures (Harris approach) (a) 233x189 image (b) 832 DOG extrema (c) 729 left after peak value threshold (d) 536 left after testing ratio of principle curvatures

SIFT vector formation Thresholded image gradients are sampled over 16x16 array of locations in scale space Create array of orientation histograms 8 orientations x 4x4 histogram array = 128 dimensions

Model verification Examine all clusters with at least 3 features Perform least-squares affine fit to model. Discard outliers and perform top-down check for additional features. Evaluate probability that match is correct  Use Bayesian model, with probability that features would arise by chance if object was not present (Lowe, CVPR 01)

Solution for affine parameters Affine transform of [x,y] to [u,v]: Rewrite to solve for transform parameters:

3D Object Recognition Extract outlines with background subtraction

3D Object Recognition Only 3 keys are needed for recognition, so extra keys provide robustness Affine model is no longer as accurate

Recognition under occlusion

Test of illumination invariance Same image under differing illumination 273 keys verified in final match

Recognition using View Interpolation

Recognition … Integration of multiple view models (Complex 3D objects) Generative vs Discriminative Models Scaling issues > 10000 object Recognition of object categories Alternative models of context intra-object-within-class variations (chairs) Different feature types Enable models with large number of parts Image based retrieval – annotating by semantic context Associating words with pictures

History: early object categorization

Figure from, “A general framework for object detection,” by C. Papageorgiou, M. Oren and T. Poggio, Proc. ICCV, copyright 1998, IEEE Recognition by finding patterns General strategy: search image windows at a range of scales correct for illumination Present corrected window to classifier : - face/no face classifier Figure from A Statistical Method for 3D Object Detection Applied to Faces and Cars, H. Schneiderman and T. Kanade, CVPR, 2000.

Object categorization: the statistical viewpoint vs. Bayes rule: posterior ratio likelihood ratioprior ratio

Object categorization: the statistical viewpoint posterior ratio likelihood ratioprior ratio Discriminative methods model posterior Generative methods model likelihood and prior

Discriminative Direct modeling of Zebra Non-zebra Decision boundary

Model and Generative LowMiddle High Middle  Low

Finding faces using relations Strategy: –Face is eyes, nose, mouth, etc. with appropriate relations between them –build a specialised detector for each of these (template matching) and look for groups with the right internal structure –Once we’ve found enough of a face, there is little uncertainty about where the other bits could be

Three main issues Representation –How to represent an object category Learning –How to form the classifier, given training data Recognition –How the classifier is to be used on novel data

Representation –Generative / discriminative / hybrid –Appearance only or location and appearance

Object Bag of ‘words’

Analogy to documents Of all the sensory impressions proceeding to the brain, the visual experiences are the dominant ones. Our perception of the world around us is based essentially on the messages that reach the brain from our eyes. For a long time it was thought that the retinal image was transmitted point by point to visual centers in the brain; the cerebral cortex was a movie screen, so to speak, upon which the image in the eye was projected. Through the discoveries of Hubel and Wiesel we now know that behind the origin of the visual perception in the brain there is a considerably more complicated course of events. By following the visual impulses along their path to the various cell layers of the optical cortex, Hubel and Wiesel have been able to demonstrate that the message about the image falling on the retina undergoes a step- wise analysis in a system of nerve cells stored in columns. In this system each cell has its specific function and is responsible for a specific detail in the pattern of the retinal image. sensory, brain, visual, perception, retinal, cerebral cortex, eye, cell, optical nerve, image Hubel, Wiesel China is forecasting a trade surplus of $90bn (£51bn) to $100bn this year, a threefold increase on 2004's $32bn. The Commerce Ministry said the surplus would be created by a predicted 30% jump in exports to $750bn, compared with a 18% rise in imports to $660bn. The figures are likely to further annoy the US, which has long argued that China's exports are unfairly helped by a deliberately undervalued yuan. Beijing agrees the surplus is too high, but says the yuan is only one factor. Bank of China governor Zhou Xiaochuan said the country also needed to do more to boost domestic demand so more goods stayed within the country. China increased the value of the yuan against the dollar by 2.1% in July and permitted it to trade within a narrow band, but the US wants the yuan to be allowed to trade freely. However, Beijing has made it clear that it will take its time and tread carefully before allowing the yuan to rise further in value. China, trade, surplus, commerce, exports, imports, US, yuan, bank, domestic, foreign, increase, trade, value

categorydecisionlearning feature detection & representation codewords dictionary image representation category models (and/or) classifiers recognition

Problem with bag-of-words All have equal probability for bag-of-words methods Location information is important

Finding faces using relations Strategy: compare Notice that once some facial features have been found, the position of the rest is quite strongly constrained. Figure from, “Finding faces in cluttered scenes using random labelled graph matching,” by Leung, T. ;Burl, M and Perona, P., Proc. Int. Conf. on Computer Vision, 1995 copyright 1995, IEEE

Constellation of Parts Model Fischler & Elschlagar, 1973 [Coots, Taylor et al’95] Univ. of Manchester [G.Csurka et al’04] Xerox Research Europe [Webber et al’00, Fei-Fei Li et al 03] Caltech [Fergus et al’03,04] Oxford Constellation of parts models Object detection part, category recognition

Constellation of Parts Model Strategy for learning models for recognition Idea: Learn generative probabilistic model of objects 1.Run part detectors, obtain parts (location, appearance, scale) 2.Form likely object hypothesis, update the probability model and validate - hypothesis is particular configuration of parts Recognition (computing likelihood ratio)

Foreground model Gaussian shape pdf Prob. of detection Gaussian part appearance pdf Generative probabilistic model courtesy of R. Fergus (presentation) Uniform shape pdf Clutter model Gaussian appearance pdf Gaussian relative scale pdf Log(scale) 0.80.750.9 Poission pdf on # detections Uniform relative scale pdf Log(scale)

Constellation model and Bayesian Framework P parts: location X, Scale S and Appearance A. Distribution is modeled by hidden parameters .( e.g. Mean, Covariance of Gaussian) Maximum Likelihood (ML) with a single value  (Fergus et al). Approximation to make the integral tractable (Li Fei-Fei et al) Appearance, shape, scale, hypothesis

Motorbikes (Fergus’s results) Samples from appearance model

References L. Fei-Fei, R. Fergus, and P. Perona. A Bayesian approach to unsupervised One-Shot learning of Object categories. Proc. ICCV. 2003 Fergus, R., Perona, P. and Zisserman, A., A Visual Category Filter for Google Images,Proc. Of of the 8th European Conf. on Computer Vision, ECCV 2004. Y. Amit and D. Geman, “A computational model for visual selection”, Neural Computation, vol. 11, no. 7, pp1691-1715, 1999. M. Weber, M. Welling and P. Perona, “Unsupervised learning of models for recognition”, Proc. 6th ECCV, vol. 2, pp. 101-108, 2000.

Horses

Hausdorff distance matching Let M be an nxn binary template and N an nxn binary image we want to compare to that template H(M,N)) = max(h(M, N), h(N, M)) where –|| || is a distance function like the Euclidean distance function h(A,B) is called the directed Hausdorff distance. –ranks each point in A based on closeness to a point in B –most mis-matched point is measure of match –if h(A,B) = e, then all points in A must be within distance e of B. –generally, h(A,B) <> h(B,A) –easy to compute Hausdorff distances from distance transform

Haussdorf Distance Matching

Shape Context From Shape Matching and Object Recognition using Shape Context, by Belongie, Malik, Puzicha, IEEE PAMI (24), 2002

Object recognition Jana Kosecka Slides from D. Lowe, D. Forsythe and J. Ponce book, ICCV 2005 Tutorial Fei-Fei Li, Rob Fergus and A. Torralba.

Similar presentations

Presentation on theme: "Object recognition Jana Kosecka Slides from D. Lowe, D. Forsythe and J. Ponce book, ICCV 2005 Tutorial Fei-Fei Li, Rob Fergus and A. Torralba."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Object recognition Jana Kosecka Slides from D. Lowe, D. Forsythe and J. Ponce book, ICCV 2005 Tutorial Fei-Fei Li, Rob Fergus and A. Torralba.

Similar presentations

Presentation on theme: "Object recognition Jana Kosecka Slides from D. Lowe, D. Forsythe and J. Ponce book, ICCV 2005 Tutorial Fei-Fei Li, Rob Fergus and A. Torralba."— Presentation transcript:

Similar presentations

About project

Feedback