Object recognition Jana Kosecka Slides from D. Lowe, D. Forsythe and J. Ponce book, ICCV 2005 Tutorial Fei-Fei Li, Rob Fergus and A. Torralba.

Slides:

Advertisements

Similar presentations

Computer Vision - A Modern Approach Set: Recognition by relations Slides by D.A. Forsyth Matching by relations Idea: –find bits, then say object is present.

Advertisements

Object class recognition using unsupervised scale-invariant learning Rob Fergus Pietro Perona Andrew Zisserman Oxford University California Institute of.

Part 1: Bag-of-words models by Li Fei-Fei (Princeton)

FATIH CAKIR MELIHCAN TURK F. SUKRU TORUN AHMET CAGRI SIMSEK Content-Based Image Retrieval using the Bag-of-Words Concept.

Outline SIFT Background SIFT Extraction Application in Content Based Image Search Conclusion.

Marco Cristani Teorie e Tecniche del Riconoscimento1 Teoria e Tecniche del Riconoscimento Estrazione delle feature: Bag of words Facoltà di Scienze MM.

TP14 - Indexing local features

Large-Scale Image Retrieval From Your Sketches Daniel Brooks 1,Loren Lin 2,Yijuan Lu 1 1 Department of Computer Science, Texas State University, TX, USA.

Object Recognition using Invariant Local Features Applications l Mobile robots, driver assistance l Cell phone location or object recognition l Panoramas,

1 Image Retrieval Hao Jiang Computer Science Department 2009.

Object Recognition. So what does object recognition involve?

Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.

Beyond bags of features: Part-based models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.

Fitting: The Hough transform

Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.

What is Texture? Texture depicts spatially repeating patterns Many natural phenomena are textures radishesrocksyogurt.

1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.

Agenda Introduction Bag-of-words model Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.

A Study of Approaches for Object Recognition

Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.

Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.

Object Recognition: History and Overview Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce.

Object Recognition by Parts Object recognition started with line segments. - Roberts recognized objects from line segments and junctions. - This led to.

Object Recognition. So what does object recognition involve?

Distinctive Image Feature from Scale-Invariant KeyPoints

Object recognition Jana Kosecka Slides from D. Lowe, D. Forsythe and J. Ponce book, ICCV 2005 Tutorial Fei-Fei Li, Rob Fergus and A. Torralba.

Distinctive image features from scale-invariant keypoints. David G. Lowe, Int. Journal of Computer Vision, 60, 2 (2004), pp Presented by: Shalomi.

“Bag of Words”: recognition using texture : Advanced Machine Perception A. Efros, CMU, Spring 2006 Adopted from Fei-Fei Li, with some slides from.

Scale Invariant Feature Transform (SIFT)

Object Recognition: Conceptual Issues Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and K. Grauman.

Visual Object Recognition Rob Fergus Courant Institute, New York University

Distinctive Image Features from Scale-Invariant Keypoints David G. Lowe – IJCV 2004 Brien Flewelling CPSC 643 Presentation 1.

Sebastian Thrun CS223B Computer Vision, Winter Stanford CS223B Computer Vision, Winter 2005 Lecture 3 Advanced Features Sebastian Thrun, Stanford.

Object Recognition by Parts Object recognition started with line segments. - Roberts recognized objects from line segments and junctions. - This led to.

CSE 473/573 Computer Vision and Image Processing (CVIP)

Recognition and Matching based on local invariant features Cordelia Schmid INRIA, Grenoble David Lowe Univ. of British Columbia.

Object Tracking/Recognition using Invariant Local Features Applications l Mobile robots, driver assistance l Cell phone location or object recognition.

Local invariant features Cordelia Schmid INRIA, Grenoble.

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

A Statistically Selected Part-Based Probabilistic Model for Object Recognition Zhipeng Zhao, Ahmed Elgammal Department of Computer Science, Rutgers, The.

Window-based models for generic object detection Mei-Chen Yeh 04/24/2012.

Template matching and object recognition. CS8690 Computer Vision University of Missouri at Columbia Matching by relations Idea: –find bits, then say object.

CVPR 2003 Tutorial Recognition and Matching Based on Local Invariant Features David Lowe Computer Science Department University of British Columbia.

MSRI workshop, January 2005 Object Recognition Collected databases of objects on uniform background (no occlusions, no clutter) Mostly focus on viewpoint.

Bayesian Parameter Estimation Liad Serruya. Agenda Introduction Bayesian decision theory Scale-Invariant Learning Bayesian “One-Shot” Learning.

Fitting: The Hough transform

Local invariant features Cordelia Schmid INRIA, Grenoble.

11/26/2015 Copyright G.D. Hager Class 2 - Schedule 1.Optical Illusions 2.Lecture on Object Recognition 3.Group Work 4.Sports Videos 5.Short Lecture on.

Discussion of Pictorial Structures Pedro Felzenszwalb Daniel Huttenlocher Sicily Workshop September, 2006.

Li Fei-Fei, UIUC Rob Fergus, MIT Antonio Torralba, MIT Recognizing and Learning Object Categories ICCV 2005 Beijing, Short Course, Oct 15.

776 Computer Vision Jan-Michael Frahm Spring 2012.

Recognizing specific objects Matching with SIFT Original suggestion Lowe, 1999,2004.

Hough Transform CS 691 E Spring Outline Hough transform Homography Reading: FP Chapter 15.1 (text) Some slides from Lazebnik.

776 Computer Vision Jan-Michael Frahm Spring 2012.

Object Recognition by Parts

SIFT Scale-Invariant Feature Transform David Lowe

Lecture 07 13/12/2011 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.

CS 2770: Computer Vision Feature Matching and Indexing

Object Recognition by Parts

Object Recognition by Parts

Object Recognition by Parts

Unsupervised learning of models for recognition

SIFT keypoint detection

Presented by Xu Miao April 20, 2005

Part 1: Bag-of-words models

Recognition and Matching based on local invariant features

Object Recognition by Parts

Object Recognition with Interest Operators

Presentation transcript:

Object recognition Jana Kosecka Slides from D. Lowe, D. Forsythe and J. Ponce book, ICCV 2005 Tutorial Fei-Fei Li, Rob Fergus and A. Torralba

perceptible vision materialthing

Challenges 1: view point variation Michelangelo

Challenges 2: illumination slide credit: S. Ullman

Challenges 3: occlusion Magritte, 1957

Challenges 4: scale

Challenges 5: deformation Xu, Beihong 1943

Challenges 6: background clutter Klimt, 1913

Object Recognition - history Collected databases of objects on uniform background (no occlusions, no clutter) Mostly focus on viewpoint variations (at fixed scale) PCA based techniques, Eigenimages [Turk 91], Leonardis & Bishoph 98] Not robust with respect to occlusions, clutter changes of viewpoint Requires segmentation

History - Recognition Alternative representations - Color histogram [Swain 91] (not discriminative enough) Geometric invariants [Rothwell 92] - Function with a value independent of the transformation - Invariant for image rotation : distance of two points - Invariant for planar homography : cross-ratio

Figure from “Efficient model library access by projectively invariant indexing functions,” by C.A. Rothwell et al., Proc. Computer Vision and Pattern Recognition, 1992, copyright 1992, IEEE - (courtesy Forsythe, Ponce CV, Prentice Hall)

Matching by relations Idea: –find bits, then say object is present if bits are ok Advantage: –objects with complex configuration spaces don’t make good templates internal degrees of freedom aspect changes (possibly) shading variations in texture etc.

Simplest Define a set of local feature templates –could find these with filters, etc. –corner detector+filters Think of objects as patterns Each template votes for all patterns that contain it Pattern with the most votes wins

History: single object recognition

Lowe, et al. 1999, 2003 Mahamud and Herbert, 2000 Ferrari, Tuytelaars, and Van Gool, 2004 Rothganger, Lazebnik, and Ponce, 2004 Moreels and Perona, 2005 …

Invariant Local Features Image content is transformed into local feature coordinates that are invariant to translation, rotation, scale, and other imaging parameters SIFT Features

Example of keypoint detection Threshold on value at DOG peak and on ratio of principle curvatures (Harris approach) (a) 233x189 image (b) 832 DOG extrema (c) 729 left after peak value threshold (d) 536 left after testing ratio of principle curvatures

SIFT vector formation Thresholded image gradients are sampled over 16x16 array of locations in scale space Create array of orientation histograms 8 orientations x 4x4 histogram array = 128 dimensions

Model verification Examine all clusters with at least 3 features Perform least-squares affine fit to model. Discard outliers and perform top-down check for additional features. Evaluate probability that match is correct  Use Bayesian model, with probability that features would arise by chance if object was not present (Lowe, CVPR 01)

Solution for affine parameters Affine transform of [x,y] to [u,v]: Rewrite to solve for transform parameters:

3D Object Recognition Extract outlines with background subtraction

3D Object Recognition Only 3 keys are needed for recognition, so extra keys provide robustness Affine model is no longer as accurate

Recognition under occlusion

Test of illumination invariance Same image under differing illumination 273 keys verified in final match

Recognition using View Interpolation

Figure from “Local grayvalue invariants for image retrieval,” by C. Schmid and R. Mohr, IEEE Trans. Pattern Analysis and Machine Intelligence, 1997 copyright 1997, IEEE

Employ spatial relations Figure from “Local grayvalue invariants for image retrieval,” by C. Schmid and R. Mohr, IEEE Trans. Pattern Analysis and Machine Intelligence, 1997 copyright 1997, IEEE

Recognition … Integration of multiple view models (Complex 3D objects) Generative vs Discriminative Models Scaling issues > object Recognition of object categories Alternative models of context intra-object-within-class variations (chairs) Different feature types Enable models with large number of parts Image based retrieval – annotating by semantic context Associating words with pictures

History: early object categorization

Figure from, “A general framework for object detection,” by C. Papageorgiou, M. Oren and T. Poggio, Proc. ICCV, copyright 1998, IEEE Recognition by finding patterns General strategy: search image windows at a range of scales correct for illumination Present corrected window to classifier : - face/no face classifier Figure from A Statistical Method for 3D Object Detection Applied to Faces and Cars, H. Schneiderman and T. Kanade, CVPR, 2000.

Object categorization: the statistical viewpoint vs. Bayes rule: posterior ratio likelihood ratioprior ratio

Object categorization: the statistical viewpoint posterior ratio likelihood ratioprior ratio Discriminative methods model posterior Generative methods model likelihood and prior

Discriminative Direct modeling of Zebra Non-zebra Decision boundary

Model and Generative LowMiddle High Middle  Low

Finding faces using relations Strategy: –Face is eyes, nose, mouth, etc. with appropriate relations between them –build a specialised detector for each of these (template matching) and look for groups with the right internal structure –Once we’ve found enough of a face, there is little uncertainty about where the other bits could be

Three main issues Representation –How to represent an object category Learning –How to form the classifier, given training data Recognition –How the classifier is to be used on novel data

Representation –Generative / discriminative / hybrid –Appearance only or location and appearance

Object Bag of ‘words’

Analogy to documents Of all the sensory impressions proceeding to the brain, the visual experiences are the dominant ones. Our perception of the world around us is based essentially on the messages that reach the brain from our eyes. For a long time it was thought that the retinal image was transmitted point by point to visual centers in the brain; the cerebral cortex was a movie screen, so to speak, upon which the image in the eye was projected. Through the discoveries of Hubel and Wiesel we now know that behind the origin of the visual perception in the brain there is a considerably more complicated course of events. By following the visual impulses along their path to the various cell layers of the optical cortex, Hubel and Wiesel have been able to demonstrate that the message about the image falling on the retina undergoes a step- wise analysis in a system of nerve cells stored in columns. In this system each cell has its specific function and is responsible for a specific detail in the pattern of the retinal image. sensory, brain, visual, perception, retinal, cerebral cortex, eye, cell, optical nerve, image Hubel, Wiesel China is forecasting a trade surplus of $90bn (£51bn) to $100bn this year, a threefold increase on 2004's $32bn. The Commerce Ministry said the surplus would be created by a predicted 30% jump in exports to $750bn, compared with a 18% rise in imports to $660bn. The figures are likely to further annoy the US, which has long argued that China's exports are unfairly helped by a deliberately undervalued yuan. Beijing agrees the surplus is too high, but says the yuan is only one factor. Bank of China governor Zhou Xiaochuan said the country also needed to do more to boost domestic demand so more goods stayed within the country. China increased the value of the yuan against the dollar by 2.1% in July and permitted it to trade within a narrow band, but the US wants the yuan to be allowed to trade freely. However, Beijing has made it clear that it will take its time and tread carefully before allowing the yuan to rise further in value. China, trade, surplus, commerce, exports, imports, US, yuan, bank, domestic, foreign, increase, trade, value

categorydecisionlearning feature detection & representation codewords dictionary image representation category models (and/or) classifiers recognition

Problem with bag-of-words All have equal probability for bag-of-words methods Location information is important

Finding faces using relations Strategy: compare Notice that once some facial features have been found, the position of the rest is quite strongly constrained. Figure from, “Finding faces in cluttered scenes using random labelled graph matching,” by Leung, T. ;Burl, M and Perona, P., Proc. Int. Conf. on Computer Vision, 1995 copyright 1995, IEEE

Constellation of Parts Model Fischler & Elschlagar, 1973 [Coots, Taylor et al’95] Univ. of Manchester [G.Csurka et al’04] Xerox Research Europe [Webber et al’00, Fei-Fei Li et al 03] Caltech [Fergus et al’03,04] Oxford Constellation of parts models Object detection part, category recognition

Feature extraction and description Scale & Affine Invariant Harris Corner detector [K. Mikolajczyk, ECCV 2002] SIFT feature descriptor [David Lowe, IJCV 2004] f1f1 f2f2 fdfd ….. Gradient orientation histogram of the region

How to Obtain Parts? Challenge – many features come from the background –how to automatically select features which belong to the objects? –Feature selection – discarding irrelevant / non-discriminative features –Parts – Clusters in the feature descriptor space

Feature Selection in Category Classification 30 training Images for face and motorbike.

Constellation of Parts Model Strategy for learning models for recognition Idea: Learn generative probabilistic model of objects 1.Run part detectors, obtain parts (location, appearance, scale) 2.Form likely object hypothesis, update the probability model and validate - hypothesis is particular configuration of parts Recognition (computing likelihood ratio)

Foreground model Gaussian shape pdf Prob. of detection Gaussian part appearance pdf Generative probabilistic model courtesy of R. Fergus (presentation) Uniform shape pdf Clutter model Gaussian appearance pdf Gaussian relative scale pdf Log(scale) Poission pdf on # detections Uniform relative scale pdf Log(scale)

Constellation model and Bayesian Framework P parts: location X, Scale S and Appearance A. Distribution is modeled by hidden parameters .( e.g. Mean, Covariance of Gaussian) Maximum Likelihood (ML) with a single value  (Fergus et al). Approximation to make the integral tractable (Li Fei-Fei et al) Appearance, shape, scale, hypothesis

Motorbikes (Fergus’s results) Samples from appearance model

References L. Fei-Fei, R. Fergus, and P. Perona. A Bayesian approach to unsupervised One-Shot learning of Object categories. Proc. ICCV Fergus, R., Perona, P. and Zisserman, A., A Visual Category Filter for Google Images,Proc. Of of the 8th European Conf. on Computer Vision, ECCV Y. Amit and D. Geman, “A computational model for visual selection”, Neural Computation, vol. 11, no. 7, pp , M. Weber, M. Welling and P. Perona, “Unsupervised learning of models for recognition”, Proc. 6th ECCV, vol. 2, pp , 2000.

Horses

Figure from “Efficient Matching of Pictorial Structures,” P. Felzenszwalb and D.P. Huttenlocher, Proc. Computer Vision and Pattern Recognition2000, copyright 2000, IEEE

Hausdorff distance matching Let M be an nxn binary template and N an nxn binary image we want to compare to that template H(M,N)) = max(h(M, N), h(N, M)) where –|| || is a distance function like the Euclidean distance function h(A,B) is called the directed Hausdorff distance. –ranks each point in A based on closeness to a point in B –most mis-matched point is measure of match –if h(A,B) = e, then all points in A must be within distance e of B. –generally, h(A,B) <> h(B,A) –easy to compute Hausdorff distances from distance transform

Haussdorf Distance Matching

Shape Context From Shape Matching and Object Recognition using Shape Context, by Belongie, Malik, Puzicha, IEEE PAMI (24), 2002