Template matching and object recognition

Name: Template matching and object recognition
Uploaded: 2017-08-16T22:53:16+00:00
Duration: PTM22S52
Description: Template matching and object recognition

Template matching and object recognition
Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

Tentative class schedule
Aug 26/28 - Introduction Sep 2/4 Cameras Radiometry Sep 9/11 Sources & Shadows Color Sep 16/18 Linear filters & edges (hurricane Isabel) Sep 23/25 Pyramids & Texture Multi-View Geometry Sep30/Oct2 Stereo Project proposals Oct 7/9 Tracking (Welch) Optical flow Oct 14/16 Oct 21/23 Silhouettes/carving (Fall break) Oct 28/30 Structure from motion Nov 4/6 Project update Proj. SfM Nov 11/13 Camera calibration Segmentation Nov 18/20 Fitting Prob. segm.&fit. Nov 25/27 Matching templates (Thanksgiving) Dec 2/4 Matching relations Range data Dec 9 Final project

Assignment 3 Use Hough, RANSAC and EM to estimate noisy line embedded in noise (details on the web by tonight)

Last class: EM (Expectation Maximization)
Alternate Expectation (determine feature appartenance) Maximization (determine ML model parameters) optimization (weighted with i) counting

Last class: model selection
AIC: BIC (and MDL): structure complexity model complexity

Recognition by finding patterns
We have seen very simple template matching (under filters) Some objects behave like quite simple templates Frontal faces Strategy: Find image windows Correct lighting Pass them to a statistical test (a classifier) that accepts faces and rejects non-faces

Basic ideas in classifiers
Loss some errors may be more expensive than others e.g. a fatal disease that is easily cured by a cheap medicine with no side-effects -> false positives in diagnosis are better than false negatives We discuss two class classification: L(1->2) is the loss caused by calling 1 a 2 Total risk of using classifier s

Basic ideas in classifiers
Generally, we should classify as 1 if the expected loss of classifying as 1 is better than for 2 gives Crucial notion: Decision boundary points where the loss is the same for either case 1 if 2 if

Some loss may be inevitable: the minimum
risk (shaded area) is called the Bayes risk

Finding a decision boundary is not the same as
modelling a conditional density.

Example: known distributions
Assume normal class densities, p-dimensional measurements with common (known) covariance and different (known) means Class priors are Can ignore a common factor in posteriors - important; posteriors are then:

Classifier boils down to: choose class that minimizes:
Mahalanobis distance because covariance is common, this simplifies to sign of a linear expression (i.e. Voronoi diagram in 2D for =I)

Plug-in classifiers Assume that distributions have some parametric form - now estimate the parameters from the data. Common: assume a normal distribution with shared covariance, different means; use usual estimates ditto, but different covariances; ditto Issue: parameter estimates that are “good” may not give optimal classifiers.

Histogram based classifiers
Use a histogram to represent the class-conditional densities (i.e. p(x|1), p(x|2), etc) Advantage: estimates become quite good with enough data! Disadvantage: Histogram becomes big with high dimension but maybe we can assume feature independence?

Finding skin Skin has a very small range of (intensity independent) colours, and little texture Compute an intensity-independent colour measure, check if colour is in this range, check if there is little texture (median filter) See this as a classifier - we can set up the tests by hand, or learn them. get class conditional densities (histograms), priors from data (counting) Classifier is

Receiver Operating Curve
Figure It’s quite hard to see much difference between 1, 2, and 3 Figure from “Statistical color models with application to skin detection,” M.J. Jones and J. Rehg, Proc. Computer Vision and Pattern Recognition, 1999 copyright 1999, IEEE

Finding faces Faces “look like” templates (at least when they’re frontal). General strategy: search image windows at a range of scales Correct for illumination Present corrected window to classifier Issues How corrected? What features? What classifier? what about lateral views?

Naive Bayes (Important: naive not necessarily pejorative)
Find faces by vector quantizing image patches, then computing a histogram of patch types within a face Histogram doesn’t work when there are too many features features are the patch types assume they’re independent and cross fingers reduction in degrees of freedom very effective for face finders why? probably because the examples that would present real problems aren’t frequent. Many face finders on the face detection home page

Face Recognition Whose face is this? (perhaps in a mugshot) Issue:
What differences are important and what not? Reduce the dimension of the images, while maintaining the “important” differences. One strategy: Principal components analysis

Template matching Simple cross-correlation between images
Best match wins Computationally expensive, i.e. requires presented image to be correlated with every image in the database !

Much cheaper to compute!
Eigenspace matching Consider PCA Then, Much cheaper to compute!

Figure 22.6 - story in the caption

Eigenfaces plus a linear combination of eigenfaces

Appearance manifold approach
(Nayar et al. ‘96) - for every object sample the set of viewing conditions - use these images as feature vectors - apply a PCA over all the images - keep the dominant PCs - sequence of views for 1 object represent a manifold in space of projections - what is the nearest manifold for a given view?

Object-pose manifold Appearance changes projected on PCs (1D pose changes) Sufficient characterization for recognition and pose estimation

Real-time system (Nayar et al. ‘96)

Difficulties with PCA Projection may suppress important detail
smallest variance directions may not be unimportant Method does not take discriminative task into account typically, we wish to compute features that allow good discrimination not the same as largest variance

Fig 22.7 Principal components will give a very poor repn of this data set

22. 10 - Two classes indicated by
Two classes indicated by * and o; the first principal component captures all the variance, but completely destroys any ability to discriminate. The second is close to what’s required.

Linear Discriminant Analysis
We wish to choose linear functions of the features that allow good discrimination. Assume class-conditional covariances are the same Want linear feature that maximises the spread of class means for a fixed within-class variance I do the two lines of math on the whiteboard - it’s easier to explain notation, etc. that way. Section

The first linear discriminant gets excellent separation

Figure 22. 12 - read the caption
Figure read the caption. This yields quite a good illustration of the power of the method.

Neural networks Linear decision boundaries are useful
but often not very powerful we seek an easy way to get more complex boundaries Compose linear decision boundaries i.e. have several linear classifiers, and apply a classifier to their output a nuisance, because sign(ax+by+cz) etc. isn’t differentiable. use a smooth “squashing function” in place of sign.

Here phi is a squashing function; this is figure 22. 13
Here phi is a squashing function; this is figure We seek to adjust the w’s to get the best classifier.

22.14 - a bunch of different squashing functions

Training Choose parameters to minimize error on training set
Stochastic gradient descent, computing gradient using trick (backpropagation, aka the chain rule) Stop when error is low, and hasn’t changed much

The vertical face-finding part of Rowley, Baluja and Kanade’s system
Figure from “Rotation invariant neural-network based face detection,” H.A. Rowley, S. Baluja and T. Kanade, Proc. Computer Vision and Pattern Recognition, 1998, copyright 1998, IEEE

Histogram equalisation gives an approximate fix for illumination induced variability
Histogram equalisation (fig 22.15)

Architecture of the complete system: they use another neural
net to estimate orientation of the face, then rectify it. They search over scales to find bigger/smaller faces. Always worth asking: “why is it better to search over scales than to use bigger templates?” Figure from “Rotation invariant neural-network based face detection,” H.A. Rowley, S. Baluja and T. Kanade, Proc. Computer Vision and Pattern Recognition, 1998, copyright 1998, IEEE

Convolutional neural networks
Template matching using NN classifiers seems to work Natural features are filter outputs probably, spots and bars, as in texture but why not learn the filter kernels, too?

subsample, and finally classify based on outputs of this process.
A convolutional neural network, LeNet; the layers filter, subsample, filter, subsample, and finally classify based on outputs of this process. Figure from “Gradient-Based Learning Applied to Document Recognition”, Y. Lecun et al Proc. IEEE, 1998 copyright 1998, IEEE

LeNet is used to classify handwritten digits. Notice that the
test error rate is not the same as the training error rate, because the test set consists of items not in the training set. Not all classification schemes necessarily have small test error when they have small training error. 22.19 Figure from “Gradient-Based Learning Applied to Document Recognition”, Y. Lecun et al Proc. IEEE, 1998 copyright 1998, IEEE

Support Vector Machines
Neural nets try to build a model of the posterior, p(k|x) Instead, try to obtain the decision boundary directly potentially easier, because we need to encode only the geometry of the boundary, not any irrelevant wiggles in the posterior. Not all points affect the decision boundary

Set S of points xiRn, each xi belongs to one of two classes yi {-1,1} The goals is to find a hyperplane that divides S in these two classes S is separable if w Rn,b R di Closest point Separating hyperplanes

Optimal separating hyperplane maximizes Problem 1: Minimize Subject to support vectors Optimal separating hyperplane (OSH)

Solve using Lagrange multipliers
Lagrangian at solution therefore

Dual problem Problem 2: Minimize Subject to where
(ai>0 only for support vectors) Kühn-Tucker condition: (for xj a support vector)

Linearly non-separable cases
Find trade-off between maximum separation and misclassifications Problem 3: Minimize Subject to

Dual problem for non-separable cases
Minimize Subject to where Kühn-Tucker condition: Support vectors: margin vectors misclassified errors too close OSH

Decision function Once w and b have been computed the classification decision for input x is given by Note that the globally optimal solution can always be obtained (convex problem)

(Radial Basis Function)
Non-linear SVMs Non-linear separation surfaces can be obtained by non-linearly mapping the data to a high dimensional space and then applying the linear SVM technique Note that data only appears through vector product Need for vector product in high-dimension can be avoided by using Mercer kernels: e.g. (Polynomial kernel) (Radial Basis Function) (Sigmoïdal function)

Space in which decision boundary is linear - a
conic in the original space has the form An SVM in a non-linear space. I don’t do the Mercer representation of this process; this figure, etc. usually gives enough illumination to the process.

SVMs for 3D object recognition
(Pontil & Verri PAMI’98) Consider images as vectors Compute pairwise OSH using linear SVM Support vectors are representative views of the considered object (relative to other) Tournament like classification Competing classes are grouped in pairs Not selected classes are discarded Until only one class is left Complexity linear in number of classes No pose estimation

Vision applications Reliable, simple classifier,
use it wherever you need a classifier Commonly used for face finding Pedestrian finding many pedestrians look like lollipops (hands at sides, torso wider than legs) most of the time classify image regions, searching over scales But what are the features? Compute wavelet coefficients for pedestrian windows, average over pedestrians. If the average is different from zero, probably strongly associated with pedestrian

Fig 22.21 - caption tells the story
Figure from, “A general framework for object detection,” by C. Papageorgiou, M. Oren and T. Poggio, Proc. Int. Conf. Computer Vision, 1998, copyright 1998, IEEE

Figure 22.23 - caption tells the story

Figure 22.22 - caption tells the story

Latest results on Pedestrian Detection: Viola, Jones and Snow’s paper (ICCV’03: Marr prize)
Combine static and dynamic features cascade for efficiency (4 frames/s) 5 best out of 55k (AdaBoost) some positive examples used for training 5 best static out of 28k (AdaBoost)

Dynamic detection false detection: typically 1/400,000
(=1 every 2 frames for 360x240)

Static detection

Next class: Object recognition
Reading: Chapter 23

Template matching and object recognition

Similar presentations

Presentation on theme: "Template matching and object recognition"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Template matching and object recognition

Similar presentations

Presentation on theme: "Template matching and object recognition"— Presentation transcript:

Similar presentations

About project

Feedback