Template matching and object recognition. CS8690 Computer Vision University of Missouri at Columbia Recognition by finding patterns We have seen very.

Slides:

Advertisements

Similar presentations

Introduction to Support Vector Machines (SVM)

Advertisements

Support Vector Machine

Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.

CSC321: Introduction to Neural Networks and Machine Learning Lecture 24: Non-linear Support Vector Machines Geoffrey Hinton.

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.

Support Vector Machines

SVM—Support Vector Machines

Machine learning continued Image source:

Computer vision: models, learning and inference Chapter 8 Regression.

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Recognition by finding patterns

Chapter 4: Linear Models for Classification

Visual Recognition Tutorial

Support Vector Machines (and Kernel Methods in general)

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Support Vector Machines (SVMs) Chapter 5 (Duda et al.)

Artificial Intelligence Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

Project 4 out today –help session today –photo session today Project 2 winners Announcements.

Classifiers for Recognition Reading: Chapter 22 (skip 22.3) Slide credits for this chapter: Frank Dellaert, Forsyth & Ponce, Paul Viola, Christopher Rasmussen.

Template matching and object recognition

Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

Computer Vision I Instructor: Prof. Ko Nishino. Today How do we recognize objects in images?

Lecture 10: Support Vector Machines

Linear Discriminant Functions Chapter 5 (Duda et al.)

Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.

Statistical Learning Theory: Classification Using Support Vector Machines John DiMona Some slides based on Prof Andrew Moore at CMU:

Face Recognition: An Introduction

Outline Separating Hyperplanes – Separable Case

CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.

CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 24 – Classifiers 1.

Support Vector Machines Mei-Chen Yeh 04/20/2010. The Classification Problem Label instances, usually represented by feature vectors, into one of the predefined.

An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.

Object Recognition in Images Slides originally created by Bernd Heisele.

Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.

1 E. Fatemizadeh Statistical Pattern Recognition.

Template matching and object recognition. CS8690 Computer Vision University of Missouri at Columbia Matching by relations Idea: –find bits, then say object.

Face Recognition: An Introduction

CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.

Ohad Hageby IDC Support Vector Machines & Kernel Machines IP Seminar 2008 IDC Herzliya.

Categorization by Learning and Combing Object Parts B. Heisele, T. Serre, M. Pontil, T. Vetter, T. Poggio. Presented by Manish Jethwa.

An Introduction to Support Vector Machine (SVM)

Linear Models for Classification

1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.

Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.

Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.

Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.

Face Detection Using Neural Network By Kamaljeet Verma ( ) Akshay Ukey ( )

Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.

Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.

Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.

Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.

1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.

Support Vector Machine

CS 9633 Machine Learning Support Vector Machines

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Support Vector Machines

Object detection as supervised classification

Support Vector Machines

CS4670: Intro to Computer Vision

Announcements Project 4 out today Project 2 winners help session today

Linear Discrimination

SVMs for Document Ranking

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

The “Margaret Thatcher Illusion”, by Peter Thompson

Presentation transcript:

Template matching and object recognition

CS8690 Computer Vision University of Missouri at Columbia Recognition by finding patterns We have seen very simple template matching (under filters) Some objects behave like quite simple templates –Frontal faces We have seen very simple template matching (under filters) Some objects behave like quite simple templates –Frontal faces Strategy: –Find image windows –Correct lighting –Pass them to a statistical test (a classifier) that accepts faces and rejects non-faces

CS8690 Computer Vision University of Missouri at Columbia Basic ideas in classifiers Loss –some errors may be more expensive than others e.g. a fatal disease that is easily cured by a cheap medicine with no side-effects -> false positives in diagnosis are better than false negatives –We discuss two class classification: L(1->2) is the loss caused by calling 1 a 2 Total risk of using classifier s Loss –some errors may be more expensive than others e.g. a fatal disease that is easily cured by a cheap medicine with no side-effects -> false positives in diagnosis are better than false negatives –We discuss two class classification: L(1->2) is the loss caused by calling 1 a 2 Total risk of using classifier s

CS8690 Computer Vision University of Missouri at Columbia Basic ideas in classifiers Generally, we should classify as 1 if the expected loss of classifying as 1 is better than for 2 gives Crucial notion: Decision boundary –points where the loss is the same for either case Generally, we should classify as 1 if the expected loss of classifying as 1 is better than for 2 gives Crucial notion: Decision boundary –points where the loss is the same for either case 1 if 2 if

CS8690 Computer Vision University of Missouri at Columbia Some loss may be inevitable: the minimum risk (shaded area) is called the Bayes risk

CS8690 Computer Vision University of Missouri at Columbia Finding a decision boundary is not the same as modelling a conditional density.

CS8690 Computer Vision University of Missouri at Columbia Assume normal class densities, p-dimensional measurements with common (known) covariance and different (known) means Class priors are Can ignore a common factor in posteriors - important; posteriors are then: Assume normal class densities, p-dimensional measurements with common (known) covariance and different (known) means Class priors are Can ignore a common factor in posteriors - important; posteriors are then: Example: known distributions

CS8690 Computer Vision University of Missouri at Columbia Classifier boils down to: choose class that minimizes: because covariance is common, this simplifies to sign of a linear expression (i.e. Voronoi diagram in 2D for  =I and equal priors) Mahalanobis distance

CS8690 Computer Vision University of Missouri at Columbia Plug-in classifiers Assume that distributions have some parametric form - now estimate the parameters from the data. Common: –assume a normal distribution with shared covariance, different means; use usual estimates –ditto, but different covariances; ditto Issue: parameter estimates that are “good” may not give optimal classifiers. Assume that distributions have some parametric form - now estimate the parameters from the data. Common: –assume a normal distribution with shared covariance, different means; use usual estimates –ditto, but different covariances; ditto Issue: parameter estimates that are “good” may not give optimal classifiers.

CS8690 Computer Vision University of Missouri at Columbia Histogram based classifiers Use a histogram to represent the class- conditional densities –(i.e. p(x|1), p(x|2), etc) Advantage: estimates become quite good with enough data! Disadvantage: Histogram becomes big with high dimension –but maybe we can assume feature independence? Use a histogram to represent the class- conditional densities –(i.e. p(x|1), p(x|2), etc) Advantage: estimates become quite good with enough data! Disadvantage: Histogram becomes big with high dimension –but maybe we can assume feature independence?

CS8690 Computer Vision University of Missouri at Columbia Finding skin Skin has a very small range of (intensity independent) colours, and little texture –Compute an intensity-independent colour measure, check if colour is in this range, check if there is little texture (median filter) –See this as a classifier - we can set up the tests by hand, or learn them. –get class conditional densities (histograms), priors from data (counting) Classifier is Skin has a very small range of (intensity independent) colours, and little texture –Compute an intensity-independent colour measure, check if colour is in this range, check if there is little texture (median filter) –See this as a classifier - we can set up the tests by hand, or learn them. –get class conditional densities (histograms), priors from data (counting) Classifier is

CS8690 Computer Vision University of Missouri at Columbia Figure from “Statistical color models with application to skin detection,” M.J. Jones and J. Rehg, Proc. Computer Vision and Pattern Recognition, 1999 copyright 1999, IEEE

CS8690 Computer Vision University of Missouri at Columbia Figure from “Statistical color models with application to skin detection,” M.J. Jones and J. Rehg, Proc. Computer Vision and Pattern Recognition, 1999 copyright 1999, IEEE Receiver Operating Curve

CS8690 Computer Vision University of Missouri at Columbia Finding faces Faces “look like” templates (at least when they’re frontal). General strategy: –search image windows at a range of scales –Correct for illumination –Present corrected window to classifier Faces “look like” templates (at least when they’re frontal). General strategy: –search image windows at a range of scales –Correct for illumination –Present corrected window to classifier Issues –How corrected? –What features? –What classifier? –what about lateral views?

CS8690 Computer Vision University of Missouri at Columbia Naive Bayes (Important: naive not necessarily pejorative) Find faces by vector quantizing image patches, then computing a histogram of patch types within a face Histogram doesn’t work when there are too many features –features are the patch types –assume they’re independent and cross fingers –reduction in degrees of freedom –very effective for face finders why? probably because the examples that would present real problems aren’t frequent. (Important: naive not necessarily pejorative) Find faces by vector quantizing image patches, then computing a histogram of patch types within a face Histogram doesn’t work when there are too many features –features are the patch types –assume they’re independent and cross fingers –reduction in degrees of freedom –very effective for face finders why? probably because the examples that would present real problems aren’t frequent. Many face finders on the face detection home page

CS8690 Computer Vision University of Missouri at Columbia Figure from A Statistical Method for 3D Object Detection Applied to Faces and Cars, H. Schneiderman and T. Kanade, Proc. Computer Vision and Pattern Recognition, 2000, copyright 2000, IEEE

CS8690 Computer Vision University of Missouri at Columbia Face Recognition Whose face is this? (perhaps in a mugshot) Issue: –What differences are important and what not? –Reduce the dimension of the images, while maintaining the “important” differences. One strategy: –Principal components analysis Whose face is this? (perhaps in a mugshot) Issue: –What differences are important and what not? –Reduce the dimension of the images, while maintaining the “important” differences. One strategy: –Principal components analysis

CS8690 Computer Vision University of Missouri at Columbia Template matching Simple cross-correlation between images Best match wins Computationally expensive, i.e. requires presented image to be correlated with every image in the database ! Simple cross-correlation between images Best match wins Computationally expensive, i.e. requires presented image to be correlated with every image in the database !

CS8690 Computer Vision University of Missouri at Columbia Eigenspace matching Consider PCA Then, Consider PCA Then, Much cheaper to compute!

CS8690 Computer Vision University of Missouri at Columbia

CS8690 Computer Vision University of Missouri at Columbia Eigenfaces plus a linear combination of eigenfaces

CS8690 Computer Vision University of Missouri at Columbia

CS8690 Computer Vision University of Missouri at Columbia Difficulties with PCA Projection may suppress important detail –smallest variance directions may not be unimportant Method does not take discriminative task into account –typically, we wish to compute features that allow good discrimination –not the same as largest variance Projection may suppress important detail –smallest variance directions may not be unimportant Method does not take discriminative task into account –typically, we wish to compute features that allow good discrimination –not the same as largest variance

CS8690 Computer Vision University of Missouri at Columbia

CS8690 Computer Vision University of Missouri at Columbia Linear Discriminant Analysis We wish to choose linear functions of the features that allow good discrimination. –Assume class-conditional covariances are the same –Want linear feature that maximises the spread of class means for a fixed within-class variance We wish to choose linear functions of the features that allow good discrimination. –Assume class-conditional covariances are the same –Want linear feature that maximises the spread of class means for a fixed within-class variance

CS8690 Computer Vision University of Missouri at Columbia

CS8690 Computer Vision University of Missouri at Columbia

CS8690 Computer Vision University of Missouri at Columbia

CS8690 Computer Vision University of Missouri at Columbia Neural networks Linear decision boundaries are useful –but often not very powerful –we seek an easy way to get more complex boundaries Compose linear decision boundaries –i.e. have several linear classifiers, and apply a classifier to their output –a nuisance, because sign(ax+by+cz) etc. isn’t differentiable. –use a smooth “squashing function” in place of sign. Linear decision boundaries are useful –but often not very powerful –we seek an easy way to get more complex boundaries Compose linear decision boundaries –i.e. have several linear classifiers, and apply a classifier to their output –a nuisance, because sign(ax+by+cz) etc. isn’t differentiable. –use a smooth “squashing function” in place of sign.

CS8690 Computer Vision University of Missouri at Columbia

CS8690 Computer Vision University of Missouri at Columbia

CS8690 Computer Vision University of Missouri at Columbia Training Choose parameters to minimize error on training set Stochastic gradient descent, computing gradient using trick (backpropagation, aka the chain rule) Stop when error is low, and hasn’t changed much Choose parameters to minimize error on training set Stochastic gradient descent, computing gradient using trick (backpropagation, aka the chain rule) Stop when error is low, and hasn’t changed much

CS8690 Computer Vision University of Missouri at Columbia The vertical face-finding part of Rowley, Baluja and Kanade’s system Figure from “Rotation invariant neural-network based face detection,” H.A. Rowley, S. Baluja and T. Kanade, Proc. Computer Vision and Pattern Recognition, 1998, copyright 1998, IEEE

CS8690 Computer Vision University of Missouri at Columbia Histogram equalisation gives an approximate fix for illumination induced variability

CS8690 Computer Vision University of Missouri at Columbia Architecture of the complete system: they use another neural net to estimate orientation of the face, then rectify it. They search over scales to find bigger/smaller faces. Figure from “Rotation invariant neural-network based face detection,” H.A. Rowley, S. Baluja and T. Kanade, Proc. Computer Vision and Pattern Recognition, 1998, copyright 1998, IEEE

CS8690 Computer Vision University of Missouri at Columbia Figure from “Rotation invariant neural-network based face detection,” H.A. Rowley, S. Baluja and T. Kanade, Proc. Computer Vision and Pattern Recognition, 1998, copyright 1998, IEEE

CS8690 Computer Vision University of Missouri at Columbia Convolutional neural networks Template matching using NN classifiers seems to work Natural features are filter outputs –probably, spots and bars, as in texture –but why not learn the filter kernels, too? Template matching using NN classifiers seems to work Natural features are filter outputs –probably, spots and bars, as in texture –but why not learn the filter kernels, too?

CS8690 Computer Vision University of Missouri at Columbia Figure from “Gradient-Based Learning Applied to Document Recognition”, Y. Lecun et al Proc. IEEE, 1998 copyright 1998, IEEE A convolutional neural network, LeNet; the layers filter, subsample, filter, subsample, and finally classify based on outputs of this process.

CS8690 Computer Vision University of Missouri at Columbia LeNet is used to classify handwritten digits. Notice that the test error rate is not the same as the training error rate, because the test set consists of items not in the training set. Not all classification schemes necessarily have small test error when they have small training error. Figure from “Gradient-Based Learning Applied to Document Recognition”, Y. Lecun et al Proc. IEEE, 1998 copyright 1998, IEEE

CS8690 Computer Vision University of Missouri at Columbia Support Vector Machines Neural nets try to build a model of the posterior, p(k|x) Instead, try to obtain the decision boundary directly –potentially easier, because we need to encode only the geometry of the boundary, not any irrelevant wiggles in the posterior. –Not all points affect the decision boundary Neural nets try to build a model of the posterior, p(k|x) Instead, try to obtain the decision boundary directly –potentially easier, because we need to encode only the geometry of the boundary, not any irrelevant wiggles in the posterior. –Not all points affect the decision boundary

CS8690 Computer Vision University of Missouri at Columbia Support Vector Machines Neural nets try to build a model of the posterior, p(k|x) Instead, try to obtain the decision boundary directly –potentially easier, because we need to encode only the geometry of the boundary, not any irrelevant wiggles in the posterior. –Not all points affect the decision boundary Neural nets try to build a model of the posterior, p(k|x) Instead, try to obtain the decision boundary directly –potentially easier, because we need to encode only the geometry of the boundary, not any irrelevant wiggles in the posterior. –Not all points affect the decision boundary

CS8690 Computer Vision University of Missouri at Columbia

CS8690 Computer Vision University of Missouri at Columbia Support Vector Machines Linearly separable data means Choice of hyperplane means Hence distance Linearly separable data means Choice of hyperplane means Hence distance

CS8690 Computer Vision University of Missouri at Columbia Support Vector Machines Actually, we construct a dual optimization problem. By being clever about what x means, I can have much more interesting boundaries.

CS8690 Computer Vision University of Missouri at Columbia Space in which decision boundary is linear - a conic in the original space has the form

CS8690 Computer Vision University of Missouri at Columbia Support Vector Machines Set S of points x i  R n, each x i belongs to one of two classes y i  {-1,1} The goals is to find a hyperplane that divides S in these two classes Set S of points x i  R n, each x i belongs to one of two classes y i  {-1,1} The goals is to find a hyperplane that divides S in these two classes S is separable if  w  R n, b  R Separating hyperplanes didi Closest point

CS8690 Computer Vision University of Missouri at Columbia Problem 1: Support Vector Machines Optimal separating hyperplane maximizes Optimal separating hyperplane (OSH) Minimize Subject to support vectors

CS8690 Computer Vision University of Missouri at Columbia Solve using Lagrange multipliers Lagrangian –at solution –therefore Lagrangian –at solution –therefore

CS8690 Computer Vision University of Missouri at Columbia Problem 2: Dual problem Minimize Subject to where Kühn-Tucker condition: (for x j a support vector) (  i >0 only for support vectors)

CS8690 Computer Vision University of Missouri at Columbia Linearly non-separable cases Find trade-off between maximum separation and misclassifications Problem 3: Minimize Subject to

CS8690 Computer Vision University of Missouri at Columbia Dual problem for non- separable cases Problem 4: Minimize Subject to where Kühn-Tucker condition: Support vectors: misclassified margin vectors too close OSH errors

CS8690 Computer Vision University of Missouri at Columbia Decision function Once w and b have been computed the classification decision for input x is given by Note that the globally optimal solution can always be obtained (convex problem) Once w and b have been computed the classification decision for input x is given by Note that the globally optimal solution can always be obtained (convex problem)

CS8690 Computer Vision University of Missouri at Columbia Non-linear SVMs Non-linear separation surfaces can be obtained by non-linearly mapping the data to a high dimensional space and then applying the linear SVM technique Note that data only appears through vector product Need for vector product in high-dimension can be avoided by using Mercer kernels: Non-linear separation surfaces can be obtained by non-linearly mapping the data to a high dimensional space and then applying the linear SVM technique Note that data only appears through vector product Need for vector product in high-dimension can be avoided by using Mercer kernels: (Polynomial kernel) (Radial Basis Function) (Sigmoïdal function) e.g.

CS8690 Computer Vision University of Missouri at Columbia Space in which decision boundary is linear - a conic in the original space has the form

CS8690 Computer Vision University of Missouri at Columbia SVMs for 3D object recognition -Consider images as vectors -Compute pairwise OSH using linear SVM -Support vectors are representative views of the considered object (relative to other) -Tournament like classification -Competing classes are grouped in pairs -Not selected classes are discarded -Until only one class is left -Complexity linear in number of classes -No pose estimation -Consider images as vectors -Compute pairwise OSH using linear SVM -Support vectors are representative views of the considered object (relative to other) -Tournament like classification -Competing classes are grouped in pairs -Not selected classes are discarded -Until only one class is left -Complexity linear in number of classes -No pose estimation (Pontil & Verri PAMI’98)

CS8690 Computer Vision University of Missouri at Columbia Vision applications Reliable, simple classifier, –use it wherever you need a classifier Commonly used for face finding Reliable, simple classifier, –use it wherever you need a classifier Commonly used for face finding Pedestrian finding –many pedestrians look like lollipops (hands at sides, torso wider than legs) most of the time –classify image regions, searching over scales –But what are the features? –Compute wavelet coefficients for pedestrian windows, average over pedestrians. If the average is different from zero, probably strongly associated with pedestrian

CS8690 Computer Vision University of Missouri at Columbia Figure from, “A general framework for object detection,” by C. Papageorgiou, M. Oren and T. Poggio, Proc. Int. Conf. Computer Vision, 1998, copyright 1998, IEEE

CS8690 Computer Vision University of Missouri at Columbia Figure from, “A general framework for object detection,” by C. Papageorgiou, M. Oren and T. Poggio, Proc. Int. Conf. Computer Vision, 1998, copyright 1998, IEEE

CS8690 Computer Vision University of Missouri at Columbia Figure from, “A general framework for object detection,” by C. Papageorgiou, M. Oren and T. Poggio, Proc. Int. Conf. Computer Vision, 1998, copyright 1998, IEEE