Face Recognition Ying Wu Electrical and Computer Engineering Northwestern University, Evanston, IL

Slides:



Advertisements
Similar presentations
EigenFaces and EigenPatches Useful model of variation in a region –Region must be fixed shape (eg rectangle) Developed for face recognition Generalised.
Advertisements

Component Analysis (Review)
CS479/679 Pattern Recognition Dr. George Bebis
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
LECTURE 11: BAYESIAN PARAMETER ESTIMATION
Dimension reduction (1)
Pattern Classification Chapter 2 (Part 2)0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O.
Visual Recognition Tutorial
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
EE 290A: Generalized Principal Component Analysis Lecture 6: Iterative Methods for Mixture-Model Segmentation Sastry & Yang © Spring, 2011EE 290A, University.
Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections CS479/679 Pattern Recognition Dr. George Bebis.
Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
© 2003 by Davi GeigerComputer Vision September 2003 L1.1 Face Recognition Recognized Person Face Recognition.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Dimensional reduction, PCA
Face Recognition using PCA (Eigenfaces) and LDA (Fisherfaces)
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
1 lBayesian Estimation (BE) l Bayesian Parameter Estimation: Gaussian Case l Bayesian Parameter Estimation: General Estimation l Problems of Dimensionality.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Bayesian Estimation (BE) Bayesian Parameter Estimation: Gaussian Case
CS 485/685 Computer Vision Face Recognition Using Principal Components Analysis (PCA) M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Probability of Error Feature vectors typically have dimensions greater than 50. Classification accuracy depends upon the dimensionality and the amount.
0 Pattern Classification, Chapter 3 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda,
Principles of Pattern Recognition
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Whitening.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.
Classification Course web page: vision.cis.udel.edu/~cv May 12, 2003  Lecture 33.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
Chapter 3 (part 2): Maximum-Likelihood and Bayesian Parameter Estimation Bayesian Estimation (BE) Bayesian Estimation (BE) Bayesian Parameter Estimation:
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.
CSE 185 Introduction to Computer Vision Face Recognition.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
EE4-62 MLCV Lecture Face Recognition – Subspace/Manifold Learning Tae-Kyun Kim 1 EE4-62 MLCV.
Discriminant Analysis
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
2D-LDA: A statistical linear discriminant analysis for image matrix
Linear Classifiers Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 10: PRINCIPAL COMPONENTS ANALYSIS Objectives:
Face detection and recognition Many slides adapted from K. Grauman and D. Lowe.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Lecture 2. Bayesian Decision Theory
CS479/679 Pattern Recognition Dr. George Bebis
Chapter 3: Maximum-Likelihood Parameter Estimation
LECTURE 11: Advanced Discriminant Analysis
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
LECTURE 10: DISCRIMINANT ANALYSIS
Parameter Estimation 主講人:虞台文.
CH 5: Multivariate Methods
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Outline Peter N. Belhumeur, Joao P. Hespanha, and David J. Kriegman, “Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection,”
Course Outline MODEL INFORMATION COMPLETE INCOMPLETE
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Feature space tansformation methods
Generally Discriminant Analysis
LECTURE 09: DISCRIMINANT ANALYSIS
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Parametric Methods Berlin Chen, 2005 References:
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Multivariate Methods Berlin Chen
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Presentation transcript:

Face Recognition Ying Wu Electrical and Computer Engineering Northwestern University, Evanston, IL

Recognizing Faces? Lighting View

Outline Bayesian Classification Principal Component Analysis (PCA) Fisher Linear Discriminant Analysis (LDA) Independent Component Analysis (ICA)

Bayesian Classification Classifier & Discriminant Function Discriminant Function for Gaussian Bayesian Learning and Estimation

Classifier & Discriminant Function Discriminant function g i (x) i=1,…,C Classifier Example Decision boundary The choice of D-function is not unique

Multivariate Gaussian principal axes The principal axes (the direction) are given by the eigenvectors of  ; The length of the axes (the uncertainty) is given by the eigenvalues of  x1x1 x2x2

Mahalanobis Distance Mahalanobis distance is a normalized distance  x1x1 x2x2

Whitening Whitening: –Find a linear transformation (rotation and scaling) such that the covariance becomes an identity matrix (i.e., the uncertainty for each component is the same) y1y1 y2y2 x1x1 x2x2 y=A T x p(x) ~ N( ,  ) p(y) ~ N(A T , A T  A)

Disc. Func. for Gaussian Minimum-error-rate classifier

Case I:  i =  2 I constant Liner discriminant function Boundary:

Example Assume p(  i )=p(  j ) ii jj Let’s derive the decision boundary:

Case II:  i =  The decision boundary is still linear:

Case III:  i = arbitrary The decision boundary is no longer linear, but hyperquadrics!

Bayesian Learning Learning means “training” i.e., estimating some unknowns from “training data” WHY? –It is very difficult to specify these unknowns –Hopefully, these unknowns can be recovered from examples collected.

Maximum Likelihood Estimation Collected examples D={x 1,x 2,…,x n } Estimate unknown parameters  in the sense that the data likelihood is maximized Likelihood Log Likelihood ML estimation

Case I: unknown 

Case II: unknown  and  generalize

Bayesian Estimation Collected examples D={x 1,x 2,…,x n }, drawn independently from a fixed but unknown distribution p(x) Bayesian learning is to use D to determine p(x|D), i.e., to learn a p.d.f. p(x) is unknown, but has a parametric form with parameters  ~ p(  ) Difference from ML: in Bayesian learning,  is not a value, but a random variable and we need to recover the distribution of , rather than a single value.

Bayesian Estimation This is obvious from the total probability rule, i.e., p(x|D) is a weighted average over all  If p(  |D) peaks very sharply about some value  *, then p(x|D) ~ p(x|  * )

The Univariate Case assume  is the only unknown, p(x|  )~N( ,  2 )  is a r.v., assuming a prior p(  ) ~ N(  0,  0 2 ), i.e.,  0 is the best guess of , and  0 is the uncertainty of it. P(  |D) is also a Gaussian for any # of training examples

The Univariate Case The best guess for  after observing n examples  n measures the uncertainty of this guess after observing n examples n=1 n=5 n=10 n=30  p(  |x 1,…,x n ) p(  |D) becomes more and more sharply peaked when observing more and more examples, i.e., the uncertainty decreases.

The Multivariate Case

PCA and Eigenface Principal Component Analysis (PCA) Eigenface for Face Recognition

PCA: motivation Pattern vectors are generally confined within some low-dimensional subspaces Recall the basic idea of the Fourier transform –A signal is (de)composed of a linear combination of a set of basis signal with different frequencies.

PCA: idea m ee x xkxk

PCA To maximize e T Se, we need to select max

Algorithm Learning the principal components from {x 1, x 2, …, x n }

PCA for Face Recognition Training data D={x 1, …, x M } –Dimension (stacking the pixels together to make a vector of dimension N) –Preprocessing cropping normalization These faces should lie in a “face” subspace Questions: –What is the dimension of this subspace? –How to identify this subspace? –How to use it for recognition?

Eigenface The EigenFace approach: M. Turk and A. Pentland, 1992

An Issue In general, N >> M However, S, the covariance matrix, is NxN! Difficulties: –S is ill-conditioned. Rank(S)<<N –The computation of the eigenvalue decomposition of S is expensive when N is large Solution?

Solution I: Let’s do eigenvalue decomposition on A T A, which is a MxM matrix A T Av= v  AA T Av= Av To see is clearly! (AA T ) (Av)= (Av) i.e., if v is an eigenvector of A T A, then Av is the eigenvector of AA T corresponding to the same eigenvalue! Note: of course, you need to normalize Av to make it a unit vector

Solution II: You can simply use SVD (singular value decomposition) A = [x 1 -m, …, x M -m] A = U  V T –A: NxM –U: NxM U T U=I –  : MxM diagonal –V: MxM V T V=VV T =I

Fisher Linear Discrimination LDA PCA+LDA for Face Recognition

When does PCA fail? PCA

Linear Discriminant Analysis Finding an optimal linear mapping W Finding an optimal linear mapping W Catches major difference between classes and discount irrelevant factors Catches major difference between classes and discount irrelevant factors In the mapped space, data are clustered In the mapped space, data are clustered

Within/between class scatters

Fisher LDA

Solution I If S w is not singular You can simply do eigenvalue decomposition on S W -1 S B

Solution II Noticing: –S B W is on the direction of m 1 -m 2 (WHY?) –We are only concern about the direction of the projection, rather than the scale We have

Multiple Discriminant Analysis

Comparing PCA and LDA PCA LDA

MDA for Face Recognition Lighting PCA does not work well! (why?) solution: PCA+MDA

Independent Component Analysis The cross-talk problem ICA

Cocktail party t S 1 (t) S 2 (t) t t x 1 (t) t x 2 (t) t x 3 (t) A Can you recover s 1 (t) and s 2 (t) from x 1 (t), x 2 (t) and x 3 (t)?

Formulation Both A and S are unknowns! Can you recover both A and S from X?

The Idea y is a linear combination of {s i } Recall the central limit theory! A sum of even two independent r.v. is more Gaussian than the original r.v. So, Z T S should be more Gaussian than any of {s i } In other words, Z T S become least Gaussian when in fact equal to {s i } Amazed!

Face Recognition: Challenges View