Face Recognition: An Introduction

Slides:



Advertisements
Similar presentations
Face Recognition Sumitha Balasuriya.
Advertisements

Component Analysis (Review)
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Face Recognition. Introduction Why we are interested in face recognition? Why we are interested in face recognition? Passport control at terminals in.
Machine Learning Lecture 8 Data Processing and Representation
Face Recognition CPSC UTC/CSE.
Face Recognition and Biometric Systems
Computer Vision Spring ,-685 Instructor: S. Narasimhan Wean 5403 T-R 3:00pm – 4:20pm Lecture #20.
Face Recognition and Feature Subspaces
Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8
© 2003 by Davi GeigerComputer Vision September 2003 L1.1 Face Recognition Recognized Person Face Recognition.
Principal Component Analysis
CS 790Q Biometrics Face Recognition Using Dimensionality Reduction PCA and LDA M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.
Eigenfaces As we discussed last time, we can reduce the computation by dimension reduction using PCA –Suppose we have a set of N images and there are c.
Face Recognition using PCA (Eigenfaces) and LDA (Fisherfaces)
Face Recognition Jeremy Wyatt.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
FACE RECOGNITION, EXPERIMENTS WITH RANDOM PROJECTION
Principal Component Analysis Barnabás Póczos University of Alberta Nov 24, 2009 B: Chapter 12 HRF: Chapter 14.5.
3D Geometry for Computer Graphics
Face Recognition: A Comparison of Appearance-Based Approaches
Computer Vision I Instructor: Prof. Ko Nishino. Today How do we recognize objects in images?
Face Collections : Rendering and Image Processing Alexei Efros.
Face Recognition: An Introduction
Three-Dimensional Face Recognition Using Surface Space Combinations Thomas Heseltine, Nick Pears, Jim Austin Advanced Computer Architecture Group Department.
A PCA-based feature extraction method for face recognition — Adaptively weighted sub-pattern PCA (Aw-SpPCA) Group members: Keren Tan Weiming Chen Rong.
Oral Defense by Sunny Tang 15 Aug 2003
Face Detection and Recognition
Face Detection and Recognition Readings: Ch 8: Sec 4.4, Ch 14: Sec 4.4
CS 485/685 Computer Vision Face Recognition Using Principal Components Analysis (PCA) M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.
SVD(Singular Value Decomposition) and Its Applications
Summarized by Soo-Jin Kim
Computer Vision – Lecture 9
Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)
Recognition Part II Ali Farhadi CSE 455.
Face Recognition and Feature Subspaces
Face Recognition and Feature Subspaces
Face Detection and Recognition Computational Photography Derek Hoiem, University of Illinois Lecture by Kevin Karsch 12/3/13 Chuck Close, self portrait.
1 Graph Embedding (GE) & Marginal Fisher Analysis (MFA) 吳沛勳 劉冠成 韓仁智
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
1 Recognition by Appearance Appearance-based recognition is a competing paradigm to features and alignment. No features are extracted! Images are represented.
PCA explained within the context of Face Recognition Berrin Yanikoglu FENS Computer Science & Engineering Sabancı University Updated Dec Some slides.
Terrorists Team members: Ágnes Bartha György Kovács Imre Hajagos Wojciech Zyla.
Classification Course web page: vision.cis.udel.edu/~cv May 12, 2003  Lecture 33.
Face Recognition: An Introduction
1 Terrorists Face recognition of suspicious and (in most cases) evil homo-sapiens.
CSE 185 Introduction to Computer Vision Face Recognition.
Discriminant Analysis
Facial Recognition Justin Kwong Megan Thompson Raymundo Vazquez-lugo.
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Irfan Ullah Department of Information and Communication Engineering Myongji university, Yongin, South Korea Copyright © solarlits.com.
2D-LDA: A statistical linear discriminant analysis for image matrix
Face Recognition and Feature Subspaces Devi Parikh Virginia Tech 11/05/15 Slides borrowed from Derek Hoiem, who borrowed some slides from Lana Lazebnik,
3D Face Recognition Using Range Images Literature Survey Joonsoo Lee 3/10/05.
Face detection and recognition Many slides adapted from K. Grauman and D. Lowe.
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
University of Ioannina
LECTURE 10: DISCRIMINANT ANALYSIS
Recognition with Expression Variations
Lecture 8:Eigenfaces and Shared Features
CS 2750: Machine Learning Dimensionality Reduction
Face Recognition and Feature Subspaces
Recognition: Face Recognition
Outline Peter N. Belhumeur, Joao P. Hespanha, and David J. Kriegman, “Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection,”
Singular Value Decomposition
PCA is “an orthogonal linear transformation that transfers the data to a new coordinate system such that the greatest variance by any projection of the.
CS4670: Intro to Computer Vision
LECTURE 09: DISCRIMINANT ANALYSIS
Facial Recognition as a Pattern Recognition Problem
Presentation transcript:

Face Recognition: An Introduction

Face

Face Recognition Face is the most common biometric used by humans Applications range from static, mug-shot verification to a dynamic, uncontrolled face identification in a cluttered background Challenges: automatically locate the face recognize the face from a general view point under different illumination conditions, facial expressions, and aging effects O O Face recognition can be categorized into appearance-based, geometry-based, and hybrid approaches.

Authentication vs Identification Face Authentication/Verification (1:1 matching) Face Identification/Recognition (1:N matching)

Applications  Access Control www.viisage.com www.visionics.com

Applications Face Scan at Airports  Video Surveillance (On-line or off-line) Face Scan at Airports www.facesnap.de

Why is Face Recognition Hard? Many faces of Madonna

Face Recognition Difficulties Identify similar faces (inter-class similarity) Accommodate intra-class variability due to: head pose illumination conditions expressions facial accessories aging effects O O Face recognition can be categorized into appearance-based, geometry-based, and hybrid approaches. Cartoon faces

news.bbc.co.uk/hi/english/in_depth/americas/2000/us_elections Inter-class Similarity Different persons may have very similar appearance www.marykateandashley.com news.bbc.co.uk/hi/english/in_depth/americas/2000/us_elections O O Face recognition can be categorized into appearance-based, geometry-based, and hybrid approaches. Twins Father and son

Intra-class Variability Faces with intra-subject variations in pose, illumination, expression, accessories, color, occlusions, and brightness O O Face recognition can be categorized into appearance-based, geometry-based, and hybrid approaches.

Sketch of a Pattern Recognition Architecture Feature Extraction Classification Image (window) Object Identity Feature Vector

Example: Face Detection Scan window over image Classify window as either: Face Non-face Classifier Window Face Non-face

Detection Test Sets

Profile views Schneiderman’s Test set

Face Detection: Experimental Results Test sets: two CMU benchmark data sets Test set 1: 125 images with 483 faces Test set 2: 20 images with 136 faces [See also work by Viola & Jones, Rehg, more recent by Schneiderman]

Example: Finding skin Non-parametric Representation of CCD Skin has a very small range of (intensity independent) colors, and little texture Compute an intensity-independent color measure, check if color is in this range, check if there is little texture (median filter) See this as a classifier - we can set up the tests by hand, or learn them. get class conditional densities (histograms), priors from data (counting) Classifier is

Figure from “Statistical color models with application to skin detection,” M.J. Jones and J. Rehg, Proc. Computer Vision and Pattern Recognition, 1999 copyright 1999, IEEE

Face Detection

Face Detection Algorithm Lighting Compensation Color Space Transformation Skin Color Detection Input Image Variance-based Segmentation Connected Component & Grouping Face Localization Eye/ Mouth Detection O O Face recognition can be categorized into appearance-based, geometry-based, and hybrid approaches. Face Boundary Detection Verifying/ Weighting Eyes-Mouth Triangles Facial Feature Detection Output Image

Canon Powershot

Face Recognition: 2-D and 3-D Time (video) 2-D Face Database 2-D Recognition Data Recognition Comparison 3-D 3-D Play this, it’s animated – Green denotes topics covered subsequent slides Prior knowledge of face class

Taxonomy of Face Recognition Algorithms Pose-dependency Pose-dependent Pose-invariant Viewer-centered Images Object-centered Models Face representation Matching features -- Gordon et al., 1995 Appearance-based (Holistic) Hybrid Feature-based (Analytic) -- Lengagne et al., 1996 -- Atick et al., 1996 -- Yan et al., 1996 PCA, LDA LFA -- Zhao et al., 2000 EGBM -- Zhang et al., 2000

Image as a Feature Vector x 1 2 3 Consider an n-pixel image to be a point in an n-dimensional space, x Rn. Each pixel value is a coordinate of x.

Nearest Neighbor Classifier { Rj } are set of training images. x 1 2 3 R1 R2 I

Comments Sometimes called “Template Matching” Variations on distance function (e.g. L1, robust distances) Multiple templates per class- perhaps many training images per class. Expensive to compute k distances, especially when each image is big (N dimensional). May not generalize well to unseen examples of class. Some solutions: Bayesian classification Dimensionality reduction

Eigenface (Turk, Pentland, 91) -1 Use Principle Component Analysis (PCA) to determine the most discriminating features between images of faces.

Eigenfaces: linear projection An n-pixel image xRn can be projected to a low-dimensional feature space yRm by y = Wx where W is an n by m matrix. Recognition is performed using nearest neighbor in Rm. How do we choose a good W?

Eigenfaces: Principal Component Analysis (PCA) Main point is that the first principal component is the direction of largest variance. I usually prove this on the whiteboard Some details: Use Singular value decomposition, “trick” described in text to compute basis when n<<d

How do you construct Eigenspace? [ ] [ ] [ x1 x2 x3 x4 x5 ] W Construct data matrix by stacking vectorized images and then apply Singular Value Decomposition (SVD)

Matrix Decompositions Definition: The factorization of a matrix M into two or more matrices M1, M2,…, Mn, such that M = M1M2…Mn. Many decompositions exist… QR Decomposition LU Decomposition LDU Decomposition Etc.

[m x n] = [m x m][m x n][n x n] Singular Value Decomposition Excellent ref: ‘Matrix Computations,” Golub, Van Loan Any m by n matrix A may be factored such that A = UVT [m x n] = [m x m][m x n][n x n] U: m by m, orthogonal matrix Columns of U are the eigenvectors of AAT V: n by n, orthogonal matrix, columns are the eigenvectors of ATA : m by n, diagonal with non-negative entries (1, 2, …, s) with s=min(m,n) are called the called the singular values Singular values are the square roots of eigenvalues of both AAT and ATA Result of SVD algorithm: 1  2  …  s

SVD Properties In Matlab [u s v] = svd(A), and you can verify that: A=u*s*v’ r=Rank(A) = # of non-zero singular values. U, V give us orthonormal bases for the subspaces of A: 1st r columns of U: Column space of A Last m - r columns of U: Left nullspace of A 1st r columns of V: Row space of A last n - r columns of V: Nullspace of A For d<= r, the first d column of U provide the best d-dimensional basis for columns of A in least squares sense.

[m x n] = [m x n][n x n][n x n] Thin SVD Any m by n matrix A may be factored such that A = UVT [m x n] = [m x n][n x n][n x n] If m>n, then one can view  as: Where ’=diag(1, 2, …, s) with s=min(m,n), and lower matrix is (n-m by m) of zeros. Alternatively, you can write: A = U’’VT In Matlab, thin SVD is:[U S V] = svds(A)

Application: Pseudoinverse Given y = Ax, x = A+y For square A, A+ = A-1 For any A… A+ = V-1UT A+ is called the pseudoinverse of A. x = A+y is the least-squares solution of y = Ax. Alternative to previous solution.

Performing PCA with SVD Singular values of A are the square roots of eigenvalues of both AAT and ATA & Columns of U are corresponding Eigenvectors And Covariance matrix is: So, ignoring 1/n subtract mean image  from each input image, create data matrix, and perform thin SVD on the data matrix.

Mean First Principal Component Direction of Maximum Variance Figure 22.6 - story in the caption Mean

Eigenfaces Modeling Recognition Given a collection of n labeled training images, Compute mean image and covariance matrix. Compute k Eigenvectors (note that these are images) of covariance matrix corresponding to k largest Eigenvalues. Project the training images to the k-dimensional Eigenspace. Recognition Given a test image, project to Eigenspace. Perform classification to the projected training images.

Eigenfaces: Training Images [ Turk, Pentland 01

Eigenfaces Mean Image Basis Images

Variable Lighting

Projection, and reconstruction An n-pixel image xRn can be projected to a low-dimensional feature space yRm by y = Wx From yRm , the reconstruction of the point is WTy The error of the reconstruction is: ||x-WTWx||

Reconstruction using Eigenfaces Given image on left, project to Eigenspace, then reconstruct an image (right).

Underlying assumptions Background is not cluttered (or else only looking at interior of object Lighting in test image is similar to that in training image. No occlusion Size of training image (window) same as window in test image.

Face detection using “distance to face space” Scan a window  across the image, and classify the window as face/not face as follows: Project window to subspace, and reconstruct as described earlier. Compute distance between  and reconstruction. Local minima of distance over all image locations less than some treshold are taken as locations of faces. Repeat at different scales. Possibly normalize windows intensity so that || = 1.

Difficulties with PCA Projection may suppress important detail smallest variance directions may not be unimportant Method does not take discriminative task into account typically, we wish to compute features that allow good discrimination not the same as largest variance

Fig 22.7 Principal components will give a very poor repn of this data set

22. 10 - Two classes indicated by 22.10 - Two classes indicated by * and o; the first principal component captures all the variance, but completely destroys any ability to discriminate. The second is close to what’s required.

Illumination Variability Same person under variable lighting “The variations between the images of the same face due to illumination and viewing direction are almost always larger than image variations due to change in face identity.” -- Moses, Adini, Ullman, ECCV ‘94

Fisherfaces: Class specific linear projection P. Belhumeur, J. Hespanha, D. Kriegman, Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection, PAMI, July 1997, pp. 711--720. An n-pixel image xRn can be projected to a low-dimensional feature space yRm by y = Wx where W is an n by m matrix. Recognition is performed using nearest neighbor in Rm. How do we choose a good W?

PCA & Fisher’s Linear Discriminant Between-class scatter Within-class scatter Total scatter Where c is the number of classes i is the mean of class i | i | is number of samples of i.. 1 2  1 2 Use this slide to explain the three types of scatter

PCA & Fisher’s Linear Discriminant PCA (Eigenfaces) Maximizes projected total scatter Fisher’s Linear Discriminant Maximizes ratio of projected between-class to projected within-class scatter 2 1 Point to emphasize is that PCA maximizes the projected total scatter – I.e., it presserves the information in the training set, and is optimal in a least squares sense for reconstruction. But in dropping dimensions, it may smear classes together. FLD trades off two desirable effects for recognition. A. The within class scatter over all classes is minimized – this makes classification using nearest neighbor effective. B. The between class scatter is maximized – this causes classes to be far apart in the feature space. FLD

Computing the Fisher Projection Matrix The wi are orthonormal There are at most c-1 non-zero generalized Eigenvalues, so m <= c-1 Can be computed with eig in Matlab

Fisherfaces Since SW is rank N-c, project training set to subspace spanned by first N-c principal components of the training set. Apply FLD to N-c dimensional subspace yielding c-1 dimensional feature space. Fisher’s Linear Discriminant projects away the within-class variation (lighting, expressions) found in training set. Fisher’s Linear Discriminant preserves the separability of the classes. Here’s how W is actually calculated. The rows of W have the dimensions of images and like the “eigenfaces”, these can be viewed as “fisherfaces.”

PCA vs. FLD

Experimental Results - 1 Variation in Facial Expression, Eyewear, and Lighting Input: 160 images of 16 people Train: 159 images Test: 1 image With glasses Without glasses 3 Lighting conditions 5 expressions

Performance Evaluation Leave-one-out evaluation of PCA and LDA on the Yale Face Database [Belhumer, Hespanha, Kriegman 97] Approach Dim. of the subspace Error rate (close crop) Error rate (full face) Eigenface (PCA) 30 24.4% 19.4% Fisherface (LDA) 15 7.3% 0.6%

Experimental Results - 2

Harvard Face Database 10 individuals 66 images per person Train on 6 images at 15o Test on remaining images 60o

Recognition Results: Lighting Extrapolation Training on near frontal subset (Within 15 degrees of frontal), and testing on more extremes of lighting. As expected, Correlation performs slightly better than Eigenfaces. Removing the first three principal components works better since the capture the lighting variation, but they also contain useful discriminatory information which is lost. Fisherface performs better than other methods.