Face Recognition: An Introduction
Face
Face Recognition Face is the most common biometric used by humans Applications range from static, mug-shot verification to a dynamic, uncontrolled face identification in a cluttered background Challenges: automatically locate the face recognize the face from a general view point under different illumination conditions, facial expressions, and aging effects O O Face recognition can be categorized into appearance-based, geometry-based, and hybrid approaches.
Authentication vs Identification Face Authentication/Verification (1:1 matching) Face Identification/Recognition (1:N matching)
Applications Access Control www.viisage.com www.visionics.com
Applications Face Scan at Airports Video Surveillance (On-line or off-line) Face Scan at Airports www.facesnap.de
Why is Face Recognition Hard? Many faces of Madonna
Face Recognition Difficulties Identify similar faces (inter-class similarity) Accommodate intra-class variability due to: head pose illumination conditions expressions facial accessories aging effects O O Face recognition can be categorized into appearance-based, geometry-based, and hybrid approaches. Cartoon faces
news.bbc.co.uk/hi/english/in_depth/americas/2000/us_elections Inter-class Similarity Different persons may have very similar appearance www.marykateandashley.com news.bbc.co.uk/hi/english/in_depth/americas/2000/us_elections O O Face recognition can be categorized into appearance-based, geometry-based, and hybrid approaches. Twins Father and son
Intra-class Variability Faces with intra-subject variations in pose, illumination, expression, accessories, color, occlusions, and brightness O O Face recognition can be categorized into appearance-based, geometry-based, and hybrid approaches.
Sketch of a Pattern Recognition Architecture Feature Extraction Classification Image (window) Object Identity Feature Vector
Example: Face Detection Scan window over image Classify window as either: Face Non-face Classifier Window Face Non-face
Detection Test Sets
Profile views Schneiderman’s Test set
Face Detection: Experimental Results Test sets: two CMU benchmark data sets Test set 1: 125 images with 483 faces Test set 2: 20 images with 136 faces [See also work by Viola & Jones, Rehg, more recent by Schneiderman]
Example: Finding skin Non-parametric Representation of CCD Skin has a very small range of (intensity independent) colors, and little texture Compute an intensity-independent color measure, check if color is in this range, check if there is little texture (median filter) See this as a classifier - we can set up the tests by hand, or learn them. get class conditional densities (histograms), priors from data (counting) Classifier is
Figure from “Statistical color models with application to skin detection,” M.J. Jones and J. Rehg, Proc. Computer Vision and Pattern Recognition, 1999 copyright 1999, IEEE
Face Detection
Face Detection Algorithm Lighting Compensation Color Space Transformation Skin Color Detection Input Image Variance-based Segmentation Connected Component & Grouping Face Localization Eye/ Mouth Detection O O Face recognition can be categorized into appearance-based, geometry-based, and hybrid approaches. Face Boundary Detection Verifying/ Weighting Eyes-Mouth Triangles Facial Feature Detection Output Image
Canon Powershot
Face Recognition: 2-D and 3-D Time (video) 2-D Face Database 2-D Recognition Data Recognition Comparison 3-D 3-D Play this, it’s animated – Green denotes topics covered subsequent slides Prior knowledge of face class
Taxonomy of Face Recognition Algorithms Pose-dependency Pose-dependent Pose-invariant Viewer-centered Images Object-centered Models Face representation Matching features -- Gordon et al., 1995 Appearance-based (Holistic) Hybrid Feature-based (Analytic) -- Lengagne et al., 1996 -- Atick et al., 1996 -- Yan et al., 1996 PCA, LDA LFA -- Zhao et al., 2000 EGBM -- Zhang et al., 2000
Image as a Feature Vector x 1 2 3 Consider an n-pixel image to be a point in an n-dimensional space, x Rn. Each pixel value is a coordinate of x.
Nearest Neighbor Classifier { Rj } are set of training images. x 1 2 3 R1 R2 I
Comments Sometimes called “Template Matching” Variations on distance function (e.g. L1, robust distances) Multiple templates per class- perhaps many training images per class. Expensive to compute k distances, especially when each image is big (N dimensional). May not generalize well to unseen examples of class. Some solutions: Bayesian classification Dimensionality reduction
Eigenface (Turk, Pentland, 91) -1 Use Principle Component Analysis (PCA) to determine the most discriminating features between images of faces.
Eigenfaces: linear projection An n-pixel image xRn can be projected to a low-dimensional feature space yRm by y = Wx where W is an n by m matrix. Recognition is performed using nearest neighbor in Rm. How do we choose a good W?
Eigenfaces: Principal Component Analysis (PCA) Main point is that the first principal component is the direction of largest variance. I usually prove this on the whiteboard Some details: Use Singular value decomposition, “trick” described in text to compute basis when n<<d
How do you construct Eigenspace? [ ] [ ] [ x1 x2 x3 x4 x5 ] W Construct data matrix by stacking vectorized images and then apply Singular Value Decomposition (SVD)
Matrix Decompositions Definition: The factorization of a matrix M into two or more matrices M1, M2,…, Mn, such that M = M1M2…Mn. Many decompositions exist… QR Decomposition LU Decomposition LDU Decomposition Etc.
[m x n] = [m x m][m x n][n x n] Singular Value Decomposition Excellent ref: ‘Matrix Computations,” Golub, Van Loan Any m by n matrix A may be factored such that A = UVT [m x n] = [m x m][m x n][n x n] U: m by m, orthogonal matrix Columns of U are the eigenvectors of AAT V: n by n, orthogonal matrix, columns are the eigenvectors of ATA : m by n, diagonal with non-negative entries (1, 2, …, s) with s=min(m,n) are called the called the singular values Singular values are the square roots of eigenvalues of both AAT and ATA Result of SVD algorithm: 1 2 … s
SVD Properties In Matlab [u s v] = svd(A), and you can verify that: A=u*s*v’ r=Rank(A) = # of non-zero singular values. U, V give us orthonormal bases for the subspaces of A: 1st r columns of U: Column space of A Last m - r columns of U: Left nullspace of A 1st r columns of V: Row space of A last n - r columns of V: Nullspace of A For d<= r, the first d column of U provide the best d-dimensional basis for columns of A in least squares sense.
[m x n] = [m x n][n x n][n x n] Thin SVD Any m by n matrix A may be factored such that A = UVT [m x n] = [m x n][n x n][n x n] If m>n, then one can view as: Where ’=diag(1, 2, …, s) with s=min(m,n), and lower matrix is (n-m by m) of zeros. Alternatively, you can write: A = U’’VT In Matlab, thin SVD is:[U S V] = svds(A)
Application: Pseudoinverse Given y = Ax, x = A+y For square A, A+ = A-1 For any A… A+ = V-1UT A+ is called the pseudoinverse of A. x = A+y is the least-squares solution of y = Ax. Alternative to previous solution.
Performing PCA with SVD Singular values of A are the square roots of eigenvalues of both AAT and ATA & Columns of U are corresponding Eigenvectors And Covariance matrix is: So, ignoring 1/n subtract mean image from each input image, create data matrix, and perform thin SVD on the data matrix.
Mean First Principal Component Direction of Maximum Variance Figure 22.6 - story in the caption Mean
Eigenfaces Modeling Recognition Given a collection of n labeled training images, Compute mean image and covariance matrix. Compute k Eigenvectors (note that these are images) of covariance matrix corresponding to k largest Eigenvalues. Project the training images to the k-dimensional Eigenspace. Recognition Given a test image, project to Eigenspace. Perform classification to the projected training images.
Eigenfaces: Training Images [ Turk, Pentland 01
Eigenfaces Mean Image Basis Images
Variable Lighting
Projection, and reconstruction An n-pixel image xRn can be projected to a low-dimensional feature space yRm by y = Wx From yRm , the reconstruction of the point is WTy The error of the reconstruction is: ||x-WTWx||
Reconstruction using Eigenfaces Given image on left, project to Eigenspace, then reconstruct an image (right).
Underlying assumptions Background is not cluttered (or else only looking at interior of object Lighting in test image is similar to that in training image. No occlusion Size of training image (window) same as window in test image.
Face detection using “distance to face space” Scan a window across the image, and classify the window as face/not face as follows: Project window to subspace, and reconstruct as described earlier. Compute distance between and reconstruction. Local minima of distance over all image locations less than some treshold are taken as locations of faces. Repeat at different scales. Possibly normalize windows intensity so that || = 1.
Difficulties with PCA Projection may suppress important detail smallest variance directions may not be unimportant Method does not take discriminative task into account typically, we wish to compute features that allow good discrimination not the same as largest variance
Fig 22.7 Principal components will give a very poor repn of this data set
22. 10 - Two classes indicated by 22.10 - Two classes indicated by * and o; the first principal component captures all the variance, but completely destroys any ability to discriminate. The second is close to what’s required.
Illumination Variability Same person under variable lighting “The variations between the images of the same face due to illumination and viewing direction are almost always larger than image variations due to change in face identity.” -- Moses, Adini, Ullman, ECCV ‘94
Fisherfaces: Class specific linear projection P. Belhumeur, J. Hespanha, D. Kriegman, Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection, PAMI, July 1997, pp. 711--720. An n-pixel image xRn can be projected to a low-dimensional feature space yRm by y = Wx where W is an n by m matrix. Recognition is performed using nearest neighbor in Rm. How do we choose a good W?
PCA & Fisher’s Linear Discriminant Between-class scatter Within-class scatter Total scatter Where c is the number of classes i is the mean of class i | i | is number of samples of i.. 1 2 1 2 Use this slide to explain the three types of scatter
PCA & Fisher’s Linear Discriminant PCA (Eigenfaces) Maximizes projected total scatter Fisher’s Linear Discriminant Maximizes ratio of projected between-class to projected within-class scatter 2 1 Point to emphasize is that PCA maximizes the projected total scatter – I.e., it presserves the information in the training set, and is optimal in a least squares sense for reconstruction. But in dropping dimensions, it may smear classes together. FLD trades off two desirable effects for recognition. A. The within class scatter over all classes is minimized – this makes classification using nearest neighbor effective. B. The between class scatter is maximized – this causes classes to be far apart in the feature space. FLD
Computing the Fisher Projection Matrix The wi are orthonormal There are at most c-1 non-zero generalized Eigenvalues, so m <= c-1 Can be computed with eig in Matlab
Fisherfaces Since SW is rank N-c, project training set to subspace spanned by first N-c principal components of the training set. Apply FLD to N-c dimensional subspace yielding c-1 dimensional feature space. Fisher’s Linear Discriminant projects away the within-class variation (lighting, expressions) found in training set. Fisher’s Linear Discriminant preserves the separability of the classes. Here’s how W is actually calculated. The rows of W have the dimensions of images and like the “eigenfaces”, these can be viewed as “fisherfaces.”
PCA vs. FLD
Experimental Results - 1 Variation in Facial Expression, Eyewear, and Lighting Input: 160 images of 16 people Train: 159 images Test: 1 image With glasses Without glasses 3 Lighting conditions 5 expressions
Performance Evaluation Leave-one-out evaluation of PCA and LDA on the Yale Face Database [Belhumer, Hespanha, Kriegman 97] Approach Dim. of the subspace Error rate (close crop) Error rate (full face) Eigenface (PCA) 30 24.4% 19.4% Fisherface (LDA) 15 7.3% 0.6%
Experimental Results - 2
Harvard Face Database 10 individuals 66 images per person Train on 6 images at 15o Test on remaining images 60o
Recognition Results: Lighting Extrapolation Training on near frontal subset (Within 15 degrees of frontal), and testing on more extremes of lighting. As expected, Correlation performs slightly better than Eigenfaces. Removing the first three principal components works better since the capture the lighting variation, but they also contain useful discriminatory information which is lost. Fisherface performs better than other methods.