Face Recognition: An Introduction
Face
Face Recognition Face is the most common biometric used by humans Applications range from static, mug-shot verification to a dynamic, uncontrolled face identification in a cluttered background Challenges: automatically locate the face recognize the face from a general view point under different illumination conditions, facial expressions, and aging effects O O Face recognition can be categorized into appearance-based, geometry-based, and hybrid approaches.
Authentication vs Identification Face Authentication/Verification (1:1 matching) Face Identification/Recognition (1:N matching)
Applications Access Control www.viisage.com www.visionics.com
Applications Face Scan at Airports Video Surveillance (On-line or off-line) Face Scan at Airports www.facesnap.de
Why is Face Recognition Hard? Many faces of Madonna
Face Recognition Difficulties Identify similar faces (inter-class similarity) Accommodate intra-class variability due to: head pose illumination conditions expressions facial accessories aging effects O O Face recognition can be categorized into appearance-based, geometry-based, and hybrid approaches. Cartoon faces
news.bbc.co.uk/hi/english/in_depth/americas/2000/us_elections Inter-class Similarity Different persons may have very similar appearance www.marykateandashley.com news.bbc.co.uk/hi/english/in_depth/americas/2000/us_elections O O Face recognition can be categorized into appearance-based, geometry-based, and hybrid approaches. Twins Father and son
Intra-class Variability Faces with intra-subject variations in pose, illumination, expression, accessories, color, occlusions, and brightness O O Face recognition can be categorized into appearance-based, geometry-based, and hybrid approaches.
Sketch of a Pattern Recognition Architecture Feature Extraction Classification Image (window) Object Identity Feature Vector
Example: Face Detection Scan window over image Classify window as either: Face Non-face Classifier Window Face Non-face
Detection Test Sets
Profile views Schneiderman’s Test set
Face Detection: Experimental Results Test sets: two CMU benchmark data sets Test set 1: 125 images with 483 faces Test set 2: 20 images with 136 faces [See also work by Viola & Jones, Rehg, more recent by Schneiderman]
Example: Finding skin Non-parametric Representation of CCD Skin has a very small range of (intensity independent) colors, and little texture Compute an intensity-independent color measure, check if color is in this range, check if there is little texture (median filter) See this as a classifier - we can set up the tests by hand, or learn them. get class conditional densities (histograms), priors from data (counting) Classifier is
Figure from “Statistical color models with application to skin detection,” M.J. Jones and J. Rehg, Proc. Computer Vision and Pattern Recognition, 1999 copyright 1999, IEEE
Face Detection
Face Detection Algorithm Lighting Compensation Color Space Transformation Skin Color Detection Input Image Variance-based Segmentation Connected Component & Grouping Face Localization Eye/ Mouth Detection O O Face recognition can be categorized into appearance-based, geometry-based, and hybrid approaches. Face Boundary Detection Verifying/ Weighting Eyes-Mouth Triangles Facial Feature Detection Output Image
Canon Powershot
Face Recognition: 2-D and 3-D Time (video) 2-D Face Database 2-D Recognition Data Recognition Comparison 3-D 3-D Play this, it’s animated – Green denotes topics covered subsequent slides Prior knowledge of face class
Taxonomy of Face Recognition Algorithms Pose-dependency Pose-dependent Pose-invariant Viewer-centered Images Object-centered Models Face representation Matching features -- Gordon et al., 1995 Appearance-based (Holistic) Hybrid Feature-based (Analytic) -- Lengagne et al., 1996 -- Atick et al., 1996 -- Yan et al., 1996 PCA, LDA LFA -- Zhao et al., 2000 EGBM -- Zhang et al., 2000
Image as a Feature Vector x 1 2 3 Consider an n-pixel image to be a point in an n-dimensional space, x Rn. Each pixel value is a coordinate of x.
Nearest Neighbor Classifier { Rj } are set of training images. x 1 2 3 R1 R2 I
Comments Sometimes called “Template Matching” Variations on distance function (e.g. L1, robust distances) Multiple templates per class- perhaps many training images per class. Expensive to compute k distances, especially when each image is big (N dimensional). May not generalize well to unseen examples of class. Some solutions: Bayesian classification Dimensionality reduction
Eigenfaces (Turk, Pentland, 91) -1 Use Principle Component Analysis (PCA) to reduce the dimsionality
How do you construct Eigenspace? [ ] [ ] [ x1 x2 x3 x4 x5 ] W Construct data matrix by stacking vectorized images and then apply Singular Value Decomposition (SVD)
Eigenfaces Modeling Recognition Given a collection of n labeled training images, Compute mean image and covariance matrix. Compute k Eigenvectors (note that these are images) of covariance matrix corresponding to k largest Eigenvalues. Project the training images to the k-dimensional Eigenspace. Recognition Given a test image, project to Eigenspace. Perform classification to the projected training images.
Eigenfaces: Training Images [ Turk, Pentland 01
Eigenfaces Mean Image Basis Images
Difficulties with PCA Projection may suppress important detail smallest variance directions may not be unimportant Method does not take discriminative task into account typically, we wish to compute features that allow good discrimination not the same as largest variance
22. 10 - Two classes indicated by 22.10 - Two classes indicated by * and o; the first principal component captures all the variance, but completely destroys any ability to discriminate. The second is close to what’s required.
Fisherfaces: Class specific linear projection An n-pixel image xRn can be projected to a low-dimensional feature space yRm by y = Wx where W is an n by m matrix. Recognition is performed using nearest neighbor in Rm. How do we choose a good W?
PCA & Fisher’s Linear Discriminant Between-class scatter Within-class scatter Total scatter Where c is the number of classes i is the mean of class i | i | is number of samples of i.. 1 2 1 2 Use this slide to explain the three types of scatter
PCA & Fisher’s Linear Discriminant PCA (Eigenfaces) Maximizes projected total scatter Fisher’s Linear Discriminant Maximizes ratio of projected between-class to projected within-class scatter 2 1 Point to emphasize is that PCA maximizes the projected total scatter – I.e., it presserves the information in the training set, and is optimal in a least squares sense for reconstruction. But in dropping dimensions, it may smear classes together. FLD trades off two desirable effects for recognition. A. The within class scatter over all classes is minimized – this makes classification using nearest neighbor effective. B. The between class scatter is maximized – this causes classes to be far apart in the feature space. FLD
Four Fisherfaces From ORL Database
Eigenfaces and Fisherfaces