Using Support Vector Machines to Enhance the Performance of Bayesian Face Recognition IEEE Transaction on Information Forensics and Security Zhifeng Li, Xiaoou Tang Dept. of Information Engineering Chinese University of Hong Kong
Outline Introduction Bayesian SVM SVM Bayesian Analysis Bayesian SVM Two-Stage Clustering-Based Classification Hierarchical Agglomerative Clustering (HAC) Two-Stage SVM Adaptive clustering Bayesian SVM Adaptive Clustering Multilevel Subspace SVM Algorithm Experiments Conclusion
Introduction Face recognition has been one of the most challenging computer vision research topics Existing face recognition techniques: Eigenface Fisherface Bayesian algorithm Support vector machines (SVM) improve the classification performance of the PCA and LDA subspace features Find one hyperplane to separate the two classes of vectors
Introduction SVM vs. face recognition Binary vs. multiclass Have to reduce the multiclass classification to a combination of SVMs Several strategies to solve the problem One-versus-all Pairwise Large number of SVMs have to be trained
Introduction Bayesian method Convert the multiclass face recognition problem into a two-class classification problem Suitable for using the SVM directly Only one hyperplane may not be enough
SVM Var 1 Var 2 Margin Width IDEA 1: Select the separating hyperplane that maximizes the margin!
SVM Var 1 Var 2 The width of the margin is: So, the problem is:
SVM Var 1 Var 2 There is a scale and unit for data so that k=1. Then problem becomes:
SVM If class 1 corresponds to 1 and class 2 corresponds to -1, we can rewrite as So the problem becomes: or
Bayesian Face Recognition Intrapersonal Extrapersonal Equate ML similarity between any two images PCA-based density estimation
PCA-based Density Estimation Perform PCA and factorize into (orthogonal) Gaussians subspaces: Solve for minimal KL divergence residual for the orthogonal subspace:
Bayesian SVM Intrapersonal variation set Extrapersonal variation set Project and whiten all the image difference vectors in the intrapersonal subspace and use these two vectors to train the SVM to generate the decision function For testing, compute the face difference vector, and then project and white it: Classification decision is made by:
Two-Stage Clustering-Based Classification Problems about above methods One-versus all approach : too many SVMs Direct Bayesian SVM : too many samples for one SVM Try to find a solution to balance the two extremes When training an SVM The most important region is around the decision hyperplane Partition the gallery data into clusters Methods Use Bayesian SVM to estimate the similarity matrix Use HAC to group the similar face clusters
Hierarchical Agglomerative Clustering Basic process of the HAC: 1) Initialize a set of clusters. 2) Find the nearest pair of clusters that have the largest similarity measure, and then merge them into a new cluster. Estimate the similarity measure between the new cluster and all the other clusters. 3) Repeat step 2 until the stopping rule is satisfied. Different strategies used in each steps lead to different designs of the HAC algorithm
Two-Stage SVM Similarity measure between the two images: where Perform one-versus-all Bayesian SVM within each cluster During testing Compute the whitened face difference vector Find the class that gives the smallest perform one-versus-all SVM on the cluster
Adaptive Clustering Bayesian SVM Method 1) Use Bayesian algorithm to find a cluster that most similar to the test face 2) Use one-versus-all algorithm to reclassify the face in this cluster Train the one-versus-all Bayesian SVM in the training stage, and then use it to reclassify only the faces in the new cluster
Adaptive Clustering Multilevel Subspace SVM Detailed algorithms in the first stage: 1) Divide the original face vector into K feature slices. Project each feature slice to its PCA subspace computed from the training set of the slice and adjust the PCA dimension to reduce the most noise 2) Compute the intrapersonal subspace using the within-class scatter matrix in the reduced PCA subspace and adjust the dimension of intrapersonal subspace to reduce the intrapersonal variation 3) For the L individuals in the gallery, compute their training data class centers. Project all of the class centers onto the intrapersonal subspace, and then normalize the projections by intrapersonal eigenvalues to compute the whitened feature vectors 4) Apply PCA on the whitened feature vector centers to compute the final discriminant feature vector 5) Combine the extracted discriminant feature vectors from each slice into a new feature vector 6) Apply PCA on the new feature vector to remove redundant information in multiple slice. The features with large eigenvalues are selected to form the final feature vector for recognition The second stage is similar to that of the adaptive clustering Bayesian SVM
Experiments
Conclusion The direct Bayesian-based SVM is too simple, that tries to separate two complex subspaces by just one hyperplane In order to improve the recognition performance, further develop the one-versus-all, HAC-based, and adaptive clustering Bayesian-based SVM The experiments results clearly demonstrate the superiority of the new algorithm over traditional subspace methods
Eigenfaces Projects all the training faces onto a universal eigenspace to “encode” variations (“modes”) via principal components analysis (PCA) Uses inverse-distance as a similarity measure for matching & recognition
Fisherfaces Eigenfaces attempt to maximize the scatter of the training images in face space Fisherfaces attempt to maximize the between class scatter, while minimizing the within class scatter In other words, moves images of the same face closer together, while moving images of difference faces further apart Fisher Linear Discriminant