Deep Face Recognition Omkar M. Parkhi Andrea Vedaldi Andrew Zisserman Visual Geometry Group Department of Engineering Science University of Oxford Present by Shih-Chuan Weng
Introduction In the present day, CNN have taken the computer vision community by storm, significantly improving the state of the art in many applications. One of the most important ingredients for the success of such methods is the availability of large quantities of training data. However, large scale public datasets have been lacking and, largely due to this factor, most of the recent advances in the community remain restricted to Internet giants such as Facebook and Google etc.
This picture shows that the datasets from two grand companies compare to others.
Contribution This paper made two contributions Designed a procedure that is able to assemble a large scale dataset, with small label noise. The second contribution was to show that a deep CNN, without any embellishments but with appropriate training, can achieve results comparable to the state of the art training.
Dataset Collection Stage 1. Bootstrapping and filtering a list of candidate identity names Obtain a list of names of candidate identities for obtaining faces from IMDB. (find 5000 name list, half male and female. 200 images per person using Google Image Search) The candidate list is then filtered to remove identities for which there are not enough distinct images, and to eliminate any overlap with standard benchmark datasets.( finally, there will be 2622 name obtained) Stage 2. Collecting more images for each identity.(queried in both Google and Bing Image Search, 2000 images per person)
Stage 3. Improving purity with an automatic filter This stage focus on removing any erroneous faces in each set automatically using a classifier Reference from https://www.robots.ox.ac.uk/~vgg/publications/2013/Simonyan13/simonyan13.pdf
Stage 4. Near duplicate removal Exact duplicate and Near duplicate, Images differing only in color balance, or with text superimposed, will be removed Reference from https://hal.inria.fr/inria-00633013/document/
This picture is AlexNet structure Dataset Collection This picture is AlexNet structure reference from https://www.nvidia.cn/content/tesla/pdf/machine-learning/imagenet-classification- with-deep-convolutional-nn.pdf Stage 5. Final manual filtering. To increase the purity (precision) of the data using human annotations. Although it said manual, it used AlexNet to calculate the score and find the top 375 high scores.
Dataset statistics after each stage of processing Type A and M specify whether the processing stage was carried out automatically or manually
Learning a face classifier
Architecture and training It’s like a N=2622 classification problem. The Architecture is vggNet, which contains classifier layer (W,b), known as parameters, in the end. The classifier error is calculated by softmax log-loss. Reference from https://arxiv.org/pdf/1409.1556.pdf While the above architecture can be used for face identity verification using the Euclidean distance to compare the images, in this paper they used triplet loss to improve the scores.
Triplet loss Triplet loss : A new loss function which contains a set of examples { anchor, positive, negative}, which are three images with loss value. The purpose is to make the same identity closer and distinct identity far away. In this paper, it aims at refining score vectors(xt = Wφ(lt) + b similar to linear regress Y = w*X+b) that perform well in the final application, get more accurate than the previous architecture .
Architecture and training They use lˆ2 -normalized, think of it as a surface, and affine projection. W’ which needs to be found and to be trained to minimize the empirical triplet loss. Optimize the left formula can get the W’. After getting the W’, calculate the right side formula to find novel softmax results . Reference from https://arxiv.org/pdf/1503.03832.pdf
The result of above experiment The result of above experiment. We can see that the last one with embedding learning has the highest accuracy. The fifth is the work of only vggNet network.
Results
LFW unrestricted setting we achieve comparable results to the state of the art whilst requiring less data (than DeepFace and FaceNet) and using a simpler network architecture (than DeepID-2,3). Note, DeepID3 results are for the test set with label errors corrected – which has not been done by any other method. The right side is ROC curves.
Thank You