FaceNet A Unified Embedding for Face Recognition and Clustering

Slides:



Advertisements
Similar presentations
A brief review of non-neural-network approaches to deep learning
Advertisements

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
ImageNet Classification with Deep Convolutional Neural Networks
The Viola/Jones Face Detector (2001)
Lecture 29: Optimization and Neural Nets CS4670/5670: Computer Vision Kavita Bala Slides from Andrej Karpathy and Fei-Fei Li
Smart Traveller with Visual Translator for OCR and Face Recognition LYU0203 FYP.
Artificial Neural Networks
Spatial Pyramid Pooling in Deep Convolutional
05/06/2005CSIS © M. Gibbons On Evaluating Open Biometric Identification Systems Spring 2005 Michael Gibbons School of Computer Science & Information Systems.
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,
Deep face recognition Omkar M. Parkhi, Andrea Vedaldi, Andrew Zisserman.
Stochastic Subgradient Approach for Solving Linear Support Vector Machines Jan Rupnik Jozef Stefan Institute.
By: David Gelbendorf, Hila Ben-Moshe Supervisor : Alon Zvirin
Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.
Lecture 3b: CNN: Advanced Layers
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Yann LeCun Other Methods and Applications of Deep Learning Yann Le Cun The Courant Institute of Mathematical Sciences New York University
Big data classification using neural network
Hybrid Deep Learning for Reflectance Confocal Microscopy Skin Images
Reinforcement Learning
Learning to Compare Image Patches via Convolutional Neural Networks
A Discriminative Feature Learning Approach for Deep Face Recognition
LINEAR CLASSIFIERS The Problem: Consider a two class task with ω1, ω2.
Deeply learned face representations are sparse, selective, and robust
Object Detection based on Segment Masks
Compact Bilinear Pooling
Data Mining, Neural Network and Genetic Programming
Chilimbi, et al. (2014) Microsoft Research
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Computer Science and Engineering, Seoul National University
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
Performance of Computer Vision
Article Review Todd Hricik.
Robust Lung Nodule Classification using 2
Presenter: Chu-Song Chen
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Inception and Residual Architecture in Deep Convolutional Networks
Classification with Perceptrons Reading:
Intelligent Information System Lab
Basic machine learning background with Python scikit-learn
Machine Learning Basics
Jan Rupnik Jozef Stefan Institute
Deep Belief Networks Psychology 209 February 22, 2013.
State-of-the-art face recognition systems
Deep Face Recognition Omkar M. Parkhi Andrea Vedaldi Andrew Zisserman
Face Recognition with Deep Learning Method
NormFace:
Two-Stream Convolutional Networks for Action Recognition in Videos
Logistic Regression & Parallel SGD
Domingo Mery Department of Computer Science
Very Deep Convolutional Networks for Large-Scale Image Recognition
Object Detection Creation from Scratch Samsung R&D Institute Ukraine
A Proposal Defense On Deep Residual Network For Face Recognition Presented By SAGAR MISHRA MECE
Image to Image Translation using GANs
ONE shot learning for recognition
Deep Learning and Mixed Integer Optimization
Neural Networks Geoff Hulten.
Lecture: Deep Convolutional Neural Networks
Neural Networks ICS 273A UC Irvine Instructor: Max Welling
Lip movement Synthesis from Text
The loss function, the normal equation,
Mathematical Foundations of BME Reza Shadmehr
Machine learning overview
COSC 4335: Part2: Other Classification Techniques
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
Machine Learning.
CS249: Neural Language Model
Directional Occlusion with Neural Network
Presentation transcript:

FaceNet A Unified Embedding for Face Recognition and Clustering Coral Sharoni Tal Sheffer

Overview Face Recognition Related work Face Net Datasets Applications Network architecture Triplets loss Mini batch Datasets Experiments & Results Conclusion

Face Recognition Face Recognition – A technology capable of identifying or verifying a person from a digital image or a video frame . Why?  Face ID (Apple) - a biometric authentication Automatic tags Security …

Chinese man caught by facial recognition at pop concert Chinese police have used facial recognition technology to locate and arrest a man who was among a crowd of 60,000 concert goers. Police said the wanted for "economic crimes", was "shocked" when he was caught. And it is not the first time.. Police in China arrested 25 suspects using a facial recognition system that was set up at the International Beer Festival.

Related Work FaceNet is based on two different deep network architecture: Architecture based on the Zeiler&Fergus model: Consists of multiple interleaved layers of convolutions, non-linear activations, local response normalizations, and max pooling layers architecture is based on the Inception model of Szegedy et al: Use mixed layers that run several different convolutional and pooling layers in parallel and concatenate their responses. Was recently used as the winning approach for ImageNet 2014 . Both architecture have been used to great success in the computer vision community.

FaceNet A unified system for: Face verification - is this the same person ? Face recognition - who is this person ? Face clustering - find common people among these faces ?

Facial recognition technology reunites lost man with his family A mentally ill Chinese man who had been missing for over a year was reunited with his family after being identified by China’s vast facial recognition surveillance network. Hospital officials were unable to identify the man before the assistance of the facial recognition firm

classify every pair correctly. FaceNet FaceNet method is based on learning a Euclidean embedding per image using a deep convolutional network. The network is trained such that the squared L2 distances in the embedding space ,directly correspond to face similarity: Faces of the same person - have small distances. Faces of distinct people - have large distances. Threshold of 1.1 would classify every pair correctly.

FaceNet – face clustering, recognition and verification Once the embedded space has been produced, the aforementioned tasks become trivial: Face verification - thresholding the distance between the two embeddings. Face recognition - becomes a k-NN classification problem. Face clustering - can be achieved using simple techniques such as k-means.

FaceNet – Network Architecture The network consists of a batch input layer and a deep CNN followed by L2 normalization, which results in the face embedding. This is followed by the triplet loss during training. EMBEDDING

Triplet loss The Triplet Loss - Minimizes the distance between an anchor and a positive (both of which have the same identity). Maximizes the distance between the anchor and a negative (of a different identity).

Triplet loss 𝑓 𝑥 𝑖 𝑎𝑛𝑐ℎ𝑜𝑟 −𝑓 𝑥 𝑖 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 2 2 +𝛼< 𝑓 𝑥 𝑖 𝑎𝑛𝑐ℎ𝑜𝑟 −𝑓 𝑥 𝑖 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 2 2 For all possible triplets in training set. Assuming that we have N triplets sets. Than, the loss function to minimize become: 𝑖 𝑁 𝑓 𝑥 𝑖 𝑎𝑛𝑐ℎ𝑜𝑟 −𝑓 𝑥 𝑖 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 2 2 − 𝑓 𝑥 𝑖 𝑎𝑛𝑐ℎ𝑜𝑟 −𝑓 𝑥 𝑖 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 2 2 +𝛼

Triplet set Generating all possible triplets would result in many triplets that are easily satisfied. These triplets would not contribute to the training and result in slower convergence . In order to ensure fast convergence it is crucial to select triplets that violate the triplet constraint . This means that – given 𝑥 𝑖 𝑎𝑛𝑐ℎ𝑜𝑟 the optimal selection is : An ‘hard positive’ 𝑥 𝑖 𝑝 such that 𝑓 𝑥 𝑖 𝑎𝑛𝑐ℎ𝑜𝑟 −𝑓 𝑥 𝑖 𝑝 2 2 is maximal An ‘hard negative’ 𝑥 𝑖 𝑛 such that 𝑓 𝑥 𝑖 𝑎𝑛𝑐ℎ𝑜𝑟 −𝑓 𝑥 𝑖 𝑛 2 2 is minimal

Triplet set It is inefficient, and sometimes infeasible to compute the minimum and maximum across the whole training set. The proposed solution - generate triplets online. selecting the hard positive/negative exemplars from within a mini-batch.

MINI BATCH The mini-batches are in the order of a few thousand exemplars. For meaningful results we need to ensure that a minimal number of exemplars of any one identity is present in each mini-batch. Around 40 faces are selected per identity per mini-batch in the experiment.

Deep Convolutional Networks The CNN trained using Stochastic Gradient Descent (SGD) with standard backprop . Learning rate – mostly start with 0.05, and descends to finalize the model. The models are trained on a CPU cluster for 1000-2000 hours. The margin 𝛼 is set to 0.2 Two types of architecture are used, which practically differs with number of parameters and FLOPS.

Datasets and Evaluation The experiments datasets: Labeled Faces in the Wild (LFW) - The academic test set for face verification. YouTube Faces - new dataset with highly popularity in the face recognition community. Hold-out Test Set - one million images, that has the same distribution as training set. Personal photos - collections with a total of around 12k images, manually verified to have very clean labels.

Datasets and Evaluation Given a pair of two face images: True accepts - correctly classified as same at threshold d False accepts - incorrectly classified as same at threshold d Validation Rate: , False Accept Rate: 𝑉𝐴𝐿 𝑑 = 𝑇𝐴 𝑑 𝑃 𝑠𝑎𝑚𝑒 ,𝐹𝐴𝑅 𝑑 = 𝐹𝐴 𝑑 𝑃 𝑑𝑖𝑓𝑓

Experiments 100M – 200M training face about 8M different identities. Input sizes – range from 96 X 96 to 224 X 224 pixels. A face detector is run on each image. The faces are resized to the input size.

VAL computed on Hold-out Test Set. The Main models Model Name Architecture Input Size Parameters FLOPS VAL ±(𝟏.𝟔 𝒕𝒐 𝟐.𝟗) NN1 Zeiler&Fergus 220 X 220 140M 1.6B 87.9% NN2 Inception 224 X 224 7.5M 89.4% NN3 160 X 160 88.3% NN4 96 X 96 285M 82.0% NNS1 mini Inception 165 X 165 26M 220M 82.4% NNS2 tiny Inception 140 X 116 4.3M 20M 51.9% VAL computed on Hold-out Test Set.

ROC graph for personal photos Experiments & Results ROC graph for personal photos

Experiments & Results Training Data size against VAL Embedding Dimensionality

Computation Accuracy Trade-off Experiments & Results Computation Accuracy Trade-off

Academic data set performance LFW: Achieved record breaking classification accuracy of 99.63% ± 0.09 (standard error of the mean) using the NN1 model. Youtube Faces DB: Achieved classification accuracy of 95.12% ± 0.39 תמונה זו מאת מחבר לא ידוע ניתן ברשיון במסגרת CC BY-SA תמונה זו מאת מחבר לא ידוע ניתן ברשיון במסגרת CC BY-SA

LWF DB errors

conclusion FaceNet provide a method to directly learn an embedding into an Euclidean space for face verification. The method uses a deep convolutional network trained to directly optimize the embedding itself. The system achieves a great success and a new record accuracy.

So, if you don’t want to be arrested in a middle of a concert… Thank you! Tal & Coral