FaceNet A Unified Embedding for Face Recognition and Clustering

Slides:

Advertisements

Similar presentations

A brief review of non-neural-network approaches to deep learning

Advertisements

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

ImageNet Classification with Deep Convolutional Neural Networks

The Viola/Jones Face Detector (2001)

Lecture 29: Optimization and Neural Nets CS4670/5670: Computer Vision Kavita Bala Slides from Andrej Karpathy and Fei-Fei Li

Smart Traveller with Visual Translator for OCR and Face Recognition LYU0203 FYP.

Artificial Neural Networks

Spatial Pyramid Pooling in Deep Convolutional

05/06/2005CSIS © M. Gibbons On Evaluating Open Biometric Identification Systems Spring 2005 Michael Gibbons School of Computer Science & Information Systems.

Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.

Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,

Deep face recognition Omkar M. Parkhi, Andrea Vedaldi, Andrew Zisserman.

Stochastic Subgradient Approach for Solving Linear Support Vector Machines Jan Rupnik Jozef Stefan Institute.

By: David Gelbendorf, Hila Ben-Moshe Supervisor : Alon Zvirin

Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.

Lecture 3b: CNN: Advanced Layers

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Yann LeCun Other Methods and Applications of Deep Learning Yann Le Cun The Courant Institute of Mathematical Sciences New York University

Big data classification using neural network

Hybrid Deep Learning for Reflectance Confocal Microscopy Skin Images

Reinforcement Learning

Learning to Compare Image Patches via Convolutional Neural Networks

A Discriminative Feature Learning Approach for Deep Face Recognition

LINEAR CLASSIFIERS The Problem: Consider a two class task with ω1, ω2.

Deeply learned face representations are sparse, selective, and robust

Object Detection based on Segment Masks

Compact Bilinear Pooling

Data Mining, Neural Network and Genetic Programming

Chilimbi, et al. (2014) Microsoft Research

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Computer Science and Engineering, Seoul National University

Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek

Performance of Computer Vision

Article Review Todd Hricik.

Robust Lung Nodule Classification using 2

Presenter: Chu-Song Chen

Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)

Inception and Residual Architecture in Deep Convolutional Networks

Classification with Perceptrons Reading:

Intelligent Information System Lab

Basic machine learning background with Python scikit-learn

Machine Learning Basics

Jan Rupnik Jozef Stefan Institute

Deep Belief Networks Psychology 209 February 22, 2013.

State-of-the-art face recognition systems

Deep Face Recognition Omkar M. Parkhi Andrea Vedaldi Andrew Zisserman

Face Recognition with Deep Learning Method

Two-Stream Convolutional Networks for Action Recognition in Videos

Logistic Regression & Parallel SGD

Domingo Mery Department of Computer Science

Very Deep Convolutional Networks for Large-Scale Image Recognition

Object Detection Creation from Scratch Samsung R&D Institute Ukraine

A Proposal Defense On Deep Residual Network For Face Recognition Presented By SAGAR MISHRA MECE

Image to Image Translation using GANs

ONE shot learning for recognition

Deep Learning and Mixed Integer Optimization

Neural Networks Geoff Hulten.

Lecture: Deep Convolutional Neural Networks

Neural Networks ICS 273A UC Irvine Instructor: Max Welling

Lip movement Synthesis from Text

The loss function, the normal equation,

Mathematical Foundations of BME Reza Shadmehr

Machine learning overview

COSC 4335: Part2: Other Classification Techniques

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION

Machine Learning.

CS249: Neural Language Model

Directional Occlusion with Neural Network

Presentation transcript:

FaceNet A Unified Embedding for Face Recognition and Clustering Coral Sharoni Tal Sheffer

Overview Face Recognition Related work Face Net Datasets Applications Network architecture Triplets loss Mini batch Datasets Experiments & Results Conclusion

Face Recognition Face Recognition – A technology capable of identifying or verifying a person from a digital image or a video frame . Why? Face ID (Apple) - a biometric authentication Automatic tags Security …

Chinese man caught by facial recognition at pop concert Chinese police have used facial recognition technology to locate and arrest a man who was among a crowd of 60,000 concert goers. Police said the wanted for "economic crimes", was "shocked" when he was caught. And it is not the first time.. Police in China arrested 25 suspects using a facial recognition system that was set up at the International Beer Festival.

Related Work FaceNet is based on two different deep network architecture: Architecture based on the Zeiler&Fergus model: Consists of multiple interleaved layers of convolutions, non-linear activations, local response normalizations, and max pooling layers architecture is based on the Inception model of Szegedy et al: Use mixed layers that run several different convolutional and pooling layers in parallel and concatenate their responses. Was recently used as the winning approach for ImageNet 2014 . Both architecture have been used to great success in the computer vision community.

FaceNet A unified system for: Face verification - is this the same person ? Face recognition - who is this person ? Face clustering - find common people among these faces ?

Facial recognition technology reunites lost man with his family A mentally ill Chinese man who had been missing for over a year was reunited with his family after being identified by China’s vast facial recognition surveillance network. Hospital officials were unable to identify the man before the assistance of the facial recognition firm

classify every pair correctly. FaceNet FaceNet method is based on learning a Euclidean embedding per image using a deep convolutional network. The network is trained such that the squared L2 distances in the embedding space ,directly correspond to face similarity: Faces of the same person - have small distances. Faces of distinct people - have large distances. Threshold of 1.1 would classify every pair correctly.

FaceNet – face clustering, recognition and verification Once the embedded space has been produced, the aforementioned tasks become trivial: Face verification - thresholding the distance between the two embeddings. Face recognition - becomes a k-NN classiﬁcation problem. Face clustering - can be achieved using simple techniques such as k-means.

FaceNet – Network Architecture The network consists of a batch input layer and a deep CNN followed by L2 normalization, which results in the face embedding. This is followed by the triplet loss during training. EMBEDDING

Triplet loss The Triplet Loss - Minimizes the distance between an anchor and a positive (both of which have the same identity). Maximizes the distance between the anchor and a negative (of a different identity).

Triplet loss 𝑓 𝑥 𝑖 𝑎𝑛𝑐ℎ𝑜𝑟 −𝑓 𝑥 𝑖 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 2 2 +𝛼< 𝑓 𝑥 𝑖 𝑎𝑛𝑐ℎ𝑜𝑟 −𝑓 𝑥 𝑖 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 2 2 For all possible triplets in training set. Assuming that we have N triplets sets. Than, the loss function to minimize become: 𝑖 𝑁 𝑓 𝑥 𝑖 𝑎𝑛𝑐ℎ𝑜𝑟 −𝑓 𝑥 𝑖 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 2 2 − 𝑓 𝑥 𝑖 𝑎𝑛𝑐ℎ𝑜𝑟 −𝑓 𝑥 𝑖 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 2 2 +𝛼

Triplet set Generating all possible triplets would result in many triplets that are easily satisﬁed. These triplets would not contribute to the training and result in slower convergence . In order to ensure fast convergence it is crucial to select triplets that violate the triplet constraint . This means that – given 𝑥 𝑖 𝑎𝑛𝑐ℎ𝑜𝑟 the optimal selection is : An ‘hard positive’ 𝑥 𝑖 𝑝 such that 𝑓 𝑥 𝑖 𝑎𝑛𝑐ℎ𝑜𝑟 −𝑓 𝑥 𝑖 𝑝 2 2 is maximal An ‘hard negative’ 𝑥 𝑖 𝑛 such that 𝑓 𝑥 𝑖 𝑎𝑛𝑐ℎ𝑜𝑟 −𝑓 𝑥 𝑖 𝑛 2 2 is minimal

Triplet set It is inefficient, and sometimes infeasible to compute the minimum and maximum across the whole training set. The proposed solution - generate triplets online. selecting the hard positive/negative exemplars from within a mini-batch.

MINI BATCH The mini-batches are in the order of a few thousand exemplars. For meaningful results we need to ensure that a minimal number of exemplars of any one identity is present in each mini-batch. Around 40 faces are selected per identity per mini-batch in the experiment.

Deep Convolutional Networks The CNN trained using Stochastic Gradient Descent (SGD) with standard backprop . Learning rate – mostly start with 0.05, and descends to finalize the model. The models are trained on a CPU cluster for 1000-2000 hours. The margin 𝛼 is set to 0.2 Two types of architecture are used, which practically differs with number of parameters and FLOPS.

Datasets and Evaluation The experiments datasets: Labeled Faces in the Wild (LFW) - The academic test set for face veriﬁcation. YouTube Faces - new dataset with highly popularity in the face recognition community. Hold-out Test Set - one million images, that has the same distribution as training set. Personal photos - collections with a total of around 12k images, manually veriﬁed to have very clean labels.

Datasets and Evaluation Given a pair of two face images: True accepts - correctly classiﬁed as same at threshold d False accepts - incorrectly classiﬁed as same at threshold d Validation Rate: , False Accept Rate: 𝑉𝐴𝐿 𝑑 = 𝑇𝐴 𝑑 𝑃 𝑠𝑎𝑚𝑒 ,𝐹𝐴𝑅 𝑑 = 𝐹𝐴 𝑑 𝑃 𝑑𝑖𝑓𝑓

Experiments 100M – 200M training face about 8M different identities. Input sizes – range from 96 X 96 to 224 X 224 pixels. A face detector is run on each image. The faces are resized to the input size.

VAL computed on Hold-out Test Set. The Main models Model Name Architecture Input Size Parameters FLOPS VAL ±(𝟏.𝟔 𝒕𝒐 𝟐.𝟗) NN1 Zeiler&Fergus 220 X 220 140M 1.6B 87.9% NN2 Inception 224 X 224 7.5M 89.4% NN3 160 X 160 88.3% NN4 96 X 96 285M 82.0% NNS1 mini Inception 165 X 165 26M 220M 82.4% NNS2 tiny Inception 140 X 116 4.3M 20M 51.9% VAL computed on Hold-out Test Set.

ROC graph for personal photos Experiments & Results ROC graph for personal photos

Experiments & Results Training Data size against VAL Embedding Dimensionality

Computation Accuracy Trade-off Experiments & Results Computation Accuracy Trade-off

Academic data set performance LFW: Achieved record breaking classiﬁcation accuracy of 99.63% ± 0.09 (standard error of the mean) using the NN1 model. Youtube Faces DB: Achieved classiﬁcation accuracy of 95.12% ± 0.39 תמונה זו מאת מחבר לא ידוע ניתן ברשיון במסגרת CC BY-SA תמונה זו מאת מחבר לא ידוע ניתן ברשיון במסגרת CC BY-SA

LWF DB errors

conclusion FaceNet provide a method to directly learn an embedding into an Euclidean space for face veriﬁcation. The method uses a deep convolutional network trained to directly optimize the embedding itself. The system achieves a great success and a new record accuracy.

So, if you don’t want to be arrested in a middle of a concert… Thank you! Tal & Coral