NormFace:

Slides:



Advertisements
Similar presentations
Rich feature Hierarchies for Accurate object detection and semantic segmentation Ross Girshick, Jeff Donahue, Trevor Darrell, Jitandra Malik (UC Berkeley)
Advertisements

Deep Learning and Neural Nets Spring 2015
Tiled Convolutional Neural Networks TICA Speedup Results on the CIFAR-10 dataset Motivation Pretraining with Topographic ICA References [1] Y. LeCun, L.
Large-Scale Object Recognition with Weak Supervision
DISCRIMINATIVE DECORELATION FOR CLUSTERING AND CLASSIFICATION ECCV 12 Bharath Hariharan, Jitandra Malik, and Deva Ramanan.
1 Jun Wang, 2 Sanjiv Kumar, and 1 Shih-Fu Chang 1 Columbia University, New York, USA 2 Google Research, New York, USA Sequential Projection Learning for.
Smart Traveller with Visual Translator for OCR and Face Recognition LYU0203 FYP.
Methods in Leading Face Verification Algorithms
Hurieh Khalajzadeh Mohammad Mansouri Mohammad Teshnehlab
Deep face recognition Omkar M. Parkhi, Andrea Vedaldi, Andrew Zisserman.
Dr. Z. R. Ghassabi Spring 2015 Deep learning for Human action Recognition 1.
Tips for Training Neural Network
CSC321 Lecture 5 Applying backpropagation to shape recognition Geoffrey Hinton.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 6: Applying backpropagation to shape recognition Geoffrey Hinton.
Deep Residual Learning for Image Recognition
Lecture 3b: CNN: Advanced Layers
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Convolutional Neural Networks at Constrained Time Cost (CVPR 2015) Authors : Kaiming He, Jian Sun (MSR) Presenter : Hyunjun Ju 1.
Deeply-Recursive Convolutional Network for Image Super-Resolution
Recent developments in object detection
Neural networks and support vector machines
Reinforcement Learning
Learning to Compare Image Patches via Convolutional Neural Networks
A Discriminative Feature Learning Approach for Deep Face Recognition
Environment Generation with GANs
Deep Neural Net Scenery Generation
Energy models and Deep Belief Networks
Data Mining, Neural Network and Genetic Programming
Computer Science and Engineering, Seoul National University
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
Classification: Logistic Regression
Regularizing Face Verification Nets To Discrete-Valued Pain Regression
Presenter: Chu-Song Chen
Lecture 25: Backprop and convnets
Recovery from Occlusion in Deep Feature Space for Face Recognition
ECE 6504 Deep Learning for Perception
Structure learning with deep autoencoders
FaceNet A Unified Embedding for Face Recognition and Clustering
State-of-the-art face recognition systems
Deep Face Recognition Omkar M. Parkhi Andrea Vedaldi Andrew Zisserman
Introduction to Neural Networks
Face Recognition with Deep Learning Method
Counting in Dense Crowds using Deep Learning
CS 4501: Introduction to Computer Vision Training Neural Networks II
Object Detection + Deep Learning
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Very Deep Convolutional Networks for Large-Scale Image Recognition
Object Detection Creation from Scratch Samsung R&D Institute Ukraine
ONE shot learning for recognition
Neural Networks Geoff Hulten.
Lecture: Deep Convolutional Neural Networks
Use 3D Convolutional Neural Network to Inspect Solder Ball Defects
Outline Background Motivation Proposed Model Experimental Results
Tuning CNN: Tips & Tricks
Loss functions for one shot learning recognition
Object Tracking: Comparison of
RCNN, Fast-RCNN, Faster-RCNN
Designing Neural Network Architectures Using Reinforcement Learning
Coding neural networks: A gentle Introduction to keras
边缘检测年度进展概述 Ming-Ming Cheng Media Computing Lab, Nankai University
Inception-v4, Inception-ResNet and the Impact of
Electrical and Electronic Engineering
Heterogeneous convolutional neural networks for visual recognition
Meta Learning (Part 2): Gradient Descent as LSTM
Introduction to Neural Networks
CS295: Modern Systems: Application Case Study Neural Network Accelerator Sang-Woo Jun Spring 2019 Many slides adapted from Hyoukjun Kwon‘s Gatech “Designing.
Batch Normalization.
Adrian E. Gonzalez , David Parra Department of Computer Science
An introduction to neural network and machine learning
Do Better ImageNet Models Transfer Better?
Presentation transcript:

NormFace: 𝐿 2 Hypersphere Embedding for Face Verification Feng Wang Feng Wang, Xiang Xiang, Jian Cheng, Alan L. Yuille, NormFace: 𝐿 2 Hypersphere Embedding for Face Verification, ACM MM 2017

Motivation DeepFace: Closing the Gap to Human-Level Performance in Face Verification, Taigman et. al. , CVPR 2014 𝐿 2 normalization is applied only on the testing phase.

Training and Testing Pipeline

Preliminary Experiments Normalization term is critical in testing phase. Cosine Similarity: Note: Pretrained model from https://github.com/ydwen/caffe-face

Why is normalization so effective? A toy experiment on MNIST. Network: 8-layer CNN. Change the feature dimension to be 2. Each point corresponds to one 2D feature from test set.

Angular is a good metric for verification counter-example for Euclidean distance counter-example for inner-product

Why is the distribution in this shape?

Softmax is soft-max argmax operation is scale invariant. Softmax is the soft version of max.

Norm is related with recognizability Figure credit:L2-constrained Softmax Loss for Discriminative Face Verification, Rajeev et al, arXiv 1703.09507

Bias term Don’t use bias term in the inner-product layer before softmax.

Optimize cosine instead of inner-product Normalization layer: Gradient:

It’s not so easy After using cosine to replace inner-product layer, the network cannot converge. An extreme case: Softmax loss gradient(w.r.t. softmax activation): 9999 class 1 class Easy sample’s gradient ≈ hard sample’s gradient Difficult to converge. In practice, the lowest loss is ~8.5 (initial loss: ~9.2).

Formal mathematics The lower bound for 10,000 classes: 8.27 Very close to the real value: 8.5

Solution Add a scale parameter. Similar solution used in Batch Normalization, Weight Normalization, Layer Normalization. The scale is learned as a parameter of CNN.

Another solution Normalization is very common in metric learning. Seems that they don’t have converge problem. Popular metric learning loss functions: - Contrastive Loss - Triplet Loss

Metric Learning has sampling problem When the training sample’s amount is huge, such as 1 Million, we need to train 1M*1M pairs to do metric learning. Usually we need hard mining. Difficult to implement. Difficult to tune the hyperparameters.

Re-formulate metric learning loss Normalized-Softmax: Reformulate metric learning Contrastive Loss Triplet Loss

Effect of 𝑊 𝑖 We call 𝑊 𝑖 as the “agent” of class i.

Results

Results

Drawback All the experiments are fine- tuned based on other models (trained with softmax loss) When training from scratch, the performance is comparable with state-of-the-art works, but cannot beat them. Loss surface for softmax cross-entropy loss.

Some recent progress

Classification and Metric Learning This model is good for classification(>99%), but not good for metric learning.

Large margin softmax Liu W, Wen Y, Yu Z, et al. Large-Margin Softmax Loss for Convolutional Neural Networks, ICML 2016 Liu W, Wen Y, Yu Z, et al. SphereFace: Deep Hypersphere Embedding for Face Recognition CVPR 2017

Classification loss for Metric Learning If the average angle span of the classes is θ, the margin should be larger than θ to ensure Liu W, Wen Y, Yu Z, et al. SphereFace: Deep Hypersphere Embedding for Face Recognition CVPR 2017

Large margin can be achieved by tuning s

Large margin can be achieved by tuning s Softmax on low scale Softmax on high scale

Set smaller scale for positive score positive scale = positive scale * 0.75 LFW 6000 pairs: 99.19%->99.25% LFW BLUFR: 95.83%->96.49% s=15