NormFace:

Slides:

Advertisements

Similar presentations

Rich feature Hierarchies for Accurate object detection and semantic segmentation Ross Girshick, Jeff Donahue, Trevor Darrell, Jitandra Malik (UC Berkeley)

Advertisements

Deep Learning and Neural Nets Spring 2015

Tiled Convolutional Neural Networks TICA Speedup Results on the CIFAR-10 dataset Motivation Pretraining with Topographic ICA References [1] Y. LeCun, L.

Large-Scale Object Recognition with Weak Supervision

DISCRIMINATIVE DECORELATION FOR CLUSTERING AND CLASSIFICATION ECCV 12 Bharath Hariharan, Jitandra Malik, and Deva Ramanan.

1 Jun Wang, 2 Sanjiv Kumar, and 1 Shih-Fu Chang 1 Columbia University, New York, USA 2 Google Research, New York, USA Sequential Projection Learning for.

Smart Traveller with Visual Translator for OCR and Face Recognition LYU0203 FYP.

Methods in Leading Face Verification Algorithms

Hurieh Khalajzadeh Mohammad Mansouri Mohammad Teshnehlab

Deep face recognition Omkar M. Parkhi, Andrea Vedaldi, Andrew Zisserman.

Dr. Z. R. Ghassabi Spring 2015 Deep learning for Human action Recognition 1.

Tips for Training Neural Network

CSC321 Lecture 5 Applying backpropagation to shape recognition Geoffrey Hinton.

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 6: Applying backpropagation to shape recognition Geoffrey Hinton.

Deep Residual Learning for Image Recognition

Lecture 3b: CNN: Advanced Layers

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Convolutional Neural Networks at Constrained Time Cost (CVPR 2015) Authors : Kaiming He, Jian Sun (MSR) Presenter : Hyunjun Ju 1.

Deeply-Recursive Convolutional Network for Image Super-Resolution

Recent developments in object detection

Neural networks and support vector machines

Reinforcement Learning

Learning to Compare Image Patches via Convolutional Neural Networks

A Discriminative Feature Learning Approach for Deep Face Recognition

Environment Generation with GANs

Deep Neural Net Scenery Generation

Energy models and Deep Belief Networks

Data Mining, Neural Network and Genetic Programming

Computer Science and Engineering, Seoul National University

Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek

Classification: Logistic Regression

Regularizing Face Verification Nets To Discrete-Valued Pain Regression

Presenter: Chu-Song Chen

Lecture 25: Backprop and convnets

Recovery from Occlusion in Deep Feature Space for Face Recognition

ECE 6504 Deep Learning for Perception

Structure learning with deep autoencoders

FaceNet A Unified Embedding for Face Recognition and Clustering

State-of-the-art face recognition systems

Deep Face Recognition Omkar M. Parkhi Andrea Vedaldi Andrew Zisserman

Introduction to Neural Networks

Face Recognition with Deep Learning Method

Counting in Dense Crowds using Deep Learning

CS 4501: Introduction to Computer Vision Training Neural Networks II

Object Detection + Deep Learning

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Very Deep Convolutional Networks for Large-Scale Image Recognition

Object Detection Creation from Scratch Samsung R&D Institute Ukraine

ONE shot learning for recognition

Neural Networks Geoff Hulten.

Lecture: Deep Convolutional Neural Networks

Use 3D Convolutional Neural Network to Inspect Solder Ball Defects

Outline Background Motivation Proposed Model Experimental Results

Tuning CNN: Tips & Tricks

Loss functions for one shot learning recognition

Object Tracking: Comparison of

RCNN, Fast-RCNN, Faster-RCNN

Designing Neural Network Architectures Using Reinforcement Learning

Coding neural networks: A gentle Introduction to keras

边缘检测年度进展概述 Ming-Ming Cheng Media Computing Lab, Nankai University

Inception-v4, Inception-ResNet and the Impact of

Electrical and Electronic Engineering

Heterogeneous convolutional neural networks for visual recognition

Meta Learning (Part 2): Gradient Descent as LSTM

Introduction to Neural Networks

CS295: Modern Systems: Application Case Study Neural Network Accelerator Sang-Woo Jun Spring 2019 Many slides adapted from Hyoukjun Kwon‘s Gatech “Designing.

Batch Normalization.

Adrian E. Gonzalez , David Parra Department of Computer Science

An introduction to neural network and machine learning

Do Better ImageNet Models Transfer Better?

Presentation transcript:

NormFace: 𝐿 2 Hypersphere Embedding for Face Verification Feng Wang Feng Wang, Xiang Xiang, Jian Cheng, Alan L. Yuille, NormFace: 𝐿 2 Hypersphere Embedding for Face Verification, ACM MM 2017

Motivation DeepFace: Closing the Gap to Human-Level Performance in Face Verification, Taigman et. al. , CVPR 2014 𝐿 2 normalization is applied only on the testing phase.

Training and Testing Pipeline

Preliminary Experiments Normalization term is critical in testing phase. Cosine Similarity: Note: Pretrained model from https://github.com/ydwen/caffe-face

Why is normalization so effective? A toy experiment on MNIST. Network: 8-layer CNN. Change the feature dimension to be 2. Each point corresponds to one 2D feature from test set.

Angular is a good metric for verification counter-example for Euclidean distance counter-example for inner-product

Why is the distribution in this shape?

Softmax is soft-max argmax operation is scale invariant. Softmax is the soft version of max.

Norm is related with recognizability Figure credit:L2-constrained Softmax Loss for Discriminative Face Verification, Rajeev et al, arXiv 1703.09507

Bias term Don’t use bias term in the inner-product layer before softmax.

Optimize cosine instead of inner-product Normalization layer: Gradient:

It’s not so easy After using cosine to replace inner-product layer, the network cannot converge. An extreme case: Softmax loss gradient(w.r.t. softmax activation): 9999 class 1 class Easy sample’s gradient ≈ hard sample’s gradient Difficult to converge. In practice, the lowest loss is ~8.5 (initial loss: ~9.2).

Formal mathematics The lower bound for 10,000 classes: 8.27 Very close to the real value: 8.5

Solution Add a scale parameter. Similar solution used in Batch Normalization, Weight Normalization, Layer Normalization. The scale is learned as a parameter of CNN.

Another solution Normalization is very common in metric learning. Seems that they don’t have converge problem. Popular metric learning loss functions: - Contrastive Loss - Triplet Loss

Metric Learning has sampling problem When the training sample’s amount is huge, such as 1 Million, we need to train 1M*1M pairs to do metric learning. Usually we need hard mining. Difficult to implement. Difficult to tune the hyperparameters.

Re-formulate metric learning loss Normalized-Softmax: Reformulate metric learning Contrastive Loss Triplet Loss

Effect of 𝑊 𝑖 We call 𝑊 𝑖 as the “agent” of class i.

Results

Results

Drawback All the experiments are fine- tuned based on other models (trained with softmax loss) When training from scratch, the performance is comparable with state-of-the-art works, but cannot beat them. Loss surface for softmax cross-entropy loss.

Some recent progress

Classification and Metric Learning This model is good for classification(>99%), but not good for metric learning.

Large margin softmax Liu W, Wen Y, Yu Z, et al. Large-Margin Softmax Loss for Convolutional Neural Networks, ICML 2016 Liu W, Wen Y, Yu Z, et al. SphereFace: Deep Hypersphere Embedding for Face Recognition CVPR 2017

Classification loss for Metric Learning If the average angle span of the classes is θ, the margin should be larger than θ to ensure Liu W, Wen Y, Yu Z, et al. SphereFace: Deep Hypersphere Embedding for Face Recognition CVPR 2017

Large margin can be achieved by tuning s

Large margin can be achieved by tuning s Softmax on low scale Softmax on high scale

Set smaller scale for positive score positive scale = positive scale * 0.75 LFW 6000 pairs: 99.19%->99.25% LFW BLUFR: 95.83%->96.49% s=15