Download presentation
Presentation is loading. Please wait.
1
NormFace: 𝐿 2 Hypersphere Embedding for Face Verification
Feng Wang Feng Wang, Xiang Xiang, Jian Cheng, Alan L. Yuille, NormFace: 𝐿 2 Hypersphere Embedding for Face Verification, ACM MM 2017
2
Motivation DeepFace: Closing the Gap to Human-Level Performance in Face Verification, Taigman et. al. , CVPR 2014 𝐿 2 normalization is applied only on the testing phase.
3
Training and Testing Pipeline
4
Preliminary Experiments
Normalization term is critical in testing phase. Cosine Similarity: Note: Pretrained model from
5
Why is normalization so effective?
A toy experiment on MNIST. Network: 8-layer CNN. Change the feature dimension to be 2. Each point corresponds to one 2D feature from test set.
6
Angular is a good metric for verification
counter-example for Euclidean distance counter-example for inner-product
7
Why is the distribution in this shape?
8
Softmax is soft-max argmax operation is scale invariant.
Softmax is the soft version of max.
9
Norm is related with recognizability
Figure credit:L2-constrained Softmax Loss for Discriminative Face Verification, Rajeev et al, arXiv
10
Bias term Don’t use bias term in the inner-product layer before softmax.
11
Optimize cosine instead of inner-product
Normalization layer: Gradient:
12
It’s not so easy After using cosine to replace inner-product layer, the network cannot converge. An extreme case: Softmax loss gradient(w.r.t. softmax activation): 9999 class 1 class Easy sample’s gradient ≈ hard sample’s gradient Difficult to converge. In practice, the lowest loss is ~8.5 (initial loss: ~9.2).
13
Formal mathematics The lower bound for 10,000 classes: 8.27
Very close to the real value: 8.5
14
Solution Add a scale parameter.
Similar solution used in Batch Normalization, Weight Normalization, Layer Normalization. The scale is learned as a parameter of CNN.
15
Another solution Normalization is very common in metric learning.
Seems that they don’t have converge problem. Popular metric learning loss functions: - Contrastive Loss - Triplet Loss
16
Metric Learning has sampling problem
When the training sample’s amount is huge, such as 1 Million, we need to train 1M*1M pairs to do metric learning. Usually we need hard mining. Difficult to implement. Difficult to tune the hyperparameters.
17
Re-formulate metric learning loss
Normalized-Softmax: Reformulate metric learning Contrastive Loss Triplet Loss
18
Effect of 𝑊 𝑖 We call 𝑊 𝑖 as the “agent” of class i.
19
Results
20
Results
21
Drawback All the experiments are fine- tuned based on other models (trained with softmax loss) When training from scratch, the performance is comparable with state-of-the-art works, but cannot beat them. Loss surface for softmax cross-entropy loss.
22
Some recent progress
23
Classification and Metric Learning
This model is good for classification(>99%), but not good for metric learning.
24
Large margin softmax Liu W, Wen Y, Yu Z, et al. Large-Margin Softmax Loss for Convolutional Neural Networks, ICML 2016 Liu W, Wen Y, Yu Z, et al. SphereFace: Deep Hypersphere Embedding for Face Recognition CVPR 2017
25
Classification loss for Metric Learning
If the average angle span of the classes is θ, the margin should be larger than θ to ensure Liu W, Wen Y, Yu Z, et al. SphereFace: Deep Hypersphere Embedding for Face Recognition CVPR 2017
26
Large margin can be achieved by tuning s
27
Large margin can be achieved by tuning s
Softmax on low scale Softmax on high scale
28
Set smaller scale for positive score
positive scale = positive scale * 0.75 LFW 6000 pairs: 99.19%->99.25% LFW BLUFR: 95.83%->96.49% s=15
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.