Presentation is loading. Please wait.

Presentation is loading. Please wait.

NormFace:

Similar presentations


Presentation on theme: "NormFace: "— Presentation transcript:

1 NormFace: 𝐿 2 Hypersphere Embedding for Face Verification
Feng Wang Feng Wang, Xiang Xiang, Jian Cheng, Alan L. Yuille, NormFace: 𝐿 2 Hypersphere Embedding for Face Verification, ACM MM 2017

2 Motivation DeepFace: Closing the Gap to Human-Level Performance in Face Verification, Taigman et. al. , CVPR 2014 𝐿 2 normalization is applied only on the testing phase.

3 Training and Testing Pipeline

4 Preliminary Experiments
Normalization term is critical in testing phase. Cosine Similarity: Note: Pretrained model from

5 Why is normalization so effective?
A toy experiment on MNIST. Network: 8-layer CNN. Change the feature dimension to be 2. Each point corresponds to one 2D feature from test set.

6 Angular is a good metric for verification
counter-example for Euclidean distance counter-example for inner-product

7 Why is the distribution in this shape?

8 Softmax is soft-max argmax operation is scale invariant.
Softmax is the soft version of max.

9 Norm is related with recognizability
Figure credit:L2-constrained Softmax Loss for Discriminative Face Verification, Rajeev et al, arXiv

10 Bias term Don’t use bias term in the inner-product layer before softmax.

11 Optimize cosine instead of inner-product
Normalization layer: Gradient:

12 It’s not so easy After using cosine to replace inner-product layer, the network cannot converge. An extreme case: Softmax loss gradient(w.r.t. softmax activation): 9999 class 1 class Easy sample’s gradient ≈ hard sample’s gradient Difficult to converge. In practice, the lowest loss is ~8.5 (initial loss: ~9.2).

13 Formal mathematics The lower bound for 10,000 classes: 8.27
Very close to the real value: 8.5

14 Solution Add a scale parameter.
Similar solution used in Batch Normalization, Weight Normalization, Layer Normalization. The scale is learned as a parameter of CNN.

15 Another solution Normalization is very common in metric learning.
Seems that they don’t have converge problem. Popular metric learning loss functions: - Contrastive Loss - Triplet Loss

16 Metric Learning has sampling problem
When the training sample’s amount is huge, such as 1 Million, we need to train 1M*1M pairs to do metric learning. Usually we need hard mining. Difficult to implement. Difficult to tune the hyperparameters.

17 Re-formulate metric learning loss
Normalized-Softmax: Reformulate metric learning Contrastive Loss Triplet Loss

18 Effect of 𝑊 𝑖 We call 𝑊 𝑖 as the “agent” of class i.

19 Results

20 Results

21 Drawback All the experiments are fine- tuned based on other models (trained with softmax loss) When training from scratch, the performance is comparable with state-of-the-art works, but cannot beat them. Loss surface for softmax cross-entropy loss.

22 Some recent progress

23 Classification and Metric Learning
This model is good for classification(>99%), but not good for metric learning.

24 Large margin softmax Liu W, Wen Y, Yu Z, et al. Large-Margin Softmax Loss for Convolutional Neural Networks, ICML 2016 Liu W, Wen Y, Yu Z, et al. SphereFace: Deep Hypersphere Embedding for Face Recognition CVPR 2017

25 Classification loss for Metric Learning
If the average angle span of the classes is θ, the margin should be larger than θ to ensure Liu W, Wen Y, Yu Z, et al. SphereFace: Deep Hypersphere Embedding for Face Recognition CVPR 2017

26 Large margin can be achieved by tuning s

27 Large margin can be achieved by tuning s
Softmax on low scale Softmax on high scale

28 Set smaller scale for positive score
positive scale = positive scale * 0.75 LFW 6000 pairs: 99.19%->99.25% LFW BLUFR: 95.83%->96.49% s=15


Download ppt "NormFace: "

Similar presentations


Ads by Google