Download presentation
Presentation is loading. Please wait.
1
End-to-End Facial Alignment and Recognition
2
Introduction Increase in face recognition accuracy
3
End-to-End Facial Alignment and Recognition
4
Why do we need STN Ideal Images Real Images
5
SPN predicts the coefficients of an affine transformation
STN Architecture SPN predicts the coefficients of an affine transformation
6
Trying different localization architectures
7
SPN predicts the coefficients of an affine transformation
STN Architecture SPN predicts the coefficients of an affine transformation
8
Parameterized Sampling grid
The grid generator’s job is to output a parametrised sampling grid, which is a set of points where the input map should be sampled to produce the desired transformed output. The column vector xin, yin consists in a set of indices that tell us where we should sample our input to obtain the desired transformed output. Compute the pixel value in output image ,take the value in the input image at the right place
9
Spatial Transformer Network
10
Identity transformation
11
SPN predicts the coefficients of an affine transformation
STN Architecture SPN predicts the coefficients of an affine transformation
12
Bilinear Interpolation
13
Differential Gradient
14
STN Result
15
STN Result
18
Recognition ResNet with 9 residual blocks
24 convolution layers in total 512 dimensional output feature vector
19
Results
21
End-to-End Spatial Transform Face Detection and Recognition
22
Architecture Region feature transformation Align the detected faces
23
Detection Similar to Faster R-CNN VGG-16 (pre-trained on Image-Net)
Region Proposal Network ROI Pooling Spatial Transformer Network
24
Faster RCNN
25
Region Proposal
26
ROI Pooling
27
Architecture Region feature transformation Align the detected faces
28
SPN predicts the coefficients of an affine transformation
STN Architecture SPN predicts the coefficients of an affine transformation
29
Recognition Another STN is added before the recognition part ResNet:
ResNet with 9 residual blocks 24 convolution layers in total 512 dimensional output feature vector
30
Results
31
References Jaderberg, Max, Karen Simonyan, and Andrew Zisserman. "Spatial transformer networks." Advances in Neural Information Processing Systems Chi, Liying, Hongxin Zhang, and Mingxiu Chen. "End-To-End Face Detection and Recognition." arXiv preprint arXiv: (2017).
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.