End-to-End Facial Alignment and Recognition
Introduction Increase in face recognition accuracy
End-to-End Facial Alignment and Recognition
Why do we need STN Ideal Images Real Images
SPN predicts the coefficients of an affine transformation STN Architecture SPN predicts the coefficients of an affine transformation
Trying different localization architectures
SPN predicts the coefficients of an affine transformation STN Architecture SPN predicts the coefficients of an affine transformation
Parameterized Sampling grid The grid generator’s job is to output a parametrised sampling grid, which is a set of points where the input map should be sampled to produce the desired transformed output. The column vector xin, yin consists in a set of indices that tell us where we should sample our input to obtain the desired transformed output. Compute the pixel value in output image ,take the value in the input image at the right place
Spatial Transformer Network
Identity transformation
SPN predicts the coefficients of an affine transformation STN Architecture SPN predicts the coefficients of an affine transformation
Bilinear Interpolation
Differential Gradient
STN Result
STN Result
Recognition ResNet with 9 residual blocks 24 convolution layers in total 512 dimensional output feature vector
Results
End-to-End Spatial Transform Face Detection and Recognition
Architecture Region feature transformation Align the detected faces
Detection Similar to Faster R-CNN VGG-16 (pre-trained on Image-Net) Region Proposal Network ROI Pooling Spatial Transformer Network
Faster RCNN
Region Proposal
ROI Pooling
Architecture Region feature transformation Align the detected faces
SPN predicts the coefficients of an affine transformation STN Architecture SPN predicts the coefficients of an affine transformation
Recognition Another STN is added before the recognition part ResNet: ResNet with 9 residual blocks 24 convolution layers in total 512 dimensional output feature vector
Results
References Jaderberg, Max, Karen Simonyan, and Andrew Zisserman. "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. Chi, Liying, Hongxin Zhang, and Mingxiu Chen. "End-To-End Face Detection and Recognition." arXiv preprint arXiv:1703.10818 (2017).