Dictionary Representation of Deep Features for Robust Face Recognition Feng Cen
Outline Recent advances in face recognition (FR) Our research work on occluded FR
Face Recognition: applications Biometrics / access control No action required Scan many people at once Places: airports, banks, safes Data: laptops, medical info Searching mugshot databases Tagging photo albums Detecting fake ID cards Identifying TV shows … A face recognition system is a computer application capable of identifying or verifying a person from a digital image or a video frame from a video source. One of the ways to do this is by comparing selected facial features from the image and a face database. It is typically used in security systems and can be compared to other biometrics such as fingerprint or eye iris recognition systems.[1] Recently, it has also become popular as a commercial identification and marketing tool. Identifying TV shows One of a number of apps aiming to be ‘Shazam for TV’, TVtak is an app that identifies the TV show you’re watching, simply by pointing your iPhone’s camera at the screen. Within one second, it will work out exactly the show or ad that you are watching. From there, users will be able to share details of the show they’re viewing via Twitter or Facebook, with a comment attached. The Israeli startup behind it plans to allow advertisers to use it as a ‘call to action’, too. You could be watching an ad for a new snack; taking a shot of the screen with TVtak could then take you to a voucher entitling you to a free sample. Still in beat and only available in Israel, TVtak’s rollout further could be slowed by the way it uses server-side monitoring of the output of multiple TV stations to allow for fast matching. Read more about it here. Gaming Image and face recognition is bringing a whole new dimension to gaming. Microsoft’s Kinect’s advanced motion sensing capabilities have given the Xbox 360 a whole new lease of life and opened up gaming to new audiences by completely doing away with hardware controllers. Meanwhile, startup Viewdle recently launched a game that uses face recognition to decide whether you’re a human or vampire, setting the stage for a battle between the two species. We’re sure to see many more examples face recognition in games in the future too – with all kinds of interesting possibilities. Humans: Built-in" face detection / recognition ability detection & recognition in different areas of the brain can be fooled by look-alikes Computers: Algorithms must be built from scratch Virtually perfect memory Can work 24/7 without degrading performance Can apply stricter matching criteria
Face Recognition Pipeline Detection Alignment Recognition
Two Types of Comparison in Face Recognition 1.Verification- The system compare the given individual with who that individual says they are. 1:1 2.Identification-The system compares a given individual to all the other individuals in the database and gives a ranked list of matches. 1:N
Conventional Image-based FR
Labeled Faces in the Wild (LFW) 13,233 face images 5,749 people
Deep Learning and Face Recognition CVPR 2014: DeepFace, DeepID Now: Deep learning achieves 99.80% face verification accuracy on Labeled Faces in the Wild (LFW), higher than human performance
Convolutional Neural Networks (CNN) – First proposed by Fukushima in 1980 – Improved by LeCun, Bottou, Bengio and Haffner in 1998 CNNs are basically layers of convolutions followed by subsampling and dense layers. Intuitively speaking, convolutions and subsampling layers works as feature extraction layers while a dense layer classifies which category current input belongs to using extracted features.
Popular CNN Architectures AlexNet (2012) VGG (2014) 3x3 convolution
Popular CNN architectures GoogLeNet (2014) 22 layers ResNet (2015) 152 layers
CNN-based FR DeepFace Alignment: 2D, 3D Input: RGB image 152x152 Output feature size: 4096 Parameters: ~ 120 million Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. Deepface: Closing the gap to human-level performance in face verification. In CVPR, 2014
CNN-based FR DeepID For each patch: Alignment: Input: 39x31 RGB or grayscale Output feature size: 160 Alignment: 2D Patch Y. Sun, X. Wang, and X. Tang. Deep learning face representation from predicting 10,000 classes. In CVPR, 2014.
CNN-based FR VGG Face (2015) FaceNet (Google 2015) image Conv-64 maxpool fc-4096 Softmax Conv-128 Conv-256 Conv-512 fc-2622 CNN-based FR VGG Face (2015) FaceNet (Google 2015)
OpenFace https://cmusatyalab.github.io/openface/
OpenFace
What makes deep learning successful in computer vision?
Comparison of CNN-based FR Method #Training images Acc. on LFW DeepFace 4M 97.35% VGG Face 2.6M 98.95% FaceNet 200M 99.65%
Face Datasets Dataset #Subjects #Images Availability LFW 5,749 13,233 Public CACD 2,000 163,446 CASIA-WebFace 10,575 494,414 MegaFace 672,057 4,753,520 MS-Celeb-1M 100k 10M public
Is Face Recognition Solved? Performance of Face++ 99.50% on LFW Not good enough on a Chinese identification task: 10-5 FPR, 66% TPR “Results show that 90% failed cases can be solved by human. There still exists a big gap between machine recognition and human level.”
Is Face Recognition Solved? How well do current face recognition algorithms scale? Is the size of training data Important? How does age affect recognition performance? How does pose and corruption affect recognition performance? … (Kemelmacher-Shlizerman et al 2016) (MS-Celeb-1M 2016 challenge)
Outline Recent advances in face recognition (FR) Our research work on occluded FR
Motivation Deep convolutional neural networks: Outperform human vision for face verification on LFW database Fail to handle contiguous occlusion Sparse representation classifier Classical method for face images with occlusion Image space or linear feature space Difficult to deal with pose variations, facial expressions, and illumination changes etc. Training dictionary:
Observation
Assumption
Algorithm Training Testing
Algorithm Residual:
Algorithm Dimension reduction with PCA Normalization of the dictionary atom Normalization of the residual with the l2 -norm of gallery coding coefficients
Experiments: AR Database Parameters Auxiliary dictionary
Experiments: AR Database Auxiliary dictionary generation
Experiments: AR Database Performance
Experiments: AR Database A single training sample per person
Experiments: FERET database Training: 150 subjects, non-occlusion ‘ba’, ‘bj’, ‘bk’ Testing: 150 subjects, block occlusion Auxiliary dictionary: other 44 subjects
Time comsumption Less than 0.4s per image – Intel i7 CPU Dictionary coding: <2ms CNNs : <0.4s without GPU acceleration
Thank you! Q&A