Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dictionary Representation of Deep Features for Robust Face Recognition

Similar presentations


Presentation on theme: "Dictionary Representation of Deep Features for Robust Face Recognition"— Presentation transcript:

1 Dictionary Representation of Deep Features for Robust Face Recognition
Feng Cen

2 Outline Recent advances in face recognition (FR)
Our research work on occluded FR

3 Face Recognition: applications
Biometrics / access control No action required Scan many people at once Places: airports, banks, safes Data: laptops, medical info Searching mugshot databases Tagging photo albums Detecting fake ID cards Identifying TV shows A face recognition system is a computer application capable of identifying or verifying a person from a digital image or a video frame from a video source. One of the ways to do this is by comparing selected facial features from the image and a face database. It is typically used in security systems and can be compared to other biometrics such as fingerprint or eye iris recognition systems.[1] Recently, it has also become popular as a commercial identification and marketing tool. Identifying TV shows One of a number of apps aiming to be ‘Shazam for TV’, TVtak is an app that identifies the TV show you’re watching, simply by pointing your iPhone’s camera at the screen. Within one second, it will work out exactly the show or ad that you are watching. From there, users will be able to share details of the show they’re viewing via Twitter or Facebook, with a comment attached. The Israeli startup behind it plans to allow advertisers to use it as a ‘call to action’, too. You could be watching an ad for a new snack; taking a shot of the screen with TVtak could then take you to a voucher entitling you to a free sample. Still in beat and only available in Israel, TVtak’s rollout further could be slowed by the way it uses server-side monitoring of the output of multiple TV stations to allow for fast matching. Read more about it here. Gaming Image and face recognition is bringing a whole new dimension to gaming. Microsoft’s Kinect’s advanced motion sensing capabilities have given the Xbox 360 a whole new lease of life and opened up gaming to new audiences by completely doing away with hardware controllers. Meanwhile, startup Viewdle recently launched a game that uses face recognition to decide whether you’re a human or vampire, setting the stage for a battle between the two species. We’re sure to see many more examples face recognition in games in the future too – with all kinds of interesting possibilities. Humans: Built-in" face detection / recognition ability detection & recognition in different areas of the brain can be fooled by look-alikes Computers: Algorithms must be built from scratch Virtually perfect memory Can work 24/7 without degrading performance Can apply stricter matching criteria

4 Face Recognition Pipeline
Detection Alignment Recognition

5 Two Types of Comparison in Face Recognition
1.Verification- The system compare the given individual with who that individual says they are. 1:1 2.Identification-The system compares a given individual to all the other individuals in the database and gives a ranked list of matches. 1:N

6 Conventional Image-based FR

7 Labeled Faces in the Wild (LFW)
13,233 face images 5,749 people

8

9 Deep Learning and Face Recognition
CVPR 2014: DeepFace, DeepID Now: Deep learning achieves 99.80% face verification accuracy on Labeled Faces in the Wild (LFW), higher than human performance

10 Convolutional Neural Networks (CNN)
– First proposed by Fukushima in 1980 – Improved by LeCun, Bottou, Bengio and Haffner in 1998 CNNs are basically layers of convolutions followed by subsampling and dense layers. Intuitively speaking, convolutions and subsampling layers works as feature extraction layers while a dense layer classifies which category current input belongs to using extracted features.

11 Popular CNN Architectures
AlexNet (2012) VGG (2014) 3x3 convolution

12 Popular CNN architectures
GoogLeNet (2014) 22 layers ResNet (2015) 152 layers

13 CNN-based FR DeepFace Alignment: 2D, 3D Input: RGB image 152x152
Output feature size: 4096 Parameters: ~ 120 million Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. Deepface: Closing the gap to human-level performance in face verification. In CVPR, 2014

14 CNN-based FR DeepID For each patch: Alignment:
Input: 39x31 RGB or grayscale Output feature size: 160 Alignment: 2D Patch Y. Sun, X. Wang, and X. Tang. Deep learning face representation from predicting 10,000 classes. In CVPR, 2014.

15 CNN-based FR VGG Face (2015) FaceNet (Google 2015) image Conv-64
maxpool fc-4096 Softmax Conv-128 Conv-256 Conv-512 fc-2622 CNN-based FR VGG Face (2015) FaceNet (Google 2015)

16 OpenFace

17 OpenFace

18 What makes deep learning successful in computer vision?

19 Comparison of CNN-based FR
Method #Training images Acc. on LFW DeepFace 4M 97.35% VGG Face 2.6M 98.95% FaceNet 200M 99.65%

20 Face Datasets Dataset #Subjects #Images Availability LFW 5,749 13,233
Public CACD 2,000 163,446 CASIA-WebFace 10,575 494,414 MegaFace 672,057 4,753,520 MS-Celeb-1M 100k 10M public

21 Is Face Recognition Solved?
Performance of Face++ 99.50% on LFW Not good enough on a Chinese identification task: 10-5 FPR, 66% TPR “Results show that 90% failed cases can be solved by human. There still exists a big gap between machine recognition and human level.”

22 Is Face Recognition Solved?
How well do current face recognition algorithms scale? Is the size of training data Important? How does age affect recognition performance? How does pose and corruption affect recognition performance? (Kemelmacher-Shlizerman et al 2016) (MS-Celeb-1M 2016 challenge)

23 Outline Recent advances in face recognition (FR)
Our research work on occluded FR

24 Motivation Deep convolutional neural networks:
Outperform human vision for face verification on LFW database Fail to handle contiguous occlusion Sparse representation classifier Classical method for face images with occlusion Image space or linear feature space Difficult to deal with pose variations, facial expressions, and illumination changes etc. Training dictionary:

25 Observation

26

27 Assumption

28 Algorithm Training Testing

29 Algorithm Residual:

30 Algorithm Dimension reduction with PCA
Normalization of the dictionary atom Normalization of the residual with the l2 -norm of gallery coding coefficients

31 Experiments: AR Database
Parameters Auxiliary dictionary

32 Experiments: AR Database
Auxiliary dictionary generation

33 Experiments: AR Database
Performance

34 Experiments: AR Database
A single training sample per person

35 Experiments: FERET database
Training: 150 subjects, non-occlusion ‘ba’, ‘bj’, ‘bk’ Testing: 150 subjects, block occlusion Auxiliary dictionary: other 44 subjects

36 Time comsumption Less than 0.4s per image – Intel i7 CPU
Dictionary coding: <2ms CNNs : <0.4s without GPU acceleration

37 Thank you! Q&A


Download ppt "Dictionary Representation of Deep Features for Robust Face Recognition"

Similar presentations


Ads by Google