Presentation is loading. Please wait.

Presentation is loading. Please wait.

Challenges in Learning the Appearance of Faces for Automated Image Analysis: part I alessandro verri DISI – università di genova

Similar presentations


Presentation on theme: "Challenges in Learning the Appearance of Faces for Automated Image Analysis: part I alessandro verri DISI – università di genova"— Presentation transcript:

1 Challenges in Learning the Appearance of Faces for Automated Image Analysis: part I alessandro verri DISI – università di genova verri@disi.unige.it

2 actually, i’m gonna talk about: ► brief introduction (the whole thing) ► what some people do for detecting faces ► what we are doing

3 the problem(s) ► geometry (position, rotation, pose, scale,…) ► facial features (beards, glasses,…) ► facial expressions ► occlusions ► imaging conditions (illumination, resolution, color contrast, camera quality,…)

4 where we are: face detection we address face detection as a brute force classification problem (sometimes sophisticated, but still brute force) the model is encoded in the training samples but not explicitly defined

5 face recognition and the like explicit image models are derived from examples separating identity and imaging parameters

6 motivation we want to explore who should learn from whom… we come back to this at the end!

7 some approaches ► knowledge-based (Yang & Huang 94) ► feature invariant (Leung et al, 95; Yow & Cipolla, 97) ► template matching (Lanitis et al, 95) ► appearance based eigenfaces (Turk & Pentland, 91)eigenfaces (Turk & Pentland, 91) SVM (Osuna et al, 97)SVM (Osuna et al, 97) naive bayes (Schneiderman & Kanade, 98)naive bayes (Schneiderman & Kanade, 98) AdaBoost (Viola and Jones, 01)AdaBoost (Viola and Jones, 01)

8 SVM: global detector (Poggio’s group) ► some preprocessing essential (equalization and normalization) ► polynomial SVM applied to pixels ► training set: about 2,500 face images (58x58 pixels)about 2,500 face images (58x58 pixels) about 10,000 non face images (extended to 13,000)about 10,000 non face images (extended to 13,000)

9 SVM: component-based detector (Poggio’s group) ► some preprocessing essential (equalization and normalization) ► two level system (always linear SVMs): component classifiers (14: eyes, nose,…)component classifiers (14: eyes, nose,…) geometrical configuration classifier based on maximal outputsgeometrical configuration classifier based on maximal outputs

10 global vs component-based ► component-based performs better (more robust to pose variation and/or occlusion) ► global a little faster (though they are both pretty slow, too many patches have to be stored!)

11 naive bayes (Kanade’s group) ► multiple detectors for different views (size and orientation) ► for each view: statistical modeling using predefined attribute histograms (17), about 2,000 face examples independency is required…independency is required… very good for out-of-plane rotation but involved procedure for building histograms (bootstrap, AdaBoost…)

12 AdaBoost (Viola & Jones) ► wavelet like features (computed efficiently) ► feature selected through AdaBoost (each weak classifier depends on a single feature) ► detection is obtained by training a cascade of classifiers ► very fast and effective on frontal faces

13 summing up ► SVM: components based on prior knowledge, simple, very good results but rather slow (optimization approaches…) ► naive bayes: robust against rotation, prior knowledge on feature selection, rather hand crafted statistical analysis, many models need to be learned (each with many examples) ► AdaBoost: data-driven feature selection, fast, frontal face only

14 what we do ► we assume we are given a fair number of positive examples only (no negatives) ► we want to explore how far one can get by combining fully data driven techniques based on 1D data ► we look at old-fashioned hypothesis testing (false negative rate under full control)

15 one possible way to object detection  building models can be expensive (domain dependent)  learning from positive examples only is more difficult, but…  classical hypothesis testing controls the false negative rate naturally

16 testing hypotheses ► HT with only one observation ► testing for independence with rank test (seems appropriate for comparing different features)

17 CBCL database faces (19x19pixels) training: 2429 test: 472 nonfaces (19x19pixels) training: 4548 test: 23573

18 training by hypothesis testing I.we first compute a large number of features (for the moment about 16,000) on the training set images II.a subset of good features (about 1,000) is then selected III.of these, a subset of independent features is considered (ending up with about 100) IV.multiple statistical tests are then constructed using the training set (one test for each feature)

19 image measurements ► grey value at fixed locations (19 x 19) ► tomographies (19 vertical + 19 horizontal + 37 along the 45deg diagonals) ► ranklets (5184 vertical, 5184 horizontal, 5184 diagonal) ► a total of about 16,000 features

20 ranklets (Smeraldi, 2002)

21 vertical ranklets (variance-to-natural support ratio)

22 a salient and a non-salient feature we discard all features for which the ratio falls below the threshold  0.15 (this leaves us with about 2000 features)

23 independent feature selection 1. we run independence tests on all possible pairs of salient features of the same category 2. we build a complete graph for each category with as many nodes as features. An edge between two features is deleted if for the two features the Spearman’s test rejects the independence hypothesis with probability 

24 independent feature selection 3. we then search the graph for maximally complete subgraphs (cliques) which we regard as sets of independent features for   0.5 we are left with 44 vertical, 64 horizontal, 35 diagonal ranklets, and 38 tomographies

25 testing for all image locations, all applicable scales and a fixed number  I.compute the values of the good, independent features II.perform multiple statistical tests at a certain confidence level III.a positive example is located if  tests are passed

26 multiple tests  we run multiple tests living with the fact that we won’t detect a certain fraction of the objects we want to find  luckily we are in a position to decide the fraction beforehand  we gain power because each test looks at a new feature

27 some results (franceschi et al, 2004) 472 positive vs 23,573 negatives tomographies + ranklets randomly chosen overlapping features

28 once you have detected a face… ask Thomas


Download ppt "Challenges in Learning the Appearance of Faces for Automated Image Analysis: part I alessandro verri DISI – università di genova"

Similar presentations


Ads by Google