High-Level Vision Face Detection.

High-Level Vision Face Detection

Face detection: Viola & Jones
Multiple view-based classifiers based on simple features that best discriminate faces vs. non-faces Most discriminating features learned from thousands of samples of face and non-face image windows Attentional mechanism: cascade of increasingly discriminating classifiers improves performance - problem of face detection, approach developed by Viola & Jones, construct classifier that can take small window of image and determine whether or not it’s face, imagine strategy where scan image looking at each small patch, ask if face? sample result – found all faces and soccer ball that looks a little like face - to design classifier, ask what features of image within window most effective at distinguishing faces from non-faces? features best for finding faces with roughly frontal view may differ from those best for detecting profile face - designed multiple classifiers specialized for frontal or profile views (keep point back of mind) - started out with thousands of simple features that could be used for classification, learned ones most effective by training classifier using lots examples of image patches labeled as faces or non-faces - another key aspect of approach is attentional mechanism that greatly reduced amount of computation needed, suppose couple hundred features sufficient to construct classifier with good performance (each window, 200 features, decision), no point wasting time computing all 200 features for every subwindow of image, e.g. don’t bother computing for regions with roughly uniform brightness, nice if could use few features as quick filter to remove lot of regions from consideration, clearly not faces, and then focus computation resources on regions more promising as possible faces, using more discriminating features to determine whether really face

Viola & Jones use simple features
Use simple rectangle features: Σ I(x,y) in gray area – Σ I(x,y) in white area within 24 x 24 image sub-windows  initially consider 160,000 potential features per sub-window!  features computed very efficiently Which features best distinguish face vs. non-face? - what are features used? simple rectangular features, squares represent sub-window of image, 24 x 24 pixels, each feature consists of 2-4 rectangular sub-regions at particular locations within window, to calculate value of feature, add all intensities within gray areas and subtract sum of all intensities within white areas, then compare difference in brightness of regions to threshold, e.g. is difference larger than certain amount, this might provide evidence about whether or not window is face - think of first feature as measuring change in brightness between left and right sub-regions – where large value? where large difference between adjacent vertical strips of image (left/right face boundaries, side hairline) - defined number different geometric configurations, positive/negative regions, different sizes, orientations, provided initial set of 160,000 potential features this sort, lot but very simple, computed very efficiently (imagine lot of redundancy in regions adding intensities, computed intermediate representation of sums of particular image regions that avoids redundant computations) - which features best distinguish whether face or not? this is learned, two most informative shown here, seeing where lie on typical face, why good at distinguishing face? face typically has eye region darker than area below, bridge of nose between eyes typically brighter than eyes, these templates capture these relationships Learn most discriminating features from thousands of samples of face and non-face image windows

Learning the best features
weak classifier using one feature: x = image window f = feature p = +1 or -1  = threshold … n training samples, equal weights, known classes  (x1,w1,1) (xn,wn,0) find next best weak classifier normalize weights - what’s learning process - iterative process, progressively find next best feature until system can do good enough job of classifying image patches as faces vs. non-faces, first define weak classifier as one that just uses single feature, h is classifier, value 1 (face) or 0 (non-face), function of 4 things, window x, feature f, threshold theta use to decide face or not, extra p +/- 1, because may be that more likely face if greater than threshold, or less than threshold, depends on specific feature, so extra p +/- 1 to allow the decision to be based on being larger/smaller than threshold - to learn good features, training data, n training samples (large), sub-windows with known class (1 face, 0 non-face), weight associated with samples, start out equal, later change, may be certain examples are more challenging to classify (soccer ball) and want to give them more weight in training process, iterative process, weights normalized to sum to 1, find next weak classifier (combination f, theta, p) does best job correctly identifying class of each image window in training set, difference what feature indicates (0/1) and correct class (0,1), summed over all windows with each multiplied by current weight associated with sample, expression measures incorrect classifications, want to minimize discrepancy between true class and what classifier gives us, based on this feature - not perfect, some samples wrong, increase weight of those samples for next time around, searching for next best feature – strategy AdaBoost - lots of weak classifiers based on single features, each does better or worse job of classifying, define final classifier that integrates evidence from all features, weighing contribution of each feature according to how good it was distinguishing face/non-face during training, sum all evidence and check if larger than threshold - if yes, then face - ~ 200 features give good results for this formulation of classifier final classifier AdaBoost ~ 200 features yields good results for “monolithic” classifier use classification errors to update weights

“Attentional cascade” of increasingly discriminating classifiers
Early classifiers use a few highly discriminating features, low threshold 1st classifier uses two features, removes 50% non-face windows later classifiers distinguish harder examples - as said, don’t want to compute values of all 200 features everywhere in image, not efficient, so used idea of attentional cascade, start all sub-windows, use small number of features to reject many sub-windows clearly non-faces, only preserve more promising ones for further analysis with additional features - these two features remove half non-face windows while preserving ~100% real faces, second layer uses about 10 more features and rejects about 80% of remaining non-faces - given computing features for fewer windows, in long run, allows more feature to be used, in practice, cascade 38 classifiers total about 6000 features yields high level performance [user intervention - each layer, minimum detection rate e.g. detect > 99%, max false positive rate tolerate e.g. up to 10% false positives, learning process determines number of features needed to accomplish, keep adding layers until overall performance good enough]  Increases efficiency  Allows use of many more features  Cascade of 38 classifiers, using ~6000 features

Training with normalized faces
many more non-face patches faces are normalized for scale & rotation small variation in pose - small sample training set, 5000 faces, many more non-face patches (9500 images with no faces, took many random image patches from each) - normalized to have common rotation, scale (by hand), small variation in pose over samples (some variation facial expression, background) [ROC: correct detection rate (e.g. 0.5 to 1.0) vs. false positive rate (0.0 to 0.01), vary parameters of algorithm gives different performance, want to be close to upper left corner]

Viola & Jones results With additional diagonal features, classifiers were created to handle image rotations and profile views - original classifiers tolerate some rotation, about 45 deg in depth, 15 deg image plane, added more diagonal features to create classifiers that could tolerate more rotation in image, specialized classifiers to recognize profile views (specialized work better than one general-purpose) - detail shuffled under rug, faces appear different sizes in image, classifier always based on 24x24 pixel window, so sample image at different scales (e.g. 400 x 400 image with 24x24 windows, sample to give 300 x 300 image with 24 x 24 windows (find larger face in original)), find face at multiple scales (12) and go with whatever assessment has strongest evidence - one of greatest challenges is occlusion, extreme illuminations, accessories e.g. sunglasses - simple image features, learns ones most effective by training classifier with large dataset, strategies for computational efficiency - many approaches that try to find the best features for classifying image regions into different object classes, differ in way define features, way go about training process - one things makes Labeled Faces in Wild not so wild, faces detected by Viola-Jones face detector

Faces everywhere...

High-Level Vision Face Detection.

Similar presentations

Presentation on theme: "High-Level Vision Face Detection."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

High-Level Vision Face Detection.

Similar presentations

Presentation on theme: "High-Level Vision Face Detection."— Presentation transcript:

Similar presentations

About project

Feedback