Download presentation
Presentation is loading. Please wait.
Published byGiles Stokes Modified over 6 years ago
1
Face Detection - before can recognize face, need to detect in image – for forward looking full view of face (GHC), not so challenging (Face++ missed one, no false detections) - in conditions like profile view or even rotated a bit in depth or in image, or when partially occluded from view, much greater challenge for automated systems (Face++ found all in other Wells photo) – Face++ platform created by Chinese company Megvii, notable places used – Alibaba (China’s e-commerce giant/Amazon) uses technology to grant access employees into work places, used in train & subway stations in parts of China (surveillance, checking faces against gov’t-issued IDs for entry) – under hood, based on training deep neural networks, but proprietary so ~info technical details - explore method developed Viola-Jones long ago, still common approach used today - MATLAB computer vision toolbox has implementation Viola-Jones method – app more near/dear heart – snapchat (thank you Lizao finding video, interested how filters work) - from start, fun, beyond face detection, relevant reading on face recognition (construct average face) (~4:16)
2
Face detection: Viola & Jones
Multiple view-based classifiers based on simple features that best discriminate faces vs. non-faces Most discriminating features learned from thousands of samples of face and non-face image windows Attentional mechanism: cascade of increasingly discriminating classifiers improves performance - construct classifier takes small window image, determines whether/not face, imagine strategy scan image looking each small patch, ask face? sample pub result – found all faces + soccer ball looks little like face - to design classifier, ask what features of image in window most effective distinguishing faces/non-faces? features best for find faces roughly frontal view may differ from best for detect profile face - design multiple classifiers specialized frontal or profile views (keep back of mind) - started out thousands simple features could be used for classification, learned ones most effective by training classifier using lots examples image patches labeled faces or non-faces - another key aspect approach - attentional mechanism, greatly reduces amount computation needed, suppose couple hundred features sufficient construct classifier good performance (each window, 200 features, decision), no point waste time computing all 200 features every subwindow image, don’t bother computing for regions roughly uniform brightness, nice if could use few features as quick filter, remove lot regions from consideration clearly not faces, then focus computation resources regions more promising possible faces, use more discriminating features determine if really face
3
Viola & Jones use simple features
Use simple rectangle features: Σ I(x,y) in gray area – Σ I(x,y) in white area within 24 x 24 image sub-windows Initially consider 160,000 potential features per sub-window! features computed very efficiently Which features best distinguish face vs. non-face? - what features used? simple rectangular features, squares represent sub-window image 24x24, each feature 2-4 rectangular sub-regions particular locations in window - to calculate value feature, add intensities gray areas, subtract sum intensities white areas, compare difference brightness regions to threshold, difference larger certain amount? can provide evidence whether/not window face - think 1st feature measure change brightness left/right sub-regions – where large value? where large difference adjacent vertical strips image (left/right face boundaries, side hairline) - defined number different geometric configs, pos/neg regions, different sizes, orientations, provides initial set 160,000 potential features this sort, lot but very simple, compute very efficiently (imagine lot redundancy in regions add intensities, compute intermediate representation sums particular image regions, avoids redundant computations) - which features best distinguish whether face/not? this is learned, two most informative here, see where lie on typical face, why good at distinguishing face? face typically has eye region darker than area below, bridge of nose between eyes typically brighter than eyes, templates capture relationships (video) Learn most discriminating features from thousands of samples of face and non-face image windows
4
Learning the best features
weak classifier using one feature: x = image window f = feature p = +1 or -1 = threshold … n training samples, equal weights, known classes (x1,w1,1) (xn,wn,0) find next best weak classifier normalize weights - what’s learning process? iterative, progressively find next best feature until system does good enough job classifying image patches faces/non, 1st define weak classifier, just uses single feature, h classifier, value 1/face or 0/non-face, function 4 things, window x, feature f, threshold theta use decide face/not, extra p +/- 1 - may be more likely face if >threshold or <threshold, depends specific feature, p +/- 1 allows decision based either larger/smaller than threshold - to learn good features, training data, n training samples/large, sub-windows known class (1 face, 0 non), weight associated samples, start equal, later change, may be certain examples more challenging to classify (soccer ball), want to give more weight in training process, iterative process, weights normalized sum to 1, find next weak classifier (combo f, theta, p) does best job correctly ID class each image window in training set, difference what feature indicates (0/1) & correct class (0/1), summed over all windows with each multiplied by current weight associated sample, expression measures incorrect classifications, want minimize discrepancy between true class & what classifier gives us, based on this feature - not perfect, some samples wrong, increase weight those samples for next time around, searching for next best feature – strategy AdaBoost - lots of weak classifiers based single features, each does better/worse job classifying, define final classifier integrates evidence all features, weigh contribution each feature according to how good distinguishing face/non during training, sum all evidence, check if larger than threshold – yes? then face, ~200 features give good results for this formulation classifier, test set novel image patches to evaluate performance %correct final classifier AdaBoost ~ 200 features yields good results for “monolithic” classifier use classification errors to update weights
5
“Attentional cascade” of increasingly discriminating classifiers
Early classifiers use a few highly discriminating features, low threshold 1st classifier uses two features, removes 50% non-face windows later classifiers distinguish harder examples - as said, don’t want to compute values of all 200 features everywhere in image, not efficient, so used idea of attentional cascade, start all sub-windows, use small number of features to reject many sub-windows clearly non-faces, only preserve more promising ones for further analysis with additional features - these two features remove half non-face windows while preserving ~100% real faces, second layer uses about 10 more features and rejects about 80% of remaining non-faces - given you’re computing features for fewer windows, in long run, allows more features to be used, in practice, cascade 38 classifiers total about 6000 features yields high level performance Increases efficiency Allows use of many more features Cascade of 38 classifiers, using ~6000 features
6
Training with normalized faces
many more non-face patches faces are normalized for scale, rotation small variation in pose - small sample training set, 5000 faces, many more non-face patches (9500 images with no faces, many random image patches from each) - normalized to have common rotation, scale (by hand), small variation in pose over samples (some variation facial expression, background) - different level of performance depending on parameters, captured in ROC curve (receiver operator characteristics) - correct detection rate (e.g. 0.5 to 1.0) vs. false positive rate (0.0 to 0.05), vary parameters of algorithm gives different performance – suppose 75% detection with 1% false positives (not so good) – what will happen if impose more strict criteria that reduce false positives? trade-off, probably worse detection rate, similarly, more accepting will increase correct detection but also accept more incorrect, where’s ideal? To to set parameters so as close as possible to upper left corner of plot (may not get close if not good algorithm to begin)
7
Viola & Jones results With additional diagonal features, classifiers were created to handle image rotations and profile views - original classifiers tolerate some rotation, ~45 deg in depth, ~15 deg image, added more diagonal features to create classifiers can tolerate more rotation in image, specialized classifiers recognize profile views (specialized work better than one general-purpose) - detail shuffled under rug, faces appear different sizes in image - classifier always based on 24x24 pixel window – how could you handle different size faces? sample image different scales (e.g. 400 x 400 image with 24x24 windows, sample to 300 x 300 image with 24 x 24 windows (find larger face in original)), find face at multiple scales (12), go with assessment has strongest evidence - one of greatest challenges is occlusion, extreme illuminations, accessories e.g. sunglasses - simple image features, learns ones most effective by training classifier with large dataset, strategies for computational efficiency - many approaches try find best features for classifying image regions into different object classes, differ in way define features, way go about training process - one things makes Labeled Faces in Wild not so wild, faces detected by Viola-Jones face detector!
8
Viola & Jones results (MATLAB Computer Vision Toolbox)
- MATLAB has separate computer vision toolbox, includes software allows you to apply Viola-Jones method, already trained, multiple classifiers e.g. frontal face, profile, eyes, nose, upper body – scrutinize (example misses, false positives) - suppose what to detect other objects, cars, dogs – how modify strategy? database object class vs. not, train classifier to recognize that one object class (applied cars, animals e.g. cats)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.