Download presentation
Presentation is loading. Please wait.
Published byJuliana Lyons Modified over 9 years ago
1
1 Rotation Invariant Face Detection Using Neural Network Lecturers: Mehdi Dehghani - Mahdy Bashary Supervisor: Dr. Bagheri Shouraki Spring 2007
2
2 Agenda What’s face detection? Usages Face Detection Techniques in Grayscale Images Template-Based Face Detection with Neural Network Structure Router Network Detector Network Arbitration Among Multiple Networks Empirical Results
3
3 Face Detection Face detection is a computer technology that determines the locations and sizes of human faces in arbitrary (digital) images. It detects facial features and ignores anything else, such as buildings, trees and bodies.
4
4 Usages Biometrics: often as a part of face recognition system Security Surveillance (e.g. for logging people passing area by saving their faces.) Image database Management (e.g. for make several picture of face in a database uniform by align face in the center of image)
5
5 Face Detection Techniques in Grayscale Images Template-based face detection: these techniques encode facial images directly in terms of pixel intensities. These images can be characterized by probabilistic models of the set of face images or implicitly by neural networks or other mechanisms.
6
6 Face Detection Techniques in Grayscale Images (cont.) Feature-based face detection: This approach based on extracting features and applying either manually or automatically generated rules for evaluating these features. (e.g.: finding place of eyes, mouth and nose and checking if nose is in the triangle made by eyes and mouth.)
7
7 Template-base Face Detection
8
8 Image Pyramid It’s used to detect faces larger than window size. It’s made by repeatedly reducing size of input image by subsampling. This amount of reduction in size in each stage is determined by invariance of detector network to scale.
9
9 Rotation Invariance Rotation invariance is ability to detect faces which are rotated in-plane
10
10 Rotation Invariance (cont.) The simplest would be to employ the upright face detection, by repeatedly rotating the input image in small increments and applying the detector to each rotated image. However, this would be an extremely computationally expensive procedure θ
11
11 Structure Image PyramidRouter NetworkDetector Network
12
12 Router Network First, the window is preprocessed using histogram equalization, and given to a router network. The rotation angle returned by the router is then used to rotate the window with the potential face to an upright position. Finally, the derotated window is preprocessed and passed to one or more detector networks.
13
13 Router Network (cont.) Derotator Compute Orientation
14
14 Output Angle Single Unit: The activation amount of a single output unit (usually either between 0-1 or -1 and +1) is mapped linearly between the range of 0-360 to determine the angle of rotation. 1-of-N Encoding: N units are used to represent the output Each unit represents 360/N For example, if there were 180 units, and if unit 30 had the highest activation, this would indicate a rotation of 60.
15
15 Output Angle (cont.) If we presume there are vectors from center of circle to each units with length of pixel intensity. The direction of average vector of these vectors is interpreted as the angle of face.
16
16 Architecture The architecture for the router network consists of three layers, an input layer of 400 units, a hidden layer of 15 units, and an output layer of 36 units. Each layer is fully connected to the next. Each unit uses a hyperbolic tangent activation function, and the network is trained using the standard error backpropogation algorithm.
17
17 Generating training set The training examples are generated from a set of manually labelled example images containing 1048 faces. In each face, the eyes, tip of the nose, and the corners and center of the mouth are labelled. We first compute the average location for each of the labelled features over the entire training set. Then, each face is aligned with the average feature locations, by computing the rotation, translation, and scaling that minimizes the distances between the corresponding features. After iterating these steps a small number of times, the alignments converge.
18
18 Generating training set (cont.) …
19
19 Generating training set (cont.) Example upright frontal face images aligned to one another.
20
20 Training Router Network To generate the training set, the faces are rotated to a random orientation.
21
21 Training Router Network (cont.) Value[i]=cos(θ – i×10) i=0 i=35 θ
22
22 Review Derotator Compute Orientation
23
23 Detector Network at a glance It has a 20×20 pixel region of image as input and generates output ranging from 1 to -1 signifying absence or presence of a face.
24
24 The Preprocessing Light Correction: This process equalize light effects in different places of window. This compensate for a variety of lighting conditions. Histogram Equalization: Histogram equalization is performed on the window. This compensate for difference in camera input gains.
25
25 The Preprocessing
26
26 Detector Neural Network It uses multi-layer perceptron. There are three types of hidden units: four which look at 10 × 10 pixel subregions,16 which look at 5 × 5 pixel subregions, and six which look at overlapping 20 × 5 pixel horizontal stripes of pixels. In particular, the horizontal stripes allow the hidden units to detect such features as mouths or pairs of eyes, while the hidden units with square receptive fields might detect features such as individual eyes,the nose, or corners of the mouth.
27
27 Training Technique It uses backpropagation with momentum technique to train the network. The detectors have two sets of training examples: images which are faces, and images which are not. Training a neural network for the face detection task is challenging because of the difficulty in characterizing prototypical “non- face” images
28
28 Generating face images training set from each original image by randomly rotating the images (about their center points) up to 10º,scaling between 90 percent and 110 percent, translating up to half a pixel, and mirroring. The randomization gives the filter invariance to translations of less than a pixel, scalings of 20 percent and rotations up to 20º.
29
29 General non-face images Practically any image can serve as a nonface example because the space of nonface images is much larger than the space of face images. However, collecting a “representative” set of nonfaces is difficult.
30
30 A “bootstrap” training algorithm 1. Create an initial set of non-face images by generating 1000 random images. 2. Train the neural network to produce an output of +1,0 for the face examples, and -1,0 for the nonface examples. In the first iteration, the network’s weights are initialized random. After the first iteration, we use the weights computed by training in the previous iteration as the starting point. 3. Run the system on an image of scenery which contains no faces. Collect subimages in which the network incorrectly identifies a face (an output activation > 0,0). 4. Select up to 250 of these subimages at random, and add them into the training set as negative examples. Go to step 2.
31
31 An Example
32
32 An Example of Result
33
33 Refinement The raw output from a single network will contain a number of false detections. A strategy should be used to reduce number of false positives. There are two ways to improve the reliability of the detector: cleaning-up the outputs from an individual network, and arbitrating among multiple networks.
34
34 Clean-Up Heuristic The faces is detected at nearby position or scales, while false detections often occur with less consistency. These observation will lead to a heuristic which can eliminate false detections. If a particular location is correctly identified as a face, then all other detection locations which overlap it are likely to be errors, and therefore be eliminated. So we preserve the locations with the higher number of detections within a small neighborhood, and eliminate locations with fewer detections.
35
35 Illustration For Heuristic
36
36 The Result
37
37 Arbitration Among Multiple Network To reduce the number of false positives, we can apply multiple networks, and arbitrate between their outputs to produce the final decision. Each network is trained using the same algorithm with the same set of face examples, but with different random initial weights, random initial nonface images, and permutations of the order of presentation of the scenery images. The detection and false positive rates of the individual networks will be quite close. However, because of different training conditions and because of selfselection of negative training examples, the networks will have different biases and will make different errors.
38
38 Arbitration Among Multiple Network
39
39
40
40 Analysis of the Networks The output of the router network is used to derotate the input for the detector, the angular accuracy of the router must be compatible with the angular invariance of the detector. To measure the accuracy of the router, we generated test example images based on the training images, with angles between -30º and 30º at 1º increments. We applied the detector to the same set of test images as the router, and measured the fraction of faces which were correctly classified as a function of the angle of the face. Because 92% of errors range between -10 to 10 and our network detect about 90 percent of faces which are rotated between -10 and 10, the two networks are compatible.
41
41 Empirical Results Upright Test Set: There are a total of 130 images, with 511 faces (of which 469 are within 10º of upright). Rotated Test Set: There are 50 images containing 223 faces, of which 210 are at angles of more than 10º from upright.
42
42 Proposed System In current system we train detector network with the scenery images straightly fed to detector network. If we train our detector network with scenery images passed from the router network, the performance of system increases.
43
43 Exhaustive Search of Orientations To demonstrate the effectiveness of the router for rotation invariant detection, we applied the two sets of detector networks described above without the router. The detectors were instead applied at 18 different orientations (in increments of 20º) for each image location.
44
44 Upright Detection Accuracy To ensure that adding the capability to detect rotated face has not come with expense of losing accuracy in detecting upright faces. We apply upright face detector on test set image.
45
45 Comparison Our new system has a slightly lower detection rate on upright faces for two reasons. First, the detector networks cannot recover from all the errors made by the router network. Second, the detector networks which are trained with derotated negative examples are more conservative in signalling detections; this is because the derotation process makes the negative examples look more like faces, which makes the classification problem harder.
46
46
47
47
48
48 Movie Examples
49
49 References 1. H. A. Rowley, S. Baluja, and T. Kanade, “Neural Network-Based Face Detection,” IEEE Trans. PAMI, vol. 20, pp. 23-38, Jan. 1998. 2. H.A. Rowley, S. Baluja, and T. Kanade, "Rotation Invariant Neural Network-Based Face Detection" Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 38-44, 1998. 3. H.A. Rowley, ”Neural Network Face Detection”, PhD Thesis, May 1999. 4. Shumeet Baluja. Face detection with in-plane rotation: Early concepts and preliminary results. JPRC-1997-001-1, Justsystem Pittsburgh Research Center, 1997.
50
50 Any Question?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.