Neural Networks Ensembles For Face Recognition

Neural Networks Ensembles For Face Recognition
B.Mohabbati , R.Ebrahimpour , S.Kasaei , E.Kabir Department of Mathematics and Computer Science Amirkabir University of Technology School of Cognitive Sciences, Institute for Studies in Theoretical Physics and Mathematics (IPM) Computer Engineering Department, Sharif University of Technology Department of Electrical and Computer Engineering, Tarbiat Modarres University Tehran, Iran

Outline Face Recognition What is face recognition? Its applications
Different approaches Neural Networks approach Combining Classifiers Experimental results Conclusions

What is Face Recognition?
It works with the most obvious individual identifier – the human face. A set of two task: Face Identification: Given a face image that belongs to a person in a database, tell whose image it is. Face Verification: Given a face image that might not belong to the database, verify whether it is from the person it is claimed to be in the database. Biometrics : digital analysis using cameras or scanners of biological characteristics such as facial structure, fingerprints and iris patterns to match profiles to databases of people

Face recognition system
Applications Applications in security : Authentication Identification Multimedia Management Security Smart Cards Surveillance Airports and railway stations Stadiums Public transportation Government offices Businesses of all kinds Quick way to discover criminals face information for each person in a fixed domain a person whose face is in the input image Face recognition system

Feature-Based Approaches:
Different Approaches Feature-Based Approaches: Holistic Approaches : Geometric Features Euclidian Distance Graph Matching Template Matching Transform-Markov Model Hidden-Markov Model … Neural computing provides technical information processing methods that are similar to the way information is processed in biological systems, such as the human brain Neural Networks Eigenfaces “Principal Component Analysis(PCA) “ Fisherfaces “Linear Discriminant Analysis(LDA)”

Biological inspirations
Some numbers… The human brain contains about 10 billion nerve cells (neurons) Each neuron is connected to the others through synapses Properties of the brain It can learn, reorganize itself from experience It adapts to the environment It is robust and fault tolerant

Biological neuron A neuron has A branching input (dendrites)
A branching output (the axon) The information circulates from the dendrites to the axon via the cell body Axon connects to dendrites via synapses Synapses vary in strength Synapses may be excitatory or inhibitory

What is an artificial neuron ?
Definition : Non linear, parameterized function with restricted output range The perceptron algorithm converges if examples are linearly separable + y Activation functions w0 w1 w2 wn Activation functions Linear Logistic Perceptron Rosenblatt (1962) Linear separation Inputs :Vector of real values Outputs :1 or -1 x1 x2 xn

Learning The procedure that consists in estimating the parameters of neurons so that the whole network can perform a specific task 2 types of learning The supervised learning The unsupervised learning The Learning process (supervised) Present the network a number of inputs and their corresponding outputs See how closely the actual outputs match the desired ones Modify the parameters to better approximate the desired outputs The desired response of the neural network in function of particular inputs is well known. A “Professor” may provide examples and teach the neural network how to fulfill a certain task Idea : group typical input data in function of resemblance criteria un-known a priori Data clustering No need of a professor The network finds itself the correlations between the data Examples of such networks : Kohonen feature maps

Multi-Layer Perceptron (MLP)
Class1 Class2 Output layer One or more hidden layers Sigmoid activations functions 2nd hidden layer 1st hidden layer Input data

Error Back Propagation
MLP Learning method Error Back Propagation -Used to train the MLP -Uses gradient descent to minimise the squared error between actual and desired outputs -The error is summed over all inputs -Error space can have local minima, which traps gradient descent Back-propagation algorithm I f the jth node is an output unit Credit assignment Learning Rate : Momentum term to smooth The weight changes over time

Back Propagation Algorithm
Input Pattern 1 Target Output 1 = Input Pattern 2 Target Output 2 1000 “ Binary digit” 0100 Input Pattern n Initialise weights Present input and target Calculate the actual output Given this input From last layer, work backwards Updating the weights

Properties of Neural Networks
Supervised networks are universal approximators Theorem : A Multi Layer Perceptron (MLP) with only one hiddenlayer has the capability to act as a universal approximator, (Hornik et al., 1989) Type of Approximators Linear approximators : for a given precision, the number of parameters grows exponentially with the number of variables (polynomials) Non-linear approximators (NN), the number of parameters grows linearly with the number of variables Three-Layer Arbitrary (Complexity Limited by No. of Nodes) A B

Face recognition using Neural Network
Problems Work done on a very selective set of face images, mostly: In upright position Lighting and background controlled Either in frontal or profile view Have no occlusions, facial hair Requirements Accurate Efficient Light invariant Rotation invariant and … Image Data Train-Test Domain Transformation (Preprocessing) Neural Networks model (Classifier) Classified face result Face recognition Standard Face Recognition Architecture

Preprocessing methods
Why Preprocessing ? The curse of Dimensionality The quantity of training data grows exponentially with the dimension of the input space In practice, we only have limited quantity of input data Increasing the dimensionality of the problem leads to give a poor representation of the mapping The preprocessing has a huge impact on performances of neural networks Preprocessing methods Normalization -Inputs of the neural net are often of different types with different orders of magnitude (E.g. Pressure, Temperature, etc.) - It is necessary to normalize the data so that they have the same impact on the model - Translate input values so that they can be exploitable by the neural network Component reduction Sometimes, the number of inputs is too large to be exploited The reduction of the input number simplifies the construction of the model Goal : Better representation of the data in order to get a more synthetic view without losing relevant information Reduction methods (PCA, CCA(Curvilinear Components Analysis), etc.)

Principal Components Analysis (PCA)
Principle Linear projection method to reduce the number of parameters Transfer a set of correlated variables into a new set of uncorrelated variables Map the data into a space of lower dimensionality Form of unsupervised learning Properties It can be viewed as a rotation of the existing axes to new positions in the space defined by original variables New axes are orthogonal and represent the directions with maximum variability Faces viewed as vectors Supposing a data set X with N data points, each of dimensions P

The PCA - Eigenface Here we will summarize the concept and method to finding principal factors: XM = X - For training set X1, X2, …… XN Average face Difference vector (Zero Mean) Covariance matrix Choose i largest Eigenvalues “ i is the inherent dimensionality of the subspace governing the original image “ First principal components

The PCA - Eigenface = 0.9571 * -0.1945 * +0.0461 * +0.586 * M-PCA PCA
= * * * * Training image set Adjusted Training image set M-PCA Image mean PCA N -Eigenface

Training the classifiers
image set Adjusted Training Image mean PCA N -Eigenface M-PCA M-PCA Combining Classifiers Training the classifiers

Neural Networks Ensembles
Two main approach to committee machines (Combining classifiers, classifiers fusion): 1-Static committee machines (Ensemble) 2- Dynamic committee machines (modular) Static committee machines (Ensemble): Static committee machines are ones where the responses of experts are combined without the mechanism seeing the input. Ensembles of classifiers are trained on different or similar data and using different or similar features. The classifiers are run simultaneously and their outputs are merged into one compound classification. 2 main methods : “Ensemble averaging and Boosting” … y2(n) yL(n) Input x(n) Expert L Expert1 Expert2 Combiner output y1(n)

S Dynamic committee machines: … Gating network Input x(n) output y1(n)
Dynamic committee machines: input signal is directly involved in combining ouputs Eg Mixtures of experts and hierarchical mixtures of experts Gating network decides the weighting of each network y1(n) … Gating network output y2(n) yL(n) Input x(n) S g1(n) g2(n) gL(n) ExpertL Expert1 Expert2

Some of Ensemble Methods:
Majority voting : In majority voting every voter has one vote that can be cast for any one candidate. The candidate that received the majority (i.e. more than half) of the votes, wins the election. Average voting : Average of vote and the candidate with the highest average wins the election. Product rule voting :Eeach voter gives a confidence value for each candidate. Then all confidence values are multiplied per candidate. The candidate with the highest confidence product wins Classifiers Support for W1 Support for W2 Decision C1 0.8 0.2 W1 C2 0.4 0.6 W2 C3 0.3 0.7 C4 C5 MAJ - AVR 0.48 0.52 PRO 0.2352 C1 C2 C3 C4 C5 Combiner W1 W2

Purposed method: Consider the Ensemble network system shown in this Fig. Here we see a set of M “level-0” networks . N1 to Nm whose outputs are combined using a “level-1” Network N* . Training Rule: The idea is to train the “level-0” networks first and then examine their behavior when generalization. This provides a new training set which is used to the train “level-1” network. … y1 y2 ym y M-PCA N1 N2 Nm N* level-0 level-1 …

Experimental results and discussion
Face Database We have used the ORL database which contains a set of faces taken between April 1992 and April 1994 at the Olivetti Research Laboratory in Cambridge, UK. There are 10 different images of 40 distinct subjects. There are variations in facial expression (open/closed eyes, smiling/non-smiling), and facial details (glasses/no glasses). All the images were taken against a dark homogeneous background with the subjects in an up-right, frontal position, with tolerance for some tilting and rotation of up to about 20 degrees. There is some variation in scale of up to about 10. The images are grayscale with a resolution of 92*112. The set of 10 images for one subject, considerable variation can be seen

Experimental results and discussion
Training image set Adjusted Training Image mean PCA N -Eigenface Step 1 M-PCA Combining Classifiers Step 2 Testing image set Zero mean Testing images M-PCA Combining Classifiers Step 3 Classified faced

Different net topologies for the same input data
Percentage correct classification on test set (200 faces – 5 times repetition) Different net topologies for the same input data Principal Components Net Topology Best (%) Average(%) 1-25 25:40:40 89 87.7 1-30 30:80:40 88.5 87.9 1-35 35:80:40 90 89.7 1-40 40:40:40 89.5 88.9 1-50 50:40:40 86.6 1-100 100:60:40 86 84.1 1-200 200:80:40 84 82.2 Principal Components Net Topology Best Average 1-40 40:40:40 89.5 88.9 40:20:40 83.8 40:80:40 90.5 89.1 40:100:40 89 87.6

Correct rate Percentage of Some manually selected committees
Averaging Majority voting Purposed method 40:20:40 (5 nets) 90.5 88.15 93.35 40:40:40 (5 nets) 88.35 88.5 95 40:80:40 (5 nets) 92 90 97.15 … y1 y2 y5 y M-PCA MLP1 MLP2 MLP5 MLP* level-0 level-1

Three main strategies for making independent individual classifiers:
Conclusion Three main strategies for making independent individual classifiers: Using different procedure, for example different kind of classifiers or different parameters in an identical procedure. For example different MLP can be trained based on different initial weights or learning parameter or number of nodes. Using different representation of input (or feature set), although this approach is practically effective in improving generalization, managing the procedure and analyzing is more complicated.(K.M. Ali and M.J. Pazzini,1995) Using identical representation but different training sets

Laboratory for develop and maintaining the ORL
Acknowledgments I Would like to thank prof. Shahshahani for helpful comments and Olivertti Reseaerch Laboratory for develop and maintaining the ORL database .

Tanks for your attention

Neural Networks Ensembles For Face Recognition

Similar presentations

Presentation on theme: "Neural Networks Ensembles For Face Recognition"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Neural Networks Ensembles For Face Recognition

Similar presentations

Presentation on theme: "Neural Networks Ensembles For Face Recognition"— Presentation transcript:

Similar presentations

About project

Feedback