Université du Québec École de technologie supérieure Face Recognition in Video Using What- and-Where Fusion Neural Network Mamoudou Barry and Eric Granger Laboratoire dimagerie, de vision et dintelligence artificielle École de technologie supérieure Montreal, Canada
Université du Québec École de technologie supérieure 2 Overview 1. Introduction 2. What-and-Where fusion neural network 3. Experimental methodology 4. Results 5. Conclusion
Université du Québec École de technologie supérieure 3 1. Introduction Challenges of video-based face recognition low quality and resolution of frames. low quality and resolution of frames. uncontrolled environments: variation in poses, orientation, expressions, illumination, occlusion, etc. uncontrolled environments: variation in poses, orientation, expressions, illumination, occlusion, etc.
Université du Québec École de technologie supérieure 4 1. Introduction General system for face recognition in video
Université du Québec École de technologie supérieure 5 1. Introduction State of the art 1. Methods based on static images –exploit quality metric, and recognize only high quality ROIs 2. Spatiotemporal approaches –track faces in the environment, and recognize individuals over several samples
Université du Québec École de technologie supérieure 6 1. Introduction Objectives Observe the effectiveness of the What-and-Where fusion neural network in video-based face recognition Observe the effectiveness of the What-and-Where fusion neural network in video-based face recognition Robust operation in uncontrolled environments Robust operation in uncontrolled environments
Université du Québec École de technologie supérieure 7 2. What-and-Where Fusion Neural Network (Granger et al., 2001) Division of data streams 1. What data : intrinsic properties of a face (to classifier) 2. Where data : contextual information (to tracker)
Université du Québec École de technologie supérieure 8 Tracker: bank of Kalman filters stimates the future position of faces in a scene estimates the future position of faces in a scene Classifier: fuzzy ARTMAP classifies faces detected in a scene classifies faces detected in a scene neural network architecture capable of fast, stable, online, unsupervised or supervised, incremental learning, classification and prediction neural network architecture capable of fast, stable, online, unsupervised or supervised, incremental learning, classification and prediction 2. What-and-Where Fusion Neural Network
Université du Québec École de technologie supérieure 9 2. What-and-Where fusion neural network Evidence accumulation
Université du Québec École de technologie supérieure 10 Sequential evidence accumulation Fusion of responses from classifier and tracker 1. accumulation rule: 2. prediction of the recognition system: 2. What-and-Where Fusion Neural Network
Université du Québec École de technologie supérieure Experimental methodology Data set (D. Gorodnichy, CNRC, 2005) Video-based framework for face recognition in video Task: recognize the user of a PC 11 individuals: 11 individuals: 2 video sequences per individual, one dedicated for training and the other for testing
Université du Québec École de technologie supérieure Experimental methodology Data set different scenarios : pose, expression, orientation, motion, proximity, resolution and partial occlusion.
Université du Québec École de technologie supérieure Experimental methodology Protocol for experiments train: train fuzzy ARTMAP with What data, using two training strategies train: train fuzzy ARTMAP with What data, using two training strategies Hold-Out Validation (HV) Hold-Out Validation (HV) Particle Swarm Optimization (PSO) to optimize hyper- parameters (Granger et al., 2007) Particle Swarm Optimization (PSO) to optimize hyper- parameters (Granger et al., 2007) test : classify What data with fuzzy ARTMAP and track Where data with Kalman filters test : classify What data with fuzzy ARTMAP and track Where data with Kalman filters
Université du Québec École de technologie supérieure Experimental methodology Performance measures accuracy: average classification error (estimate of generalization error) accuracy: average classification error (estimate of generalization error) resource requirements: resource requirements: compression: average number of training patterns per category compression: average number of training patterns per category convergence time: average number of epochs required to complete learning. convergence time: average number of epochs required to complete learning.
Université du Québec École de technologie supérieure Results Examples of Face Detections
Université du Québec École de technologie supérieure Results Average error and compression vs. ROI scaling size (with 100% of training data)
Université du Québec École de technologie supérieure Results Average error and compression vs. training subset size (with a |ROI| =10x10)
Université du Québec École de technologie supérieure Results Average convergence time fuzzy ARTMAP with HV: ~1 epoch fuzzy ARTMAP with HV: ~1 epoch fuzzy ARTMAP with PSO: ~543 epochs fuzzy ARTMAP with PSO: ~543 epochs (60 particles x ~8.9 iterations x 1 epoch)
Université du Québec École de technologie supérieure Results Average confusion matrix
Université du Québec École de technologie supérieure 20 Example of prediction errors over time 4. Results
Université du Québec École de technologie supérieure 21 Effectiveness of the What-and-Where fusion neural network in improving the accuracy on complex video data (about 50% over fuzzy ARTMAP alone, and k-NN). Effectiveness of the What-and-Where fusion neural network in improving the accuracy on complex video data (about 50% over fuzzy ARTMAP alone, and k-NN). The system is less sensitive to noise: attenuation of fuzzy ARTMAP poor predictions. The system is less sensitive to noise: attenuation of fuzzy ARTMAP poor predictions. Optimizing the network internal parameters using PSO learning strategy improves the accuracy of the system. Optimizing the network internal parameters using PSO learning strategy improves the accuracy of the system. Fuzzy ARTMAP yields a higher compression than k-NN: suitable for real time and ressource limited applications. Fuzzy ARTMAP yields a higher compression than k-NN: suitable for real time and ressource limited applications. 5. Conclusion
Université du Québec École de technologie supérieure Future work Explore different ARTMAP models to improve the classification rate. Explore different ARTMAP models to improve the classification rate. Explore other representations (features) of face based on biological vision perception. Explore other representations (features) of face based on biological vision perception. Investigate for more robust tracking algorithms such as Extended Kalman filter, Particle filters, etc., for non linear tracking. Investigate for more robust tracking algorithms such as Extended Kalman filter, Particle filters, etc., for non linear tracking.