Presentation is loading. Please wait.

Presentation is loading. Please wait.

Face Recognition in Video Int. Conf. on Audio- and Video-Based Biometric Person Authentication (AVBPA ’03) Guildford, UK June 9-11, 2003 Dr. Dmitry Gorodnichy.

Similar presentations


Presentation on theme: "Face Recognition in Video Int. Conf. on Audio- and Video-Based Biometric Person Authentication (AVBPA ’03) Guildford, UK June 9-11, 2003 Dr. Dmitry Gorodnichy."— Presentation transcript:

1 Face Recognition in Video Int. Conf. on Audio- and Video-Based Biometric Person Authentication (AVBPA ’03) Guildford, UK June 9-11, 2003 Dr. Dmitry Gorodnichy Computational Video Group Institute for Information Technology National Research Council Canada http://www.cv.iit.nrc.ca/~dmitry

2 2 What makes FR in video special ? Constraints: - Real-time processing is required. - Low resolution: 160x120 images or mpeg-decoded. - Low-quality: week exposure, blurriness, cheap lenses. Importance: - Video is becoming ubiquitous. Cameras are everywhere. - For security, computer–human interaction, video-conferencing, entertainment … Essence: - It is inherently dynamic! - It has parallels with biological vision! NB: Living organisms also process very poor images*, yet they are very successful in tracking, detection and recognition. * - except for a very small area (fovea)

3 3 Lessons from biological vision Images are of very low resolution except in the fixation point. The eyes look at points which attract visual attention. disparity Saliency is: in a) motion, b) colour, c) disparity, d) intensity. These channels are processed independently in brain (Think of a frog catching a fly or a bull running on a torero) Intensity means: frequencies, orientation, gradient. Brain process the sequences of images rather than one image. - Bad quality of images is compensated by the abundance of images. Animals & humans perceive colour non-linearly. Colour & motion are used for segmentation. Intensity is used for recognition. Bottom-up (image driven) visual attention is very fast and precedes top-down (goal-driven) attention: 25ms vs 1sec.

4 4 Localization first. Then recognition Try to recognize a face at right What about the next one? What did you do? – -First you detected face-looking regions. -Then, if they were too small or badly orientated, you did nothing. Otherwise, you turned your face – right? -…to align your eyes with the eyes in the picture. -…since this was the coordinate system in which you stored the face. This is what biological vision does. - Localization (and tracking) of the object precedes its recognition - These tasks are performed by two different parts of visual cortex. So, why computer vision should not do the same?

5 5 These mesmerizing eyes Did you notice that you’ve started examining this slide by looking at the eyes (or circles) at left? - These pictured are sold commercially to capture infants attention. Now imagine that the eyes blinked … - For sure you’ll be looking at them! No wonder, animals and humans look at each other’s eyes. - This is apart from psychological reasons. Eyes are the most salient features on a face. Besides, there two of them, which creates a hypnotic effect (which is due to the fact that the saliency of a pixel just attended is inhibited to avoid attending it again soon.) Finally, they also the best (and the only) stable landmarks on a face which can be used a reference. Intra-ocular distance (IOD) make a very convenient unit of measurement!

6 6 Which part of the face is the most informative? What is the minimal size of a recognizable face? 1.By studying previous work: [CMU, MIT, UIUC, Fraunhofer, MERL, …] 2.By examining averaged faces: 3.By computing statistical relationship between face pixels in 1500 faces from the BioID Face Database: 9x9 12x12 16x1624x24 Using the RGB colours, each point in this 576x576 array shows, how frequently two pixels of the 24x24 face are darker one another, brighter one another or are the same (within a certain boundary) The presence of high contrast RGB colours in the image indicates the strong relationship between the face pixels. Such the strongest relationship is observed for 24x24 images centered on the eye as shown on the next slide.

7 7 Anthropometrics of face Surprised to the binary nature of our faces? But it’s true - Tested with 1500 faces from BioID face database and multiple experiments with perceptual user interfaces [Nouse’02, BlinkDet’03]. Do you also see that colour is not important for recognition? - while f or detection, it is. 2. IOD. IOD 24 2. IOD

8 8 Canonical eye-centered face model Size 24 x 24 is sufficient for face memorization & recognition and is optimal for low-quality video and for fast processing. Canonical face model suitable for on-line Face Memorization and Recognition in video [Gorodnichy’03] 2.. IOD d 24 2.. IOD Procedure: after the eyes are located, the face is extracted from video and resized to the canonical 24x24 form, in which it is memorized or recognized. Canonical face model suitable for Face Recognition in documents [Identix’02]

9 9 Face Processing Tasks Hierarchy of face recognition tasksApplicability of 160x120 video to the tasks, according to face anthropometrics “Something yellow moves” Face Segmentation Facial Event Recognition Face Memorization Face Detection Face Tracking (crude) Face Classification Face Localization (precise) Face Identification “It’s a face” “It’s at (x,y,z,  ” “Lets follow it!” “It’s face of a child”“S/he smiles, blinks” “Face unknown. Store it!”“It’s Mila!” “I look and see…” … Face size ½ image ¼ image 1/8 image 1/16 image In pixels80x8040x4020x2010x10 Between eyes-IOD 4020105 Eye size 201052 Nose size 105-- FS  b FD  b- FT  b- FL  b-- FER  b- FC  b- FM / FI  --  – good b – barely applicable - – not good (tested with Perceptual User Interfaces)

10 10 Perceptual Vision Interfaces Goal: To detect, track and recognize face and facial movements of the user. colour calibration face identification nose tracking (precise) blink detection face tracking (crude) face detection “click” event x y ( z  ) face classification face memorization Multi-channel video processing framework x y, z  PUI monitor binary event ON OFF recognition / memorization Unknown User!

11 11 Recent Advances in PUI 1. Nouse TM (Use Nose as Mouse) Face Tracking - based on tracking rotation-invariant convex shape nose feature [FGR’02] - head motion- and scale- invariant & sub-pixel precision "Nouse TM brings users with disabilities and video game fans one step closer to a more natural way of interacting hands-free with computers" - Silicon Valley North magazine, Jan 2002 "It is a convincing demonstration of the potential uses of cameras as natural interfaces." - The Industrial Physicist, Feb. 2003 2. Eye blink detection in moving heads - based on computing second-order change [Gorodnichy’03] & non-linear change detection [Durucan’01] - is currently used to enable people with brain injury [AAATE’03] 1 & 2: After each blink, eyes and nose positions are retrieved. If they form an equilateral triangle (i.e.face is parallel to image plane), than face is extracted and recognized / memorized. Figure 3. Commonly used first-order change (left image) has many pixels due to head motion (shown in the middle). Second-order change (right image) detects the local change only (change in a change), making it possible to detect eye blinks in moving heads, which was previously not possible. t-2t-1t Figure 1. This logo of the Nouse TM Technology website is written by nose. Figure 2. A camera tracks the point of each player’s nose closest to the camera and links it to the red “bat” at the top (or bottom) of the table to return the omputer ball across the “net.” (The Industrial Physicist)

12 12 Recognition with Associative Memory We use Pseudo-Inverse Associative Memory for on-line memorization and storing of faces in video. The advantages of this memory over others, as well as the Cpp code, are available from our website. Main features: –It stores binary patterns as attractors. –The accociativity is achieved by converging from any state to an attractor: –Faces are made attractors by using the Pseudo-Inverse learning rule: C = VV + or –Saturation of the network is avoided by using the desaturation technique [Gorodnichy’95]: C ii = D  C ii (0<D<1) or Converting 24x24 face to binary feature vector: A) V i =I i - I ave, B ) V i,j =sign(I i - I j ), C ) V i,j =Viola(i,j,k,l ), D ) V i,j =Haar(i,j,k,l ) PINN website: www.cv.iit.nrc.ca/~dmitry/pinn

13 13 Summary & Demos than localized and transformed to the canonical 24x24 representation, than recognized using the PINN associative memory trained pixel differences. In experiments: With 63 faces from BioD database and 9 faces of our lab users (all of which are shown) stored, the system has no problem recognizing our users after a single (or several) blinks. In many cases, as a user involuntary blinks, s/he is even not aware of the fact that his/her face is memorized / recognized. More at www.perceptual-video.com/faceinvideo.html The face is detected: using motion at far range (non-linear change detection), using colour at close range (non-linear colour mapping to perceptual uniform space) than tracked until convenient for recognition: using blink detection and nose tracking E.g. images retrieved from blink (at left) are recognized as the right image


Download ppt "Face Recognition in Video Int. Conf. on Audio- and Video-Based Biometric Person Authentication (AVBPA ’03) Guildford, UK June 9-11, 2003 Dr. Dmitry Gorodnichy."

Similar presentations


Ads by Google