Towards building user seeing computers

Slides:



Advertisements
Similar presentations
Patient information extraction in digitized X-ray imagery Hsien-Huang P. Wu Department of Electrical Engineering, National Yunlin University of Science.
Advertisements

High Performance Associative Neural Networks: Overview and Library High Performance Associative Neural Networks: Overview and Library Presented at AI06,
ARTIFICIAL PASSENGER.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
National Research Council Canada Conseil national de recherches Canada National Research Council Canada Conseil national de recherches Canada Canada Dmitry.
Face Recognition. Introduction Why we are interested in face recognition? Why we are interested in face recognition? Passport control at terminals in.
Computer Vision Lecture 18: Object Recognition II
Image Analysis Phases Image pre-processing –Noise suppression, linear and non-linear filters, deconvolution, etc. Image segmentation –Detection of objects.
Reference Guide: Visual feedback provided by Nousor for different states NouseBoard + NousePad Touch corner for next letter selection Go to block containing.
Simple Neural Nets For Pattern Classification
Un Supervised Learning & Self Organizing Maps Learning From Examples
Cindy Song Sharena Paripatyadar. Use vision for HCI Determine steps necessary to incorporate vision in HCI applications Examine concerns & implications.
CS335 Principles of Multimedia Systems Multimedia and Human Computer Interfaces Hao Jiang Computer Science Department Boston College Nov. 20, 2007.
A Vision-Based System that Detects the Act of Smoking a Cigarette Xiaoran Zheng, University of Nevada-Reno, Dept. of Computer Science Dr. Mubarak Shah,
Oral Defense by Sunny Tang 15 Aug 2003
Computer Vision Systems for the Blind and Visually Disabled. STATS 19 SEM Talk 3. Alan Yuille. UCLA. Dept. Statistics and Psychology.
EE392J Final Project, March 20, Multiple Camera Object Tracking Helmy Eltoukhy and Khaled Salama.
Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.
Knowledge Systems Lab JN 9/10/2002 Computer Vision: Gesture Recognition from Images Joshua R. New Knowledge Systems Laboratory Jacksonville State University.
Computational Video Group From recognition in brain to recognition in perceptual vision systems. Case study: face in video. Example: identifying computer.
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
CGMB214: Introduction to Computer Graphics
11.10 Human Computer Interface www. ICT-Teacher.com.
Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.
Graphite 2004 Statistical Synthesis of Facial Expressions for the Portrayal of Emotion Lisa Gralewski Bristol University United Kingdom
 The most intelligent device - “Human Brain”.  The machine that revolutionized the whole world – “computer”.  Inefficiencies of the computer has lead.
Using associative memory principles to enhance perceptual ability of vision systems (Giving a meaning to what you see) CVPR Workshop on Face Processing.
Hebbian Coincidence Learning
National Research Council Canada Conseil national de recherches Canada National Research Council Canada Conseil national de recherches Canada Institute.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Recognition in Video: Recent Advances in Perceptual Vision Technology Dept. of Computing Science, U. Windsor 31 March 2006 Dr. Dmitry O. Gorodnichy Computational.
Face Recognition: An Introduction
Face Recognition in Video Int. Conf. on Audio- and Video-Based Biometric Person Authentication (AVBPA ’03) Guildford, UK June 9-11, 2003 Dr. Dmitry Gorodnichy.
Creating Graphical User Interfaces (GUI’s) with MATLAB By Jeffrey A. Webb OSU Gateway Coalition Member.
KAMI KITT ASSISTIVE TECHNOLOGY Chapter 7 Human/ Assistive Technology Interface.
Recognizing faces in video: Problems and Solutions NATO Workshop on "Enhancing Information Systems Security through Biometrics" October 19, 2004 Dr. Dmitry.
1 Machine Vision. 2 VISION the most powerful sense.
Team Members Ming-Chun Chang Lungisa Matshoba Steven Preston Supervisors Dr James Gain Dr Patrick Marais.
CONTENT FOCUS FOCUS INTRODUCTION INTRODUCTION COMPONENTS COMPONENTS TYPES OF GESTURES TYPES OF GESTURES ADVANTAGES ADVANTAGES CHALLENGES CHALLENGES REFERENCE.
Face Detection Using Neural Network By Kamaljeet Verma ( ) Akshay Ukey ( )
Pattern Recognition. What is Pattern Recognition? Pattern recognition is a sub-topic of machine learning. PR is the science that concerns the description.
Face recognition in video as a new biometrics modality & the associative memory framework Talk for IEEE Computation Intelligence Society (Ottawa Chapter)
Face Recognition Technology By Catherine jenni christy.M.sc.
Over the recent years, computer vision has started to play a significant role in the Human Computer Interaction (HCI). With efficient object tracking.
Presented By Bhargav (08BQ1A0435).  Images play an important role in todays information because A single image represents a thousand words.  Google's.
National Taiwan Normal A System to Detect Complex Motion of Nearby Vehicles on Freeways C. Y. Fang Department of Information.
Office 2016 and Windows 10: Essential Concepts and Skills
Interactive Multimedia Authoring
Visual Basic Fundamental Concepts
Hand Gestures Based Applications
- photometric aspects of image formation gray level images
The Relationship between Deep Learning and Brain Function
11.10 Human Computer Interface
Data Mining, Neural Network and Genetic Programming
CS201 Lecture 02 Computer Vision: Image Formation and Basic Techniques
FACE DETECTION USING ARTIFICIAL INTELLIGENCE
Chapter 5 - Input.
Chapter 2: Input and output devices
Pearson Lanka (Pvt) Ltd.
Visual Basic..
Hebb and Perceptron.
Interactive Input Methods & Graphical User Input
On Facial Recognition in Video
Perceptron as one Type of Linear Discriminants
Interactive Input Methods & Graphical User Input
Ying Dai Faculty of software and information science,
The Naïve Bayes (NB) Classifier
Object Detection Implementations
Retrieval of blink- based lexicon for persons with brain-stem injury using video cameras CVPR Workshop on Face Processing in Video  June 28, 2004, Washington,
Presentation transcript:

Towards building user seeing computers CRV’05 Workshop on Face Processing in Video  August 8-11, 2005, Victoria, BC, Canada Gilles Bessens and Dmitry Gorodnichy Computational Video Group Institute for Information Technology National Research Council Canada http://synapse.vit.iit.nrc.ca

recognition / memorization What means “to see” When humans lose a sense of touch or hearing, they can still communicate using vision. The same is for computers: When information cannot be entered into computer using hands or speech, vision could provide a solution, if only computers can see... Users with accessibility needs (e.g. residents of the SCO Ottawa Health Center) will benefit the most. But other users would benefit too. Seeing tasks: 1. Where - to see where is a user: {x,y,z…} 2. What - to see what is user doing: {actions} 3. Who - to see who is the user: {names} Our goal: to built systems which can do all three tasks. x y , z a , b g PUI monitor binary event ON OFF recognition / memorization Unknown User!

Wish-list and constraints Users want computers to be able to 1. Automatically detect and recognize a user: a) to load user’s personal windows settings (e.g. font size, application window layout), which is a very tedious work for users with disabilities, b) to find the range of user's motion to map it to the computer control coordinates. 2. Enable written communication: e.g. typing a message in an email browser or internet. 3. Enable navigation in Window environment: selecting items from windows menus and pushing buttons of Windows applications. 4. Detect visual cues from users (intentional blinks, mouth opening, repetitive or predefined motion patterns) for hands-free remote control: a) mouse-type "clicks“, b) vision-based lexicon, c) computer control commands: "go to next/last window", "copy/cut/paste", "start Editor", "save and quite". But limitation should be acknowledged: computer limitations - the system should run in real time (>10fps) user mobility limitations - user have limited range of motion. Besides, camera field of view and resolution are limited environmental limitations - changing, we develop a state transition machine which switches from face detection to face recognition to face tracking modules to accommodate the constraints. Other: a) the need for missing feedback, which is what the feeling of touch when holding a mouse provides to users who operate with mouse b) the need for limited-motion-based cursor control and key entry.

Evolution of seeing computers 1998. Proof-of-concept colour-based skin tracking [Bradski’98] – not precise 2001. Motion-based segmentation & localization – not precise 1999-2002. Several skin colour models developed - reached the limits 2001. Rapid face detection using rectangular wavelets of intensities [fg02] 2002. Subpixel-accuracy convex-shape nose tracking [Nouse™, fg02, ivc04] 2002. Stereo face tracking using projective vision [w.Roth, ivc04] 2003. Second-order change detection [Double-blink, ivc04] 2003-now. Neuro-biological recognition of low-res faces [avbpa05,fpiv04,fpiv05] Figure: Typical result for face detection using colour, motion and intensity components of video using six different webcams.

Nouse™ “Nose as Mouse” good news Precision & convenience of tracking the convex-shape nose feature allows one to use nose as mouse (or joystick handle) Copyright S. A. LA NACION 2003. Todos los derechos reservados. image Motion,colour,edges,Haar-wavelets  nose search box: x,y,width,height Rating by Planeta Digital (Aug. 2003) Convex-shape template matching  nose tip detection: I,J (pixel precision) Integration over continuous intensity  X,Y (sub-pixel pixel precision) (X,Y)

Main face recognition challenge ICAO-conformed passport photograph (presently used for forensic identification) Image-based biometrics modalities Images obtained from surveillance cameras (of 11/9 hijackers) and TV. NB: VCD is 320x240 pixels Face recognition performance

Keys to resolving FRiV problem 12 pixels between the eyes should be sufficient – Nominal face resolution To beat low resolution & quality, use lessons from human vision recognition system: 1) Efficient visual attention mechanisms 2) Decision based on accumulating results over several frames (rather than on one frame) 3) Efficient neuro-associative mechanisms a) to accumulate learning data in time by adjusting synapses, and b) to associate a visual stimulus to a semantic meaning based on the computed synaptic values, using: non-linear processing, massively distributed collective decision making synaptic plasticity.

Lessons from biological vision Saliency based localization and rectification - implemented Fovea vision: Accumulation over time and space - implemented Local brightness adjustment - implemented Recognition decision at time t depends on our recognition decision at time t+1 - implemented

Lessons from biological memory Brain stores information using synapses connecting the neurons. In brain: 1010 to 1013 interconnected neurons Neurons are either in rest or activated, depending on values of other neurons Yj and the strength of synaptic connections: Yi={+1,-1} Brain is a network of “binary” neurons evolving in time from initial state (e.g. stimulus coming from retina) until it reaches a stable state – attractor. What we remember are attractors! This is the associative principle we all live to Refs: Hebb’49, Little’74,’78, Willshaw’71 - implemented ?..

From visual image  to saying name From neuro-biological prospective, memorization and recognition is two stages of the associative process: From receptor stimulus R  to effector stimulus E In brain Main associative principle Stimulus neuron Response neuron Xi: {+1 or –1} Yj: {+1 or –1} Synaptic strength: -1 < Cij < +1 “Dmitry” In computer

How to update weights Learning rules: From biologically plausible to mathematically justifiable Models of learning Hebb (correlation learning): Generalized Hebb: Better rule: or even Widrow-Hoff’s (delta) rule : Projection learning: is both incremental and takes into account the relevance of the training stimuli and their attributes Refs: Amari’71,’77, Kohonen’72, Personnaz’85, Kanter-Sompolinsky’86,Gorodnichy‘95-’99

Testing FRiV framework TV programs annotation IIT-NRC 160x120 video-based facial database (one video to memorize, another to recognize)

From video input to neural output 1. face-looking regions are detected using rapid classifiers. 2. they are verified to have skin colour and not to be static. 3. face rotation is detected and rotated, eye aligned and resampled to 12-pixels-between-the-eyes resolution face is extracted. 4. extracted face is converted to a binary feature vector (Receptor) 5. this vector is then appended by nametag vector (Effector) 6. synapses of the associative neuron network are updated Time weighted decision: a) neural mode: all neurons with PSP greater than a certain threshold Sj>S0 are considered as ``winning"; b) max mode: the neuron with the maximal PSP wins; c) time-filtered: average or median of several consecutive frame decisions, each made according to a) or b), is used; d) PSP time-filtered: technique of a) or b) is used on the averaged (over several consecutive frames) PSPs instead of PSPs of individual frames; e) any combination of the above. S10 -  The numbers of frames in 2nd video clip of the pair,  when the face in a frame is associated with the correct person (i.e. the one seen in the 1st video clip of the pair), without any association with other seen persons. - best (non-hesitant) case S11 - ... when the face is not associated with one individual, but rather with several individuals, one of which is the correct one. - good (hesitating) case S01 - ... when  the face is associated with someone else - worst case S02 - ... when the face is associated with several individuals (none of which is correct) -  wrong but hesitating case S00 - ... when  the face is not associated with any of the seen faces - not bad case

Perceptual Vision Interface Nouse™ Combining results Perceptual Vision Interface Nouse™ Evolved from a single demo program to a hands-free perceptual operating system Combines all techniques presented and provides a clear vision for other to-be-developed seeing computers Requires more man-power for tuning and software designing, contingent upon extra funding… Nouse connected User recognized User’s motion range obtained User’s face detected Nouse zero position (0,0) set Nouse initialization and calibration Face position converted to (X,Y) (to use for typing, cursor control) Visual pattern analyzed (for hands-free commands)